LP #1016272: pt-kill kills prepared statements without checking busy-time

Description

**Reported in Launchpad by Gavin Towey last update 06-09-2017 19:49:16

We're using pt-kill with the following options:
pt-kill --busy-time 120s --print --match-info "^(select|SELECT)" --interval 10

The idea is to only kill select queries running longer than 120 seconds.

However there's a problem with the logic that pt-kill uses to kill queries. If the Command is not equal to Query, then busy time is ignored, and other match parameters are free to take effect.

In this case, a prepared statement is in the "Execute" Command state, and running a SELECT in the Info part. It gets killed immediately by pt-kill every time.

The current assumption that only items in the processlist with Command=Query are "busy" But that isn't the case.

I think the right fix is to say that any query which is not idle is busy.

That is to change the current busy time check from:
if ( $find_spec{busy_time} && ($query->{Command} || '') eq 'Query' ) {

to
if ( $find_spec{busy_time} && ($query->{Command} || '') ne 'Sleep' ) {

However, I understand that this isn't a trivial change. The smallest change which would fix this issue is to add Execute to the if statement as well as Query, but that seems like it could leave the script vulnerable to similar issues.

The MySQL documentation in this case seems to agree with my case: that many queries not having Command=Query should actually be considered busy: http://dev.mysql.com/doc/refman/5.1/en/thread-commands.html

Environment

None

Smart Checklist

Activity

Show:

Manuel June 5, 2018 at 10:59 AM

Any expected version where this can finally fixed?

Jenni Snyder April 30, 2018 at 4:57 PM

This looks related to https://perconadev.atlassian.net/browse/PT-1492#icft=PT-1492, which is marked as closed in version 3.0.8, but still appears to be an issue for us after upgrading. These tickets should be linked.

Jason Corley February 14, 2018 at 1:43 AM

specifying --kill-busy-commands=Query,Execute to pt-kill yields the following error:

Option kill-busy-commands does not take an argument

this patch appears to fix:

--- /usr/bin/pt-kill 2017-12-29 15:16:42.000000000 +0000 +++ /tmp/pt-kill 2018-02-14 00:54:51.000000000 +0000 @@ -8263,6 +8263,8 @@ =item --kill-busy-commands +type: string + default: Query group: Actions

looking at travis tests https://travis-ci.org/percona/percona-toolkit/ it would appear none of the tests in the t/ directory are being run, which might have masked this issue?

Done

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 3:43 PM
Updated March 4, 2024 at 5:13 PM
Resolved October 8, 2018 at 2:09 AM