LP #1220341: pt-query-digest: a lot of information is removed when doing '--no-report --output'

Description

**Reported in Launchpad by Kenny Gryp last update 20-05-2017 09:52:53

In this particular case, I wanted to do some more complex groupby and ordering on a large slowlog which requires a lot of memory.
In order to be able to do this without gigabytes of memory and hours of wait time, I wanted to filter out most of the data first and create a smaller slowlog
But unfortunately all that data is lost when doing --no-report --output slowlog

What is lost:

  • SET TIMESTAMP

  • Time microsecond precision

  • Rows_affected

  • InnoDB_trx_id

  • hostname

  • Rows_read

  • Bytes_sent
    and many many more

To reproduce, run:

echo "

  1. Time: 130828 0:00:00.540005

  2. User@Host: masked[masked] @ [172.29.126.132]

  3. Thread_id: 352095 Schema: masked Last_errno: 0 Killed: 0

  4. Query_time: 0.000208 Lock_time: 0.000072 Rows_sent: 0 Rows_examined: 0 Rows_affected: 1 Rows_read: 0

  5. Bytes_sent: 11 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

  6. InnoDB_trx_id: DF756D

  7. QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No

  8. Filesort: No Filesort_on_disk: No Merge_passes: 0

  9. InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000

  10. InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000

  11. InnoDB_pages_distinct: 5
    SET timestamp=1377640800;
    select 'we_love_daniel';
    " | ./pt-query-digest --no-report --output slowlog

Then you will get:

  1. Time: 130828 0:00:00

  2. User@Host: masked[masked] @ []

  3. Thread_id: 352095

  4. Query_time: 0.000208 Lock_time: 0.000072 Rows_sent: 0 Rows_examined: 0

  5. QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No

  6. Filesort: No Filesort_on_disk: No Merge_passes: 0

  7. InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000

  8. InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000

  9. InnoDB_pages_distinct: 5
    use masked;
    select 'we_love_daniel'
    ;

Environment

None

Smart Checklist

Activity

Show:

lpjirasync January 24, 2018 at 7:44 PM

**Comment from Launchpad by: Tom Desp on: 20-05-2017 09:52:52

I'm facing the exact same problem.

I'm using pt-query-digest (v2.2.20) slowlog output format to filter out queries based on their timestamp, to run further digests on the filtered slowlog :

ptqd 11-21.slow.log --output slowlog --no-report --filter '($event->{timestamp} > 1483957800) && ($event->{timestamp} < 1483959600)' > 11-21.halfhour.log

The major point for me is the loss of the query cache hit info.

Original log:

  1. Time: 170109 12:08:43

  2. User@Host: pidf_user[pidf_user] @ [10.226.13.132]

  3. Thread_id: 32829731 Schema: webmbpidf QC_hit: No

  4. Query_time: 0.015387 Lock_time: 0.000038 Rows_sent: 0 Rows_examined: 0

Filtered log

  1. Time: 170109 11:59:59

  2. User@Host: pest_user[pest_user] @ 10.226.13.13 []

  3. Thread_id: 32805954

  4. Query_time: 0.000221 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0

lpjirasync January 24, 2018 at 7:44 PM

**Comment from Launchpad by: Kenny Gryp on: 03-09-2013 18:03:47

Attached is a diff that fixes most of it:

This still remains:

-# Time: 130828 0:00:00.540005
+# Time: 130828 0:00:00

  1. User@Host: masked[masked] @ [172.29.126.132]

  2. Thread_id: 352095 Schema: masked Last_errno: 0 Killed: 0

  3. Query_time: 0.000208 Lock_time: 0.000072 Rows_sent: 0 Rows_examined: 0 Rows_affected: 1 Rows_read: 0
    @@ -9,5 +9,7 @@

  4. InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000

  5. InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000

  6. InnoDB_pages_distinct: 5
    +use masked;
    SET timestamp=1377640800;
    -select 'we_love_daniel';
    +select 'we_love_daniel'
    +;

lpjirasync January 24, 2018 at 7:44 PM

**Comment from Launchpad by: Kenny Gryp on: 03-09-2013 18:02:36

— pt-query-digest.orig 2013-09-03 19:22:28.423315853 +0200
+++ pt-query-digest 2013-09-03 20:01:27.827232189 +0200
@@ -5164,25 +5164,45 @@
print $fh "# Time: $event->{ts}\n";
}
if ( $event->{user} ) {

  • printf $fh "# User@Host: %s[%s] @ %s []\n",

  • $event->{user}, $event->{user}, $event->{host};
    + printf $fh "# User@Host: %s[%s] @ %s [%s]\n",
    + $event->{user}, $event->{user}, $event->{host}, $event->{ip};
    }
    if ( $event->{ip} && $event->{port} ) {
    printf $fh "# Client: $event->{ip}:$event->{port}\n";
    }
    +
    + printf $fh "#";
    if ( $event->{Thread_id} ) {

  • printf $fh "# Thread_id: $event->{Thread_id}\n";
    + printf $fh " Thread_id: $event->{Thread_id}";
    + }
    +
    + if ( defined $event->{db} ) {
    + printf $fh " Schema: $event->{db}";
    + }
    +
    + if ( defined $event->{Last_errno} ) {
    + printf $fh " Last_errno: $event->{Last_errno}";
    + }
    +
    + if ( defined $event->{Killed} ) {
    + printf $fh " Killed: $event->{Killed}";
    }
    + printf $fh "\n";

my $percona_patched = exists $event->{QC_Hit} ? 1 : 0;

printf $fh

  • "# Query_time: %.6f Lock_time: %.6f Rows_sent: %d Rows_examined: %d\n",
    + "# Query_time: %.6f Lock_time: %.6f Rows_sent: %d Rows_examined: %d Rows_affected: %d Rows_read: %d\n",
    map { $_ || 0 }

  • @{$event}{qw(Query_time Lock_time Rows_sent Rows_examined)};
    + @{$event}{qw(Query_time Lock_time Rows_sent Rows_examined Rows_affected Rows_read)};

if ( $percona_patched ) {

printf $fh
+ "# Bytes_sent: %d Tmp_tables: %d Tmp_disk_tables: %d Tmp_table_sizes: %d\n# InnoDB_trx_id: %s\n",
+ map { $_ || 0 }
+ @{$event}{qw(Bytes_sent Tmp_tables Tmp_disk_tables Tmp_table_sizes InnoDB_trx_id)};
+
+ printf $fh
"# QC_Hit: %s Full_scan: %s Full_join: %s Tmp_table: %s Tmp_table_on_disk: %s\n# Filesort: %s Filesort_on_disk: %s Merge_passes: %d\n",
map { $_ || 0 }
@{$event}{qw(QC_Hit Full_scan Full_join Tmp_table Tmp_table_on_disk Filesort Filesort_on_disk Merge_passes)};
@@ -5202,6 +5222,11 @@
if ( $event->{db} ) {
printf $fh "use %s;\n", $event->{db};
}
+
+ if ( $event->{timestamp} ) {
+ printf $fh "SET timestamp=$event->{timestamp};\n";
+ }
+
if ( $event->{arg} =~ m/^administrator command/ ) {
print $fh '# ';
}

lpjirasync January 24, 2018 at 7:44 PM

**Comment from Launchpad by: Kenny Gryp on: 03-09-2013 17:15:53

percona@debian7:/data/percona$ ./pt-query-digest --version
pt-query-digest 2.2.4

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 7:44 PM
Updated December 21, 2019 at 2:32 PM