LP #1096274: pt-archiver misses data when only copying data

Description

**Reported in Launchpad by Pritam Mungse last update 17-11-2016 18:00:56

when using '--no-delete' option in pt-archiver (to only copy data to destination) if the where clause column can have multiple records then pt-archiver will miss records in copying.
e.g. create table with following columns:
id, timestamp
data:

001-25941882

1280490242452

001-25941883

1280490242452

001-25941884

1280490242452

001-25941885

1280490242452

001-25941886

1280495048106

001-25941887

1280495048108

001-25942993

1280495048108

001-25942994

1280495048108

001-25942995

1280490776956

001-25942996

1280493558772

use batch size of 2 (limit) and with commit-each option, the pt-archiver will miss copying few records.

pt version- 2.1.7
mysql version - 5.1.40
command:
/usr/bin/pt-archiver --source h=127.0.0.1,P=3306,D=test_db_1,u=root,p=xxx,t=test,i=idx_timestamp,A=utf8 --dest h=127.0.0.1,P=3306,D=test_db_2,u=root,p=xxx,t=test,A=utf8 --where "created < 1280493558772 " --limit 2--sentinel /tmp/sentinel --set-vars "SQL_MODE=''" --no-delete --commit-each --skip-foreign-key-checks --no-safe-auto-increment --why-quit --statistics --progress 2

It's pretty easy to reproduce and happens due to optimized SQL (which add in above case timestamp > [max value in last run]) in the next sql clause - which cause the next run to miss few records.

Environment

None

Smart Checklist

Activity

lpjirasync January 24, 2018 at 2:18 PM

**Comment from Launchpad by: Carlos Salguero on: 17-11-2016 18:00:01

This is the default behavior.
From the docs:

--[no]safe-auto-increment
default: yes

Do not archive row with max AUTO_INCREMENT.

Adds an extra WHERE clause to prevent pt-archiver from removing the newest row when ascending a single-column AUTO_INCREMENT key. This guards against re-using AUTO_INCREMENT values if the server restarts, and is enabled by default.

The extra WHERE clause contains the maximum value of the auto-increment column as of the beginning of the archive or purge job. If new rows are inserted while pt-archiver is running, it will not see them.

lpjirasync January 24, 2018 at 2:18 PM

**Comment from Launchpad by: Jesse McLaughlin on: 17-11-2016 14:14:41

Although if the test case listed above was added to the current code, it would also break. So either way you have broken tests.

lpjirasync January 24, 2018 at 2:18 PM

**Comment from Launchpad by: Carlos Salguero on: 17-11-2016 13:43:20

That 'fix' breaks all the tests

lpjirasync January 24, 2018 at 2:18 PM

**Comment from Launchpad by: Jesse McLaughlin on: 06-10-2016 10:53:29

FYI,

With reference to earlier in this thread, the following patch fixes the test case above:

$ diff pt-archiver pt-archiver-patched
6247c6247
< $first_sql .= " AND ($col < " . $q->quote_val($val) . ")";

> $first_sql .= " AND ($col <= " . $q->quote_val($val) . ")";

lpjirasync January 24, 2018 at 2:18 PM

**Comment from Launchpad by: Jesse McLaughlin on: 04-10-2016 14:30:13

Here's a test case that reproduces this issue:

  1. STEP 1. Create some test data (create table and inserts below):

$ mysql tmp

Server version: 5.5.37-MariaDB-log MariaDB Server
MariaDB [tmp]>

CREATE TABLE `tiger_pony` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`ref` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

insert into tiger_pony (ref) values ('Timber');
insert into tiger_pony (ref) values ('Shiny');

select * from tiger_pony;


--------+

id

ref


--------+

1

Timber

2

Shiny


--------+

  1. STEP 2. Run the Percona archiver utility on the table:

$ /opt/percona-toolkit-2.2.17/bin/pt-archiver -u <user> -p <password> --source h=<host>,D=tmp,t=tiger_pony --where 'true' --no-delete --limit 1000 --commit-each --file ./tmp.txt

$ cat tmp.txt
1 Timber

  1. So the original select gives two rows, and the Percona archiver utility only outputs one.

Won't Do

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 2:17 PM
Updated December 20, 2023 at 3:37 AM
Resolved January 24, 2018 at 2:18 PM