LP #1277016: pt-archiver returning duplicate rows when selecting data from InnoDB partitioned table
Description
Environment
Smart Checklist
Activity

lpjirasync January 24, 2018 at 8:08 PM
**Comment from Launchpad by: Frank Cizmich on: 13-04-2015 17:38:03
Fixed along with:

lpjirasync January 24, 2018 at 8:08 PM
**Comment from Launchpad by: Tibor Korocz on: 17-09-2014 09:03:00
Hi,
I have the same problem.. When using pt-archiver on an InnoDB partitioned table it gives me back duplicated rows... any update or news in this topic?
Thanks

lpjirasync January 24, 2018 at 8:08 PM
**Comment from Launchpad by: Nilnandan Joshi on: 10-09-2014 09:33:13
able to reproduce with pt-archiver 2.2.7
mysql> show create table tbl_part \G
*************************** 1. row ***************************
Table: tbl_part
Create Table: CREATE TABLE `tbl_part` (
`id` int(11) DEFAULT NULL,
`name` varchar(50) DEFAULT NULL,
`purchased` date DEFAULT NULL,
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE ( YEAR(purchased))
(PARTITION p0 VALUES LESS THAN (2000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (2005) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (2010) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (2015) ENGINE = InnoDB) */
1 row in set (0.00 sec)
mysql>
mysql> insert into tbl_part values (1, 'laptop', now());
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> insert into tbl_part values (2, 'mobile', '2010-09-05');
Query OK, 1 row affected (0.01 sec)
mysql> insert into tbl_part values (3, 'ipad', '2003-09-05');
Query OK, 1 row affected (0.02 sec)
mysql> insert into tbl_part values (4, 'mouse', '2005-09-05');
Query OK, 1 row affected (0.01 sec)
mysql> insert into tbl_part values (5, 'keyboard', '2007-09-05');
Query OK, 1 row affected (0.01 sec)
mysql> insert into tbl_part values (6, 'monitor', '2007-07-05');
Query OK, 1 row affected (0.02 sec)
mysql> insert into tbl_part values (7, 'cpu', '2006-09-05');
Query OK, 1 row affected (0.00 sec)
mysql> select * from tbl_part order by id;
----------------------------
id | name | purchased |
----------------------------
1 | laptop | 2014-09-10 |
2 | mobile | 2010-09-05 |
3 | ipad | 2003-09-05 |
4 | mouse | 2005-09-05 |
5 | keyboard | 2007-09-05 |
6 | monitor | 2007-07-05 |
7 | cpu | 2006-09-05 |
----------------------------
7 rows in set (0.00 sec)
mysql>
nilnandan@Dell-XPS:~$ pt-archiver --source u=root,h=localhost,D=test,t=tbl_part,i=id --file "/home/nilnandan/Data.txt" --where "purchased > '2005-04-02' and purchased < '2013-07-02'" --ask-pass --no-delete --statistics --why-quit --limit 5
Enter password:
Started at 2014-09-10T14:58:08, ended at 2014-09-10T14:58:08
Source: D=test,h=localhost,i=id,p=...,t=tbl_part,u=root
SELECT 9
INSERT 0
DELETE 0
Action Count Time Pct
select 3 0.0012 28.97
commit 10 0.0007 17.73
print_file 9 0.0004 9.81
other 0 0.0018 43.49
Exiting because there are no more rows.
nilnandan@Dell-XPS:~$
nilnandan@Dell-XPS:~$ cat Data.txt
4 mouse 2005-09-05
5 keyboard 2007-09-05
6 monitor 2007-07-05
7 cpu 2006-09-05
2 mobile 2010-09-05
4 mouse 2005-09-05
5 keyboard 2007-09-05
6 monitor 2007-07-05
7 cpu 2006-09-05
nilnandan@Dell-XPS:~$
nilnandan@Dell-XPS:~$ pt-archiver --version
pt-archiver 2.2.7
nilnandan@Dell-XPS:~$

lpjirasync January 24, 2018 at 8:08 PM
**Comment from Launchpad by: James on: 13-08-2014 16:42:34
Hi...is there any update on this? I opened it 6 months ago and it looks like it still hasn't been triaged.
Thanks.

lpjirasync January 24, 2018 at 8:08 PM
**Comment from Launchpad by: I C on: 12-03-2014 10:49:21
I'm also encountering this issue (although for me it doesn't seem to matter if the original table is MyISAM or InnoDB). My source table is also partitioned. Another difference is that I'm copying the rows to another DB instead of to a file.
Here is the command I'm running:
pt-archiver --source ... --ask-pass --no-delete --dest ... --where "CREATION_TIME >= '2000-01-01 12:00:01' AND CREATION_TIME < '2014-03-01 00:00:00'" --txn-size 1000000000 --limit 1000 --ignore --statistics --why-quit
Counting the rows with the condition I give the archiver returns 1405 rows. (SELECT COUNT FROM source_table WHERE CREATION_TIME >= '2000-01-01 12:00:01' AND CREATION_TIME < '2014-03-01 00:00:00'
I tried using the Archive engine for the destination table and got duplicate rows:
SELECT 1695
INSERT 1695
I tried working around this by changing the destination table's engine to InnoDB and defining the same primary key as the original table has, but this time I got LESS rows than expected:
SELECT 1695
INSERT 1228
pt-archiver version is 2.2.7
Both source and destination DB are running on Percona Server 5.5 (5.5.35-33 for the source DB, 5.5.27-28 for the destination DB).
Details
Details
Assignee
Reporter

Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

**Reported in Launchpad by James last update 13-04-2015 17:38:22
When using pt-archiver on an InnoDB partitioned table (this particular table has 50m+ rows) it was returning many more rows than using the exact same where clause with mysql from the command line. After a bit of digging, I found that pt-archiver was duplicating rows when outputting the archive data into a file.
I then tried the exact same pt-archiver command with the same table data put into a MyIsam table with the exact same schema, and the row count returned was exactly the same as the same command put into a mysql command. This to me looks like pt-archiver has a problem with InnoDB partitioned tables, as I also see another bug to do with --purge not working properly with partitioned tables (#1044167).
Commands used:
pt-archiver --source u=someuser,h=localhost,D=somedb,t=SomeInnoDBTable,i=anIndex --file "largeData.txt" --where "someDate > '2013-04-02' and someData < '2013-07-02'" --password=somepass --no-delete --statistics --why-quit --limit 20000000 --charset utf8
This returned a few thousand more rows than the below mysql command
mysql -usomeuser -hlocalhost -psomepass -e "SELECT count from somedb.someInnoDBTable force index(valuedate) where someDate > '2013-04-02' and someDate < '2013-07-02'"
pt-archiver --source u=someuser,h=localhost,D=somedb,t=SomeMyIsamTable,i=anIndex --file "largeData.txt" --where "someDate > '2013-04-02' and someData < '2013-07-02'" --password=somepass --no-delete --statistics --why-quit --limit 20000000 --charset utf8
mysql -usomeuser -hlocalhost -psomepass -e "SELECT count from somedb.someMyIsamTable force index(valuedate) where someDate > '2013-04-02' and someDate < '2013-07-02'"
These both returned the same number of rows when using the MyIsam table
pt-archiver version used is 2.2.2
Mysql version is 14.14, distrib 5.5.11