LP #1253537: [FR] Add recursion-method=dsn in pt-table-sync

Description

**Reported in Launchpad by Przemek last update 27-04-2015 15:20:32

In some cases, available recursion methods in pt-table-sync are not capable of discovering the slave hosts. The mysqlsandbox replication sandbox is a good example, where both TCP port and report_host are custom:

master [localhost] {msandbox} ((none)) > show processlist;


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Id

User

Host

db

Command

Time

State

Info

Rows_sent

Rows_examined

Rows_read


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1

rsandbox

localhost:49943

NULL

Binlog Dump

73470

Master has sent all binlog to slave; waiting for binlog to be updated

NULL

0

0

0

2

rsandbox

localhost:49944

NULL

Binlog Dump

73468

Master has sent all binlog to slave; waiting for binlog to be updated

NULL

0

0

0

125

msandbox

localhost

NULL

Query

0

NULL

show processlist

0

0

0


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 rows in set (0.00 sec)

master [localhost] {msandbox} ((none)) > show slave hosts;
---------------------------------------+

Server_id

Host

Port

Master_id

---------------------------------------+

101

SBslave1

23189

1

102

SBslave2

23190

1

---------------------------------------+
2 rows in set (0.00 sec)

In the same case, you can successfully use --recursion-method=dsn=... in pt-table-checksum.

Please, implement --recursion-method=dsn also in pt-table-sync.

Environment

None

Activity

Show:

lpjirasync 
January 24, 2018 at 5:47 PM

**Comment from Launchpad by: Frank Cizmich on: 27-04-2015 15:20:31

Update.
After examining both pt-table-checksum and pt-table-table-sync code it seems a direct transfer of this feature is non trivial, being both code bases very different (implemented years appart).
Will continue examining the posibility though since the request is reasonable.

lpjirasync 
January 24, 2018 at 5:47 PM

**Comment from Launchpad by: Fernando Ipar on: 23-04-2015 12:53:09

Frank: That is not a workaround, at least not in all cases.

A typical use of recursion-method=dsns for me is when I want to exclude some replicas from being monitored by pt-table-checksum. Imagine intentionally delayed replicas (via pt-slave-delay): I don't want pt-table-checksum to wait for those to catch up before proceeding with the checksum process. Another case is when replicas are using non standard ports, or they don't all use the same credentials.

So if master M has replicas R1, R2 and R3, of which R3 is delayed, I'll usually create the percona.dsns table and only insert rows for R1 and R2.

That works well for pt-table-checksum, but since pt-table-sync does not support the option, if the delayed replica R3 is also on a non standard port and/or using different credentials, pt-table-sync will fail with an error like this:

[root@master ~]# pt-table-sync --print --replicate percona.checksums --databases dbname localhost
Cannot connect to h=some.example.ip.address

I think the only workaround would be to run --sync-to-master from some.example.ip.address directly, but in some cases, that can fail too (i.e. if the replication connection from replica to master is via an ssh tunnel, meaning the master is also on localhost but on a different port. It seems in that case pt-table-sync, when trying to connect to the master, connects to localhost:3306, which is the slave itself).

That means that, to me, there is no reliable workaround for this problem.

lpjirasync 
January 24, 2018 at 5:47 PM

**Comment from Launchpad by: Francisco Bordenave on: 09-04-2015 18:38:00

I think --recursion-method=dsn should be ported from pt-table-checksum I have a case (using tungsten replicator) where SHOW PROCESSLIST and SHOW SLAVE HOSTS doesn't work so using --replicate is not working.

Using PTDEBUG=1 throws:

  1. MasterSlave:6641 21976 Recursion methods: processlist hosts

  2. MasterSlave:6651 21976 Connected to h=masterxxx.net

  3. MasterSlave:6660 21976 SELECT @@SERVER_ID

  4. MasterSlave:6662 21976 Working on server ID 4000

  5. TableChecksum:4931 21976 SELECT db, tbl, CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM finance.cksums_finance_db_server WHERE master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)

  6. MasterSlave:6695 21976 Looking for slaves on h=masterxxx.net using methods processlist hosts

  7. MasterSlave:6702 21976 Finding slaves with _find_slaves_by_processlist

  8. MasterSlave:6764 21976 DBI::db=HASH(0x6f4a2b0) SHOW GRANTS FOR CURRENT_USER()

  9. MasterSlave:6794 21976 DBI::db=HASH(0x6f4a2b0) SHOW PROCESSLIST

  10. MasterSlave:6702 21976 Finding slaves with _find_slaves_by_hosts

  11. MasterSlave:6736 21976 DBI::db=HASH(0x6f4a2b0) SHOW SLAVE HOSTS

  12. MasterSlave:6707 21976 Found 0 slaves

  13. pt_table_sync:11009 21976 Disconnected dbh DBI::db=HASH(0x6f4a2b0)

  14. pt_table_sync:11009 21976 Disconnected dbh DBI::db=HASH(0x6f4fbb0)

However pt-table-checksum --replicate-check-only --replicate=percona.cksums reports differences.

lpjirasync 
January 24, 2018 at 5:47 PM

**Comment from Launchpad by: Frank Cizmich on: 25-09-2014 22:13:46

It is possible to use pt-table-checksum first, with --recursion-method=dsn ,
and then run pt-table-sync using the --replicate option to use the data gathered in the checksums table

see:
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html#cmdoption-pt-table-sync--replicate

lpjirasync 
January 24, 2018 at 5:47 PM

**Comment from Launchpad by: Matthew B on: 25-09-2014 19:57:58

Is there a workaround? This seems more like a bug.

Details

Assignee

Reporter

Priority

Fix versions

Created January 24, 2018 at 5:47 PM
Updated May 18, 2024 at 12:30 AM