pt-heartbeat doesn't reconnect for check-read-only

Description

Steps to reproduce:

1. Start pt-heartbeat like:

pt-heartbeat --update --database=percona --create-table --host=127.0.0.1 --port=5725 --user=msandbox --password=msandbox --replace --check-read-only

2. Kill connection of pt-heartbeat:

mysql [localhost:5725] {msandbox} ((none)) > show processlist; +----+----------+-----------------+---------+---------+------+----------+------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +----+----------+-----------------+---------+---------+------+----------+------------------+-----------+---------------+ | 16 | msandbox | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 | | 17 | msandbox | localhost:42608 | percona | Sleep | 0 | | NULL | 0 | 0 | +----+----------+-----------------+---------+---------+------+----------+------------------+-----------+---------------+ 2 rows in set (0.00 sec) mysql [localhost:5725] {msandbox} ((none)) > kill connection 17; Query OK, 0 rows affected (0.00 sec)

pt-heartbeat will enter endless loop of:

... Lost connection to MySQL server during query [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"] MySQL server has gone away [for Statement "SELECT @@global.read_only"]

Environment

None

AFFECTED CS IDs

275153

Smart Checklist

Activity

Show:

Iwo Panowicz June 7, 2020 at 1:19 PM
Edited

There's also another use-case in which pt-heartbeat should be reconnecting.

Steps to reproduce:

  • non-root user

  • MySQL started with read_only=1

  • pt-heartbeat started as

    pt-heartbeat --update --database=percona --host=127.0.0.1 --port=5534 --user=msandbox --password=msandbox --replace --check-read-only --no-version-check

After killing the connection pt-heatbeat will exit with exit code = 2

# pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds # pt_heartbeat:5969 19725 Sleeping for 1.0 seconds DBD::mysql::db selectrow_array failed: Lost connection to MySQL server during query [for Statement "SELECT @@global.read_only"] at ./pt line 6469.

Carlos Salguero May 19, 2020 at 12:49 PM

I cannot reproduce:
Killed the connection 3 times, pt-heartbeat is always reconnecting.

mysql> show processlist; +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | 8 | msandbox | localhost:44060 | NULL | Binlog Dump | 84706 | Master has sent all binlog to slave; waiting for more updates | NULL | | 8389 | msandbox | localhost:38702 | NULL | Sleep | 4 | | NULL | | 8393 | msandbox | localhost:38714 | NULL | Sleep | 4 | | NULL | | 8394 | msandbox | localhost:38718 | NULL | Sleep | 4 | | NULL | | 8396 | msandbox | localhost:38726 | sakila | Sleep | 0 | | NULL | | 8400 | msandbox | localhost:38754 | NULL | Query | 0 | starting | show processlist | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ 6 rows in set (0.00 sec)mysql> kill 8396 -> ; Query OK, 0 rows affected (0.00 sec)mysql> show processlist; +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | 8 | msandbox | localhost:44060 | NULL | Binlog Dump | 84800 | Master has sent all binlog to slave; waiting for more updates | NULL | | 8400 | msandbox | localhost:38754 | NULL | Query | 0 | starting | show processlist | | 8408 | msandbox | localhost:38796 | sakila | Sleep | 0 | | NULL | | 8412 | msandbox | localhost:38830 | NULL | Sleep | 1 | | NULL | | 8414 | msandbox | localhost:38840 | NULL | Sleep | 1 | | NULL | | 8415 | msandbox | localhost:38842 | NULL | Sleep | 1 | | NULL | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ 6 rows in set (0.00 sec)mysql> kill 8408; Query OK, 0 rows affected (0.00 sec)mysql> show processlist; +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ | 8 | msandbox | localhost:44060 | NULL | Binlog Dump | 84813 | Master has sent all binlog to slave; waiting for more updates | NULL | | 8400 | msandbox | localhost:38754 | NULL | Query | 0 | starting | show processlist | | 8412 | msandbox | localhost:38830 | NULL | Sleep | 1 | | NULL | | 8414 | msandbox | localhost:38840 | NULL | Sleep | 1 | | NULL | | 8415 | msandbox | localhost:38842 | NULL | Sleep | 1 | | NULL | | 8418 | msandbox | localhost:38870 | sakila | Sleep | 0 | | NULL | +------+----------+-----------------+--------+-------------+-------+---------------------------------------------------------------+------------------+ 6 rows in set (0.00 sec)mysql> kill 8418; Query OK, 0 rows affected (0.00 sec)

Sveta Smirnova May 5, 2020 at 10:39 AM

Workaround:

crontab:

* * * * * /etc/pt_heartbeat_restart.sh

script:

#!/bin/bash # Restart pt-heartbeat if journalctl report error of: # MySQL server has gone away [for Statement "SELECT @@global.read_only"] journalctl -upt-heartbeat --since '1 min ago' | grep -q "MySQL server has gone away" if [ $? -eq 0 ]; then systemctl restart pt-heartbeat.service fi
Done

Details

Assignee

Reporter

Priority

Affects versions

Fix versions

Needs Review

Yes

Needs QA

Yes

Story Points

Time tracking

7h logged

Sprint

Smart Checklist

Created April 21, 2020 at 10:36 AM
Updated February 29, 2024 at 8:59 PM
Resolved May 20, 2020 at 2:52 PM