mysqld got signal 11 When replication master comes back after it was down.

Description

Hi,

We have just upgraded our infrastructure from percona-server 5.7 to 8.0.18-9. Our servers are running Ubuntu 18.04 Bionic.

The packages that we use from percona are:

percona-server-client 8.0.18-9-1.bionic amd64 percona-server-common 8.0.18-9-1.bionic amd64 percona-server-rocksdb 8.0.18-9-1.bionic amd64 percona-server-server 8.0.18-9-1.bionic amd64 percona-server-tokudb 8.0.18-9-1.bionic amd64

The setup is:

MasterServer (on top of the chain)

  • Slave

  • Slave

  • Slave

  • Slave

  • ...

If we restart the master, the slaves acknowledge that the master is gone with the following message in the logs:

2020-02-19T13:38:31.520367Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <9f2573cb-4374-11ea-9cab-00163e89a476> : Error 2020-02-19T13:38:31.520992Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2020-02-19T13:38:31.521196Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@SLAVEIP:3306' - retry-time: 60 retries: 1 message: Can't connect to MySQL server on 'SLAVEIP' (111), Error_code: MY-002003

At this point the slaves are still running. However as soon as the master is available the slaves crash with this in the logs:

13:39:31 UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7f3f8400e000 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f4416b8f598 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x55a7f21ec8bd] /usr/sbin/mysqld(handle_fatal_signal+0x303) [0x55a7f143d233] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f4416642890] /usr/sbin/mysqld(mysql_compress_context_init(mysql_compress_context*, enum_compression_algorithm, unsigned int)+0x4) [0x55a7f21e4484] /usr/sbin/mysqld(+0x1196c85) [0x55a7f141fc85] /usr/sbin/mysqld(mysql_real_connect+0x9e) [0x55a7f14234ce] /usr/sbin/mysqld(+0x119be89) [0x55a7f1424e89] /usr/sbin/mysqld(+0x1c99827) [0x55a7f1f22827] /usr/sbin/mysqld(+0x1c9a437) [0x55a7f1f23437] /usr/sbin/mysqld(handle_slave_io+0xbc5) [0x55a7f1f2dc95] /usr/sbin/mysqld(+0x2442819) [0x55a7f26cb819] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f44166376db] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f441464188f]Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 4 Status: NOT_KILLEDPlease help us make Percona Server better by reporting any bugs at https://bugs.percona.com/You may download the Percona Server operations manual by visiting http://www.percona.com/software/percona-server/. You may find information in the manual which will help you identify the cause of the crash.

 

We have servers with ram from 20GB to 400GB, all of our servers have much more RAM than MySQL is using and all of them hit the bug when the master was absent and it comes back.

I attached an example of our my.cnf files, different servers have different innodb and tokudb cache settings but the rest of the settings are basically the same.

 

The issue is reproducable every time we restart the master.

Environment

None

Attachments

1
  • 19 Feb 2020, 02:12 PM

Smart Checklist

Activity

Show:

Lalit Choudhary March 4, 2020 at 10:30 AM

Here is the MySQL 8 Crash with MASTER_COMPRESSION_ALGORITHMS / slave_compressed_protocol.

 As I can see from provided my.cnf file slave_compressed_protocol is enabled

we already identified and verified this issue in https://jira.percona.com/browse/PS-6876

Please followup on https://perconadev.atlassian.net/browse/PS-6876#icft=PS-6876

Closing this ticket as a duplicate.

Atanas Stoyanov February 20, 2020 at 11:52 AM

Hi,

Our topology is with two masters, behind which we have multiple slaves, which in turn are masters for others. I am however able to recreate the issue every time one a completely separated environment with two servers only, one master and one slave behind it.

The UUIDs of both servers are:
MASTER: 93636590-5338-11ea-8954-00163e7ab2bc
SLAVE: df136b5e-5339-11ea-b146-00163e533740
The server IDs are also diferent:
MASTER: 99999
SLAVE: 99998

Other than the standard DBs, which come with the initialization of MySQL I have created an empty DB with just one table just so I can verify that the replication is working.

The process to recreate this is:

  1. Gracefully stopped the MySQL on the master (systemctl stop mysql) and the master logged this:

2020-02-20T11:21:57.288147Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.18-9) Percona Server (GPL), Release '9', Revision '53e606f'.
  1. The slave log had this:

2020-02-20T11:21:55.733774Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <df136b5e-5339-11ea-b146-00163e533740> : Error 2020-02-20T11:21:55.734398Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2020-02-20T11:21:55.734564Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@MASTERIP:3306' - retry-time: 60 retries: 1 message: Can't connect to MySQL server on 'MASTERIP' (111), Error_code: MY-002003
  1. Next I started the master and the master logged this:

2020-02-20T11:22:16.717109Z 0 [Warning] [MY-011069] [Server] The syntax '--relay-log-info-file' is deprecated and will be removed in a future release. 2020-02-20T11:22:16.718856Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18-9) starting as process 17224 2020-02-20T11:22:18.189924Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2020-02-20T11:22:18.210549Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.18-9' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release '9', Revision '53e606f'. 2020-02-20T11:22:18.337508Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/run/mysqld/mysqlx.sock' bind-address: '::' port: 33060
  1. Some seconds later the slave detected the presence of the master and logged this:

11:22:55 UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7f8505c0e000 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f89345f3598 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x5563174288bd] /usr/sbin/mysqld(handle_fatal_signal+0x303) [0x556316679233] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f89340a6890] /usr/sbin/mysqld(mysql_compress_context_init(mysql_compress_context*, enum_compression_algorithm, unsigned int)+0x4) [0x556317420484] /usr/sbin/mysqld(+0x1196c85) [0x55631665bc85] /usr/sbin/mysqld(mysql_real_connect+0x9e) [0x55631665f4ce] /usr/sbin/mysqld(+0x119be89) [0x556316660e89] /usr/sbin/mysqld(+0x1c99827) [0x55631715e827] /usr/sbin/mysqld(+0x1c9a437) [0x55631715f437] /usr/sbin/mysqld(handle_slave_io+0xbc5) [0x556317169c95] /usr/sbin/mysqld(+0x2442819) [0x556317907819] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f893409b6db] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f89320a588f]Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 4 Status: NOT_KILLEDPlease help us make Percona Server better by reporting any bugs at https://bugs.percona.com/You may download the Percona Server operations manual by visiting http://www.percona.com/software/percona-server/. You may find information in the manual which will help you identify the cause of the crash. 2020-02-20T11:22:56.437083Z 0 [Warning] [MY-011069] [Server] The syntax '--relay-log-info-file' is deprecated and will be removed in a future release. 2020-02-20T11:22:56.438831Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18-9) starting as process 34218 2020-02-20T11:22:57.849381Z 0 [System] [MY-010229] [Server] Starting crash recovery... 2020-02-20T11:22:57.861261Z 0 [System] [MY-010232] [Server] Crash recovery finished. 2020-02-20T11:22:57.937573Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2020-02-20T11:22:57.960761Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2020-02-20T11:22:57.962095Z 4 [System] [MY-010562] [Repl] Slave I/O thread for channel '': connected to master 'OURUSER@MASTERIP:3306',replication started in log 'mysql-bin.000004' at position 155 2020-02-20T11:22:57.963994Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.18-9' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release '9', Revision '53e606f'. 2020-02-20T11:22:58.072744Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/run/mysqld/mysqlx.sock' bind-address: '::' port: 33060
  1. Nothing was logged in the Master log file.

I have made a similar test but left the master down for ~10 minutes. Slave was running without problems during that time. It only logged these messages every 60 seconds:

2020-02-20T11:40:54.569406Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@MASTERIP:3306' - retry-time: 60 retries: 2 message: Can't connect to MySQL server on 'MASTERIP' (111), Error_code: MY-002003

The moment the master appeared, and the slave detected that, the slave crashed.

This is the show slave status output you requested:

root@localhost [(none)] > show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: MASTER.IP Master_User: OURUSER Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 155 Relay_Log_File: mysqld-relay-bin.000020 Relay_Log_Pos: 369 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 155 Relay_Log_Space: 745 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 99999 Master_UUID: 93636590-5338-11ea-8954-00163e7ab2bc Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set (0.00 sec)

Lalit Choudhary February 20, 2020 at 10:34 AM
Edited

Hi

Thank you for the report.
2020-02-19T13:38:31.520367Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <9f2573cb-4374-11ea-9cab-00163e89a476> : Error

Can you check master and slave servers UUID in auto.cnf and server-id, Make sure they are different on mastre and slave.

Do you have one master and multiple slaves as like Chain replication topology?

Also, provide show slave status\G  output from the problematic slave.

Since crash happens whenever master server restart, would like to see master error log if it has any details.

Can you provide a master log for the time of restart?

 

Duplicate

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created February 19, 2020 at 2:58 PM
Updated March 6, 2024 at 11:20 AM
Resolved March 4, 2020 at 10:32 AM