mysqld got signal 11 When replication master comes back after it was down.

General

Escalation

General

Escalation

Description

Hi,

We have just upgraded our infrastructure from percona-server 5.7 to 8.0.18-9. Our servers are running Ubuntu 18.04 Bionic.

The packages that we use from percona are:

percona-server-client 8.0.18-9-1.bionic amd64
percona-server-common 8.0.18-9-1.bionic amd64
percona-server-rocksdb 8.0.18-9-1.bionic amd64
percona-server-server 8.0.18-9-1.bionic amd64
percona-server-tokudb 8.0.18-9-1.bionic amd64

The setup is:

MasterServer (on top of the chain)

Slave
Slave
Slave
Slave
...

If we restart the master, the slaves acknowledge that the master is gone with the following message in the logs:

2020-02-19T13:38:31.520367Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <9f2573cb-4374-11ea-9cab-00163e89a476> : Error
2020-02-19T13:38:31.520992Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-02-19T13:38:31.521196Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@SLAVEIP:3306' - retry-time: 60 retries: 1 message: Can't connect to MySQL server on 'SLAVEIP' (111), Error_code: MY-002003

At this point the slaves are still running. However as soon as the master is available the slaves crash with this in the logs:

13:39:31 UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7f3f8400e000 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f4416b8f598 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x55a7f21ec8bd] /usr/sbin/mysqld(handle_fatal_signal+0x303) [0x55a7f143d233] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f4416642890] /usr/sbin/mysqld(mysql_compress_context_init(mysql_compress_context*, enum_compression_algorithm, unsigned int)+0x4) [0x55a7f21e4484] /usr/sbin/mysqld(+0x1196c85) [0x55a7f141fc85] /usr/sbin/mysqld(mysql_real_connect+0x9e) [0x55a7f14234ce] /usr/sbin/mysqld(+0x119be89) [0x55a7f1424e89] /usr/sbin/mysqld(+0x1c99827) [0x55a7f1f22827] /usr/sbin/mysqld(+0x1c9a437) [0x55a7f1f23437] /usr/sbin/mysqld(handle_slave_io+0xbc5) [0x55a7f1f2dc95] /usr/sbin/mysqld(+0x2442819) [0x55a7f26cb819] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f44166376db] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f441464188f]Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 4 Status: NOT_KILLEDPlease help us make Percona Server better by reporting any bugs at https://bugs.percona.com/You may download the Percona Server operations manual by visiting http://www.percona.com/software/percona-server/. You may find information in the manual which will help you identify the cause of the crash.

We have servers with ram from 20GB to 400GB, all of our servers have much more RAM than MySQL is using and all of them hit the bug when the master was absent and it comes back.

I attached an example of our my.cnf files, different servers have different innodb and tokudb cache settings but the rest of the settings are basically the same.

The issue is reproducable every time we restart the master.

Environment

None

Attachments

19 Feb 2020, 02:12 PM

Linked issues

duplicates

PS-6876

MySQL 8 Crash with MASTER_COMPRESSION_ALGORITHMS / slave_compressed_protocol

Smart Checklist

Activity

Show:

Lalit Choudhary March 4, 2020 at 10:30 AM

@Atanas Stoyanov

Here is the MySQL 8 Crash with MASTER_COMPRESSION_ALGORITHMS / slave_compressed_protocol.

As I can see from provided my.cnf file slave_compressed_protocol is enabled

we already identified and verified this issue in https://jira.percona.com/browse/PS-6876

Please followup on https://perconadev.atlassian.net/browse/PS-6876#icft=PS-6876

Closing this ticket as a duplicate.

Atanas Stoyanov February 20, 2020 at 11:52 AM

Hi,

Our topology is with two masters, behind which we have multiple slaves, which in turn are masters for others. I am however able to recreate the issue every time one a completely separated environment with two servers only, one master and one slave behind it.

The UUIDs of both servers are:
MASTER: 93636590-5338-11ea-8954-00163e7ab2bc
SLAVE: df136b5e-5339-11ea-b146-00163e533740
The server IDs are also diferent:
MASTER: 99999
SLAVE: 99998

Other than the standard DBs, which come with the initialization of MySQL I have created an empty DB with just one table just so I can verify that the replication is working.

The process to recreate this is:

Gracefully stopped the MySQL on the master (systemctl stop mysql) and the master logged this:

2020-02-20T11:21:57.288147Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.18-9)  Percona Server (GPL), Release '9', Revision '53e606f'.

The slave log had this:

2020-02-20T11:21:55.733774Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <df136b5e-5339-11ea-b146-00163e533740> : Error
2020-02-20T11:21:55.734398Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-02-20T11:21:55.734564Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@MASTERIP:3306' - retry-time: 60 retries: 1 message: Can't connect to MySQL server on 'MASTERIP' (111), Error_code: MY-002003

Next I started the master and the master logged this:

2020-02-20T11:22:16.717109Z 0 [Warning] [MY-011069] [Server] The syntax '--relay-log-info-file' is deprecated and will be removed in a future release.
2020-02-20T11:22:16.718856Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18-9) starting as process 17224
2020-02-20T11:22:18.189924Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2020-02-20T11:22:18.210549Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.18-9'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona Server (GPL), Release '9', Revision '53e606f'.
2020-02-20T11:22:18.337508Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/run/mysqld/mysqlx.sock' bind-address: '::' port: 33060

Some seconds later the slave detected the presence of the master and logged this:

11:22:55 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f8505c0e000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f89345f3598 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x5563174288bd]
/usr/sbin/mysqld(handle_fatal_signal+0x303) [0x556316679233]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f89340a6890]
/usr/sbin/mysqld(mysql_compress_context_init(mysql_compress_context*, enum_compression_algorithm, unsigned int)+0x4) [0x556317420484]
/usr/sbin/mysqld(+0x1196c85) [0x55631665bc85]
/usr/sbin/mysqld(mysql_real_connect+0x9e) [0x55631665f4ce]
/usr/sbin/mysqld(+0x119be89) [0x556316660e89]
/usr/sbin/mysqld(+0x1c99827) [0x55631715e827]
/usr/sbin/mysqld(+0x1c9a437) [0x55631715f437]
/usr/sbin/mysqld(handle_slave_io+0xbc5) [0x556317169c95]
/usr/sbin/mysqld(+0x2442819) [0x556317907819]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f893409b6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f89320a588f]Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 4
Status: NOT_KILLEDPlease help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
2020-02-20T11:22:56.437083Z 0 [Warning] [MY-011069] [Server] The syntax '--relay-log-info-file' is deprecated and will be removed in a future release.
2020-02-20T11:22:56.438831Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18-9) starting as process 34218
2020-02-20T11:22:57.849381Z 0 [System] [MY-010229] [Server] Starting crash recovery...
2020-02-20T11:22:57.861261Z 0 [System] [MY-010232] [Server] Crash recovery finished.
2020-02-20T11:22:57.937573Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2020-02-20T11:22:57.960761Z 4 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-02-20T11:22:57.962095Z 4 [System] [MY-010562] [Repl] Slave I/O thread for channel '': connected to master 'OURUSER@MASTERIP:3306',replication started in log 'mysql-bin.000004' at position 155
2020-02-20T11:22:57.963994Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.18-9'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona Server (GPL), Release '9', Revision '53e606f'.
2020-02-20T11:22:58.072744Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/run/mysqld/mysqlx.sock' bind-address: '::' port: 33060

Nothing was logged in the Master log file.

I have made a similar test but left the master down for ~10 minutes. Slave was running without problems during that time. It only logged these messages every 60 seconds:

2020-02-20T11:40:54.569406Z 4 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error reconnecting to master 'OURUSER@MASTERIP:3306' - retry-time: 60 retries: 2 message: Can't connect to MySQL server on 'MASTERIP' (111), Error_code: MY-002003

The moment the master appeared, and the slave detected that, the slave crashed.

This is the show slave status output you requested:

root@localhost [(none)] > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: MASTER.IP
                  Master_User: OURUSER
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 155
               Relay_Log_File: mysqld-relay-bin.000020
                Relay_Log_Pos: 369
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 155
              Relay_Log_Space: 745
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 99999
                  Master_UUID: 93636590-5338-11ea-8954-00163e7ab2bc
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set (0.00 sec)

Lalit Choudhary February 20, 2020 at 10:34 AM
Edited

Hi @Atanas Stoyanov

Thank you for the report.
2020-02-19T13:38:31.520367Z 4 [System] [MY-010558] [Repl] Slave for channel '': received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <9f2573cb-4374-11ea-9cab-00163e89a476> : Error

Can you check master and slave servers UUID in auto.cnf and server-id, Make sure they are different on mastre and slave.

Do you have one master and multiple slaves as like Chain replication topology?

Also, provide show slave status\G output from the problematic slave.

Since crash happens whenever master server restart, would like to see master error log if it has any details.

Can you provide a master log for the time of restart?

Duplicate

Details

Assignee

Unassigned

Reporter

Atanas Stoyanov

Labels

rocksdbtokudb

Affects versions

8.0.18-9

Priority

Medium

Smart Checklist

Created February 19, 2020 at 2:58 PM

Updated March 6, 2024 at 11:20 AM

Resolved March 4, 2020 at 10:32 AM

Configure

mysqld got signal 11 When replication master comes back after it was down.

Description

Environment

Attachments

Linked issues

duplicates

Smart Checklist

Activity

Lalit Choudhary March 4, 2020 at 10:30 AM

Atanas Stoyanov February 20, 2020 at 11:52 AM

Lalit Choudhary February 20, 2020 at 10:34 AMEdited

Details

Assignee

Reporter

Labels

Affects versions

Priority

Smart Checklist

Lalit Choudhary February 20, 2020 at 10:34 AM
Edited