applying incremental backup fails with SIGSEGV
Description
Environment
Percona Server 5.7.35, Ubuntu 18.04, XtraBackup 2.4.25
Attachments
- 16 May 2023, 06:25 PM
Activity
Aaditya Dubey May 31, 2023 at 2:02 PM
Hi @Ernie Souhrada,
Thank you for the updates. Closing the report.
Ernie Souhrada May 30, 2023 at 8:35 PM
So it would appear that this was a bug in 2.4.25 that got fixed somewhere along the way with 2.4.28.
I found the DB/table with the space_id you provided and then did a SELECT * with every available index. Nothing happened. No crash, no error message, nothing. So, I guess we're all good.
mysql> SELECT SPACE, NAME FROM information_schema.INNODB_SYS_TABLESPACES WHERE SPACE = 1858\G
*************************** 1. row ***************************
SPACE: 1858
NAME: pbdata13029/board_has_pins
mysql> pager md5sum; select * from pbdata13029.board_has_pins force index (primary);
PAGER set to 'md5sum'
5bc6245c497635083cb655c7b2324d44 -
9634666 rows in set (17.58 sec)
mysql> select * from pbdata13029.board_has_pins force index (pin_id); select * from pbdata13029.board_has_pins force index (pin_type_idx); select * from pbdata13029.board_has_pins force index (cover_idx_new);
90ff3a4b80e6387ed853608aa3ff3968 -
9634749 rows in set (7.87 sec)
e0a90b25361394c689bc5d665a0d058e -
9634757 rows in set (7.87 sec)
db574a8b4d7b4472b97353f1c6743fcf -
9634763 rows in set (15.24 sec)
I mentioned that there was another replica set that saw a similar problem with applying incrementals - so I went and looked at that one, but that one is on 8.0.25 and the error message is completely different.
Since we don't really want to run multiple versions of 8.x until all of our 5.7s are gone, I just disabled incrementals for that cluster and we'll deal with it when we're done with the upgrade.
Thanks for looking into this. I think we can close it as "fixed in later version."
Marcelo Altmann May 29, 2023 at 3:45 PM
Hi @Ernie Souhrada
Was nice to meet you and discuss the action items of this issue in person last week.
Based on coredump, I can see we tried to load page 120875 from space 1858 and it could not identify the offset of fields in the record it read.
(gdb) f 11
#11 0x000055e7577aa2a3 in buf_page_io_complete (bpage=0x7f2f7c32e900, evict=evict@entry=false) at ./storage/innobase/buf/buf0buf.cc:5871
5871 ./storage/innobase/buf/buf0buf.cc: No such file or directory.
(gdb) p bpage->id
$10 = {m_space = 1858, m_page_no = 120875, m_fold = 1948376941}
Lets have a look to which table space id 1858 belongs and try to access all rows on this table. To translate space to table we can use below query:
SELECT SPACE, NAME FROM information_schema.INNODB_SYS_TABLESPACES WHERE SPACE = 1858\G
Then lets check the table and do:
SELECT * FROM table_name;
– this will read the primary keyfor each secondary index we have on table, lets run a query that forces the index to be loaded into memory by selecting the fields in the index (covering index).
Example:
mysql> SHOW CREATE TABLE table_name;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_name | CREATE TABLE `table_name` (
`ID` int(11) DEFAULT NULL,
`name` varchar(50) DEFAULT NULL,
`age` smallint(5) unsigned DEFAULT NULL,
`address` varchar(200) DEFAULT NULL,
KEY `name` (`name`),
KEY `name_address` (`name`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)
# PK index (key from explain null = clustered index:
mysql> SELECT * FROM table_name;
-- results ---
mysql> EXPLAIN SELECT * FROM table_name;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | SIMPLE | table_name | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0,00 sec)
# name (key for explain name)
mysql> SELECT name FROM table_name;
-- results --
mysql> EXPLAIN SELECT name FROM table_name;
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | table_name | NULL | index | NULL | name | 53 | NULL | 3 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)
# name_address (key for explain name_address)
mysql> SELECT name, address FROM table_name;
-- results --
mysql> EXPLAIN SELECT name, address FROM table_name;
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | table_name | NULL | index | NULL | name_address | 256 | NULL | 3 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)
Executing above steps should clarify if the original server table has corruption on either PK or secondary index. If corruption is present, we should see either warnings on error log or the server crashing when attempting to read the corrupted page.
Michael Coburn May 18, 2023 at 3:57 PM
core dump uploaded
Aaditya Dubey May 18, 2023 at 9:04 AMEdited
Hi @Ernie Souhrada,
Sure, Will connect with @Michael Coburn, Thank you.
Details
Assignee
UnassignedUnassignedReporter
Ernie SouhradaErnie SouhradaNeeds QA
YesAffects versions
Priority
High
Details
Details
Assignee
Reporter
Needs QA
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

We take incremental backups every other day - one full, one incremental, one full, etc... When trying to restore an incremental on top of a full backup, sometimes (I don't know what triggers this - it doesn't happen all the time), we get this:
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 2023-05-16 06:01:32 0x7f4addc14700 InnoDB: Assertion failure in thread 139959524738816 in file rem0rec.cc line 586 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 06:01:32 UTC - xtrabackup got signal 6 ; This could be because you hit a bug or data is corrupted. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x10000 /usr/bin/xtrabackup24(my_print_stacktrace+0x3b)[0x564c55a6722b] /usr/bin/xtrabackup24(handle_fatal_signal+0x285)[0x564c5571e735] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f4ae06db980] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f4ade47be87] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f4ade47d7f1] /usr/bin/xtrabackup24(+0x56f237)[0x564c553da237] /usr/bin/xtrabackup24(+0xa0c1dc)[0x564c558771dc] /usr/bin/xtrabackup24(_Z25page_cur_parse_insert_recmPKhS0_P11buf_block_tP12dict_index_tP5mtr_t+0x33b)[0x564c558ac26b] /usr/bin/xtrabackup24(+0xa62fd1)[0x564c558cdfd1] /usr/bin/xtrabackup24(_Z22recv_recover_page_funcmP11buf_block_t+0xb1f)[0x564c558cfbcf] /usr/bin/xtrabackup24(_Z20buf_page_io_completeP10buf_page_tb+0x333)[0x564c55a0a9a3] /usr/bin/xtrabackup24(_Z12fil_aio_waitm+0x12f)[0x564c559977cf] /usr/bin/xtrabackup24(io_handler_thread+0x28)[0x564c55808cf8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f4ae06d06db] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4ade55e61f]
When we were taking full backups every night, I don't recall this ever occurring.
No DDL would have been taking place on the source, but it does have partitioned tables (static partitions - not partitions that are getting truncated or reorg'd). It also has tables with compressed columns, but this seems irrelevant, because the same issue has happened on another cluster that has neither of these features.
I'm going to try upgrading to 2.4.28, and I'll update the ticket after I've done so, but I didn't see anything in the release notes which would suggest that this will be fixed - the SIGSEGV bug fixed in 2.4.26 was related to temporary directories, which seems to not be happening here.