applying incremental backup fails with SIGSEGV

General

Escalation

General

Escalation

Description

We take incremental backups every other day - one full, one incremental, one full, etc... When trying to restore an incremental on top of a full backup, sometimes (I don't know what triggers this - it doesn't happen all the time), we get this:

InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 2023-05-16 06:01:32 0x7f4addc14700  InnoDB: Assertion failure in thread 139959524738816 in file rem0rec.cc line 586
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
06:01:32 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.


Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
/usr/bin/xtrabackup24(my_print_stacktrace+0x3b)[0x564c55a6722b]
/usr/bin/xtrabackup24(handle_fatal_signal+0x285)[0x564c5571e735]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f4ae06db980]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f4ade47be87]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f4ade47d7f1]
/usr/bin/xtrabackup24(+0x56f237)[0x564c553da237]
/usr/bin/xtrabackup24(+0xa0c1dc)[0x564c558771dc]
/usr/bin/xtrabackup24(_Z25page_cur_parse_insert_recmPKhS0_P11buf_block_tP12dict_index_tP5mtr_t+0x33b)[0x564c558ac26b]
/usr/bin/xtrabackup24(+0xa62fd1)[0x564c558cdfd1]
/usr/bin/xtrabackup24(_Z22recv_recover_page_funcmP11buf_block_t+0xb1f)[0x564c558cfbcf]
/usr/bin/xtrabackup24(_Z20buf_page_io_completeP10buf_page_tb+0x333)[0x564c55a0a9a3]
/usr/bin/xtrabackup24(_Z12fil_aio_waitm+0x12f)[0x564c559977cf]
/usr/bin/xtrabackup24(io_handler_thread+0x28)[0x564c55808cf8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f4ae06d06db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4ade55e61f]

When we were taking full backups every night, I don't recall this ever occurring.

No DDL would have been taking place on the source, but it does have partitioned tables (static partitions - not partitions that are getting truncated or reorg'd). It also has tables with compressed columns, but this seems irrelevant, because the same issue has happened on another cluster that has neither of these features.

I'm going to try upgrading to 2.4.28, and I'll update the ticket after I've done so, but I didn't see anything in the release notes which would suggest that this will be fixed - the SIGSEGV bug fixed in 2.4.26 was related to temporary directories, which seems to not be happening here.

Environment

Percona Server 5.7.35, Ubuntu 18.04, XtraBackup 2.4.25

Attachments

16 May 2023, 06:25 PM

Activity

Aaditya Dubey May 31, 2023 at 2:02 PM

Hi @Ernie Souhrada,

Thank you for the updates. Closing the report.

Ernie Souhrada May 30, 2023 at 8:35 PM

So it would appear that this was a bug in 2.4.25 that got fixed somewhere along the way with 2.4.28.

I found the DB/table with the space_id you provided and then did a SELECT * with every available index. Nothing happened. No crash, no error message, nothing. So, I guess we're all good.

mysql> SELECT SPACE, NAME FROM information_schema.INNODB_SYS_TABLESPACES WHERE SPACE = 1858\G
*************************** 1. row ***************************
SPACE: 1858
 NAME: pbdata13029/board_has_pins

mysql> pager md5sum; select * from pbdata13029.board_has_pins force index (primary);
PAGER set to 'md5sum'
5bc6245c497635083cb655c7b2324d44  -
9634666 rows in set (17.58 sec)


mysql> select * from pbdata13029.board_has_pins force index (pin_id); select * from pbdata13029.board_has_pins force index (pin_type_idx); select * from pbdata13029.board_has_pins force index (cover_idx_new);
90ff3a4b80e6387ed853608aa3ff3968  -
9634749 rows in set (7.87 sec)


e0a90b25361394c689bc5d665a0d058e  -
9634757 rows in set (7.87 sec)


db574a8b4d7b4472b97353f1c6743fcf  -
9634763 rows in set (15.24 sec)

I mentioned that there was another replica set that saw a similar problem with applying incrementals - so I went and looked at that one, but that one is on 8.0.25 and the error message is completely different.

Since we don't really want to run multiple versions of 8.x until all of our 5.7s are gone, I just disabled incrementals for that cluster and we'll deal with it when we're done with the upgrade.

Thanks for looking into this. I think we can close it as "fixed in later version."

Marcelo Altmann May 29, 2023 at 3:45 PM

Hi @Ernie Souhrada

Was nice to meet you and discuss the action items of this issue in person last week.

Based on coredump, I can see we tried to load page 120875 from space 1858 and it could not identify the offset of fields in the record it read.

(gdb) f 11
#11 0x000055e7577aa2a3 in buf_page_io_complete (bpage=0x7f2f7c32e900, evict=evict@entry=false) at ./storage/innobase/buf/buf0buf.cc:5871
5871    ./storage/innobase/buf/buf0buf.cc: No such file or directory.
(gdb) p bpage->id
$10 = {m_space = 1858, m_page_no = 120875, m_fold = 1948376941}

Lets have a look to which table space id 1858 belongs and try to access all rows on this table. To translate space to table we can use below query:

SELECT SPACE, NAME FROM information_schema.INNODB_SYS_TABLESPACES WHERE SPACE = 1858\G

Then lets check the table and do:

SELECT * FROM table_name; – this will read the primary key
for each secondary index we have on table, lets run a query that forces the index to be loaded into memory by selecting the fields in the index (covering index).

Example:

mysql> SHOW CREATE TABLE table_name;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                                                                                                                                                                                                             |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_name | CREATE TABLE `table_name` (
  `ID` int(11) DEFAULT NULL,
  `name` varchar(50) DEFAULT NULL,
  `age` smallint(5) unsigned DEFAULT NULL,
  `address` varchar(200) DEFAULT NULL,
  KEY `name` (`name`),
  KEY `name_address` (`name`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)


# PK index (key from explain null = clustered index:
mysql> SELECT * FROM table_name;
-- results ---
mysql> EXPLAIN SELECT * FROM table_name;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | table_name | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0,00 sec)


# name (key for explain name)
mysql> SELECT name FROM table_name;
-- results --
mysql> EXPLAIN SELECT name FROM table_name;
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | table_name | NULL       | index | NULL          | name | 53      | NULL |    3 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

# name_address (key for explain name_address)
mysql> SELECT name, address FROM table_name;
-- results --
mysql> EXPLAIN SELECT name, address FROM table_name;
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | table_name | NULL       | index | NULL          | name_address | 256     | NULL |    3 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

Executing above steps should clarify if the original server table has corruption on either PK or secondary index. If corruption is present, we should see either warnings on error log or the server crashing when attempting to read the corrupted page.

Michael Coburn May 18, 2023 at 3:57 PM

core dump uploaded

Aaditya Dubey May 18, 2023 at 9:04 AM
Edited

Hi @Ernie Souhrada,

Sure, Will connect with @Michael Coburn, Thank you.

Done

Details
Assignee
Unassigned
Reporter
Ernie Souhrada
Needs QA
Yes
Affects versions
2.4.25
Priority
High

Smart Checklist

Created May 16, 2023 at 6:24 PM

Updated March 6, 2024 at 6:09 PM

Resolved May 31, 2023 at 2:05 PM

applying incremental backup fails with SIGSEGV

Description

Environment

Attachments

Activity

Aaditya Dubey May 31, 2023 at 2:02 PM

Ernie Souhrada May 30, 2023 at 8:35 PM

Marcelo Altmann May 29, 2023 at 3:45 PM

Michael Coburn May 18, 2023 at 3:57 PM

Aaditya Dubey May 18, 2023 at 9:04 AM
Edited

Details
Assignee
Unassigned
Reporter
Ernie Souhrada
Needs QA
Yes
Affects versions
2.4.25
Priority
High

Details

Assignee

Reporter

Needs QA

Affects versions

Priority

Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

applying incremental backup fails with SIGSEGV

Description

Environment

Attachments

Activity

Aaditya Dubey May 31, 2023 at 2:02 PM

Ernie Souhrada May 30, 2023 at 8:35 PM

Marcelo Altmann May 29, 2023 at 3:45 PM

Michael Coburn May 18, 2023 at 3:57 PM

Aaditya Dubey May 18, 2023 at 9:04 AMEdited

DetailsAssigneeUnassignedUnassignedReporterErnie SouhradaErnie SouhradaNeeds QAYesAffects versions2.4.25PriorityHigh

Details

Assignee

Reporter

Needs QA

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

Aaditya Dubey May 18, 2023 at 9:04 AM
Edited

Details
Assignee
Unassigned
Reporter
Ernie Souhrada
Needs QA
Yes
Affects versions
2.4.25
Priority
High

Smart Checklist