Correct incremental prepare failure with logical redo by skipping the apply of logical redos (MLOG_TABLE_DYNAMIC_META) during the incremental prepare (except the last prepare).

General

Escalation

General

Escalation

Description

pxb-1824.sh fails randomly and exposed a bug:

incremental prepare fails with assertion:

InnoDB: Assertion failure: page0page.cc:110:r >= page + ((FIL_PAGE_DATA + 36 + 2 * 10) + 2 * 5 + 8)
InnoDB: thread 140217405404928InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.percona.com/projects/PXB.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
10:37:54 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x46000
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(my_print_stacktrace(unsigned char const*, unsigned long)+0x55) [0x55b606870c85]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(handle_fatal_signal+0x2a8) [0x55b6050e344f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13f40) [0x7f86e9a82f40]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7f86e8ec0ed7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121) [0x7f86e8ea2535]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(+0x3d1e1f6) [0x55b6057b81f6]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(page_dir_find_owner_slot(unsigned char const*)+0x151) [0x55b605608ff8]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(page_cur_delete_rec(page_cur_t*, dict_index_t const*, unsigned long const*, mtr_t*)+0x320) [0x55b605604298]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(page_cur_parse_delete_rec(unsigned char*, unsigned char*, buf_block_t*, dict_index_t*, mtr_t*)+0x251) [0x55b605603f40]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(+0x3b17ad3) [0x55b6055b1ad3]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(recv_recover_page_func(bool, buf_block_t*)+0x893) [0x55b6055b2da5]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(buf_page_io_complete(buf_page_t*, bool)+0x6d3) [0x55b6052997bc]
/home/satya/WORK/pxb-8.0-orig/bld/storage/innobase/xtrabackup/test/../../../../runtime_output_directory/xtrabackup(fil_aio_wait(unsigned long)+0x202) [0x55b6053c2b20]

Environment

None

Smart Checklist

Activity

Show:

Marcelo Altmann February 12, 2021 at 11:33 AM

Problem:
PXB process MLOG_TABLE_DYNAMIC_META log events as part of prepare. This
will make table->dirty_status = METADATA_DIRTY. As part of the shutdown
procedure, we will persist those changes to mysql.ibd. MySQL server that
originated the backup will continue processing data and merging them
into memory DDBuffer. Lather when a checkpoint happens on MySQL, it
persist those changes to mysql.ibd and generate proper redo logs for
those changes. However, since we have already persisted the changes we
had until that point the layout of our mysql.ibd might not match the
layout of the redo log we try to apply and a crash can happen.

Solution:
Skip applying logical redos (MLOG_TABLE_DYNAMIC_META) during the
incremental prepare (except the last prepare).

Use metadata_from_lsn for copying delta pages of mysql.ibd since we skip
applying logical records during prepare. These logical redos are
converted to regular redo and flushed to pages in mysql.ibd when the
server process a checkpoint. So we directly take the physical changes
made to innodb_dynamic_metadata since the last backup. Hence we copy
all changes to mysql.ibd since last backup start_lsn instead of last
backup end_lsn.

Satya Bodapati January 5, 2021 at 7:03 AM

with the solution proposed, we take the physical changes (generated by MLOG_TABLE_DYNAMIC_META records). So we can safely ignore the logical redo records during pxb incremental prepare

Satya Bodapati January 5, 2021 at 6:57 AM

Right, just to add what marce said, the core problem is the nature of MLOG_TABLE_DYNAMIC_META records. These are LOGICAL REDO records. ie. applying them will cause newer regular redo + changes to actual pages. So these changes are applied post recovery.

Another problem of these type of records is merging/grouping of operations in memory and then the conversion to on-disk.

Marcelo Altmann December 30, 2020 at 3:20 PM
Edited

The problem here is that we are processing MLOG_TABLE_DYNAMIC_META during prepare of incrementals. This is marking metadata of tables as dirty and later when we finish the incremental backup, as part of the shutdown, those records get persistent to disk / mysql.ibd. The problem happens later when the server process a checkpoint during the next incremental backup, it will generate redo logs to be applied directly to mysql.ibd and since we have written to mysql.ibd as part of the previous incremental backup shutdown, the layout of the records/pages might not match.

and I have discussed a few ideas:

Have a separate redo log to store buffered changes to DD. PXB will process this file as part of incremental backup - Not chosen because this might be a bit too complicated and overengineering
Always copy mysql.ibd as part of incremental backups (not only delta) - Not chosen because mysql.ibd store a lot of other information, like gtid_executed / list of tables ... on server with a lot of tables or too many gtid gaps, the size of the incremental can be big.
Always copy delta changes from mysql.ibd only based on LSN from previous backup - We will give this option a try since this has most of the implementation already done. We need to adjust incremental filter to thread mysql.ibd separatelly.

The main idea is:

Skip MLOG_TABLE_DYNAMIC_META events if --apply-log-only is set (this is incremental, but its not the last incremental)

Apply them only when this is either a full backup alone or the last incremental backup.

Satya Bodapati December 30, 2020 at 7:44 AM

Looks like we are observing something..

1) Can you please confirm if PXB is writing to DDTabeBuffer as part of shutdown or did you mean only 'Server'? If PXB is doing the writes to DDTableBuffer, then we really have something.. (Also Check if PXB has log_checkpointer thread and that also starts writing to DDTableBuffer?) ?

2) Server can write to pages as part of shutdown. PXB will get this changes either via redo log or via 'delta pages'. Can you also please dump pages from inc delta files? See if we have copied a mysql.ibd page 5 as part of delta

Done

Details

Assignee

Marcelo Altmann(Deactivated)

Reporter

Satya Bodapati

Time tracking

2w 1d 4h 40m logged

Fix versions

8.0.23-16 (Q1 2021)

Affects versions

8.0.x

Priority

Medium

Smart Checklist

Created May 18, 2020 at 1:02 PM

Updated March 6, 2024 at 6:59 PM

Resolved February 12, 2021 at 11:33 AM

Configure

Correct incremental prepare failure with logical redo by skipping the apply of logical redos (MLOG_TABLE_DYNAMIC_META) during the incremental prepare (except the last prepare).

Description

Environment

Smart Checklist

Activity

Marcelo Altmann February 12, 2021 at 11:33 AM

Satya Bodapati January 5, 2021 at 7:03 AM

Satya Bodapati January 5, 2021 at 6:57 AM

Marcelo Altmann December 30, 2020 at 3:20 PMEdited

Satya Bodapati December 30, 2020 at 7:44 AM

Details

Assignee

Reporter

Time tracking

Fix versions

Affects versions

Priority

Smart Checklist

Marcelo Altmann December 30, 2020 at 3:20 PM
Edited