Race condition in redo-log-consumer causes redo log read failure

Description

While debugging another issues on another feature, today I observed a race condition with --register-redo-log-consumer.

after we advance an LSN, some times we go back and ask for previous LSN

That happens because of block boundaries, ut_align_down() by 512 bytes. If this happens on the last block on the file, xtrabackup signals it has consumed the LSN and server will quickly delete the file

 

Then xtrabackup, later asks fro the LSN that has been consumed and by the time we check and read the file, server would have deleted the redo log file.

 

 

 

(rr) p log_scanned_lsn
$10 = 212977468

(rr) p start_lsn
$11 = 212977152

 

 

(rr) n
(rr) p err
$16 = DB_NOT_FOUND
(rr)

 

server has renamed the file (to _tmp for later reuse)

 

Consumer advanced to this LSN 212977468 but we are trying to open redo file at 212977152

rr) bt
#0 Redo_Log_Consumer::advance (this=0x7fff6c99ee80, connection=0x556c972eeb40, lsn=212977468)
at /home/satya/WORK/pxb-aibek/storage/innobase/xtrabackup/src/redo_log_consumer.cc:42
#1 0x0000556c8fad7e1c in Redo_Log_Data_Manager::copy_func (this=0x7fff6c99ed50) at /home/satya/WORK/pxb-aibek/storage/innobase/xtrabackup/src/redo_log.cc:1216
#2 0x0000556c8fad6c97 in operator() (__closure=0x7fccf446a980) at /home/satya/WORK/pxb-aibek/storage/innobase/xtrabackup/src/redo_log.cc:966
#3 0x0000556c8fada027 in std::__invoke_impl<void, Redo_Log_Data_Manager::init()::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...)
at /usr/include/c++/11/bits/invoke.h:61
#4 0x0000556c8fad9fd3 in std::__invoke<Redo_Log_Data_Manager::init()::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#5 0x0000556c8fad9f58 in std::_Bind<Redo_Log_Data_Manager::init()::<lambda()>()>::__call<void>(std::tuple<> &&, std::_Index_tuple<>) (this=0x7fccf446a980,
__args=...) at /usr/include/c++/11/functional:420
#6 0x0000556c8fad9e4b in std::_Bind<Redo_Log_Data_Manager::init()::<lambda()>()>::operator()<>(void) (this=0x7fccf446a980) at /usr/include/c++/11/functional:503
#7 0x0000556c8fad9c46 in Detached_thread::operator()<Redo_Log_Data_Manager::init()::<lambda()> >(struct {...} &&) (this=0x556c96fa7120, f=...)
at /home/satya/WORK/pxb-aibek/storage/innobase/include/os0thread-create.h:194
#8 0x0000556c8fad9ac3 in std::__invoke_impl<void, Detached_thread, Redo_Log_Data_Manager::init()::<lambda()> >(std::__invoke_other, Detached_thread &&) (__f=...)
at /usr/include/c++/11/bits/invoke.h:61
#9 0x0000556c8fad99b3 in std::__invoke<Detached_thread, Redo_Log_Data_Manager::init()::<lambda()> >(Detached_thread &&) (__fn=...)
at /usr/include/c++/11/bits/invoke.h:96
#10 0x0000556c8fad98d9 in std::thread::_Invoker<std::tuple<Detached_thread, Redo_Log_Data_Manager::init()::<lambda()> > >::_M_invoke<0, 1>(std::_Index_tuple<0, 1>) (
this=0x556c96fa7118) at /usr/include/c++/11/bits/std_thread.h:259
#11 0x0000556c8fad9876 in std::thread::_Invoker<std::tuple<Detached_thread, Redo_Log_Data_Manager::init()::<lambda()> > >::operator()(void) (this=0x556c96fa7118)
at /usr/include/c++/11/bits/std_thread.h:266
#12 0x0000556c8fad9836 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, Redo_Log_Data_Manager::init()::<lambda()> > > >::_M_run(void) (
this=0x556c96fa7110) at /usr/include/c++/11/bits/std_thread.h:211
#13 0x00007fccf74dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007fccf7094ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007fccf7125a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Environment

None

Attachments

1

Activity

Show:

Aaditya Dubey April 30, 2024 at 3:21 PM

Hi

Thank you for the report.

Satya Bodapati April 25, 2024 at 8:27 PM

It is a very rare scenario, but it can happen nevertheless.

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created April 25, 2024 at 8:26 PM
Updated March 2, 2025 at 8:00 PM