LP #1714684: Falling join node after a successful SST
Description
Environment
Smart Checklist
Activity

Peter Schwaller November 18, 2018 at 2:49 PM
Closing since we were never able to reproduce this and due to the age of the issue. If this problem still exists, please resubmit a new issue.

lpjirasync January 14, 2018 at 7:14 PM
**Comment from Launchpad by: Krunal Bauskar on: 05-09-2017 03:14:03
From the log and output it suggest that SST is still active.
Did you run out of disk space on JOINER ?

lpjirasync January 14, 2018 at 7:14 PM
**Comment from Launchpad by: Stanislav Gritsenko on: 02-09-2017 17:25:09
have added archive with
output of dmesg
output of pt-summary
output of pt-mysql-summary
full copy of /var/log/messages (or syslog)
full copy of /etc/my.cnf and any other cnf file that might be included
full copy of error logs
full copy of /var/lib/mysql/innobackup.*.log
output of rpm -qa |egrep -i "mysql|percona|maria" (or dpkg -l |egrep -i "mysql|percona|maria" |grep "ii")
Details
Details
Assignee

Reporter

Labels
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

**Reported in Launchpad by Stanislav Gritsenko last update 05-09-2017 03:14:04
I observe a strange situation when I try to add a second node to the percona galera cluster.
After a long file upload using xtrabackup-v2 (3.5TB of data and about 1,000,000 files) and successfully passed all the stages, judging by the log, the second node falls with an absolutely uninformative error:
2017-09-02 01:31:14 17707 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-09-02 01:31:14 17707 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-02 01:31:14 17707 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-02 01:31:14 17707 [Note] InnoDB: Memory barrier is not used
2017-09-02 01:31:14 17707 [Note] InnoDB: Compressed tables use zlib 1.2.7
2017-09-02 01:31:14 17707 [Note] InnoDB: Using Linux native AIO
2017-09-02 01:31:14 17707 [Note] InnoDB: Using CPU crc32 instructions
2017-09-02 01:31:14 17707 [Warning] InnoDB: innodb_file_io_threads is deprecated. Please use innodb_read_io_threads and innodb_write_io_threads instead
2017-09-02 01:31:14 17707 [Note] InnoDB: Initializing buffer pool, size = 10.0G
2017-09-02 01:31:15 17707 [Note] InnoDB: Completed initialization of buffer pool
2017-09-02 01:31:16 17707 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-02 01:32:27 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000047 of size 134217728 bytes
2017-09-02 02:44:02 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000048 of size 134217728 bytes
2017-09-02 03:03:40 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000049 of size 134217728 bytes
2017-09-02 03:21:14 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000050 of size 134217728 bytes
2017-09-02 03:40:19 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000051 of size 134217728 bytes
2017-09-02 03:44:41 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000052 of size 134217728 bytes
2017-09-02 03:49:21 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000053 of size 134217728 bytes
2017-09-02 03:53:24 17707 [Note] InnoDB: 128 rollback segment(s) are active.
2017-09-02 03:53:24 17707 [Note] InnoDB: Waiting for purge to start
2017-09-02 03:53:24 17707 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequence number 7400309372985
2017-09-02 03:53:24 17707 [ERROR] Aborting
2017-09-02 03:53:24 17707 [Note] WSREP: Signalling cancellation of the SST request.
2017-09-02 03:53:24 17707 [Note] WSREP: SST request was cancelled
2017-09-02 03:53:24 17707 [Note] WSREP: Closing send monitor...
2017-09-02 03:53:24 17707 [Note] WSREP: Closed send monitor.
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: terminating thread
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: joining thread
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: closing backend
2017-09-02 03:53:26 17707 [Note] WSREP: Service disconnected.
2017-09-02 03:53:26 17707 [Note] WSREP: Waiting to close threads......
2017-09-02 03:53:26 17707 [Note] WSREP: rollbacker thread exiting
-------------------------------------
This info correspond with strace log data:
2049 03:53:24 <... setpriority resumed> ) = 0
2049 03:53:24 getpriority(PRIO_PROCESS, 2049) = 20
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequence number 7400309372985\n", 137 <unfinished ...>
2049 03:53:24 select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
17707 03:53:24 <... write resumed> ) = 137
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb32cbf9000
17707 03:53:24 mprotect(0x7fb32cbf9000, 4096, PROT_NONE) = 0
17707 03:53:24 clone(child_stack=0x7fb32d3f8e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32d3f99d0, tls=0x7fb32d3f9700, child_tidptr=0x7fb32d3f99d0) = 2050
2050 03:53:24 set_robust_list(0x7fb32d3f99e0, 24 <unfinished ...>
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
2050 03:53:24 <... set_robust_list resumed> ) = 0
17707 03:53:24 <... mmap resumed> ) = 0x7fb32c3f8000
17707 03:53:24 mprotect(0x7fb32c3f8000, 4096, PROT_NONE) = 0
2050 03:53:24 futex(0x7fbf9746b58c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
17707 03:53:24 clone( <unfinished ...>
2051 03:53:24 set_robust_list(0x7fb32cbf89e0, 24 <unfinished ...>
17707 03:53:24 <... clone resumed> child_stack=0x7fb32cbf7e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32cbf89d0, tls=0x7fb32cbf8700, child_tidptr=0x7fb32cbf89d0) = 2051
2051 03:53:24 <... set_robust_list resumed> ) = 0
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
2051 03:53:24 futex(0x7fb4cd3ef90c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1504324414, 807821000}, ffffffff <unfinished ...>
17707 03:53:24 <... mmap resumed> ) = 0x7fb32bbf7000
17707 03:53:24 mprotect(0x7fb32bbf7000, 4096, PROT_NONE) = 0
17707 03:53:24 clone( <unfinished ...>
2052 03:53:24 set_robust_list(0x7fb32c3f79e0, 24 <unfinished ...>
17707 03:53:24 <... clone resumed> child_stack=0x7fb32c3f6e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32c3f79d0, tls=0x7fb32c3f7700, child_tidptr=0x7fb32c3f79d0) = 2052
2052 03:53:24 <... set_robust_list resumed> ) = 0
2052 03:53:24 futex(0x7fb334add74c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1504324409, 808928000}, ffffffff <unfinished ...>
2047 03:53:24 <... pread resumed> "K\315\1\320\0\t\276\200\0\2\321b\0\2\321\\\0\0\6\271\7U\2727E\277\0\0\0\0\0\0\0\0\0\0\0\0\0!=\251\0\233\16\356\6\353\0\0\0\1\0\1\0\206\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\1\0\0\3<\234infimum\0\t\6\0\10\3\0\0supremum\0NJ% \0\20\7;\335whois_record_delta/#sql-291557_8e08aawhois_record_delta/FK9F837BD2D0C7A375\0\n \350NJ% \0\30\0077\313whois_record_delta/#sql-291557_a77c0bwhois_record_delta/FKA6CD4E737B556202\0\n\4\276NJ% \0 \7\0\335whois_record_delta/#sql-291557_a7a611whois_record_delta/FKB4869798B8CF12D0\0\n\4\277NJ% \0(\7\0014whois_record_delta/#sql-291557_a7ba06whois_record_delta/FKC8DFC7E4D0C7A375\0\n \340`\\7\0\0000\7\2Kwhois_record_delta/registry_data_academy_add_2017_07_31whois_record_delta/FKA03AB7BF7B556202\0\3`Md`;\0\0008\7\2\270whois_record_delta/registry_data_accountants_add_2017_04_13whois_record_delta/FK5DCE39197B556202\0\t\352\21c_:\0\0@\7\3$whois_record_delta/registry_data_accountant_add_2017_07_28whois_record_delta/FK27E110F37B556202\0\4Z\36c_:\5\0H\7\3\220whois_record_delta/registry_data_accountant_add_2017_08_28whois_record_delta/FK27E18552D0C7A375\0\4Z\35^Z5\0\0P\7\3\367whois_record_delta/registry_data_actor_add_2017_07_06whois_record_delta/FK38C0C9FED0C7A375\0\0029\24^Z5\0\0X\7\4^whois_rec"..., 16384, 10462691328) = 16384
2047 03:53:24 pread(12, "\230S\314\376\0\n \354\0\3`J\0\3`M\0\0\6\271\0079B\302E\277\0\0\0\0\0\0\0\0\0\0\0\0\0\7*~\0\200\35\222 ~\0\0\0\5\0\0\0\31\0\0\0\0\246\0\312m\0\0\0\0\0\0\0\0\0\f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\1\0\0\3\24\17infimum\0\t\3\0\10\3\0\0supremum\0J% \0\20\5\2\303whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A9420710653J%$\0\30\5\1{whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A9479B00024J%$\0 \5\0\327whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A947B556202J% \0(\5\1\315whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A94B8CF12D0J% \0000\5\0\205whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A94D0C7A375J% \0008\5#zwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED1313020710653J%$\0@\5\3gwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED1313079B00024J%$\0H\5\2qwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED131307B556202J% \0P\5\3\271whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130B8CF12D0J% \0X\5\3\25whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130C7212EEFJ%$\0`\5\2\37whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130D0C7A"..., 16384, 10875502592) = 16384
2047 03:53:24 pread(12, <unfinished ...>
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [ERROR] Aborting\n\n", 44) = 44
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] WSREP: Signalling cancellation of the SST request.\n", 84) = 84
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] WSREP: SST request was cancelled\n", 66) = 66
17707 03:53:24 futex(0x7fb885bdb224, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fb885bdb220, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
------------------
All logs has been attached.