LP #1334606: Undetected state gap. Can't continue.
General
Escalation
General
Escalation
Description
**Reported in Launchpad by Teemu Ollakka last update 17-11-2015 12:05:15
Nod that had been previously part of the cluster was started so that it first entered into non-primary component. For some reason local state was cleared on mysqld side, leading to undetected state gap error.
2014-06-26 10:49:37 6056 [Note] WSREP: Read nil XID from storage engines, skippi ng position init 2014-06-26 10:49:37 6056 [Note] WSREP: wsrep_load(): loading provider library '/ home/vagrant/galera/local4/galera/lib/libgalera_smm.so' 2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_load(): Galera 3.6dev(rXXXX) by Cod ership Oy <info@codership.com> loaded successfully. 2014-06-26 10:49:38 6056 [Note] WSREP: CRC-32C: using hardware acceleration. 2014-06-26 10:49:38 6056 [Note] WSREP: Found saved state: 2a7f5131-fc61-11e3-809 a-e6e6e4d4557c:0 2014-06-26 10:49:38 6056 [Note] WSREP: Passing config to GCS: base_host = 192.16 8.17.11; base_port = 4567; cert.log_conflicts = no; debug = no; evs.debug_log_ma sk = 0x4; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs. info_log_mask = 0x4; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT 5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /hom e/vagrant/galera/local4/mysql/var/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/vagrant/galera/local4/mysql/var//galera.cache; gcache.p age_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs. fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_th rottle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limi t = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announ ce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo 2014-06-26 10:49:38 6056 [Note] WSREP: Service thread queue flushed. 2014-06-26 10:49:38 6056 [Note] WSREP: Assign initial position for certification : 0, protocol version: -1 2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_sst_grab() 2014-06-26 10:49:38 6056 [Note] WSREP: Start replication 2014-06-26 10:49:38 6056 [Note] WSREP: Setting initial position to 2a7f5131-fc61 -11e3-809a-e6e6e4d4557c:0 2014-06-26 10:49:38 6056 [Note] WSREP: protonet asio version 0 2014-06-26 10:49:38 6056 [Note] WSREP: Using CRC-32C (optimized) for message che cksums. 2014-06-26 10:49:38 6056 [Note] WSREP: backend: asio 2014-06-26 10:49:38 6056 [Warning] WSREP: access file(gvwstate.dat) failed(No su ch file or directory) 2014-06-26 10:49:38 6056 [Note] WSREP: restore pc from disk failed 2014-06-26 10:49:38 6056 [Note] WSREP: GMCast version 0 2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') listeni ng at tcp://0.0.0.0:10041 2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') multica st: , ttl: 1 2014-06-26 10:49:38 6056 [Note] WSREP: EVS version 0 2014-06-26 10:49:38 6056 [Note] WSREP: PC version 0 2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connecting to group 'my_wsrep_clus ter', peer '192.168.17.13:10031' 2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning message relay requesting on, nonlive peers: 2014-06-26 10:49:38 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13 :10031 stable 2014-06-26 10:49:38 6056 [Warning] WSREP: no nodes coming from prim view, prim n ot possible 2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connected 2014-06-26 10:49:38 6056 [Note] WSREP: Changing maximum packet size to 64500, re sulting msg size: 32636 2014-06-26 10:49:38 6056 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 2014-06-26 10:49:38 6056 [Note] WSREP: Opened channel 'my_wsrep_cluster' 2014-06-26 10:49:38 6056 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2 2014-06-26 10:49:38 6056 [Note] WSREP: Waiting for SST to complete. 2014-06-26 10:49:38 6056 [Note] WSREP: Flow-control interval: [23, 23] 2014-06-26 10:49:38 6056 [Note] WSREP: Received NON-PRIMARY. 2014-06-26 10:49:38 6056 [Note] WSREP: New cluster view: global state: 2a7f5131- fc61-11e3-809a-e6e6e4d4557c:0, view# -1: non-Primary, number of nodes: 2, my ind ex: 1, protocol version -1 2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2014-06-26 10:49:46 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning message relay requesting off 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable 2014-06-26 10:49:48 6056 [Note] WSREP: Node 8e9b4e2a state prim 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9199b15d at tcp://192.168.17.13:10091 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a2005e at tcp://192.168.17.11:10071 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a93081 at tcp://192.168.17.13:10061 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable 2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9455a28c at tcp://192.168.17.12:10081 stable 2014-06-26 10:49:49 6056 [Note] WSREP: Node 8e9b4e2a state prim 2014-06-26 10:49:49 6056 [Note] WSREP: save pc into disk 2014-06-26 10:49:49 6056 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 5, memb_num = 9 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: sent state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 1 (node3) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 2 (node2) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 3 (node9) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 5 (node4) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 6 (node6) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 7 (node5) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 8 (node8) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 0 (node1) 2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 4 (node7) 2014-06-26 10:49:49 6056 [Note] WSREP: Quorum results: version = 3, component = PRIMARY, conf_id = 2, members = 9/9 (joined/total), act_id = 0, last_appl. = -1, protocols = 1/5/3 (gcs/repl/appl), group UUID = 2a7f5131-fc61-11e3-809a-e6e6e4d4557c 2014-06-26 10:49:49 6056 [Note] WSREP: Flow-control interval: [48, 48] 2014-06-26 10:49:49 6056 [Note] WSREP: Restored state OPEN -> JOINED (0) 2014-06-26 10:49:49 6056 [Note] WSREP: New cluster view: global state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0, view# 3: Primary, number of nodes: 9, my index: 5, protocol version 3 2014-06-26 10:49:49 6056 [ERROR] WSREP: Undetected state gap. Can't continue. 2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Group state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0 2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Local state: 00000000-0000-0000-0000-000000000000:-1 2014-06-26 10:49:49 6056 [ERROR] Aborting
Environment
None
Activity
Show:
Kamil Holubicki
December 15, 2022 at 11:06 AM
Marked as 'fixed' in Launchpad
lpjirasync
January 12, 2018 at 10:26 AM
**Comment from Launchpad by: Teemu Ollakka on: 01-07-2014 07:12:05
**Reported in Launchpad by Teemu Ollakka last update 17-11-2015 12:05:15
Nod that had been previously part of the cluster was started so that it first entered into non-primary component. For some reason local state was cleared on mysqld side, leading to undetected state gap error.
2014-06-26 10:49:37 6056 [Note] WSREP: Read nil XID from storage engines, skippi
ng position init
2014-06-26 10:49:37 6056 [Note] WSREP: wsrep_load(): loading provider library '/
home/vagrant/galera/local4/galera/lib/libgalera_smm.so'
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_load(): Galera 3.6dev(rXXXX) by Cod
ership Oy <info@codership.com> loaded successfully.
2014-06-26 10:49:38 6056 [Note] WSREP: CRC-32C: using hardware acceleration.
2014-06-26 10:49:38 6056 [Note] WSREP: Found saved state: 2a7f5131-fc61-11e3-809
a-e6e6e4d4557c:0
2014-06-26 10:49:38 6056 [Note] WSREP: Passing config to GCS: base_host = 192.16
8.17.11; base_port = 4567; cert.log_conflicts = no; debug = no; evs.debug_log_ma
sk = 0x4; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.
info_log_mask = 0x4; evs.join_retrans_period = PT1S; evs.max_install_timeouts =
3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT
5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /hom
e/vagrant/galera/local4/mysql/var/; gcache.keep_pages_size = 0; gcache.mem_size
= 0; gcache.name = /home/vagrant/galera/local4/mysql/var//galera.cache; gcache.p
age_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.
fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_th
rottle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limi
t = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announ
ce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb =
false; pc.npvo
2014-06-26 10:49:38 6056 [Note] WSREP: Service thread queue flushed.
2014-06-26 10:49:38 6056 [Note] WSREP: Assign initial position for certification
: 0, protocol version: -1
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_sst_grab()
2014-06-26 10:49:38 6056 [Note] WSREP: Start replication
2014-06-26 10:49:38 6056 [Note] WSREP: Setting initial position to 2a7f5131-fc61
-11e3-809a-e6e6e4d4557c:0
2014-06-26 10:49:38 6056 [Note] WSREP: protonet asio version 0
2014-06-26 10:49:38 6056 [Note] WSREP: Using CRC-32C (optimized) for message che
cksums.
2014-06-26 10:49:38 6056 [Note] WSREP: backend: asio
2014-06-26 10:49:38 6056 [Warning] WSREP: access file(gvwstate.dat) failed(No su
ch file or directory)
2014-06-26 10:49:38 6056 [Note] WSREP: restore pc from disk failed
2014-06-26 10:49:38 6056 [Note] WSREP: GMCast version 0
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') listeni
ng at tcp://0.0.0.0:10041
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') multica
st: , ttl: 1
2014-06-26 10:49:38 6056 [Note] WSREP: EVS version 0
2014-06-26 10:49:38 6056 [Note] WSREP: PC version 0
2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connecting to group 'my_wsrep_clus
ter', peer '192.168.17.13:10031'
2014-06-26 10:49:38 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning
message relay requesting on, nonlive peers:
2014-06-26 10:49:38 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13
:10031 stable
2014-06-26 10:49:38 6056 [Warning] WSREP: no nodes coming from prim view, prim n
ot possible
2014-06-26 10:49:38 6056 [Note] WSREP: gcomm: connected
2014-06-26 10:49:38 6056 [Note] WSREP: Changing maximum packet size to 64500, re
sulting msg size: 32636
2014-06-26 10:49:38 6056 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2014-06-26 10:49:38 6056 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2014-06-26 10:49:38 6056 [Note] WSREP: New COMPONENT: primary = no, bootstrap =
no, my_idx = 1, memb_num = 2
2014-06-26 10:49:38 6056 [Note] WSREP: Waiting for SST to complete.
2014-06-26 10:49:38 6056 [Note] WSREP: Flow-control interval: [23, 23]
2014-06-26 10:49:38 6056 [Note] WSREP: Received NON-PRIMARY.
2014-06-26 10:49:38 6056 [Note] WSREP: New cluster view: global state: 2a7f5131-
fc61-11e3-809a-e6e6e4d4557c:0, view# -1: non-Primary, number of nodes: 2, my ind
ex: 1, protocol version -1
2014-06-26 10:49:38 6056 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-06-26 10:49:46 6056 [Note] WSREP: (91a8037e, 'tcp://0.0.0.0:10041') turning message relay requesting off
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable
2014-06-26 10:49:48 6056 [Note] WSREP: Node 8e9b4e2a state prim
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 8e9b4e2a at tcp://192.168.17.11:10011 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 918e2134 at tcp://192.168.17.13:10031 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91931b6f at tcp://192.168.17.12:10021 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9199b15d at tcp://192.168.17.13:10091 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a2005e at tcp://192.168.17.11:10071 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91a93081 at tcp://192.168.17.13:10061 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 91aae867 at tcp://192.168.17.12:10051 stable
2014-06-26 10:49:48 6056 [Note] WSREP: declaring 9455a28c at tcp://192.168.17.12:10081 stable
2014-06-26 10:49:49 6056 [Note] WSREP: Node 8e9b4e2a state prim
2014-06-26 10:49:49 6056 [Note] WSREP: save pc into disk
2014-06-26 10:49:49 6056 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 5, memb_num = 9
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: sent state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 1 (node3)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 2 (node2)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 3 (node9)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 5 (node4)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 6 (node6)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 7 (node5)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 8 (node8)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 0 (node1)
2014-06-26 10:49:49 6056 [Note] WSREP: STATE EXCHANGE: got state msg: 98906d2d-fd1f-11e3-b1c0-cb3bb2038a9b from 4 (node7)
2014-06-26 10:49:49 6056 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 2,
members = 9/9 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 1/5/3 (gcs/repl/appl),
group UUID = 2a7f5131-fc61-11e3-809a-e6e6e4d4557c
2014-06-26 10:49:49 6056 [Note] WSREP: Flow-control interval: [48, 48]
2014-06-26 10:49:49 6056 [Note] WSREP: Restored state OPEN -> JOINED (0)
2014-06-26 10:49:49 6056 [Note] WSREP: New cluster view: global state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0, view# 3: Primary, number of nodes: 9, my index: 5, protocol version 3
2014-06-26 10:49:49 6056 [ERROR] WSREP: Undetected state gap. Can't continue.
2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Group state: 2a7f5131-fc61-11e3-809a-e6e6e4d4557c:0
2014-06-26 10:49:49 6056 [ERROR] WSREP: WSREP: Local state: 00000000-0000-0000-0000-000000000000:-1
2014-06-26 10:49:49 6056 [ERROR] Aborting