Issues
- Uninstalling Percona Telemetry DB component causing inconsistencyPXC-4590Resolved issue: PXC-4590patrick.birch
- mysql cannot connect to mysql-server in post-processing , if the host is not localhost in the .mylogin.cnfPXC-4558Resolved issue: PXC-4558Kamil Holubicki
- Assertion `state() == s_certifying' failure in debug buildPXC-4556
- Assertion `state() == s_aborting || state() == s_must_replay' failed on pxc node while replicating CTASPXC-4549
- PXC is not compatible with sql_generate_invisible_primary_key settingPXC-4548
- Garb receiving error/fatal messages during the stopPXC-4522
- percona_telemetry causes a long wait on COND_thd_list due to the absence of the root userPXC-4521Resolved issue: PXC-4521parveez.baig
- 2 cluster nodes go in to NON-PRIMARY state MDL BF-BF conflictPXC-4512Aaditya Dubey
- DROP PROCEDURE/FUNCTION IF EXISTS generates local GTID eventPXC-4504Resolved issue: PXC-4504Kamil Holubicki
- xtrabackup binary broken in percona/percona-xtradb-cluster:8.0.36-28.1 imagePXC-4437Resolved issue: PXC-4437Hrvoje Matijakovic
- Telemetry Ph1 - adjustments needed for PXCPXC-4436Resolved issue: PXC-4436Kamil Holubicki
- PXC 5.7.44-31.65.3PXC-4434Resolved issue: PXC-4434Venkatesh Prasad
- PXC 8.3.0PXC-4416Resolved issue: PXC-4416Kamil Holubicki
- PXC 8.0.37PXC-4407Resolved issue: PXC-4407Venkatesh Prasad
Uninstalling Percona Telemetry DB component causing inconsistency
Description
Environment
AFFECTED CS IDs
Details
Details
Assignee
Reporter
Needs QA
Sprint
Fix versions
Affects versions
Priority
Smart Checklist
Smart Checklist
Activity
Kamil Holubicki January 29, 2025 at 9:13 AM
Hi @Matthew Boehm,
Executing something on node1 should absolutely never cause another node to crash.
What about:
1. Disable replication on node2. Drop table t1 on node2. Enable replication on node2. Drop table t1 on node1. Node2 will be evicted by inconsistency voting
2. Start node1 with plugin A installed. On node1 execute ‘uninstall plugin A’, node2 will be evicted by inconsistency voting
3. node1 is 8.4, node2 is 8.0. Execute ‘create table… format=compressed’ on node2.
I think my example was not accurate. The problem is with a 2-node cluster, which is not a recommended setup. You execute something on node1, and node2 is evicted. If we have a 3-node cluster, by inconsistency voting protocol, the majority will win, so node1 will be evicted.
Going back to the original problem. This is the example of configuration mismatch between nodes. node1 has installed component while node2 doesn’t. The requirement for PXC is having all nodes' configurations to be the same. Note that if all nodes were upgraded and then we try to uninstall component, all nodes would have it installed (the same configuration) and there would be no problem.
Yet another reason I think it should be solved by documentation is that uninstalling the Percona Telemetry component is not the way to disable it permanently. The way is disabling it in my.cnf file.
If I unload a component on A, and that component isn’t loaded on B, it should not cause B to crash. B should error log
Yes, it is feasible to implement. But it will work only for components. The example with tables (above) will still evict the node. That’s why DROP TABLE IF EXISTS should be used in a case when it is the possibility of query failure on any node. The ideal solution would be having UNINSTALL COMPONENT IF INSTALLED. But we don’t have it and we don’t want to extend SQL.
Again, if we implement it for components, what about plugins? And then what about other similar inconsistencies? It will open up all the possibilities for next “bugs”. Instead, let’s stick to the already existing strict rule: “Nodes should have the same configuration. If not, make sure you know what your are doing.”
Matthew Boehm January 28, 2025 at 11:39 PM
Hey @Kamil Holubicki
I need to investigate it deeper, but 99% it is something to document, not fix.
I 100% disagree. Executing something on node1 should absolutely never cause another node to crash. If I unload a component on A, and that component isn’t loaded on B, it should not cause B to crash. B should error log something like “A requested unload component but component not loaded” and B should continue on as normal. If the component was loaded on A but not on B and also not on C, that would bring down the whole cluster.
High quality software cannot brush off such disastrous behavior as a documentation note.
Kamil Holubicki January 28, 2025 at 10:58 PM
HI @Matthew Boehm , yes, it is the same for every component. Moreover, not only component. It is similar situation to the one when you have node1, node2, node3, let’s say node 1 is the one that supports some syntax of DDL, but node2 and node3 do not and you execute this new syntax on node1.
I need to investigate it deeper, but 99% it is something to document, not fix.
Matthew Boehm January 28, 2025 at 6:21 PM
Has anyone tested other components with the same idea? I’m curious if this is not specific to percona telemetry component, but to any components.
Kamil Holubicki January 2, 2025 at 9:33 AM
Hi @Alok Pathak, thank you for your report. Please note that executing UNINSTALL COMPONENT file://component_percona_telemetry does not permanently disable it. It will come back after the server restart. To permanently disable Percona Telemetry, please specify percona_telemetry_disable=1 in my.cnf.
I understand that you decided to permanently disable Percona Telemetry. In your scenario, as a workaround, you can add loose_percona_telemetry_disable=1 to all servers' my.cnf files and then proceed with the upgrade.
Hello,
I’ve a two node PXC 8.0.36 cluster setup for testing purpose.
mysql> select @@hostname,@@version,@@version_comment; +------------+-------------+---------------------------------------------------------------------------------------+ | @@hostname | @@version | @@version_comment | +------------+-------------+---------------------------------------------------------------------------------------+ | db1 | 8.0.36-28.1 | Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3 | +------------+-------------+---------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select @@hostname,@@version,@@version_comment; +------------+-------------+---------------------------------------------------------------------------------------+ | @@hostname | @@version | @@version_comment | +------------+-------------+---------------------------------------------------------------------------------------+ | db2 | 8.0.36-28.1 | Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3 | +------------+-------------+---------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
In order to upgrade from PXC 8.0.36 → 8.0.37, I upgraded PXC version on first node.
mysql> select @@hostname,@@version,@@version_comment; +------------+-------------+---------------------------------------------------------------------------------------+ | @@hostname | @@version | @@version_comment | +------------+-------------+---------------------------------------------------------------------------------------+ | db1 | 8.0.37-29.1 | Percona XtraDB Cluster (GPL), Release rel29, Revision d29a325, WSREP version 26.1.4.3 | +------------+-------------+---------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
db1
now has Percona Telemetry enabled, whiledb2
does not.# db1 mysql> show global variables like '%percona%'; +-----------------------------------------+----------------------------------+ | Variable_name | Value | +-----------------------------------------+----------------------------------+ | percona_telemetry.grace_interval | 86400 | | percona_telemetry.history_keep_interval | 604800 | | percona_telemetry.scrape_interval | 86400 | | percona_telemetry.telemetry_root_dir | /usr/local/percona/telemetry/pxc | | percona_telemetry_disable | OFF | +-----------------------------------------+----------------------------------+ 5 rows in set (0.00 sec) # db2 mysql> show global variables like '%percona%'; Empty set (0.01 sec)
If I uninstall Percona telemetry DB component on db1, it was uninstalled successfully:
# db1 mysql> UNINSTALL COMPONENT "file://component_percona_telemetry"; Query OK, 0 rows affected (0.01 sec)
However, when the same command was replicated to
db2
(which doesn't have Percona Telemetry), an error occurred, causing the node to be removed from the cluster.# db2 mysql> show global status where variable_name IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_local_commits','wsrep_received','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected','wsrep_ready'); +---------------------------+--------------+ | Variable_name | Value | +---------------------------+--------------+ | wsrep_cluster_size | 0 | | wsrep_cluster_status | Disconnected | | wsrep_connected | OFF | | wsrep_local_commits | 0 | | wsrep_local_state | 5 | | wsrep_local_state_comment | Inconsistent | | wsrep_ready | OFF | | wsrep_received | 9 | +---------------------------+--------------+ 8 rows in set (0.00 sec)
The cluster was disrupted with the errors in the logs related to Percona Telemetry:
2025-01-01T14:37:49.943198Z 1 [ERROR] [MY-010584] [Repl] Replica SQL: Error 'Component specified by URN 'file://component_percona_telemetry' to unload has not been loaded before.' on query. Default database: ''. Query: 'UNINSTALL COMPONENT "file://component_percona_telemetry"', Error_code: MY-003537 2025-01-01T14:37:49.943322Z 1 [Warning] [MY-000000] [WSREP] Event 1 Query apply failed: 1, seqno 3 2025-01-01T14:37:49.948080Z 0 [Note] [MY-000000] [Galera] Member 0(pxc2) initiates vote on b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3,e3a28324db7bc89f: The Persistent Dynamic Loader was used to unload a component 'file://component_percona_telemetry', but it was not used to load that component before., Error_code: 3542; Component specified by URN 'file://component_percona_telemetry' to unload has not been loaded before., Error_code: 3537; 2025-01-01T14:37:49.948523Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 3537, 3542. New vote df132f0401aa2f52 will be used for further steps. Old Vote: e3a28324db7bc89f 2025-01-01T14:37:49.948596Z 0 [Note] [MY-000000] [Galera] Votes over b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3: df132f0401aa2f52: 1/2 Waiting for more votes. 2025-01-01T14:37:49.949751Z 0 [Note] [MY-000000] [Galera] Member 1(pxc1) responds to vote on b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3,0000000000000000: Success 2025-01-01T14:37:49.949807Z 0 [Note] [MY-000000] [Galera] Votes over b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3: 0000000000000000: 1/2 df132f0401aa2f52: 1/2 Winner: 0000000000000000 2025-01-01T14:37:49.950006Z 1 [ERROR] [MY-000000] [Galera] Inconsistency detected: Inconsistent by consensus on b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3 at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.36/percona-xtradb-cluster-galera/galera/src/replicator_smm.cpp:process_apply_error():1462 2025-01-01T14:37:49.952243Z 1 [Note] [MY-000000] [Galera] Closing send monitor... 2025-01-01T14:37:49.952314Z 1 [Note] [MY-000000] [Galera] Closed send monitor. 2025-01-01T14:37:49.952338Z 1 [Note] [MY-000000] [Galera] gcomm: terminating thread 2025-01-01T14:37:49.952378Z 1 [Note] [MY-000000] [Galera] gcomm: joining thread 2025-01-01T14:37:49.952626Z 1 [Note] [MY-000000] [Galera] gcomm: closing backend 2025-01-01T14:37:50.955947Z 1 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(NON_PRIM,79fc60f6-ab6c,18) memb { 79fc60f6-ab6c,0 } joined { } left { } partitioned { c2cad85a-ab8e,0 } ) 2025-01-01T14:37:50.956169Z 1 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0 2025-01-01T14:37:50.956223Z 1 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view ((empty)) 2025-01-01T14:37:50.956455Z 1 [Note] [MY-000000] [Galera] gcomm: closed 2025-01-01T14:37:50.956519Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2025-01-01T14:37:50.956655Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100] 2025-01-01T14:37:50.956704Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY. 2025-01-01T14:37:50.956741Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 3) 2025-01-01T14:37:50.956782Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE. 2025-01-01T14:37:50.956832Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0] 2025-01-01T14:37:50.956869Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection. 2025-01-01T14:37:50.956905Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 3) 2025-01-01T14:37:50.956944Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success 2025-01-01T14:37:50.957117Z 1 [Note] [MY-000000] [Galera] recv_thread() joined. 2025-01-01T14:37:50.957165Z 1 [Note] [MY-000000] [Galera] Closing replication queue. 2025-01-01T14:37:50.957199Z 1 [Note] [MY-000000] [Galera] Closing slave action queue. 2025-01-01T14:37:50.957318Z 1 [Note] [MY-000000] [Galera] ================================================ View: id: b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: no own_index: 0 members(1): 0: 79fc60f6-c84d-11ef-ab6c-63433d297016, pxc2 ================================================= 2025-01-01T14:37:50.957378Z 1 [Note] [MY-000000] [Galera] Non-primary view 2025-01-01T14:37:50.957417Z 1 [Note] [MY-000000] [WSREP] Server status change synced -> connected 2025-01-01T14:37:50.957453Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2025-01-01T14:37:50.957864Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2025-01-01T14:37:50.957944Z 1 [Note] [MY-000000] [Galera] ================================================ View: id: b82a7ad4-c84d-11ef-95ec-8aeb6af6c555:3 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: yes own_index: -1 members(0): ================================================= 2025-01-01T14:37:50.957987Z 1 [Note] [MY-000000] [Galera] Non-primary view 2025-01-01T14:37:50.958025Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> disconnected 2025-01-01T14:37:50.958059Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2025-01-01T14:37:50.958098Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2025-01-01T14:37:50.958188Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2025-01-01T14:37:50.958251Z 1 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 6 2025-01-01T14:37:50.958291Z 1 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2025-01-01T14:37:50.958336Z 1 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 1
db2 failed to run
UNINSTALL COMPONENT
command as it never had the telemetry component installed, resulting in an inconsistency.Is this expected behaviour?
Ideally, removal of Percona Telemetry on
db1
should not impactdb2
if the component isn't installed there.I performed another test with same setup (db1 - 8.0.37 with Percona Telemetry and db2- 8.0.36 without Telemetry), If I run
UNINSTALL COMPONENT
directly, it fails and removes db2 from the cluster.# db2 mysql> UNINSTALL COMPONENT "file://component_percona_telemetry"; ERROR 3537 (HY000): Component specified by URN 'file://component_percona_telemetry' to unload has not been loaded before.
Error Log
2025-01-01T14:29:37.002160Z 0 [Note] [MY-000000] [Galera] Member 1(pxc2) initiates vote on 785a7007-c788-11ef-b9b1-979fe3ae7c4e:31,d0626ec7b648e197: The Persistent Dynamic Loader was used to unload a component 'file://component_percona_telemetry', but it was not used to load that component before., Error_code: 3542; Component specified by URN 'file://component_percona_telemetry' to unload has not been loaded before., Error_code: 3537; 2025-01-01T14:29:37.002714Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 3537, 3542. New vote c486c89888ff9d45 will be used for further steps. Old Vote: d0626ec7b648e197 2025-01-01T14:29:37.002893Z 0 [Note] [MY-000000] [Galera] Votes over 785a7007-c788-11ef-b9b1-979fe3ae7c4e:31: c486c89888ff9d45: 1/2 Waiting for more votes. 2025-01-01T14:29:37.008065Z 0 [Note] [MY-000000] [Galera] Member 0(pxc1) responds to vote on 785a7007-c788-11ef-b9b1-979fe3ae7c4e:31,0000000000000000: Success 2025-01-01T14:29:37.008290Z 0 [Note] [MY-000000] [Galera] Votes over 785a7007-c788-11ef-b9b1-979fe3ae7c4e:31: 0000000000000000: 1/2 c486c89888ff9d45: 1/2 Winner: 0000000000000000 2025-01-01T14:29:37.008494Z 460 [ERROR] [MY-000000] [Galera] Inconsistency detected: Inconsistent by consensus on 785a7007-c788-11ef-b9b1-979fe3ae7c4e:31 at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.36/percona-xtradb-cluster-galera/galera/src/replicator_smm.cpp:process_apply_error():1462 2025-01-01T14:29:37.010077Z 460 [Note] [MY-000000] [Galera] Closing send monitor... 2025-01-01T14:29:37.010294Z 460 [Note] [MY-000000] [Galera] Closed send monitor. 2025-01-01T14:29:37.010438Z 460 [Note] [MY-000000] [Galera] gcomm: terminating thread 2025-01-01T14:29:37.010552Z 460 [Note] [MY-000000] [Galera] gcomm: joining thread 2025-01-01T14:29:37.010800Z 460 [Note] [MY-000000] [Galera] gcomm: closing backend
Is this expected behaviour?
If db2 doesn’t have Percona Telemetry component installed and commands fails, it shouldn’t replicate to other nodes.
Regards,
Alok