GRANT statement may be replicated in a wrong way if partial_revokes=1

Description

Description

This problem causes Galera cluster breaks replication on async node when GRANT statement is run remotely through TCP/IP because it damages or corrupts the binary log due to unable to detect propertly the CURRENT_USER value which results to be empty or null. However, on the node where you connect and run the GRANT, CURRENT_USER/CURRENT_USER() value reveals to be correct, but the rest of the nodes including async replica reveals to be corrupted.

 

Below is the scenario....

I have a 3-node Galera cluster with 1 async replica node. This is how it looks like:

+ pupnode57 (192.168.40.57)

+ pupnode58 (192.168.40.58) 

+ >>>>> pupnode60 (192.168.40.60)  (async replica)          

+ pupnode59 (192.168.40.59)

 

The point of this bug, whenever a GRANT statement is run remotely over TCP/IP it reveals that it fails to pass the CURRENT_USER/CURRENT_USER() logged-in on that originating node. This means that, binary log correctly writes the CURRENT_USER() value in that originating node but not when is synchronously and asynchronously written the binlog to the rest of the nodes (either a galera or async replica) it fails to detect the value of CURRENT_USER() result to a 'skip-grants user'@'skip-grants host' is used

For example, from running,

GRANT ALL PRIVILEGES ON *.* TO proxymaxdba23@'192.168.40.%' WITH GRANT OPTION

 

Binary/relay log writes:

GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'skip-grants user'@'skip-grants host' WITH ROLE NONE

 

The string skip-grants user and skip-grants host are placeholders whenever CURRENT_USER/CURRENT_USER() is empty or null from what I understand. See https://github.com/mysql/mysql-server/blob/ea1efa9822d81044b726aab20c857d5e1b7e046a/sql/auth/sql_security_ctx.h#L61 and https://github.com/mysql/mysql-server/blob/ea1efa9822d81044b726aab20c857d5e1b7e046a/sql/auth/sql_security_ctx.cc#L170 

Steps to Reproduce

  • Deploy a 3-node PXC galera. Current version i.e. 8.0.33-25.1 

  • Deploy an async replica node. Choose to whichever master you would like to use in the 3-node. For example, I choose pupnode58 (192.168.40.58) as my master of my pupnode60 (192.168.40.60) async replica in my environment when reproducing this bug.

  • Connect to ssh to any of your galera node and create user mytestuser@'%' through localhost

    mysql -uroot -p -P3306 -hlocalhost -e "CREATE USER mytestuser@'%' IDENTIFIED WITH caching_sha2_password BY 'admin'; GRANT ALL PRIVILEGES ON *.* TO mytestuser@'%' WITH GRANT OPTION;" -v
  • Run the following statements by connecting through TCP/IP as follows. However, at this time let's run on the non-master nodes of pupnode60 i.e. run only on pupnode57 (192.168.40.57) and pupnode59 (192.168.40.59). First, let's try to run in pupnode57 (192.168.40.57)

    export num=23; mysql -umytestuser -padmin -P3306 -h192.168.40.57 -e "CREATE USER proxymaxdba${num}@'192.168.40.%' IDENTIFIED WITH caching_sha2_password BY 'admin'; GRANT ALL PRIVILEGES ON *.* TO proxymaxdba${num}@'192.168.40.%' WITH GRANT OPTION;" -v 2>/dev/null
  • Then let's check the binary log within pupnode57,

    root@pupnode57:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows binlog.000028 |grep -i grant -C5|tail -10 SET @@SESSION.GTID_NEXT= 'a3a32317-46a5-11ee-9230-6e4f4e434910:524'/*!*/; # at 783 #230905 18:36:51 server id 8000 end_log_pos 994 CRC32 0xf6fea4be Query thread_id=26382 exec_time=0 error_code=0 Xid = 875 SET TIMESTAMP=1693939011/*!*/; GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'mytestuser'@'%' WITH ROLE NONE /*!*/; # at 994 ....

Because I run it on the node pupnode57 (192.168.40.57), the binary log write it correctly and was able to detect the CURRENT_USER()/CURRENT_USER value. Noticed the value inside the AS clause i.e. AS 'mytestuser'@'%'.

 

Now, let's check also the binlog to all the remaining nodes (pupnode58, pupnode59, and pupnode60):

root@pupnode58:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows binlog.000030 |grep -i grant -C5|tail -10 SET @@SESSION.GTID_NEXT= '5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:155'/*!*/; # at 1000 #230905 18:36:51 server id 8000 end_log_pos 1255 CRC32 0x05a1625c Query thread_id=26382 exec_time=0 error_code=0 Xid = 875 SET TIMESTAMP=1693939011/*!*/; GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'skip-grants user'@'skip-grants host' WITH ROLE NONE /*!*/; # at 1255 ... root@pupnode59:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows binlog.000034 |grep -i grant -C5|tail -10 SET @@SESSION.GTID_NEXT= 'a3a32317-46a5-11ee-9230-6e4f4e434910:524'/*!*/; # at 6104 #230905 18:36:51 server id 8000 end_log_pos 6359 CRC32 0x5b6ede42 Query thread_id=26382 exec_time=0 error_code=0 Xid = 875 SET TIMESTAMP=1693939011/*!*/; GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'skip-grants user'@'skip-grants host' WITH ROLE NONE /*!*/; ....

Lastly, pupnode60, let's check the relay log which is the replicated binary log from its master (pupnode58).

root@pupnode60:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows relay-bin.000002 |grep -i grant -C5 |tail -10 SET @@SESSION.GTID_NEXT= '5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:155'/*!*/; # at 1130 #230905 18:36:51 server id 8000 end_log_pos 1255 CRC32 0x05a1625c Query thread_id=26382 exec_time=0 error_code=0 Xid = 875 SET TIMESTAMP=1693939011/*!*/; GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'skip-grants user'@'skip-grants host' WITH ROLE NONE /*!*/; ....

As you can see, all the nodes except the originating nodes has the CURRENT_USER value stated correctly, while the rest reveals empty resulting to have 'skip-grants user'@'skip-grants host' as the value instead (default as stated in the code).

  • Because of that this cause the replication in pupnode60 to be broken. See the following output:

mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 192.168.40.58 Master_User: cmon_replication Master_Port: 3306 Connect_Retry: 10 Master_Log_File: binlog.000030 Read_Master_Log_Pos: 4737 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 717 Relay_Master_Log_File: binlog.000030 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 3836 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:155' at source log binlog.000030, end_log_pos 1255. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Skip_Counter: 0 Exec_Master_Log_Pos: 587 Relay_Log_Space: 5071 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 3836 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:155' at source log binlog.000030, end_log_pos 1255. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Replicate_Ignore_Server_Ids: Master_Server_Id: 8000 Master_UUID: 75d22a6b-4b8f-11ee-8fcb-0210bc02654d Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 230905 18:36:51 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:152-169 Executed_Gtid_Set: 5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:1-154, a3a32317-46a5-11ee-9230-6e4f4e434910:1-369 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) mysql> SELECT worker_id, last_error_number, last_error_message FROM performance_schema.replication_applier_status_by_worker\G *************************** 1. row ***************************          worker_id: 1  last_error_number: 3836 last_error_message: Worker 1 failed executing transaction '5c5cdce8-b95a-ee11-6dcf-91b0b1bcb6ef:155' at source log binlog.000030, end_log_pos 1255; Error 'Either some of the authorization IDs in the AS clause are invalid or the current user lacks privileges to execute the statement.' on query. Default database: ''. Query: 'GRANT ALL PRIVILEGES ON *.* TO 'proxymaxdba23'@'192.168.40.%' WITH GRANT OPTION AS 'skip-grants user'@'skip-grants host' WITH ROLE NONE' *************************** 2. row ***************************          worker_id: 2  last_error_number: 0 last_error_message: *************************** 3. row ***************************          worker_id: 3  last_error_number: 0 last_error_message: *************************** 4. row ***************************          worker_id: 4  last_error_number: 0 last_error_message: 4 rows in set (0.00 sec)

This means that whenever you do not run the GRANT statement on a non-master node of your async replica, it will always BREAK your replication. If you run the GRANT statement in a master-node of your replica this can be mitigated. However, this is again a problem if you have multiple replicas in every Galera node. For example, all your 3-galera nodes have its respective async replica nodes, for sure, one of the async replica will be fine but all the two nodes will fail because of the CURRENT_USER() bug value presence.

Actual Behavior

It fails to logged the CURRENT_USER() value.

Expected Behavior

It should properly logged the CURRENT_USER() value in the binary log  globally in the cluster, i.e. all involve database nodes in the cluster (synchronous and asynchronous)

 

Platform/OS and relevant software versions

PXC versions for all database nodes including the async replica:

$ mysqld --version /usr/sbin/mysqld  Ver 8.0.33-25.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3)

database nodes are using:

$ cat /etc/os-release|head -2 NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)"

 

Environment

 

PXC versions for all database nodes including the async replica:

$ mysqld --version /usr/sbin/mysqld  Ver 8.0.33-25.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3)

database nodes are using:

$ cat /etc/os-release|head -2 NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)"

 

Activity

Show:

Kamil Holubicki October 16, 2023 at 12:24 PM

Please note that for PXC cluster it is recommended that all nodes have the same configuration.

If we have 2-nodes cluster, and we enable partial_revokes only on node_2 and then do the above grant on node_1, it will result with skip-grants user to be binlogged by node_2. This is because node_2 (replica) executes queries from root user context.

The same situation we have when using standard, async replication node_1 -> node_2. If we set partial_revokes=1 on node_2 and execute grant on node_1, we will end up with skip-grant user in node_2's binlog.

In simple, when partial_revokes is enabled, the node rewrites the query adding explicit "AS ... WITH ROLE ..." clause.

Paul Namuag October 8, 2023 at 8:18 PM

Hi @kamil.holubicki,

 

That's a great update. Thank you for the update and great work. Will test it once released.

Kamil Holubicki October 4, 2023 at 6:41 PM
Edited

Hi ,

After considering all pros and cons, it was fixed by rewriting the query on the source node.

Paul Namuag September 8, 2023 at 5:15 PM

Hi ,

Thanks.

Although what you have said is not what is the intention of this bug report. The reason is that, we do not use AS clause nor in the example statements in the GRANT showed we use AS. But due to partial_revokes might be set or enabled by default for some reason (didn't dig on this), that cause the problem there. As what I have said, this is not the case with just using standard or async replication. It writes to the binlog and affected replicating nodes just fine for which was hoping it should also work the way it is with synchronous replication using Galra/wsrep plugin. The reason of that bug is causing breaking the replication because obviously that 'skip-grant user'@'skip-grant host' is obviously do not exist in mysql.users and which cannot be existent since there's no way a user ++ host combination with space can exist in mysql, and that cause the error or problem. It is now how to use AS clause as is.

Thank you for the feedback.

Kamil Holubicki September 8, 2023 at 12:35 PM

Hi ,

When using 'partial_revokes', please use AS clause explicitly when granting privileges on * . *

Analysis of the possible fix is in progress, but most probably the implicit usage will be disallowed (as it is for CURRENT_USER()) in future versions.

 

Done

Details

Assignee

Reporter

Needs Review

Yes

Needs QA

Yes

Priority

Smart Checklist

Created September 5, 2023 at 7:34 PM
Updated March 6, 2024 at 8:36 PM
Resolved October 16, 2023 at 12:19 PM