Results from CHECK TABLE from PXC server can cause the client libraries to crash
Description
Environment
Attachments
- 05 May 2020, 08:18 AM
Smart Checklist
Activity
Noemi Lapresta July 15, 2021 at 1:25 PM
Verified fix for PXC 5.7.34-31.51. MTR test passes.
Venkatesh Prasad April 6, 2021 at 10:26 AM
Problem
-------
PXC node can send malformed packets to client and can cause client to fail with
asserion
`!check_buffer || (vio_pending(net->vio) <= 1)' in net_clear().
Background
----------
For any query that sends result set to the client, the server does the following
things as per classic protocol.
1. Result metadata specifying the number of fields in the result set
i.e, `protocol->start_result_metadata()`.
2. The field metadata specifying the types of fields in the result set
i.e, `protocol->send_field_metadata()` followed by
`protocol->end_result_metadata()`.
3. Actual row data (any data sent between `protocol->start_row()` and
`protocol->end_row()`
4. An OK/EOF packet indicating the end of result set.
`protocol->send_eof()` / `protocol->send_ok()` usually called from
`THD::send_statement_status()` in the end of `dispatch_command()`
Analysis
--------
When the server is running in autocommit mode and when it is executing a query
that sends some result set to client programs and it was BF aborted by TOI or
high priority transactions, the `wsrep_retry_autocommit` mechanism comes into
effect and the server retries the autocommit query withouy returning the error
to the client.
However, when the server is retrying the query, it is possible that the client
program may have already received partial result from server and may have been
already waiting for the OK/EOF packet to report it to the user (i.e, Steps 1-3
are over and waiting for the step 4 to happen).
In such a scenario, when a retry is performed, the server shall start executing
from Step-1 to Step-4 and if the query execution is successful, it sends OK/EOF
packet in the end to indicate that the query is complete. But on the client
side, this causes the client program to receive unexepected result metadata in
place of an OK/EOF packet and thus causes the client to error out with Malformed
packet error.
Note:
This issue was mostly seen with CHECK TABLE query when it's execution was
interrupted by a TOI. However, this issue is not seen with commands that run
in TOI. So, we can infer that this can happen on all queries that return result,
cannot run in TOI and can be killed by a PXC.
Fix
—
Disable wsrep_retry_autocommit mechanism for CHECK TABLE and instead report
ER_DEADLOCK_ERROR.
Note: As of now, we only disable it for CHECK TABLE and if the problem exists for
other commands, we can add them later.
Venkatesh Prasad April 6, 2021 at 10:22 AM
mohit.joshi February 25, 2021 at 1:33 PM
Updates: The crash is seen with CHECK TABLE and ANALYZE TABLE.
It is noticed from the logs that the number of rows returned by the query is negative
ANALYZE TABLE tt_1_t rows:-1
CHECK TABLE tt_2_t rows:-1
Details
Details
Assignee
Reporter
Time tracking
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

While running pstress a crash is seen with the following Assert condition:
pstress-pxc: /home/mohit.joshi/pxc-8.0/sql-common/net_serv.cc:237: void net_clear(NET*, bool): Assertion `!check_buffer || (vio_pending(net->vio) <= 1)' failed.
On further investigation it is found that the assert is coming from vio_pending()
/** Number of bytes in the read or socket buffer @remark An EOF condition might count as one readable byte. @return number of bytes in one of the buffers or < 0 if error. */ ssize_t vio_pending(Vio *vio) { uint bytes = 0; /* Data pending on the read buffer. */ if (vio->read_pos < vio->read_end) return vio->read_end - vio->read_pos; /* Skip non-socket based transport types. */ if (vio->type == VIO_TYPE_TCPIP || vio->type == VIO_TYPE_SOCKET) { /* Obtain number of readable bytes in the socket buffer. */ if (socket_peek_read(vio, &bytes)) return -1; } return (ssize_t)bytes; }
The stack trace is pasted below:
#0 0x00007f827857c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f827857e02a in __GI_abort () at abort.c:89 #2 0x00007f8278574bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5e1588 "!check_buffer || (vio_pending(net->vio) <= 1)", file=file@entry=0x5e1518 "/home/mohit.joshi/pxc-8.0/sql-common/net_serv.cc", line=line@entry=237, function=function@entry=0x5e1790 <net_clear(NET*, bool)::__PRETTY_FUNCTION__> "void net_clear(NET*, bool)") at assert.c:92 #3 0x00007f8278574c82 in __GI___assert_fail (assertion=0x5e1588 "!check_buffer || (vio_pending(net->vio) <= 1)", file=0x5e1518 "/home/mohit.joshi/pxc-8.0/sql-common/net_serv.cc", line=237, function=0x5e1790 <net_clear(NET*, bool)::__PRETTY_FUNCTION__> "void net_clear(NET*, bool)") at assert.c:101 #4 0x00000000004da531 in net_clear (net=0x7f8254002d70, check_buffer=true) at /home/mohit.joshi/pxc-8.0/sql-common/net_serv.cc:237 #5 0x00000000004bcd61 in cli_advanced_command (mysql=0x7f8254002d70, command=COM_QUERY, header=0x0, header_length=0, arg=0x7f827472cb70 "COMMIT", arg_length=6, skip_check=true, stmt=0x0) at /home/mohit.joshi/pxc-8.0/sql-common/client.cc:1309 #6 0x00000000004ce5ad in mysql_send_query (mysql=0x7f8254002d70, query=0x7f827472cb70 "COMMIT", length=6) at /home/mohit.joshi/pxc-8.0/sql-common/client.cc:7307 #7 0x00000000004ce89c in mysql_real_query (mysql=0x7f8254002d70, query=0x7f827472cb70 "COMMIT", length=6) at /home/mohit.joshi/pxc-8.0/sql-common/client.cc:7355 #8 0x000000000047c5d2 in execute_sql(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Thd1*) () #9 0x00000000004959bf in Thd1::run_some_query() () #10 0x0000000000479700 in Node::workerThread(int) () #11 0x00007f8278ee8c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f82797df6ba in start_thread (arg=0x7f8274731700) at pthread_create.c:333 #13 0x00007f827864e41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Steps to reproduce:
Start a 3 node cluster manually
Run pstress with default options
./pstress-pxc --database=test --threads=10 --queries-per-thread=2147483647 --logdir=/dev/shm/790213/2/node1/ --user=root --socket=/dev/shm/790213/2/node1/node1_socket.sock --seed 790213 --step 1 --metadata-path /home/mohit.joshi/pxc_runs/790213/ --seconds 60 --tables 20 --records 400 --insert-row 100 --update-with-cond 100 --delete-with-cond 100 --log-all-queries --log-failed-queries