Issues

Select view

Select search mode

 
31 of 31

Audit plug memory leak

Duplicate

Description

After three months of uptime our server crashed with out of memory logging

2023-01-16T04:49:48.537035Z 18 [ERROR] /usr/sbin/mysqld: Out of memory (Needed 3221225528 bytes) 04:49:48 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. Please help us make Percona Server better by reporting any bugs at https://bugs.percona.com/ ... Writing a core file ... 602023-01-16T06:00:33.264344Z 0 [Note] /usr/sbin/mysqld: ready for connections. 61Version: '5.7.29-32-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 32, Revision 56bce88

Our core file investigation found that the ~3GB memory allocation request came from the audit plugin

#0 0x00007f710db05aa1 in pthread_kill () from /lib64/libpthread.so.0 #1 0x0000000000d50a75 in handle_fatal_signal (sig=11) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-2/sql/signal_handler.cc:230 #2 <signal handler called> #3 0x00007f710bc5ea5d in __memset_sse2 () from /lib64/libc.so.6 #4 0x00007f706faace1e in memset (__len=3221225472, __ch=0, __dest=0x7f6dfaff6048) at /usr/include/bits/string3.h:84 #5 realloc_stack_frames (thd=thd@entry=0x7f70180008c0, size=134217728) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/plugin/audit_log/audit_log.c:2037 #6 0x00007f706faaec23 in audit_log_update_thd_local (event=0x7f70580bf630, event_class=4, local=0x7f7018011e98, thd=0x7f70180008c0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/plugin/audit_log/audit_log.c:1305 #7 audit_log_notify (thd=0x7f70180008c0, event_class=MYSQL_AUDIT_TABLE_ACCESS_CLASS, event=0x7f70580bf630) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/plugin/audit_log/audit_log.c:1336 #8 0x0000000000d52198 in plugins_dispatch (arg=<synthetic pointer>, plugin=<optimized out>, thd=0x7f70180008c0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_audit.cc:1392 #9 event_class_dispatch (event=0x7f70580bf630, event_class=MYSQL_AUDIT_TABLE_ACCESS_CLASS, thd=0x7f70180008c0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_audit.cc:1432 #10 event_class_dispatch_error (event=0x7f70580bf630, event_name=0x13b6650 "MYSQL_AUDIT_TABLE_ACCESS_UPDATE", event_class=MYSQL_AUDIT_TABLE_ACCESS_CLASS, thd=0x7f70180008c0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_audit.cc:1454 #11 mysql_audit_notify (thd=0x7f70180008c0, subclass=<optimized out>, subclass_name=0x13b6650 "MYSQL_AUDIT_TABLE_ACCESS_UPDATE", table=<optimized out>) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_audit.cc:636 #12 0x0000000000d52349 in mysql_audit_table_access_notify (thd=thd@entry=0x7f70180008c0, table=0x7f6fc0c73730) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_audit.cc:715 #13 0x0000000000c18864 in open_tables (thd=thd@entry=0x7f70180008c0, start=start@entry=0x7f70580bf848, counter=<optimized out>, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f70580bf850) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_base.cc:5969 #14 0x0000000000c190aa in open_tables_for_query (thd=thd@entry=0x7f70180008c0, tables=tables@entry=0x7f6fc0c73730, ---Type <return> to continue, or q <return> to quit--- flags=flags@entry=0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_base.cc:6696 #15 0x0000000000cfd581 in Sql_cmd_update::try_single_table_update (this=this@entry=0x7f6fc0c19598, thd=thd@entry=0x7f70180008c0, switch_to_multitable=switch_to_multitable@entry=0x7f70580bf950) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_update.cc:2905 #16 0x0000000000cfda06 in Sql_cmd_update::execute (this=0x7f6fc0c19598, thd=0x7f70180008c0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_update.cc:3052 #17 0x0000000000c6e72d in mysql_execute_command (thd=thd@entry=0x7f70180008c0, first_level=first_level@entry=true) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_parse.cc:3842 #18 0x0000000000c73b75 in mysql_parse (thd=0x7f70180008c0, parser_state=parser_state@entry=0x7f70580c0af0, update_userstat=update_userstat@entry=true) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/sql_parse.cc:5927 #19 0x0000000000e87cd5 in Query_log_event::do_apply_event (this=0x7f702402caf0, rli=0x7f7024161360, query_arg=0x7f702402ba35 "/* ch=latin1 */ UPDATE account_452_multi_channel_transcripts_export SET status=if(error_count>2400, 1, status), transfer_started_ts=1673843913897, last_error_ts = 1673843913903, error_count=error_cou"..., q_len_arg=<optimized out>) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/log_event.cc:4923 #20 0x0000000000eedd87 in slave_worker_exec_job_group (worker=0x7f7024161360, rli=0xee8a050) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/rpl_rli_pdb.cc:2759 #21 0x0000000000ed03bb in handle_slave_worker (arg=arg@entry=0x7f7024161360) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/rpl_slave.cc:6309 #22 0x0000000000f30fa4 in pfs_spawn_thread (arg=0x7f70241651f0) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/storage/perfschema/pfs.cc:2198 #23 0x00007f710db00ea5 in start_thread () from /lib64/libpthread.so.0 #24 0x00007f710bccd9fd in clone () from /lib64/libc.so.6

It appears that audit plugin tried to reallocate the stack doubling its size (from 64M elements to 128M elements)

(gdb) frame 5 #5 realloc_stack_frames (thd=thd@entry=0x7f70180008c0, size=134217728) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/plugin/audit_log/audit_log.c:2037 2037 memset(stack, 0, size * sizeof(query_stack_frame)); (gdb) list 2032 if (local->stack.size > 0) 2033 memcpy(buf, stack, local->stack.size * sizeof(query_stack_frame)); 2034 THDVAR_SET(thd, query_stack, 2035 buf + local->stack.size * sizeof(query_stack_frame)); 2036 stack= (query_stack_frame *) THDVAR(thd, query_stack); 2037 memset(stack, 0, size * sizeof(query_stack_frame)); 2038 if (local->stack.size > 0) 2039 memcpy(stack, buf, local->stack.size * sizeof(query_stack_frame)); 2040 local->stack.frames= stack; 2041 local->stack.size= size; (gdb) print sizeof(query_stack_frame) $2 = 24 (gdb) print size $3 = 134217728

here size = 128 * 1024 * 1024. As a result, the audit plugin tried to allocate ~3 GB of memory:

(size=128 * 1024 * 1024) * (sizeof stack element=24) = 3221225472

which is close to the error log entry: Out of memory (Needed 3221225528 bytes).{{{}{}}}

This issue looks specific to our audit log configuration in our /etc/my.cnf file

# Audit Plugin plugin-load="audit_log=audit_log.so;auth_pam=auth_pam.so" audit_log_policy = LOGINS audit_log_format = CSV audit_log_rotate_on_size = 1GB audit_log_rotations = 3 audit_log_strategy = PERFORMANCE audit_log_policy = LOGINS audit_log_format = CSV

Having configuration with audit_log_strategy = PERFORMANCE is supposed to drop events if unable to write into file. The source code indicates that the realloc_stack_frames is called only if (event_class == MYSQL_AUDIT_TABLE_ACCESS_CLASS). However, our current log policy is set to LOGINS, which in turn should ignore all such table access events. 

One possible solution could be to move the lines calling is_event_class_allowed_by_policy before the audit_log_update_thd_local call in the audit_log_notify  code:

https://github.com/percona/percona-server/blob/Percona-Server-5.7.29-32/plugin/audit_log/audit_log.c#L1325

 

Environment

CentOS

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created February 2, 2023 at 11:38 PM
Updated March 6, 2024 at 9:49 AM
Resolved February 8, 2023 at 5:40 PM

Activity

Show:

Lenz Grimmer February 8, 2023 at 5:40 PM

Thank you for the feedback! We'll resolve this issue as a duplicate of https://perconadev.atlassian.net/browse/PS-7203#icft=PS-7203 for now.
Please re-open this issue if you can still reproduce this behavior after updating to 5.7.31.

Sasha February 3, 2023 at 3:24 PM

Indeed, the reported crash happen on our STATEMENT-based replication slave configured with

binlog_format = STATEMENT master_info_repository = 'TABLE' relay_log_info_repository = 'TABLE' slave_parallel_workers = 5

The master has mysql 5.6 release that does not exhibit this memory leak issue. We will test the https://perconadev.atlassian.net/browse/PS-7203#icft=PS-7203 fix.

Marcelo Altmann February 3, 2023 at 12:32 PM

Hi . Thanks for your complete report and analyzes.

#21 0x0000000000ed03bb in handle_slave_worker (arg=arg@entry=0x7f7024161360) at /usr/src/debug/percona-server-5.7.29-32/percona-server-5.7.29-32/sql/rpl_slave.cc:6309

Does the issue happen only on replicas or also on source?

Your version is affected by https://jira.percona.com/browse/PS-7203 which what you are reporting seems to be related (its happening on slave, and you are filtering events).

If possible, can you test one of the replicas running on version 5.7.31 and validate that the problem still occurs or its fixed by https://perconadev.atlassian.net/browse/PS-7203#icft=PS-7203 ?

Thanks.

Lenz Grimmer February 3, 2023 at 8:24 AM

Thank you for this very detailed bug report!