LP #1740459: replication delay too much on tokudb partition table

Description

**Reported in Launchpad by HolmerD last update 12-01-2018 02:40:17

Hi:
my slave is delay too much, the TPS and QPS very low, memory, io(fusion-io), cpu everything is ok.
analye the latest six binlog files, found 99% binlog event is abount table `abtest_monitor_fac`(partition table), event type is: Write_rows, Update_rows.

table message:
*************************** 1. row ***************************
Table: abtest_monitor_fac
Create Table: CREATE TABLE `abtest_monitor_fac` (
`day` int(11) NOT NULL COMMENT '20161022',
`time` bigint(11) NOT NULL COMMENT '201610221100',
`hash_id` bigint(20) unsigned NOT NULL COMMENT 'logic id',
`pv` bigint(20) NOT NULL COMMENT 'pv`',
`used_time` bigint(11) NOT NULL COMMENT ' ',
`uv` bigint(20) NOT NULL COMMENT 'uv`',
`last_update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '',
PRIMARY KEY (`time`,`hash_id`),
KEY `idx_day` (`day`),
KEY `idx_last_update_time` (`last_update_time`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8
/*!50500 PARTITION BY RANGE COLUMNS(`time`)
(PARTITION p20161028 VALUES LESS THAN (20161029000000) ENGINE = TokuDB,
........
PARTITION p20171230 VALUES LESS THAN (20171231000000) ENGINE = TokuDB) */

----------------------------------------------------

table_name

PARTITION_EXPRESSION

p_count

----------------------------------------------------

abtest_monitor_fac

`time`

429

----------------------------------------------------

-----------------------------------------------------------------------------

table_name

engine

table_rows

data_gb

index_gb

-----------------------------------------------------------------------------

abtest_monitor_fac

TokuDB

16723416943

763.579869242385

673.313777318224

-----------------------------------------------------------------------------

production environment: single master slave
version: Percona-Server-5.6.33

lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Stepping: 4
CPU MHz: 3000.195
BogoMIPS: 5999.11
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-39

cpu used:
12:36:16 PM PID %usr %system %guest %CPU CPU Command
12:36:17 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:18 PM 5220 100.00 3.00 0.00 100.00 0 mysqld
12:36:19 PM 5220 100.00 3.00 0.00 100.00 2 mysqld
12:36:20 PM 5220 100.00 2.00 0.00 100.00 1 mysqld
12:36:21 PM 5220 100.00 2.00 0.00 100.00 11 mysqld
12:36:22 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:23 PM 5220 100.00 4.00 0.00 100.00 1 mysqld
12:36:24 PM 5220 100.00 4.00 0.00 100.00 11 mysqld
12:36:25 PM 5220 100.00 2.00 0.00 100.00 13 mysqld
12:36:26 PM 5220 100.00 5.00 0.00 100.00 10 mysqld
12:36:27 PM 5220 100.00 2.00 0.00 100.00 0 mysqld
12:36:28 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:29 PM 5220 100.00 11.00 0.00 100.00 12 mysqld
12:36:30 PM 5220 100.00 5.00 0.00 100.00 0 mysqld

memory used:
total used free shared buffers cached
Mem: 126 123 2 0 0 48
-/+ buffers/cache: 74 51
Swap: 31 0 31

perf record message:

  1. Overhead sys usr Samples Command Shared Object

  2. ........ ........ ........ ............ ....... ......................................................................................... ..................
    #
    15.89% 0.00% 15.89% 40054 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/bin/mysqld 0x4f5245
    |

    •  

      • bitmap_get_next_set
        |
        |--50.36%-- ha_partition::rnd_init(bool)
        | handler::ha_rnd_init(bool)
        | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
        | Rows_log_event::do_apply_event(Relay_log_info const*)
        | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
        | handle_slave_worker
        | start_thread
        |
        |--48.26%-- ha_partition::rnd_end()
        | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
        | Rows_log_event::do_apply_event(Relay_log_info const*)
        | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
        | handle_slave_worker
        | start_thread
        --1.38%-- [...]

7.30% 0.00% 7.30% 18401 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/lib/mysql/plugin/ha_tokudb.so 0x5a48e

 

— ha_tokudb::index_init(unsigned int, bool)

-98.76%- ha_tokudb::rnd_init(bool)

handler::ha_rnd_init(bool)

ha_partition::rnd_init(bool)

handler::ha_rnd_init(bool)

Rows_log_event::do_index_scan_and_update(Relay_log_info const*)

Rows_log_event::do_apply_event(Relay_log_info const*)

slave_worker_exec_job(Slave_worker*, Relay_log_info*)

handle_slave_worker

start_thread

-1.24%- handler::ha_rnd_init(bool)

ha_partition::rnd_init(bool)
handler::ha_rnd_init(bool)
Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
Rows_log_event::do_apply_event(Relay_log_info const*)
slave_worker_exec_job(Slave_worker*, Relay_log_info*)
handle_slave_worker
start_thread

6.21% 0.00% 6.21% 15657 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/lib/mysql/plugin/ha_tokudb.so 0xaf438

 

— toku_db_cursor_internal(__toku_db*, __toku_db_txn*, __toku_dbc*, unsigned int, int)

-98.99%- toku_db_cursor(__toku_db*, __toku_db_txn*, __toku_dbc**, unsigned int)

ha_tokudb::index_init(unsigned int, bool)

ha_tokudb::rnd_init(bool)

handler::ha_rnd_init(bool)

ha_partition::rnd_init(bool)

handler::ha_rnd_init(bool)

Rows_log_event::do_index_scan_and_update(Relay_log_info const*)

Rows_log_event::do_apply_event(Relay_log_info const*)

slave_worker_exec_job(Slave_worker*, Relay_log_info*)

handle_slave_worker

start_thread

-0.84%- ha_tokudb::index_init(unsigned int, bool)

ha_tokudb::rnd_init(bool)

handler::ha_rnd_init(bool)

ha_partition::rnd_init(bool)

handler::ha_rnd_init(bool)

Rows_log_event::do_index_scan_and_update(Relay_log_info const*)

Rows_log_event::do_apply_event(Relay_log_info const*)

slave_worker_exec_job(Slave_worker*, Relay_log_info*)

handle_slave_worker

start_thread
-0.17%- [...]

Is this a bug?
And how to fix it?

Environment

None

Smart Checklist

Activity

Show:

Lalit Choudhary September 28, 2018 at 1:23 PM

Hi HolmerD,

Thank you for report and update.

lpjirasync January 24, 2018 at 11:51 AM

**Comment from Launchpad by: HolmerD on: 12-01-2018 02:40:16

This bug fixed in 5.6.38, 5.7.20.
https://github.com/mysql/mysql-server/commit/9414c7035020a1915db56101aa7929d8b4069d83#diff-8c78bfc758be72b7df2306f1d9984c47

If slave_rows_search_algorithms is set to TABLE_SCAN,HASH_SCAN there is a performance improvement.

set global slave_rows_search_algorithms='TABLE_SCAN,HASH_SCAN';

Done

Details

Assignee

Reporter

Fix versions

Priority

Smart Checklist

Created January 24, 2018 at 11:51 AM
Updated September 28, 2018 at 1:24 PM
Resolved September 28, 2018 at 1:24 PM