LP #1113301: pt-online-schema-change blocks on metadata locks

Description

**Reported in Launchpad by Ovais Tariq last update 24-07-2015 08:28:05

  1. pt-online-schema-change --version
    pt-online-schema-change 2.1.8

The tool pt-online-schema-change when run against a MySQL 5.5 server can block indefinitely on metadata locks in certain conditions, rendering the table in accessible. The metadata locks need to be acquired at various stages of the online schema change, such as when creating triggers, or when renaming table, etc.

Please see the test cases below, they all involve the following table:
mysql [localhost] {msandbox} (test) > show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL,
`x` char(3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `x` (`x`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

– Test case for pt-online-schema-change blocking during trigger creation:
session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test;


------+

id

x


------+

2

bar

1

foo


------+
2 rows in set (0.00 sec)

  1. pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test > pt-osc.out 2> pt-osc.err

  2. /usr/bin/perl 5.010001

  3. Linux ovaistariq-net 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

  4. Arguments: [--alter] [add column c char(32) default 'dummy_test'] [--execute] [h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test]
    Altering `test`.`test`...
    Creating new table...
    Created new table test._test_new OK.
    Altering new table...
    Altered `test`.`_test_new` OK.
    Creating triggers... <-- blocks here

session2 > show processlist;


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Id

User

Host

db

Command

Time

State

Info


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

20

msandbox

localhost

test

Sleep

245

 

NULL

21

msandbox

localhost

test

Query

0

NULL

show processlist

24

msandbox

localhost

test

Query

115

Waiting for table metadata lock

CREATE TRIGGER `pt_osc_test_test_del` AFTER DELETE ON `test`.`test` FOR EACH ROW DELETE IGNORE FROM


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

– Test case for pt-online-schema-change blocking during the step where tables are swapped:
session1 > select sleep(3);start transaction;select * from test;
----------

sleep(3)

----------

0

----------
1 row in set (3.00 sec)

Query OK, 0 rows affected (0.00 sec)


------+

id

x


------+

2

bar

1

foo


------+
2 rows in set (0.00 sec)

[root@ovaistariq-net msb_5_5_29]# pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test
Altering `test`.`test`...
Creating new table...
Created new table test._test_new OK.
Altering new table...
Altered `test`.`_test_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 2 rows...
Copied rows OK.
Swapping tables... <-- blocks here

session1 > show processlist;


------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Id

User

Host

db

Command

Time

State

Info


------------------------------------------------------------------------------------------------------------------------------------------------------------------+

20

msandbox

localhost

test

Query

0

NULL

show processlist

30

msandbox

localhost

test

Query

209

Waiting for table metadata lock

RENAME TABLE `test`.`test` TO `test`.`_test_old`, `test`.`_test_new` TO `test`.`test`

31

msandbox

localhost

test

Query

4

Waiting for table metadata lock

select * from test


------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Indefinite metadata locking can be prevented by setting the MySQL session variable lock_wait_timeout to a low value, however in that case pt-osc will simply fail:

  1. pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --set-vars "lock_wait_timeout=5" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test > pt-osc.out 2> pt-osc.err

  2. /usr/bin/perl 5.010001

  3. Linux ovaistariq-net 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

  4. Arguments: [--alter] [add column c char(32) default 'dummy_test'] [--set-vars] [lock_wait_timeout=5] [--execute] [h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test]
    Altering `test`.`test`...
    Creating new table...
    Created new table test._test_new OK.
    Altering new table...
    Altered `test`.`_test_new` OK.
    Creating triggers...
    Dropping triggers...
    Dropped triggers OK.
    Dropping new table...
    Dropped new table OK.
    `test`.`test` was not altered.

The ideal solution would be to let the user specify lock_wait_timeout via another variable metadata_lock_wait_timeout, and let the user specify the number of retries to be done when metadata locks cannot be taken immediately via another variable metadat_lock_retries perhaps.

I am also wondering if the pt-osc variable --lock-wait-timeout should actually be called innodb-lock-wait-timeout, because from the perspective of MySQL lock-wait-timeout and innodb-lock-wait-timeout are two different variables, and this can be confusing.

Please see this blog post for details on MDL:
http://www.mysqlperformanceblog.com/2013/02/01/implications-of-metadata-locking-changes-in-mysql-5-5/

Environment

None

Smart Checklist

Activity

February 22, 2018 at 9:14 PM

Creating triggers still blocks on metadata locks for me with PT-OSC v3.0.6 and Mysql 5.6. Makes the tool much less useful for frequently accessed tables, even relatively small ones.

lpjirasync January 24, 2018 at 2:22 PM

**Comment from Launchpad by: Abdel-Mawla Gharieb on: 24-07-2015 08:28:05

Hi Daniel,

AFAIK, the potential metadata lock cases might happen in different stages with pt-osc (creating new table, altering new table, creating the triggers, swapping the tables, dropping old table, dropping the triggers).
Taking --tries into consideration, what will pt-osc tool do if the number of tries was reached in one of those stages?
For example:
1- Creating new table ... done
2- Altering new table ... done
3- Creating triggers ... failed after reaching the number of tries.

What is the case related to the newly created table? will it be dropped before exiting the tool?
1- if the table will be dropped this is another potential metadata lock issue?
2- if it will be kept - beside having unneeded table in the system - executing the tool to alter the same table again might cause trouble or fail.

Thanks,
Abdel-Mawla

lpjirasync January 24, 2018 at 2:22 PM

**Comment from Launchpad by: Daniel Nichter on: 13-12-2013 04:08:05

Sheyda, I think http://www.percona.com/forums/questions-discussions/percona-toolkit would be the best place to ask because Percona support staff watch the forms, as well as other non-Percona employee with experience with the tools. The short answer to your question is: there are no special options or anything in the tool to make it magically work on a very busy server. Maybe try really low lock wait timeouts and really high numbers of retries, but the tradeoff for doing this is that the tool make take a very long time to complete because it keeps being the victim of deadlocks and such. Maybe 5.6 online DDL is better in this case, but it's not perfect either. For example, it has no progress indicator so you're left wondering for a long time if it's working and when it will complete.

lpjirasync January 24, 2018 at 2:22 PM

**Comment from Launchpad by: sheyda amini on: 07-12-2013 00:02:06

Daniel,

I am going to use pt-online-schema-change to optimize a very large table which has lots of data deleted from. I did try it on a test server and seems to be working, but my worry is as in the past by inception of the triggers, the table got locked and in our environment which has high QPS , will generate an outage. Anything more to use this tool to avoid this scenario. I am running percona 5.5.34.

Thanks!

lpjirasync January 24, 2018 at 2:22 PM

**Comment from Launchpad by: sheyda amini on: 28-03-2013 15:26:35

Daniel, Thanks. At this point if I want to reproduce the incident, I have to create another outage, which the mgmt will not go for it.

I have rolled back, however I want to introduce it as an existing bug, since the fix has not made a difference in the behavior of

meta(locking) issues. We don't have support with percona and at this point I am not trying to reproduce the error.

This tool works for me just fine in my test server, but does not work with high write transactions in production.

Done

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 2:22 PM
Updated February 22, 2018 at 9:14 PM
Resolved January 24, 2018 at 2:22 PM