LP #1472779: core dump - Can't connect to MySQL server on '10.0.2.148' (111) at /usr/local/bin/pt-online-schema-change line 2262

Description

**Reported in Launchpad by Dave Gregg last update 06-06-2017 04:17:54

I have a unique topology with replication.
I am attempting to attach to a Master that is ALSO a SLAVE....(10.0.2.148)

I have a dsns table set up on 10.0.2.148 that has the one entry (10.0.2.147) - the slave

There is a Grand MASTER 10.0.2.150

 

\/
So there is a Master-Slave 10.0.2.149 (has replication filters)

 

\/
A 2nd Master-Slave 10.0.2.148

 

\/
A Slave 10.0.2.147

The server I am attempting to connect to (10.0.2.148) and execute schema changes on has one Slave Host replicating -> ASE2TestVM (10.0.2.147)

I run a --dry-run and there are NO issues. (Debug mode tells me nothing extra???)

I run in --execute mode and it always end up with the same....A core dump and it exits. I Keep getting a message that it cannot connect to MySQL Server on '10.0.2.148' (111) at /usr/local/bin/pt-online-schema-change line 2262.

Once this happens....I notice the connection has dropped and that the SLAVE has STOPPED Replication on 10.0.2.148
The Slave host under that master/slave (10.0.2.147) - is fine.....it is still replicating.

A core dump happens and I have to clean up the triggers and the temp table manually.

The command I am running is as follows:

>pt-online-schema-change --recursion-method=dsn=h=10.0.2.148,D=percona,t=dsns --nocheck-replication-filters --max-lag=10 --critical-load Threads_running=15 --alter "modify ExtraText1 varchar(256)" h=10.0.2.148,D=Data,t=Retest,u=root,p=pswd --execute

Found 1 slaves:
ASE2TestVM
Will check slave lag on:
ASE2TestVM
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `ATData`.`Retest`...
Creating new table...
Created new table ATData._Retest_new OK.
Altering new table...
Altered `ATData`.`_Retest_new` OK.
2015-07-08T15:28:46 Creating triggers...
2015-07-08T15:28:46 Created triggers OK.
2015-07-08T15:28:46 Copying approximately 5406492 rows...
2015-07-08T15:28:49 Dropping triggers...
2015-07-08T15:28:49 Error dropping trigger: DBI connect('ATData;host=10.0.2.148;mysql_read_default_group=client','root',...) failed: Can't c
.0.2.148' (111) at /usr/local/bin/pt-online-schema-change line 2262.

2015-07-08T15:28:49 Error dropping trigger: DBI connect('ATData;host=10.0.2.148;mysql_read_default_group=client','root',...) failed: Can't c
.0.2.148' (111) at /usr/local/bin/pt-online-schema-change line 2262.

2015-07-08T15:28:49 Error dropping trigger: DBI connect('ATData;host=10.0.2.148;mysql_read_default_group=client','root',...) failed: Can't c
.0.2.148' (111) at /usr/local/bin/pt-online-schema-change line 2262.

2015-07-08T15:28:49 To try dropping the triggers again, execute:
DROP TRIGGER IF EXISTS `ATData`.`pt_osc_ATData_Retest_del`;
DROP TRIGGER IF EXISTS `ATData`.`pt_osc_ATData_Retest_upd`;
DROP TRIGGER IF EXISTS `ATData`.`pt_osc_ATData_Retest_ins`;
`ATData`.`Retest` was not altered.
Segmentation fault (core dumped)
dgregg@Slave6vm:/usr/local/bin$

Environment

None

Smart Checklist

Activity

Show:

lpjirasync January 24, 2018 at 8:44 PM

**Comment from Launchpad by: Launchpad Janitor on: 06-06-2017 04:17:53

[Expired for Percona Toolkit because there has been no activity for 60 days.]

lpjirasync January 24, 2018 at 8:44 PM

**Comment from Launchpad by: Sveta Smirnova on: 06-04-2017 21:04:21

Dave, I assume you don't see the issue after you re-installed virtual machines and this was practically not pt-table-checksum bug? Please confirm.

Thanks in advance.

lpjirasync January 24, 2018 at 8:44 PM

**Comment from Launchpad by: Frank Cizmich on: 06-08-2015 19:19:29

Great to hear that!

The old "reinstall and/or reboot" engineering solution to "everything".

lpjirasync January 24, 2018 at 8:44 PM

**Comment from Launchpad by: Dave Gregg on: 04-08-2015 20:13:27

Hey Frank,

I think we now have this working!! I reinstalled multiple VMs and verified we have no file system corruption. Looks good in our initial testing. Just giving you a heads up.

Thanks
Dave

lpjirasync January 24, 2018 at 8:44 PM

**Comment from Launchpad by: Dave Gregg on: 23-07-2015 19:55:54

Just a quick FYI - the VMs we are working with do have some File System / Disk Corruption.

I am rebuilding the VMs so they are clean. I will let you know what this looks like next week.

Thanks ~ Dave

Cannot Reproduce

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 8:43 PM
Updated February 4, 2018 at 1:12 AM
Resolved January 24, 2018 at 8:43 PM