No Certificate Rotation possible in PXC 8 Cluster

Description

We upgraded to PXC 8 Cluster and found an pretty critical Issue.

It seems that a SSL Certificate Rotation is not possible anymore, without a downtime of the whole Cluster.

Since PXC 8 does not support the wsrep_sst_auth anymore, because it is not really secure, you introduced a mechanism to create a new user if an SST/IST happens. This is quiet good, however, based on the information from "vadimtk" in you IRC Channel, the authentication to create this user is done based on the correct SSL Certificates. And based on this article here (https://www.percona.com/doc/percona-xtradb-cluster/8.0/security/encrypt-traffic.html) you are using CA, Certificate and Key for authentication and this must be the same on all nodes.

 

However, if we need to change the SSL Certificate (reasons see below), this means, we have to stop all the galera nodes at the same time, change the CA, Cert and Key on all nodes and start all nodes again. This cannot be the solution I think!

 

There are many reasons for Company's to change those certificates:

  • ISO 27001 Certified Company's switch them regularly, normally every year

  • Usage of weak key lengths or elliptical curves in the certificate

  • Or may the key or CA Key has been compromised

 

This are only 3 reasons, I think there are some more.

 

I think, removing this wsrep_sst_auth user was a quiet good step, on the other hand we also had the possibility to include this config in the mysql config and set appropriate permissions to this configfile.

 

The most valuable solution for me for such a certificate rotation would be, to be able to add multiple certificates and keys in the single files (simply one after the other) and if there are two available for the node, the node uses for the SST user creation the certificate which is working. However you need to prefer the youngest certificate/key pair at this point for any connections, but also still accept the old.

In the rotation process, I only need to add the new certificates and key to all the nodes, restart the nodes and they pull the IST on startup. If this is done, I can remove the old cert/key and restart the nodes again one after the other to ensure the new cert/key are used.

 

Hope I could made clear what I mean. 

Environment

PXC 8 on centos 8

Smart Checklist

Activity

Show:

Julia Vural March 4, 2025 at 9:27 PM

It appears that this issue is no longer being worked on, so we are closing it for housekeeping purposes. If you believe the issue still exists, please open a new ticket after confirming it's present in the latest release.

Lalit Choudhary September 7, 2020 at 12:19 PM
Edited

There are two types of traffic: (1) galera traffic and (2) SST  traffic.  In both cases, the CA file is used for authorization (only connections from the certs generated from the CA file are allowed).

to answer your question,

Does that mean, the joiner node tries to authenticate at the donor node with both available certificates, while we are in the rotation process or is only the used CA checked at this point?

 

If they use the same CA, yes it here at this point all we need is the same CA file.

The receiver (server) side of the connection checks if the cert/key (from the sender/client) was generated from the CA file.  

We have a test for the same.

https://github.com/percona/percona-xtradb-cluster/blob/8.0/mysql-test/suite/galera/t/galera_ssl_upgrade.test

 

But I do see some gaps in documentation and as well in the overall process that we can improve. So I'm converting this report as an improvement request. 

Issue/Improvement :
Improvement:  Testing and come up with the process for SSL certificates upgrade for PXC nodes which also consider SSL configuration changes for IST/SST.

Once this done update the documentation for the SSL certificate upgrade.
 
Documentation bug:  To have a certification upgrade process which also includes  SSL configuration changes for IST/SST along with Galera replication traffic. 

 

For documenting the overall process of certification upgrade I think we should 1st upgrade the SSL certificate as described in doc for  (1) galera traffic and once all nodes are up and running with new SSL's we should change [sst]  config in my.cnf to use these new ssl certificates on all nodes.
 

Thomas bruckmann September 3, 2020 at 3:49 PM

I have one more question then, we found this Issue since, we wondered how the new SST User creatin in PXC 8 is authenticated and thought, the SSL certificate is used for this. Does that mean, the joiner node tries to authenticate at the donor node with both available certificates, while we are in the rotation process, or is only the used CA checked at this point?

Lalit Choudhary September 3, 2020 at 2:29 PM

Sure, Thomas.

Thanks for the update.

Thomas bruckmann September 3, 2020 at 2:27 PM

I will, however currently we have trouble with our prod cluster, if this is repaired I will test.

Won't Do

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created September 2, 2020 at 1:55 PM
Updated March 4, 2025 at 9:27 PM
Resolved March 4, 2025 at 9:27 PM