missing error text on partly done restore from PBM side
General
Escalation
General
Escalation
Description
If there's some issue like pbm-agent crash on one of the nodes during restore PBM marks it as "partlyDone" and we mark this restore as "error", but we are missing some meaningful text for it. from PBM:
If there's some issue like pbm-agent crash on one of the nodes during restore PBM marks it as "partlyDone" and we mark this restore as "error", but we are missing some meaningful text for it.
from PBM:
[mongodb@my-cluster-name-rs0-0 db]$ /opt/percona/pbm describe-restore 2023-09-19T09:24:09.415068149Z --config /etc/pbm/pbm_config.yaml name: "2023-09-19T09:24:09.415068149Z" opid: 650968b9b2adf4b4ded22207 backup: "2023-09-18T19:58:06Z" type: physical status: partlyDone last_transition_time: "2023-09-19T09:28:21Z" replsets: - name: rs0 status: partlyDone last_transition_time: "2023-09-19T09:28:15Z" nodes: - name: my-cluster-name-rs0-2.my-cluster-name-rs0.test.svc.cluster.local:27017 status: error error: 'Node lost. Last heartbeat: 1695115450' last_transition_time: "2023-09-19T09:24:30Z" - name: my-cluster-name-rs0-0.my-cluster-name-rs0.test.svc.cluster.local:27017 status: done last_transition_time: "2023-09-19T09:26:36Z" - name: my-cluster-name-rs0-1.my-cluster-name-rs0.test.svc.cluster.local:27017 status: done last_transition_time: "2023-09-19T09:26:44Z"
as it can be seen on one node the restore has failed.
Our restore object:
apiVersion: psmdb.percona.com/v1 kind: PerconaServerMongoDBRestore metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDBRestore","metadata":{"annotations":{},"name":"restore6","namespace":"test"},"spec":{"backupName":"backup1","clusterName":"my-cluster-name"}} creationTimestamp: "2023-09-19T09:21:38Z" generation: 1 name: restore6 namespace: test resourceVersion: "471036" uid: 5943b832-cd6d-4d87-87bb-4eef5b470ce2 spec: backupName: backup1 clusterName: my-cluster-name status: pbmName: "2023-09-19T09:24:09.415068149Z" state: error $ k describe psmdb-restore restore6 Name: restore6 Namespace: test Labels: <none> Annotations: <none> API Version: psmdb.percona.com/v1 Kind: PerconaServerMongoDBRestore Metadata: Creation Timestamp: 2023-09-19T09:21:38Z Generation: 1 Resource Version: 471036 UID: 5943b832-cd6d-4d87-87bb-4eef5b470ce2 Spec: Backup Name: backup1 Cluster Name: my-cluster-name Status: Pbm Name: 2023-09-19T09:24:09.415068149Z State: error Events: <none>
It would be helpful if we had some more text that the restore was partially done and that the user needs to check PBM logs or something like that.