recovery ended before configured recovery target was reached

General

Escalation

General

Escalation

Description

I'm trying to restore the database from S3:

apiVersion: pg.percona.com/v1
kind: Pgtask
metadata:
  labels:
    pg-cluster: cluster1
    pgouser: admin
  name: cluster1-backrest-restore
  namespace: pgo
spec:
  name: cluster1-backrest-restore
  namespace: pgo
  parameters:
    backrest-restore-from-cluster: cluster1
    backrest-restore-opts: --type=time --target="2022-07-13 21:32:58" --no-repo1-storage-verify-tls
    backrest-storage-type: "s3"
    backrest-s3-verify-tls: "false"
  tasktype: restore

Restore fails without significant hint why:

2022-07-13 21:35:42.085 UTC [261] LOG:  redirecting log output to logging collector process
2022-07-13 21:35:42.085 UTC [261] HINT:  Future log output will appear in directory "pg_log".
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
2022-07-13 21:35:51,038 INFO: Lock owner: None; I am cluster1-bootstrap--1-wfcvb
2022-07-13 21:35:51,038 INFO: not healthy enough for leader race
/tmp:5432 - no response
2022-07-13 21:35:51,093 INFO: bootstrap in progress
2022-07-13 21:35:51,104 INFO: removing initialize key after failed attempt to bootstrap the cluster
2022-07-13 21:35:51,116 INFO: renaming data directory to /pgdata/cluster1_2022-07-13-21-35-51
Traceback (most recent call last):
  File "/usr/bin/patroni", line 11, in <module>
    load_entry_point('patroni==2.1.1', 'console_scripts', 'patroni')()
  File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 171, in main
    return patroni_main()
  File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 139, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main
    controller.run()
  File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 109, in run
    super(Patroni, self).run()
  File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run
    self._run_cycle()
  File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 112, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1471, in run_cycle
    info = self._run_cycle()
  File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1345, in _run_cycle
    return self.post_bootstrap()
  File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1238, in post_bootstrap
    self.cancel_initialization()
  File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1231, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

The next boostrap process shows infinite loop:

Thu Jul 14 01:20:51 UTC 2022 WARN: Detected an earlier failed attempt to initialize
Thu Jul 14 01:20:51 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again
Thu Jul 14 01:20:51 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51
Thu Jul 14 01:21:01 UTC 2022 WARN: Detected an earlier failed attempt to initialize
Thu Jul 14 01:21:01 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again
Thu Jul 14 01:21:01 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51

The error is actually hidden on postgresql pvc:
cluster1_2022-07-13-21-35-51/pg_log/postgresql-Wed.log contains:

2022-07-13 21:35:50.337 UTC [268] LOG:  restored log file "000000020000000000000011" from archive
2022-07-13 21:35:50.872 UTC [268] LOG:  redo done at 0/11000268 system usage: CPU: user: 0.00 s, system: 0.09 s, elapsed: 6.63 s
2022-07-13 21:35:50.872 UTC [268] LOG:  last completed transaction was at log time 2022-07-13 12:14:03.835829+00
2022-07-13 21:35:50.872 UTC [268] FATAL:  recovery ended before configured recovery target was reached
2022-07-13 21:35:50.875 UTC [261] LOG:  startup process (PID 268) exited with exit code 1
2022-07-13 21:35:50.875 UTC [261] LOG:  terminating any other active server processes
2022-07-13 21:35:50.877 UTC [261] LOG:  shutting down due to startup process failure
2022-07-13 21:35:50.899 UTC [261] LOG:  database system is shut down

Suggested fix:
a) Show full postgresql log boostrap fails
b) Describe this situation in documentation and mention recovery steps:
kubectl -n pgo delete pgtasks.pg.percona.com cluster1-backrest-restore
kubectl -n pgo delete job cluster1-bootstrap
kubectl -n pgo delete pvc cluster1

At this step the cluster is ready to apply restore with proper coordinates, because we have specified target restore time after latest restore time in backup repository, we can safely use restore without --type=time or --type=lsn and --target.

apiVersion: pg.percona.com/v1
kind: Pgtask
metadata:
  labels:
    pg-cluster: cluster1
    pgouser: admin
  name: cluster1-backrest-restore2
  namespace: pgo
spec:
  name: cluster1-backrest-restore2
  namespace: pgo
  parameters:
    backrest-restore-from-cluster: cluster1
    backrest-restore-opts: --no-repo1-storage-verify-tls
    backrest-storage-type: "s3"
    backrest-s3-verify-tls: "false"
  tasktype: restore

Environment

None

AFFECTED CS IDs

PS0001414

Smart Checklist

Activity

Show:

Details

Assignee

Unassigned

Reporter

Nickolay Ihalainen(Deactivated)

Labels

Priority

Medium

Smart Checklist

Created July 14, 2022 at 2:33 AM

Updated July 15, 2024 at 3:34 PM

Configure