recovery ended before configured recovery target was reached

Description

I'm trying to restore the database from S3:

apiVersion: pg.percona.com/v1 kind: Pgtask metadata: labels: pg-cluster: cluster1 pgouser: admin name: cluster1-backrest-restore namespace: pgo spec: name: cluster1-backrest-restore namespace: pgo parameters: backrest-restore-from-cluster: cluster1 backrest-restore-opts: --type=time --target="2022-07-13 21:32:58" --no-repo1-storage-verify-tls backrest-storage-type: "s3" backrest-s3-verify-tls: "false" tasktype: restore

Restore fails without significant hint why:

2022-07-13 21:35:42.085 UTC [261] LOG: redirecting log output to logging collector process 2022-07-13 21:35:42.085 UTC [261] HINT: Future log output will appear in directory "pg_log". /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections 2022-07-13 21:35:51,038 INFO: Lock owner: None; I am cluster1-bootstrap--1-wfcvb 2022-07-13 21:35:51,038 INFO: not healthy enough for leader race /tmp:5432 - no response 2022-07-13 21:35:51,093 INFO: bootstrap in progress 2022-07-13 21:35:51,104 INFO: removing initialize key after failed attempt to bootstrap the cluster 2022-07-13 21:35:51,116 INFO: renaming data directory to /pgdata/cluster1_2022-07-13-21-35-51 Traceback (most recent call last): File "/usr/bin/patroni", line 11, in <module> load_entry_point('patroni==2.1.1', 'console_scripts', 'patroni')() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 171, in main return patroni_main() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 139, in patroni_main abstract_main(Patroni, schema) File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main controller.run() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 109, in run super(Patroni, self).run() File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run self._run_cycle() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 112, in _run_cycle logger.info(self.ha.run_cycle()) File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1471, in run_cycle info = self._run_cycle() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1345, in _run_cycle return self.post_bootstrap() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1238, in post_bootstrap self.cancel_initialization() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1231, in cancel_initialization raise PatroniFatalException('Failed to bootstrap cluster') patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

The next boostrap process shows infinite loop:

Thu Jul 14 01:20:51 UTC 2022 WARN: Detected an earlier failed attempt to initialize Thu Jul 14 01:20:51 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again Thu Jul 14 01:20:51 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51 Thu Jul 14 01:21:01 UTC 2022 WARN: Detected an earlier failed attempt to initialize Thu Jul 14 01:21:01 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again Thu Jul 14 01:21:01 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51

The error is actually hidden on postgresql pvc:
cluster1_2022-07-13-21-35-51/pg_log/postgresql-Wed.log contains:

2022-07-13 21:35:50.337 UTC [268] LOG: restored log file "000000020000000000000011" from archive 2022-07-13 21:35:50.872 UTC [268] LOG: redo done at 0/11000268 system usage: CPU: user: 0.00 s, system: 0.09 s, elapsed: 6.63 s 2022-07-13 21:35:50.872 UTC [268] LOG: last completed transaction was at log time 2022-07-13 12:14:03.835829+00 2022-07-13 21:35:50.872 UTC [268] FATAL: recovery ended before configured recovery target was reached 2022-07-13 21:35:50.875 UTC [261] LOG: startup process (PID 268) exited with exit code 1 2022-07-13 21:35:50.875 UTC [261] LOG: terminating any other active server processes 2022-07-13 21:35:50.877 UTC [261] LOG: shutting down due to startup process failure 2022-07-13 21:35:50.899 UTC [261] LOG: database system is shut down

Suggested fix:
a) Show full postgresql log boostrap fails
b) Describe this situation in documentation and mention recovery steps:
kubectl -n pgo delete pgtasks.pg.percona.com cluster1-backrest-restore
kubectl -n pgo delete job cluster1-bootstrap
kubectl -n pgo delete pvc cluster1

At this step the cluster is ready to apply restore with proper coordinates, because we have specified target restore time after latest restore time in backup repository, we can safely use restore without --type=time or --type=lsn and --target.

apiVersion: pg.percona.com/v1 kind: Pgtask metadata: labels: pg-cluster: cluster1 pgouser: admin name: cluster1-backrest-restore2 namespace: pgo spec: name: cluster1-backrest-restore2 namespace: pgo parameters: backrest-restore-from-cluster: cluster1 backrest-restore-opts: --no-repo1-storage-verify-tls backrest-storage-type: "s3" backrest-s3-verify-tls: "false" tasktype: restore

Environment

None

AFFECTED CS IDs

PS0001414

Smart Checklist

Activity

Show:

Details

Assignee

Reporter

Labels

Priority

Smart Checklist

Created July 14, 2022 at 2:33 AM
Updated July 15, 2024 at 3:34 PM