Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
2022-07-13 21:35:42.085 UTC [261] LOG: redirecting log output to logging collector process
2022-07-13 21:35:42.085 UTC [261] HINT: Future log output will appear in directory "pg_log".
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
2022-07-13 21:35:51,038 INFO: Lock owner: None; I am cluster1-bootstrap--1-wfcvb
2022-07-13 21:35:51,038 INFO: not healthy enough for leader race
/tmp:5432 - no response
2022-07-13 21:35:51,093 INFO: bootstrap in progress
2022-07-13 21:35:51,104 INFO: removing initialize key after failed attempt to bootstrap the cluster
2022-07-13 21:35:51,116 INFO: renaming data directory to /pgdata/cluster1_2022-07-13-21-35-51
Traceback (most recent call last):
File "/usr/bin/patroni", line 11, in <module>
load_entry_point('patroni==2.1.1', 'console_scripts', 'patroni')()
File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 171, in main
return patroni_main()
File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 139, in patroni_main
abstract_main(Patroni, schema)
File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main
controller.run()
File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 109, in run
super(Patroni, self).run()
File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run
self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 112, in _run_cycle
logger.info(self.ha.run_cycle())
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1471, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1345, in _run_cycle
return self.post_bootstrap()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1238, in post_bootstrap
self.cancel_initialization()
File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1231, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
The next boostrap process shows infinite loop:
Thu Jul 14 01:20:51 UTC 2022 WARN: Detected an earlier failed attempt to initialize
Thu Jul 14 01:20:51 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again
Thu Jul 14 01:20:51 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51
Thu Jul 14 01:21:01 UTC 2022 WARN: Detected an earlier failed attempt to initialize
Thu Jul 14 01:21:01 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again
Thu Jul 14 01:21:01 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51
The error is actually hidden on postgresql pvc: cluster1_2022-07-13-21-35-51/pg_log/postgresql-Wed.log contains:
2022-07-13 21:35:50.337 UTC [268] LOG: restored log file "000000020000000000000011" from archive
2022-07-13 21:35:50.872 UTC [268] LOG: redo done at 0/11000268 system usage: CPU: user: 0.00 s, system: 0.09 s, elapsed: 6.63 s
2022-07-13 21:35:50.872 UTC [268] LOG: last completed transaction was at log time 2022-07-13 12:14:03.835829+00
2022-07-13 21:35:50.872 UTC [268] FATAL: recovery ended before configured recovery target was reached
2022-07-13 21:35:50.875 UTC [261] LOG: startup process (PID 268) exited with exit code 1
2022-07-13 21:35:50.875 UTC [261] LOG: terminating any other active server processes
2022-07-13 21:35:50.877 UTC [261] LOG: shutting down due to startup process failure
2022-07-13 21:35:50.899 UTC [261] LOG: database system is shut down
Suggested fix: a) Show full postgresql log boostrap fails b) Describe this situation in documentation and mention recovery steps: kubectl -n pgo delete pgtasks.pg.percona.com cluster1-backrest-restore kubectl -n pgo delete job cluster1-bootstrap kubectl -n pgo delete pvc cluster1
At this step the cluster is ready to apply restore with proper coordinates, because we have specified target restore time after latest restore time in backup repository, we can safely use restore without --type=time or --type=lsn and --target.
I'm trying to restore the database from S3:
apiVersion: pg.percona.com/v1 kind: Pgtask metadata: labels: pg-cluster: cluster1 pgouser: admin name: cluster1-backrest-restore namespace: pgo spec: name: cluster1-backrest-restore namespace: pgo parameters: backrest-restore-from-cluster: cluster1 backrest-restore-opts: --type=time --target="2022-07-13 21:32:58" --no-repo1-storage-verify-tls backrest-storage-type: "s3" backrest-s3-verify-tls: "false" tasktype: restore
Restore fails without significant hint why:
2022-07-13 21:35:42.085 UTC [261] LOG: redirecting log output to logging collector process 2022-07-13 21:35:42.085 UTC [261] HINT: Future log output will appear in directory "pg_log". /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections /tmp:5432 - rejecting connections 2022-07-13 21:35:51,038 INFO: Lock owner: None; I am cluster1-bootstrap--1-wfcvb 2022-07-13 21:35:51,038 INFO: not healthy enough for leader race /tmp:5432 - no response 2022-07-13 21:35:51,093 INFO: bootstrap in progress 2022-07-13 21:35:51,104 INFO: removing initialize key after failed attempt to bootstrap the cluster 2022-07-13 21:35:51,116 INFO: renaming data directory to /pgdata/cluster1_2022-07-13-21-35-51 Traceback (most recent call last): File "/usr/bin/patroni", line 11, in <module> load_entry_point('patroni==2.1.1', 'console_scripts', 'patroni')() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 171, in main return patroni_main() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 139, in patroni_main abstract_main(Patroni, schema) File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main controller.run() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 109, in run super(Patroni, self).run() File "/usr/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run self._run_cycle() File "/usr/lib/python3.6/site-packages/patroni/__init__.py", line 112, in _run_cycle logger.info(self.ha.run_cycle()) File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1471, in run_cycle info = self._run_cycle() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1345, in _run_cycle return self.post_bootstrap() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1238, in post_bootstrap self.cancel_initialization() File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1231, in cancel_initialization raise PatroniFatalException('Failed to bootstrap cluster') patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
The next boostrap process shows infinite loop:
Thu Jul 14 01:20:51 UTC 2022 WARN: Detected an earlier failed attempt to initialize Thu Jul 14 01:20:51 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again Thu Jul 14 01:20:51 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51 Thu Jul 14 01:21:01 UTC 2022 WARN: Detected an earlier failed attempt to initialize Thu Jul 14 01:21:01 UTC 2022 INFO: Correct the issue, remove '/pgdata/cluster1.initializing', and try again Thu Jul 14 01:21:01 UTC 2022 INFO: Your data might be in: /pgdata/cluster1_2022-07-13-21-35-51
The error is actually hidden on postgresql pvc:
cluster1_2022-07-13-21-35-51/pg_log/postgresql-Wed.log contains:
2022-07-13 21:35:50.337 UTC [268] LOG: restored log file "000000020000000000000011" from archive 2022-07-13 21:35:50.872 UTC [268] LOG: redo done at 0/11000268 system usage: CPU: user: 0.00 s, system: 0.09 s, elapsed: 6.63 s 2022-07-13 21:35:50.872 UTC [268] LOG: last completed transaction was at log time 2022-07-13 12:14:03.835829+00 2022-07-13 21:35:50.872 UTC [268] FATAL: recovery ended before configured recovery target was reached 2022-07-13 21:35:50.875 UTC [261] LOG: startup process (PID 268) exited with exit code 1 2022-07-13 21:35:50.875 UTC [261] LOG: terminating any other active server processes 2022-07-13 21:35:50.877 UTC [261] LOG: shutting down due to startup process failure 2022-07-13 21:35:50.899 UTC [261] LOG: database system is shut down
Suggested fix:
a) Show full postgresql log boostrap fails
b) Describe this situation in documentation and mention recovery steps:
kubectl -n pgo delete pgtasks.pg.percona.com cluster1-backrest-restore
kubectl -n pgo delete job cluster1-bootstrap
kubectl -n pgo delete pvc cluster1
At this step the cluster is ready to apply restore with proper coordinates, because we have specified target restore time after latest restore time in backup repository, we can safely use restore without --type=time or --type=lsn and --target.
apiVersion: pg.percona.com/v1 kind: Pgtask metadata: labels: pg-cluster: cluster1 pgouser: admin name: cluster1-backrest-restore2 namespace: pgo spec: name: cluster1-backrest-restore2 namespace: pgo parameters: backrest-restore-from-cluster: cluster1 backrest-restore-opts: --no-repo1-storage-verify-tls backrest-storage-type: "s3" backrest-s3-verify-tls: "false" tasktype: restore