pg_stat_monitor hangs primary instance and it's impossible to disable it

Description

pg_stat_monitor extension enable unconditionally in a reconcile loop.
This makes impossible to disable the extension in case of serious issues:

https://github.com/percona/percona-postgresql-operator/blob/v2.2.0/internal/controller/postgrescluster/postgres.go#L244

All postgresql backends frozen with following stack trace:
1 do_futex_wait.prop,__new_sem_wait_slow.prop.0,PGSemaphoreLock,LWLockAcquire,pgsm_store,pgsm_ExecutorEnd,PortalCleanup,PortalDrop,exec_simple_query,PostgresMain,ServerLoop,PostmasterMain,main

This makes primary pod unready, but not causing "database" container to be restarted, because patroni passes liveness check. As a result the instance loosing servers one by one until completely outage.

Disabling pg_stat_monitor could be a workaround, but the operator re-installs it with reconcile loop in all databases and there is no .spec parameter to disable this behavior.

Environment

None

AFFECTED CS IDs

CS0041658

Activity

Show:

Nickolay Ihalainen November 7, 2023 at 3:00 PM
Edited

Done

Details

Assignee

Reporter

Needs QA

Yes

Fix versions

Affects versions

Priority

Smart Checklist

Created November 7, 2023 at 12:41 PM
Updated March 8, 2024 at 2:10 PM
Resolved November 27, 2023 at 2:58 PM