Issues

Select view

List view

Detail view

Select search mode

Basic

JQL

48 of 48

Pgbouncer server DNS lookup failed

General

Escalation

General

Escalation

Description

I deployed Percona operator on a K8s cluster v1.23 by 'Generic installation instruction' mentioned [here|https://docs.percona.com/percona-operator-for-postgresql/kubernetes.html.]
Checking the connectivity by following command:

connection gets stuck till it timed out.

It seems is a issue related to the pgbouncer configuration, as mentioned in this discussion: link

By manually edting the pgbouncer configmap putting the whole service fqdn (like: xxx.yyy.svc.cluster.local), instead of the only hostname (cluster1 in the example), another pod in the same namespace can successfully connect to the service, please have a loot at the `pgbouncer-cm` snippet below.

Environment

None

Details

Assignee

ege.gunes

Reporter

Mario Torrisi

Labels

Needs QA

Yes

Story Points

Affects versions

1.4.0

Priority

Medium

Smart Checklist

Created April 18, 2023 at 9:36 AM

Updated February 24, 2025 at 9:05 AM

Configure

Activity

Show:

Slava Sarzhan February 24, 2025 at 9:05 AM
Edited

this issue was fixed in PGO v2.6.0. We will have a release in one week. From your end, you can confirm the fix using the release branch or main, but please do not use it in production.

ege.gunes February 5, 2025 at 4:04 PM
Edited

Hi . Thank you very much for investigating this and documented your findings, much appreciated!

I think it’d be great to officially document suggestions for “optimal” DNS configuration. Do you continue to see this problem after configuration changes in your DNS servers?

Artem Reznikov November 22, 2024 at 5:39 PM

Hello!

I finally found a root of the problem.

By default, pgbouncer’s DNS library (c-ares) discards queries, that returned SERVFAIL (and couple of other codes: look for ARES_FLAG_NOCHECKRESP on for details).

In our k8s cluster we have our own local DNS-servers set on nodes, and I found that when you try to

from pod, you get SERVFAIL response.

When I changed one of the DNSes to 8.8.8.8 on all k8s nodes, problem has started occuring much more rarely, and when I changed both of DNS-servers to 8.8.8.8/4.4.4.4, it disappeared completely.

That is not a complete solution, but I think, that there is no possible solution from your side, it’s a issue in pgbouncer and I think, that the only one complete solution is a proper configuration of DNS-servers.

Hope that information will help someone (it costed me several days of research).

Artem Reznikov September 24, 2024 at 7:25 PM

Hello!

I am able to reproduce that issue.

Kubernetes cluster v 1.28.10, three nodes in LAN that are deployed by kubespray 2.24.2. Almost default configuration, just enabled helm and nginx-ingress addons.

I have used that instruction: , PPGO image v 2.4.1 from dockerhub (hash 3531696a36188) and pg-db Helm chart (main branch, commit 9c50b797 from )

There are a bunch of cluster services:

All of them are resolved normally in all of the cluster pods (dbms, pgbouncer, client pod), both by a short name and FQDN - psql, dig +search, nslookup are working correctly.

But pgbouncer still can’t resolve svc cluster-pg-db-primary:

I have tried a lot of ppgo-pgbouncer images: versions 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.3.1, 2.4.1, 2.5.0 (pgbouncer 1.17, 1.18, 1.21, 1.22, 1.23) with the same result.

There is an issue on pgbouncer github thread (), where some of the maintainers mentioned, that the problem may persist on pgbouncer with c-ares DNS variant.

I have noticed that all of the images are using pgbouncer with c-ares as a DNS:

Possibly that library has some issues.

Planned actions: rebuild docker image for proxy with another version of pgbouncer.

Hope that helps. If more information needed, please reply here.

ege.gunes August 28, 2024 at 9:14 AM

I need to note here that if you want to connect using postgres user or any user with SUPERUSER privileges, you need to enable spec.proxy.pgBouncer.exposeSuperusers