The PMM Server API (via /v1/readyz) now also returns Grafana status information in addition to that for Prometheus.

General

Escalation

General

Escalation

Description

Currently, our /v1/readyz readiness pmm-managed API checks only Prometheus status (and, indirectly, returns nothing if nginx, pmm-managed, or PostgreSQL is down). Managed services require a check for Grafana too.

DoD

/v1/readyz returns an error if Grafana is no ready (down, starting up, or shutting down).

Implementation

Check what Grafana Health API returns when Grafana is starting up or shutting down.
Add a method to our Grafana client to access that API. We might need to expect a response body for that, not only the status code.
Use that method in readiness API.

Discussion

We are not checking `supervisorctl status` output (as used by update mechanism) as this is too brittle and a constant source of various tricky update bugs.

How to test

None

How to document

None

Linked work items

causes

PMM-2004

Add /ping alias to nginx

relates to

PMM-4379

Update APIs stubs

Activity

Alexey Palazhchenko
April 20, 2020 at 9:41 AM

Merged into .0 branch.

C W
January 8, 2020 at 12:00 PM

that's fine, so long as all services that should be in the RUNNING state under normal operations are checked

Alexey Palazhchenko
January 8, 2020 at 11:16 AM

Please rely only on:

response code is 200 = container is ready;
any other response code or no code at all = container is not ready.

Do not really on other response codes, any response body (including empty JSON), etc. Supporting many failure modes requires a disproportionate amount of effort to the benefits.

C W
January 8, 2020 at 11:01 AM

we require v1/readyz to confirm that everything is ready, not just Prometheus. In particular, Grafana needs to be monitored as there is no clean way to check that we can interact with the API.

Also, you currently get an HTML 500 response by stopping pmm-managed, so adding that will presumably require NGINX adjustments to return a JSON 500 when requesting with Content-type: application/json

Alexey Palazhchenko
July 30, 2019 at 10:57 AM

For to plan / prioritize future work.

Resize issue view side panel

Done

Details

Assignee

Unassigned

Reporter

Tim Vaillancourt(Deactivated)

Priority

High

Components

Labels

msp_blocker

Needs QA

Yes

Needs Doc

Fix versions

2.06.0

Story Points

Sprint

None

Smart Checklist Progress

0/1

Created January 23, 2018 at 2:34 PM

Updated April 22, 2025 at 7:12 AM

Resolved April 22, 2020 at 9:34 AM

The PMM Server API (via /v1/readyz) now also returns Grafana status information in addition to that for Prometheus.

Description

DoD

Implementation

Discussion

How to test

How to document

Linked work items

causes

relates to

Activity

Alexey Palazhchenko April 20, 2020 at 9:41 AM

C W January 8, 2020 at 12:00 PM

Alexey Palazhchenko January 8, 2020 at 11:16 AM

C W January 8, 2020 at 11:01 AM

Alexey Palazhchenko July 30, 2019 at 10:57 AM

Details

Assignee

Reporter

Priority

Components

Labels

Needs QA

Needs Doc

Fix versions

Story Points

Sprint

Smart Checklist Progress

Alexey Palazhchenko
April 20, 2020 at 9:41 AM

C W
January 8, 2020 at 12:00 PM

Alexey Palazhchenko
January 8, 2020 at 11:16 AM

C W
January 8, 2020 at 11:01 AM

Alexey Palazhchenko
July 30, 2019 at 10:57 AM