The PMM Server API (via /v1/readyz) now also returns Grafana status information in addition to that for Prometheus.
Description
How to test
How to document
relates to
Activity

Alexey Palazhchenko April 20, 2020 at 9:41 AM
Merged into .0 branch.

C W January 8, 2020 at 12:00 PM
that's fine, so long as all services that should be in the RUNNING state under normal operations are checked

Alexey Palazhchenko January 8, 2020 at 11:16 AM
Please rely only on:
response code is 200 = container is ready;
any other response code or no code at all = container is not ready.
Do not really on other response codes, any response body (including empty JSON), etc. Supporting many failure modes requires a disproportionate amount of effort to the benefits.

C W January 8, 2020 at 11:01 AM
we require v1/readyz
to confirm that everything is ready, not just Prometheus. In particular, Grafana needs to be monitored as there is no clean way to check that we can interact with the API.
Also, you currently get an HTML 500 response by stopping pmm-managed
, so adding that will presumably require NGINX adjustments to return a JSON 500 when requesting with Content-type: application/json

Alexey Palazhchenko July 30, 2019 at 10:57 AM
For to plan / prioritize future work.
Details
Details
Assignee
Reporter

Currently, our /v1/readyz readiness pmm-managed API checks only Prometheus status (and, indirectly, returns nothing if nginx, pmm-managed, or PostgreSQL is down). Managed services require a check for Grafana too.
DoD
/v1/readyz returns an error if Grafana is no ready (down, starting up, or shutting down).
Implementation
Check what Grafana Health API returns when Grafana is starting up or shutting down.
Add a method to our Grafana client to access that API. We might need to expect a response body for that, not only the status code.
Use that method in readiness API.
Discussion
We are not checking `supervisorctl status` output (as used by update mechanism) as this is too brittle and a constant source of various tricky update bugs.