mysql_up fluctuates against MariaDB 10.1

Description

Out of all the MySQL hosts in PMM Demo, only mdb101 has a fluctuating mysql_up value.  is there something about MariaDB 10.1 that is causing this issue?  Can we please investigate as this may be affecting other PMM users with MariaDB.  Thanks,

https://pmmdemo.percona.com/graph/dashboard/db/prometheus-exporter-status?refresh=1m&panelId=40&fullscreen&orgId=1&var-interval=$__auto_interval&var-host=mdb101

 

https://pmmdemo.percona.com/prometheus/graph?g0.range_input=2h&g0.step_input=1s&g0.stacked=0&g0.expr=mysql_up%7Binstance%3D%22mdb101%22%7D&g0.tab=0

How to test

None

How to document

None

Attachments

4

Smart Checklist

Activity

Show:

Roma Novikov January 31, 2019 at 5:59 PM

can't   see how we can reproduce this.  Yes I can observe this on the Demo   .. 

I've used our Stage system and installed  MariaDB server .  I see https://18.222.190.154/graph/d/o2zrwGNmz/prometheus-exporter-status?from=now-30m&to=now&var-interval=$__auto_interval_interval&var-host=MD_NODE-1&refresh=1m&orgId=1 no gaps and no problems. 

 

 

Michael Coburn January 31, 2019 at 5:21 PM

Hi -

Yes we need to find out if this is DB or exporter issue.  I suspect more exporter because we have many hosts that are heavily loaded (ps57) and yet it doesn't fail any checks.

Roma Novikov January 18, 2019 at 10:08 PM

, can you check this on Mysql Graph, not Exporter Status. Because Exporter status Dashboard is for getting more insights into what going on with the system

As I can see on https://pmmdemo.percona.com/graph/d/MQWgroiiz/mysql-overview?from=1547833042031&to=1547836052728&var-interval=$__auto_interval_interval&var-host=mdb102&refresh=1m&orgId=1 (For example) there some spikes in DB and System metrics

On https://pmmdemo.percona.com/graph/d/o2zrwGNmz/prometheus-exporter-status?from=1547833042031&to=1547836052728&orgId=1&var-interval=$__auto_interval_interval&var-host=mdb102 you can see some "Mysql down "events at the same time as you see MysqlD exporter Errors on HR.


So we need to understand - is this DB or Exporter problem. Maybe this server can't survive spikes and failing on some 1-sec resolution

Also, Alerts should not be based on 1-sec metrics

Details

Assignee

Reporter

Priority

Components

Needs QA

Yes

Needs Doc

Yes

Affects versions

Smart Checklist

Created February 6, 2018 at 9:55 PM
Updated March 27, 2024 at 2:58 PM