Topology-identifying labels for metrics in mongodb_exporter v2
Description
How to test
How to document
is triggering
Smart Checklist
Activity
Akira Kurogane August 24, 2020 at 12:35 PM
re "hp" label: for PMM instance-id will be ID of the agent, so this will be a unique ID. we'll also add our standard labels where will be the host as the label.
So, maybe we can skip this label
I agree. I was just coming to this ticket to say 'forget "hp", let's use instance' myself.
There's a problem with not being able to distinguish between different mongod nodes (and maybe mongos nodes) on the same host. But I think we can deal with that later.
I've come to think the better solution (for later) is to export the hostname + port as a string somehow, and use relabel_configs such as is described in https://www.robustperception.io/controlling-the-instance-label in the prometheus config to get the port as well in the "instance" label. Eg:
For now not possible as neither mongodb_ss_host (serverStatus.host) nor any other strings are exported.
Roma Novikov August 19, 2020 at 8:48 AM
re "hp" label: for PMM instance-id will be ID of the agent, so this will be a unique ID. we'll also add our standard labels where will be the host as the label.
So, maybe we can skip this label
Details
Details
Assignee
Reporter
Priority
Components
Needs QA
Fix versions
Story Points
Sprint
Smart Checklist
Open Smart Checklist
Smart Checklist

To every prometheus metric (excluding the v1 backward compatibility ones) I suggest we add these labels to make topological place and replica state easy to work with. And I strongly request we do this before we release mongodb_exporter v2 first GA version.
Suggested format (not certain on the label names entirely yet).
The labels are:
cl_role: Cluster role - "shardsvr", "configsvr", "mongos" or "". Set "mongos" if there is a sharding.configDB config value set. For others whatever the value for sharding.clusterRole is. It will be empty when a non-sharded replicaset or standalone mongod node.
cl_id: It is possible to get a unique GUID value shared by all nodes in any cluster, or non-sharded replicaset. In short the clusterId value in a cluster, the replicaSetId of a non-sharded cluster. Empty string if a standalone mongod. In a cluster it is db.getSiblingDB("config").getCollection("version").findOne().clusterId on configsvr or mongos, or db.getSiblingDB("admin").system.version.findOne({_id: "shardIdentity"}).clusterId on shard. For a non-shared rs member use rs.conf().settings.replicaSetId.
hp (host+ mongod port). This can be found in db.adminCommand({getDiagnosticData: 1}).data.host string value, or db.hostInfo().system.hostname. N.b. TCP port is needed, we can't rely on prometheus default "instance" label alone. (If an exporter could set it's own "instance" value and include the mongod/mongos port that would be better than having this "hp" field I propose. I don't know if that's possible though.)
rs_nm (replica set name). This will be in db.adminCommand({getDiagnosticData: 1}).data.replSetGetStatus.set, or rs.conf()._id.
rs_state: Current replica state. In db.adminCommand({getDiagnosticData: 1}).data.replSetGetStatus.myState as numeric value (1=PRIMARY, 2=SECONDARY, etc.), or the ...replSetGetStatus.members[x].stateStr where the x member has self=true.
rs_state is only one that can vary without restarting a mongod/mongos node. The others will be the constant. Even even after restart they'll only change if the mongodb_exporter is pointed to a mongod/mongos node that has different location or different role.
The benefit of these labels are:
(cl_id): Ability to automatically group by cluster, or non-sharded replicaset. With mongodb_exporter v1 manual input is needed in the prometheus config that adds every mongod or mongos node's exporter as a target. (I agree users would prefer a human-recognizable name but these GUID or GUID-like values are the only thing inside mongodb itself.)
(rs_state): In a replica set it gives an easy way to exclude secondaries when only the primary's stats matter (e.g. operations per second, oplog window) or exclude primary when it's only the secondary's stats that it matter (e.g. replication lag).
What we have to do with so many dashboard graphs's equations now without this makes the equations very long and unreadable. In a nutshell the hack is to aggregate-join the replset myState metric to add the rs state label, then aggregate it again in a negating way (e.g. + and then -, or * and then / ) to get the metric value interference removed.
(cl_role): Easily, automatically exclude the configsvr mongod node's metrics in cluster views.
(cl_role): Easily, automatically exclude mongos nodes. mongos nodes metric's don't have a role in most dashboards.
(rs_nm, hp, rs_state): Make it easy to compare shards metric's in a combined graph for a cluster,. Make it easy to select our way through the topology.