Tracking deployment methods and platforms in Telemetry
Description
How to test
How to document
blocks
relates to
Activity

duygu.aksoy February 16, 2023 at 11:46 AM
it is on you

Denys Kondratenko February 8, 2023 at 1:47 PM

Denys Kondratenko December 28, 2022 at 12:42 PM
pmm cli uses container labels to identify install method:
CC
So sooner or later we need some API, file or other identifier that could be used to identify install/upgrade method.
For example upgrade method proposed by uses API, so that could be logged/audit and discovered by telemetry probably.

Denys Kondratenko December 16, 2022 at 10:13 PM
when this "Definition of Done" will be defined, sure. See my concerns on the WIP Telemetry confluence page: https://confluence.percona.com/display/PM/WIP+Telemetry .
Is that new DoD applied Ex post facto? Probably not. It is a new task. And as could be seen in the definition - it is a quite a big topic.
As this data currently doesn't exist (and probably can't) in current sources - it can't be queried. As we agreed before, if it is a limitation of a framework - we need the Telemetry team to investigate and provide either new interfaces or guidance.
It doesn't look like this data could be metric data as this is more or less constant and probably provides some inside information that might need to be secured against exposing some critical info.
Or maybe not then this a question of node_exporter and metrics collection, probably textfile collector (maybe we need an additional file as, "pmm-distribution").
So Observability/Telemetry team easily can textfile that if that is the way to go.
But it looks like there is a need to query some API that could execute some code or query any existing API (like list DBaaS DB Clusters). But that probably should be better protected so can be used only by ppl that have the right permissions.
So the issue here, and probably where the confusion starts - there is no data currently in the sources that Telemetry could query. So either we need guidance - where this data belongs (but then it is still Telemetry/Observability) or we need new interfaces.
>however individual teams own providing data to it
data exists, as we have API first - it is all there, almost every functionality has API, and could be tracked by either calling the API or auditing it.
Real examples:
Helm - there is no feasible way for helm to push data to any of the data sources supported during installation. We can probably mount a special file from configmap to indicate that it is a Helm, or maybe querying k8s API directly could be a way.
So data actually exists - k8s API. Or we can push additional data - files. But that is up to the design.
So please get some committee and either define new interfaces or add the ability to query API endpoints. Data is there or could be provided, but there is a need to change telemetry.

dave.poole December 16, 2022 at 6:12 PM
As a follow up and related to the work to setup and create the Observability team (effective Jan 1)
The Observability teams owns the telemetry system (it's framework/architecture/how it works), however individual teams own providing data to it. Teams should consider it a part of the "Definition of Done" to properly instrument the features/solutions they implement. After all Product teams are responsible for the OUTCOMES their team produces and you cannot understand or measure that without telemetry data. If it is unclear on how a team should best implement that I would expect them to consult with the Observability Lead (Alexey as mentioned) and the architect team for assistance/direction/standards.
Observability staffing/responsibility details include as the PM. The Design Lead is undefined currently, is figuring that out
Back to the original problem statement of this ticket. So to Denys' feedback, if addressing this needs changes to the telemetry system itself, I would expect the observability team to tackle that BUT tied to that is what are the changes needed and that would come from whatever team was looking to add the new data which I suspect would be Core (seeing settings / installation type metrics in the ticket description). Lucky Duygu is the PM for both so she can figure out roadmap/planning etc.
Details
Details
Assignee
Reporter

Problem:
There are a number of ways to install PMM Server and PMM Clients:
docker
AWS market place
Virtual Box, KVM and etc
docker, podman
Helm
easy-install script
packages (client)
binaries (client)
pmm cli
ansible PoC: https://github.com/Percona-Lab/install-repo-pmm-server
There are a number of systems to run:
AWS instance
local VM
docker
podman
kubernetes
as a service binary in OS
docker swarm
nomad
etc
There are different vendors for the systems:
vanilla k8s
managed k8s (EKS, GKE and etc)
cloud vendors (AWS, Linode and etc)
OpenShift
VMWare Tanzu
etc
There should be a method to have telemetry that answers these questions:
where is it run?
system
platform
vendor
how was it installed?