Tracking deployment methods and platforms in Telemetry

Description

Problem:

There are a number of ways to install PMM Server and PMM Clients:

There are a number of systems to run:

  • AWS instance

  • local VM

  • docker

  • podman

  • kubernetes

  • as a service binary in OS

  • docker swarm

  • nomad

  • etc

There are different vendors for the systems:

  • vanilla k8s

  • managed k8s (EKS, GKE and etc)

  • cloud vendors (AWS, Linode and etc)

  • OpenShift

  • VMWare Tanzu

  • etc

 

There should be a method to have telemetry that answers these questions:

  • where is it run?

    • system

    • platform

    • vendor

  • how was it installed?

 

How to test

None

How to document

None

Activity

duygu.aksoy 
February 16, 2023 at 11:46 AM

it is on you  

Denys Kondratenko 
February 8, 2023 at 1:47 PM

Denys Kondratenko 
December 28, 2022 at 12:42 PM

pmm cli uses container labels to identify install method:

https://github.com/percona/pmm/blob/f0ab3696c9de559108624a1f8329a465e9493070/admin/commands/pmm/server/docker/upgrade.go#L125

 

CC  

 

So sooner or later we need some API, file or other identifier that could be used to identify install/upgrade method.

 

For example upgrade method proposed by uses API, so that could be logged/audit and discovered by telemetry probably.

Denys Kondratenko 
December 16, 2022 at 10:13 PM

when this "Definition of Done"  will be defined, sure. See my concerns on the WIP Telemetry confluence page: https://confluence.percona.com/display/PM/WIP+Telemetry .

 

Is that new DoD applied Ex post facto? Probably not. It is a new task. And as could be seen in the definition - it is a quite a big topic.

As this data currently doesn't exist (and probably can't) in current sources - it can't be queried. As we agreed before, if it is a limitation of a framework - we need the Telemetry team to investigate and provide either new interfaces or guidance.

 

It doesn't look like this data could be metric data as this is more or less constant and probably provides some inside information that might need to be secured against exposing some critical info.

Or maybe not then this a question of node_exporter and metrics collection, probably textfile collector (maybe we need an additional file as, "pmm-distribution").

 

So Observability/Telemetry team easily can textfile that if that is the way to go.

 

 

But it looks like there is a need to query some API that could execute some code or query any existing API (like list DBaaS DB Clusters). But that probably should be better protected so can be used only by ppl that have the right permissions.

 

So the issue here, and probably where the confusion starts - there is no data currently in the sources that Telemetry could query. So either we need guidance - where this data belongs (but then it is still Telemetry/Observability) or we need new interfaces.

>however individual teams own providing data to it

data exists, as we have API first - it is all there, almost every functionality has API, and could be tracked by either calling the API or auditing it.

 

Real examples:

Helm - there is no feasible way for helm to push data to any of the data sources supported during installation. We can probably mount a special file from configmap to indicate that it is a Helm, or maybe querying k8s API directly could be a way.

So data actually exists - k8s API. Or we can push additional data - files. But that is up to the design.

 

So please get some committee and either define new interfaces or add the ability to query API endpoints. Data is there or could be provided, but there is a need to change telemetry.

 

dave.poole 
December 16, 2022 at 6:12 PM

As a follow up and related to the work to setup and create the Observability team (effective Jan 1)

The Observability teams owns the telemetry system (it's framework/architecture/how it works), however individual teams own providing data to it. Teams should consider it a part of the "Definition of Done" to properly instrument the features/solutions they implement. After all Product teams are responsible for the OUTCOMES their team produces and you cannot understand or measure that without telemetry data. If it is unclear on how a team should best implement that I would expect them to consult with the Observability Lead (Alexey as mentioned) and the architect team for assistance/direction/standards.

Observability staffing/responsibility details include as the PM. The Design Lead is undefined currently,   is figuring that out

Back to the original problem statement of this ticket. So to Denys' feedback, if addressing this needs changes to the telemetry system itself,  I would expect the observability team to tackle that BUT tied to that is what are the changes needed and that would come from whatever team was looking to add the new data which I suspect would be Core (seeing settings / installation type metrics in the ticket description). Lucky Duygu is the PM for both so she can figure out roadmap/planning etc.

 

 

 

 

Details

Assignee

Reporter

Priority

Components

Needs QA

Needs Doc

Created July 15, 2022 at 1:13 PM
Updated March 6, 2024 at 12:59 AM