General
Escalation
General
Escalation
Description
How to test
None
How to document
None
AFFECTED CS IDs
CS0038570
Activity
Show:
Leonardo Bacchi Fernandes August 23, 2023 at 9:19 PM
Hello Roma,
Yes, that is the long-term solution the customer is going with, and it avoids this issue altogether. If I'm not mistaken, it doesn't work if the PMM account is not in the same AWS account as all the monitored RDSs, but it is the best solution otherwise.
Roma Novikov August 23, 2023 at 9:06 AM
As a good workaround here - use AMI roles for the PMM server and rotate them instead of credentials.
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Reporter
Priority
Medium
Needs QA
Yes
Needs Doc
Yes
Smart ChecklistOpen Smart Checklist
Open Smart Checklist
Created August 18, 2023 at 9:40 PM
Updated July 23, 2024 at 1:08 AM
When you add an RDS instance to PMM (https://docs.percona.com/percona-monitoring-and-management/setting-up/client/aws.html#adding-an-amazon-rds-aurora-or-remote-instance), PMM will keep track of which aws_access_key and aws_secret_key was used by each instance (as you might have different AWS users to monitor different RDSs). It uses that aws_access_key/aws_secret_key combination to retrieve OS data from CloudWatch.
If you rotate the key (generate a new aws_access_key/aws_secrete_key and disable the old credentials), PMM stops tracking OS metrics*, as it will fail to authenticate to CloudWatch with the old credentials (as expected).
Currently, the only way to update each instance's aws_access_key and aws_secret_key is by removing the instance and discovering it again with the new credentials, which is not doable for a large number of monitored instances.
One workaround is to manually update the aws_access_key and aws_secret_key in the PMM Server's PostgreSQL database (it keeps the information on the database pmm-managed, table agents).
It would be nice to have a way to do this through the PMM GUI, as rotating keys regularly is a security best practice, and currently, it is not an easy task.
*PMM only seems to stop tracking the OS metrics once you discover a new RDS instance, as it seems to refresh a token when that is done. Here is the output from the RDS_EXPORTER logs:
INFO[2023-08-18T16:36:18.503+00:00] ts=2023-08-18T16:36:18.503Z caller=main.go:41 level=info msg="Starting RDS exporter (version=0.7.3, branch=, revision=)" agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.503+00:00] ts=2023-08-18T16:36:18.503Z caller=main.go:42 level=info msg="Build context (go=go1.20.1, user=, date=2023-07-11T16:48:47+0000)" agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.503+00:00] ts=2023-08-18T16:36:18.503Z caller=sessions.go:49 level=info component=sessions msg="Creating sessions..." agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.513+00:00] Sending status: STARTING (port 42003). agentID=pmm-server/rds component=agent-process type=rds_exporter ERRO[2023-08-18T16:36:18.550+00:00] ts=2023-08-18T16:36:18.550Z caller=sessions.go:122 level=error component=sessions msg="Failed to get resource IDs." error="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: 710e1fe2-bc6a-4480-b529-c1dc95a4396d" agentID=pmm-server/rds component=agent-process type=rds_exporter ERRO[2023-08-18T16:36:18.982+00:00] ts=2023-08-18T16:36:18.982Z caller=sessions.go:145 level=error component=sessions msg="Skipping us-east-1/database-1 - can't determine resourceID." agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.982+00:00] Region Instance Resource ID Interval agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.982+00:00] us-east-1 leordstest db-EPPRXBTMDZCKWLIG7ZHGYW7QOE 1m0s agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.982+00:00] ts=2023-08-18T16:36:18.982Z caller=sessions.go:169 level=info component=sessions msg="Using 1 sessions." agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:18.982+00:00] ts=2023-08-18T16:36:18.982Z caller=collector.go:52 level=info component=enhanced msg="Updating enhanced metrics every 1m0s." agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:19.112+00:00] ts=2023-08-18T16:36:19.112Z caller=main.go:77 level=info msg="Basic metrics : http://:42003/basic" agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:19.112+00:00] ts=2023-08-18T16:36:19.112Z caller=main.go:78 level=info msg="Enhanced metrics: http://:42003/enhanced" agentID=pmm-server/rds component=agent-process type=rds_exporter INFO[2023-08-18T16:36:19.498+00:00] Sending status: RUNNING (port 42003). agentID=pmm-server/rds component=agent-process type=rds_exporter ERRO[2023-08-18T16:37:08.060+00:00] ts=2023-08-18T16:37:08.060Z caller=collector.go:78 level=error component=basic msg="No scraper for us-east-1/database-1 (AKIAXAU2JF3DOEQMLUMK), skipping." agentID=pmm-server/rds component=agent-process type=rds_exporter