All work

Select view

Select search mode

 

Allow the pbm logs command to retrieve complete log from pbm-agents during physical restore

Description

Current scenario:

Regardless of the configuration(Replica Set or Sharded Cluster), if we watch the logs from the pbm logs command, it does not bring the complete information from agents.

 

Instead, it shows some summarized info:

2023-04-07T17:11:48Z I [rs0/myubuntu:27017] got command restore [name: 2023-04-07T17:11:47.175923427Z, backup name: 2023-04-07T15:37:45Z] <ts: 1680887507> 2023-04-07T17:11:48Z I [rs0/myubuntu:27017] got epoch {1680882800 2} 2023-04-07T17:11:48Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] backup: 2023-04-07T15:37:45Z 2023-04-07T17:11:48Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restore started 2023-04-07T17:11:49Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] moving to state running 2023-04-07T17:12:22Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restoring users and roles 2023-04-07T17:12:22Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] moving to state dumpDone 2023-04-07T17:12:24Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] starting oplog replay 2023-04-07T17:12:25Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] oplog replay finished on {1680881898 2} 2023-04-07T17:12:26Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restore finished successfully

Above, we can see it acknowledged a restore command, and that started on node [rs0/myubuntu:27017].

 

If we want to check the full execution of that, or if there was any problem on the agent we need to go to rs0/myubuntu:27017 node and check via journactl command.

 

Proposed feature.

  • Centralize all necessary logs for troubleshooting pbm on pbm-CLI.

    • In that way, we reduce the problem of missing logs while analyzing an issue. 

    • Also, in large environments such as sharded clusters, collecting journal log node by node might take a considerable time.

Environment

None

AFFECTED CS IDs

CS0034448

Details

Assignee

Reporter

Labels

Needs QA

Priority

Created April 7, 2023 at 5:24 PM
Updated November 26, 2024 at 4:23 PM

Activity

Show:

radoslaw.szulgo 
September 19, 2024 at 9:36 AM

After the discussion with the team, we’ll tentatively plan for the next year (no urgency) when we’ll come back to physical backup/restore improvements.

radoslaw.szulgo 
September 19, 2024 at 8:43 AM

I agree. I meant that PBM should send all logs to PMM. Optionally, PMM pulls logs via CLI/SDK from all PBM agents.

Dmytro Zghoba 
September 19, 2024 at 8:38 AM

PMM is not aware of PBM internals (never should be). It should use PBM CLI/SDK to get the logs right for any deployed version.

radoslaw.szulgo 
September 17, 2024 at 12:29 PM

At best, the central point of all logs should be PMM.

andrew.pogrebnoi 
April 10, 2023 at 12:03 PM

Hi  

Just a heads up.

`pbm logs` by default shows the last 20 lines of "Info" severity level logs. `pbm logs -sD -t0` - will show all logs.

Although there are a few exceptions for the data that is stored in pbm logs currently:

  1. An output of mongobackup and mongorestore tools for logical backups and restores respectively. Mongotools set stderr as output for its logs and it wasn't a clear and easy way to redirect it to pbm logs (we have to revisit it). Besides these logs are only about which collection currently being processed and we catch and store in pbm logs any errors that have occurred there during the execution.

  2. Almost all output of pbm-agents during physical (and incremental) restores. This is due to PSMDB being shut down during the restore. There is a ticket to address that: https://jira.percona.com/browse/PBM-778. And a commit https://github.com/percona/percona-backup-mongodb/commit/f7b2395efcba086c0c2475a0e6ddc349e33a6f89 that will be included in the next release (v2.1.0). It doesn't solve the issue but makes improvements in that direction. Namely, physical restores now start buffering and dump restore logs to the storage (into `.pbm.restore/<restore_name>/log/`). So later we can add improvements to gather that logs back to the pbm db during the `resync` or display them directly from the storage via something like `pbm logs -c </path/to/pbm/conf>` if the restore went wrong and PSMDB is down.