All work

Select view

List view

Detail view

Select search mode

Basic

JQL

Allow the pbm logs command to retrieve complete log from pbm-agents during physical restore
PBM-1092
[Phys Backups] Handle (Location50915) A checkpoint took place while opening a backup cursor.
PBM-803
Resolved issue: PBM-803
cannot run physical restores with non-voting/arbiter/delayed replica members
K8SPSMDB-875
Resolved issue: K8SPSMDB-875

3 of 3

Allow the pbm logs command to retrieve complete log from pbm-agents during physical restore

General

Escalation

General

Escalation

Description

Current scenario:

Regardless of the configuration(Replica Set or Sharded Cluster), if we watch the logs from the pbm logs command, it does not bring the complete information from agents.

Instead, it shows some summarized info:

2023-04-07T17:11:48Z I [rs0/myubuntu:27017] got command restore [name: 2023-04-07T17:11:47.175923427Z, backup name: 2023-04-07T15:37:45Z] <ts: 1680887507>
2023-04-07T17:11:48Z I [rs0/myubuntu:27017] got epoch {1680882800 2}
2023-04-07T17:11:48Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] backup: 2023-04-07T15:37:45Z
2023-04-07T17:11:48Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restore started
2023-04-07T17:11:49Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] moving to state running
2023-04-07T17:12:22Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restoring users and roles
2023-04-07T17:12:22Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] moving to state dumpDone
2023-04-07T17:12:24Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] starting oplog replay
2023-04-07T17:12:25Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] oplog replay finished on {1680881898 2}
2023-04-07T17:12:26Z I [rs0/myubuntu:27017] [restore/2023-04-07T17:11:47.175923427Z] restore finished successfully

Above, we can see it acknowledged a restore command, and that started on node [rs0/myubuntu:27017].

If we want to check the full execution of that, or if there was any problem on the agent we need to go to rs0/myubuntu:27017 node and check via journactl command.

Proposed feature.

Centralize all necessary logs for troubleshooting pbm on pbm-CLI.
- In that way, we reduce the problem of missing logs while analyzing an issue.
- Also, in large environments such as sharded clusters, collecting journal log node by node might take a considerable time.

Environment

None

AFFECTED CS IDs

CS0034448

Details

Assignee

radoslaw.szulgo

Reporter

Jean da Silva

Labels

Needs QA

Yes

Priority

Low

Created April 7, 2023 at 5:24 PM

Updated November 26, 2024 at 4:23 PM

Activity

Show:

radoslaw.szulgo
September 19, 2024 at 9:36 AM

After the discussion with the team, we’ll tentatively plan for the next year (no urgency) when we’ll come back to physical backup/restore improvements.

radoslaw.szulgo
September 19, 2024 at 8:43 AM

I agree. I meant that PBM should send all logs to PMM. Optionally, PMM pulls logs via CLI/SDK from all PBM agents.

Dmytro Zghoba
September 19, 2024 at 8:38 AM

@radoslaw.szulgo PMM is not aware of PBM internals (never should be). It should use PBM CLI/SDK to get the logs right for any deployed version.

radoslaw.szulgo
September 17, 2024 at 12:29 PM

At best, the central point of all logs should be PMM.

andrew.pogrebnoi
April 10, 2023 at 12:03 PM

Hi @Jean da Silva

Just a heads up.

`pbm logs` by default shows the last 20 lines of "Info" severity level logs. `pbm logs -sD -t0` - will show all logs.

Although there are a few exceptions for the data that is stored in pbm logs currently:

An output of mongobackup and mongorestore tools for logical backups and restores respectively. Mongotools set stderr as output for its logs and it wasn't a clear and easy way to redirect it to pbm logs (we have to revisit it). Besides these logs are only about which collection currently being processed and we catch and store in pbm logs any errors that have occurred there during the execution.
Almost all output of pbm-agents during physical (and incremental) restores. This is due to PSMDB being shut down during the restore. There is a ticket to address that: https://jira.percona.com/browse/PBM-778. And a commit https://github.com/percona/percona-backup-mongodb/commit/f7b2395efcba086c0c2475a0e6ddc349e33a6f89 that will be included in the next release (v2.1.0). It doesn't solve the issue but makes improvements in that direction. Namely, physical restores now start buffering and dump restore logs to the storage (into `.pbm.restore/<restore_name>/log/`). So later we can add improvements to gather that logs back to the pbm db during the `resync` or display them directly from the storage via something like `pbm logs -c </path/to/pbm/conf>` if the restore went wrong and PSMDB is down.