unknown transaction id error during logical restore

Description

error from CI tests

Error: operation failed with: waiting for dumpDone: cluster failed: reply oplog: replay chunk 1709119532.1709119543: apply oplog for chunk: applying a transaction entry: apply txn: { "Timestamp": { "T": 1709119532, "I": 13 }, "Term": 1, "Hash": null, "Version": 2, "Operation": "c", "Namespace": "admin.$cmd", "Object": [ { "Key": "commitTransaction", "Value": 1 }, { "Key": "commitTimestamp", "Value": { "T": 1709119532, "I": 11 } } ], "Query": null, "UI": null, "LSID": "SAAAAAVpZAAQAAAABKMucOOQPE3khcq6dXEb5eMFdWlkACAAAAAAY5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fgA", "TxnNumber": 1, "PrevOpTime": "HAAAABF0cwAFAAAALBjfZRJ0AAEAAAAAAAAAAA==" }: unknown transaction id SAAAAAVpZAAQAAAABKMucOOQPE3khcq6dXEb5eMFdWlkACAAAAAAY5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fgA-1

Environment

None

Activity

Show:

Dmytro Zghoba March 12, 2024 at 3:42 PM

The fix includes saving oplog from an older active transaction timestamp.

Oleksandr Havryliak February 21, 2024 at 9:54 AM

Let's split this issue into two issues:


1. the commit message has been sent to transaction coordinator and two-phase commit protocol has been started by coordinator. In this case we are able to see this transactions in config.transactions collection with prepared or inProgress state by querying

db.getSiblingDB("config").transactions.find({ "state": { $in: ["prepared", "inProgress"] } })

So the fix from the upstream works fine


2. the commit message has not been sent but there is opened client session with transaction. In this case we are able to see this transactions in config.transactions but there is no state field. We are still able to see this session by querying

db.currentOp({ "transaction": { $exists: 1 } })

and the output will be like:

{ inprog: [ { type: 'idleSession', host: 'rs201:27017', desc: 'inactive transaction', client: '172.23.0.18:36008', connectionId: Long('107'), appName: '', clientMetadata: { driver: { name: 'PyMongo', version: '4.6.1' }, os: { type: 'Linux', name: 'Linux', architecture: 'x86_64', version: '5.15.0-94-generic' }, platform: 'CPython 3.11.6.final.0', mongos: { host: 'mongos:27017', client: '172.23.0.2:60670', version: '7.0.5-3' } }, lsid: { id: UUID('f7dc265d-e971-4b38-a0c1-ebac7d358df7'), uid: Binary.createFromBase64('Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=', 0) }, transaction: { parameters: { txnNumber: Long('2'), txnRetryCounter: 0, autocommit: false, readConcern: { afterClusterTime: Timestamp({ t: 1708360399, i: 68 }), provenance: 'clientSupplied' } }, readTimestamp: Timestamp({ t: 0, i: 0 }), startWallClockTime: '2024-02-19T16:33:19.792+00:00', timeOpenMicros: Long('5655246'), timeActiveMicros: Long('383'), timeInactiveMicros: Long('5654863'), expiryTime: '2024-02-19T16:34:19.792+00:00' }, waitingForLock: false, active: false, locks: { FeatureCompatibilityVersion: 'w', ReplicationStateTransition: 'w', Global: 'w', Database: 'w', Collection: 'w' }, lockStats: { FeatureCompatibilityVersion: { acquireCount: { r: Long('1'), w: Long('1') } }, ReplicationStateTransition: { acquireCount: { w: Long('4') } }, Global: { acquireCount: { r: Long('1'), w: Long('1') } }, Database: { acquireCount: { w: Long('1') } }, Collection: { acquireCount: { w: Long('1') } }, Mutex: { acquireCount: { r: Long('9') } } } } ], ok: 1 }


We should decide how to properly deal with the second issue

Done

Details

Assignee

Reporter

Regression Issue

Yes

Found by Automation

Yes

Needs QA

Yes

Fix versions

Priority

Smart Checklist

Created December 15, 2023 at 8:51 AM
Updated March 12, 2024 at 3:42 PM
Resolved March 4, 2024 at 10:20 AM