percona mongodb cluster keep restarting
Description
Environment
Attachments
Smart Checklist
Activity

Jira Bot June 27, 2021 at 10:56 AM
Hello ,
It's been 52 days since this issue went into Incomplete and we haven't heard
from you on this.
At this point, our policy is to Close this issue, to keep things from getting
too cluttered. If you have more information about this issue and wish to
reopen it, please reply with a comment containing "jira-bot=reopen".

Jira Bot June 19, 2021 at 10:56 AM
Hello ,
It's jira-bot again. Your bug report is important to us, but we haven't heard
from you since the previous notification. If we don't hear from you on
this in 7 days, the issue will be automatically closed.

Jira Bot June 4, 2021 at 10:56 AM
Hello ,
I'm jira-bot, Percona's automated helper script. Your bug report is important
to us but we've been unable to reproduce it, and asked you for more
information. If we haven't heard from you on this in 3 more weeks, the issue
will be automatically closed.

Lalit Choudhary May 6, 2021 at 10:05 AM
Hi
Thank you for the report.
Do you still have this issue? You might also want to upgrade the operators version to the latest(currently 1.8.0) and test again.

Maurizio Vivarelli October 26, 2020 at 4:08 PM
This is another log fragment where I marked in red the portion that may be interesting, where nodes are identified by IP and not by name:
+ exec mongod --bind_ip_all --auth --dbpath=/data/db --port=27017 --replSet=rs0 --storageEngine=wiredTiger --relaxPermChecks --clusterAuthMode=x509 --slowms=100 --profile=1 --rateLimit=100 --enableEncryption --encryptionKeyFile=/etc/mongodb-encryption/encryption-key --encryptionCipherMode=AES256-CBC --wiredTigerCacheSizeGB=0.25 --wiredTigerCollectionBlockCompressor=snappy --wiredTigerJournalCompressor=snappy --wiredTigerIndexPrefixCompression=true --setParameter ttlMonitorSleepSecs=60 --setParameter wiredTigerConcurrentReadTransactions=128 --setParameter wiredTigerConcurrentWriteTransactions=128 --tlsMode preferTLS --tlsCertificateKeyFile /tmp/tls.pem --tlsAllowInvalidCertificates --tlsClusterFile /tmp/tls-internal.pem --tlsCAFile /etc/mongodb-ssl/ca.crt --tlsClusterCAFile /etc/mongodb-ssl-internal/ca.crt
2020-10-26T16:04:46.571+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2020-10-26T16:04:46.576+0000 W ASIO [main] No TransportLayer configured during NetworkInterface startup
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mymongo-rs0-0
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] db version v4.2.8-8
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] git version: 389dde50b8368b026e41abeeedc4498c24e27fd6
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] allocator: tcmalloc
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] modules: none
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] build environment:
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] distarch: x86_64
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] target_arch: x86_64
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] 476 MB of memory available to the process out of 3944 MB total system memory
2020-10-26T16:04:46.581+0000 I CONTROL [initandlisten] options: { net: { bindIp: "*", port: 27017, tls: { CAFile: "/etc/mongodb-ssl/ca.crt", allowInvalidCertificates: true, certificateKeyFile: "/tmp/tls.pem", clusterCAFile: "/etc/mongodb-ssl-internal/ca.crt", clusterFile: "/tmp/tls-internal.pem", mode: "preferTLS" } }, operationProfiling: { mode: "slowOp", rateLimit: 100, slowOpThresholdMs: 100 }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", enableEncryption: true, encryptionCipherMode: "AES256-CBC", encryptionKeyFile: "/etc/mongodb-encryption/encryption-key", relaxPermChecks: true }, setParameter: { ttlMonitorSleepSecs: "60", wiredTigerConcurrentReadTransactions: "128", wiredTigerConcurrentWriteTransactions: "128" }, storage: { dbPath: "/data/db", engine: "wiredTiger", wiredTiger: { collectionConfig: { blockCompressor: "snappy" }, engineConfig: { cacheSizeGB: 0.25, journalCompressor: "snappy" }, indexConfig: { prefixCompression: true } } } }
2020-10-26T16:04:46.654+0000 I STORAGE [initandlisten] Initializing KeyDB with wiredtiger_open config: create,config_base=false,extensions=[local=(entry=percona_encryption_extension_init,early_load=true,config=(cipher=AES256-CBC,rotation=false))],encryption=(name=percona,keyid=""),log=(enabled,file_max=5MB),transaction_sync=(enabled=true,method=fsync),
2020-10-26T16:04:48.928+0000 I STORAGE [initandlisten] Encryption keys DB is initialized successfully
2020-10-26T16:04:48.928+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=256M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],encryption=(name=percona,keyid="/default"),extensions=[local=(entry=percona_encryption_extension_init,early_load=true,config=(cipher=AES256-CBC)),],
2020-10-26T16:04:49.040+0000 I STORAGE [initandlisten] WiredTiger message [1603728289:40372][1:0x7f59e65f5c40], txn-recover: Recovering log 15 through 16
2020-10-26T16:04:49.455+0000 I STORAGE [initandlisten] WiredTiger message [1603728289:455617][1:0x7f59e65f5c40], txn-recover: Recovering log 16 through 16
2020-10-26T16:04:49.869+0000 I STORAGE [initandlisten] WiredTiger message [1603728289:869790][1:0x7f59e65f5c40], txn-recover: Main recovery loop: starting at 15/7040 to 16/256
2020-10-26T16:04:49.872+0000 I STORAGE [initandlisten] WiredTiger message [1603728289:872340][1:0x7f59e65f5c40], txn-recover: Recovering log 15 through 16
2020-10-26T16:04:50.169+0000 I STORAGE [initandlisten] WiredTiger message [1603728290:169059][1:0x7f59e65f5c40], txn-recover: Recovering log 16 through 16
2020-10-26T16:04:50.379+0000 I STORAGE [initandlisten] WiredTiger message [1603728290:379760][1:0x7f59e65f5c40], txn-recover: Set global recovery timestamp: (1603725030, 1)
2020-10-26T16:04:50.522+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1603725030, 1)
2020-10-26T16:04:50.566+0000 I STORAGE [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2020-10-26T16:04:50.566+0000 I STORAGE [initandlisten] The size storer reports that the oplog contains 161 records totaling to 60699 bytes
2020-10-26T16:04:50.566+0000 I STORAGE [initandlisten] Scanning the oplog to determine where to place markers for truncation
2020-10-26T16:04:50.573+0000 I STORAGE [initandlisten] WiredTiger record store oplog processing took 7ms
2020-10-26T16:04:50.579+0000 I STORAGE [initandlisten] Timestamp monitor starting
2020-10-26T16:04:50.592+0000 I CONTROL [initandlisten] ** WARNING: While invalid X509 certificates may be used to
2020-10-26T16:04:50.592+0000 I CONTROL [initandlisten] ** connect to this server, they will not be considered
2020-10-26T16:04:50.592+0000 I CONTROL [initandlisten] ** permissible for authentication.
2020-10-26T16:04:50.592+0000 I CONTROL [initandlisten]
2020-10-26T16:04:50.657+0000 I SHARDING [initandlisten] Marking collection local.system.replset as collection version: <unsharded>
2020-10-26T16:04:50.693+0000 I STORAGE [initandlisten] Flow Control is enabled on this deployment.
2020-10-26T16:04:50.693+0000 I SHARDING [initandlisten] Marking collection admin.system.roles as collection version: <unsharded>
2020-10-26T16:04:50.700+0000 I SHARDING [initandlisten] Marking collection admin.system.version as collection version: <unsharded>
2020-10-26T16:04:50.714+0000 I SHARDING [initandlisten] Marking collection local.startup_log as collection version: <unsharded>
2020-10-26T16:04:50.714+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2020-10-26T16:04:50.716+0000 I SHARDING [initandlisten] Marking collection local.replset.minvalid as collection version: <unsharded>
2020-10-26T16:04:50.716+0000 I SHARDING [initandlisten] Marking collection local.replset.election as collection version: <unsharded>
2020-10-26T16:04:50.735+0000 I REPL [initandlisten] Rollback ID is 1
2020-10-26T16:04:50.741+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1603725030, 1) (top of oplog: { ts: Timestamp(1603725030, 1), t: 1 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2020-10-26T16:04:50.741+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1603725030, 1)
2020-10-26T16:04:50.741+0000 I REPL [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
2020-10-26T16:04:50.741+0000 I SHARDING [initandlisten] Marking collection config.transactions as collection version: <unsharded>
2020-10-26T16:04:50.747+0000 I SHARDING [initandlisten] Marking collection local.oplog.rs as collection version: <unsharded>
2020-10-26T16:04:50.755+0000 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2020-10-26T16:04:50.755+0000 I SHARDING [LogicalSessionCacheReap] Marking collection config.system.sessions as collection version: <unsharded>
2020-10-26T16:04:50.755+0000 I CONTROL [LogicalSessionCacheReap] Failed to reap transaction table: NotYetInitialized: Replication has not yet been configured
2020-10-26T16:04:50.755+0000 I NETWORK [listener] Listening on /tmp/mongodb-27017.sock
2020-10-26T16:04:50.755+0000 I NETWORK [listener] Listening on 0.0.0.0
2020-10-26T16:04:50.755+0000 I NETWORK [listener] waiting for connections on port 27017 ssl
2020-10-26T16:04:51.769+0000 W NETWORK [replexec-0] The server certificate does not match the host name. Hostname: 10.0.15.111 does not match SAN(s): localhost, mymongo-rs0, mymongo-rs0.psmdb, mymongo-rs0.psmdb.svc.cluster.local, *.mymongo-rs0, *.mymongo-rs0.psmdb, *.mymongo-rs0.psmdb.svc.cluster.local,
2020-10-26T16:04:51.778+0000 W NETWORK [replexec-0] The server certificate does not match the host name. Hostname: 10.0.15.112 does not match SAN(s): localhost, mymongo-rs0, mymongo-rs0.psmdb, mymongo-rs0.psmdb.svc.cluster.local, *.mymongo-rs0, *.mymongo-rs0.psmdb, *.mymongo-rs0.psmdb.svc.cluster.local,
2020-10-26T16:04:51.783+0000 W REPL [replexec-0] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 3 for replica set rs0 maps to this node" while validating { _id: "rs0", version: 3, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "10.0.15.110:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "10.0.15.111:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { serviceName: "mymongo" }, slaveDelay: 0, votes: 1 }, { _id: 2, host: "10.0.15.112:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { serviceName: "mymongo" }, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f96e3275c4115d94247c2a7') } }
2020-10-26T16:04:51.783+0000 I REPL [replexec-0] New replica set config in use: { _id: "rs0", version: 3, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "10.0.15.110:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "10.0.15.111:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { serviceName: "mymongo" }, slaveDelay: 0, votes: 1 }, { _id: 2, host: "10.0.15.112:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { serviceName: "mymongo" }, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f96e3275c4115d94247c2a7') } }
2020-10-26T16:04:51.783+0000 I REPL [replexec-0] This node is not a member of the config
2020-10-26T16:04:51.783+0000 I REPL [replexec-0] transition to REMOVED from STARTUP
2020-10-26T16:04:51.784+0000 I REPL [replexec-0] Starting replication storage threads
2020-10-26T16:04:51.786+0000 I REPL [replexec-0] Starting replication fetcher thread
2020-10-26T16:04:51.786+0000 I REPL [replexec-0] Starting replication applier thread
2020-10-26T16:04:51.786+0000 I REPL [replexec-0] Starting replication reporter thread
2020-10-26T16:04:51.787+0000 I REPL [rsSync-0] Starting oplog application
2020-10-26T16:04:53.733+0000 I NETWORK [listener] connection accepted from 10.244.2.0:45032 #3 (1 connection now open)
2020-10-26T16:04:53.741+0000 I NETWORK [conn3] SSL mode is set to 'preferred' and connection 3 to 10.244.2.0:45032 is not using SSL.
2020-10-26T16:04:53.741+0000 I NETWORK [conn3] received client metadata from 10.244.2.0:45032 conn3: { driver: { name: "mongo-go-driver", version: "v1.3.4" }, os: { type: "linux", architecture: "amd64" }, platform: "go1.14.4" }
Details
Assignee
Lalit ChoudharyLalit ChoudharyReporter
Maurizio VivarelliMaurizio VivarelliLabels
Priority
Medium
Details
Details
Assignee

Reporter

Labels
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

installed with instruction in:
https://www.percona.com/doc/kubernetes-operator-for-psmongodb/kubernetes.html
operator version:1.5.0
kubernetes version: 1.18.3
error is:
Liveness probe failed: 2020-10-26 15:16:36.825 main.go:109 INFO Running Kubernetes liveness check 2020-10-26 15:16:36.831 main.go:112 ERROR replSetGetStatus returned error Our replica set config is invalid or we are not a member of it