Simply killing LDAP server is not enough to reproduce this, perhaps LDAP server replied something incorrect, or connection failed at just the right moment. We haven't come up with a way to reproduce this.
Thank you for detailed report. Your analysis is correct - suddenly ldap_search_ext_s() return NULL result while indicating that everything is ok with LDAP_SUCCESS return code. Unfortunately this behavior is not documented, also it seems it is transitive. We still cannot reproduce this in our testing environment - that means this only happens when there are some specific network conditions or specific load on some part of the production system. It is also possible we are facing a bug in Open LDAP library. To alleviate the issue we will implement a retry algorithm which will do several attempts to retry query and will gracefully report error instead of crashing mongod process.
Backtrace:
I think there was some sort of issue reaching LDAP server, which led to PSMDB crashing.
If I read this correctly, ldap_first_entry() gets called from here: https://github.com/percona/percona-server-mongodb/blob/psmdb-4.4.21-20/src/mongo/db/ldap/ldap_manager_impl.cpp#L608
And the
answer
variable seems to be NULL, which leads to assert failing in libldap: https://github.com/openldap/openldap/blob/master/libraries/libldap/getentry.c#L36Simply killing LDAP server is not enough to reproduce this, perhaps LDAP server replied something incorrect, or connection failed at just the right moment. We haven't come up with a way to reproduce this.