LDAPIdResolver seems to keep ldap connections open indefinetly

Dear privacyidea community,

I am using privacy idea in the following setup:

Cisco ASA 5506 → Radius → privacyidea (started by uwsgi) (Version 3.8.1)

Up to privacy idea 3.7.x I experience after random timeframes authentication failures. The root casue were exhausted file desrciptors which resulted in “OSError(24, ‘Too many open files’)!” in the log file. After increasing the number of file descriptors in the uwsgi.ini file (max-fd=1024 is default) I still got the same issue. After switching to Version 3.8.1 I do not get the OSError above but still the service becomes inresponsive.
The root cause of the fd exhaustion issue was discovered to be many malicious log-ins with invalid username. The number of malicious logins is not very high and the exhaustion of the file desriptors which leads to the “too many open files” message seems to be privacyidea’s LDAPIdResolver. Often I have many of the following message in my log. As you can see they are sometimes a few seconds apart and sometimes minutes.

[2023-05-16 11:03:43,204][5447][139728796583680][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:03:44,828][5447][139728788190976][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:04:36,046][5447][139728707909376][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:04:37,436][5447][139728699516672][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:03,557][5447][139728682731264][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:05,087][5447][139728691123968][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:06,517][5447][139728674338560][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:07,928][5447][139728665945856][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:09,339][5447][139728657553152][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:10,839][5447][139728036820736][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:12,368][5447][139729174165632][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:13,775][5447][139728821761792][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:22:02,275][5447][139728804976384][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.

I am not sure why privacy idea’s number of used file handles increases over time. I checked this by calling. Example:
ps -fu privacyidea

UID PID PPID C STIME TTY TIME CMD
privacy+ 5406 5405 0 09:35 ? 00:00:03 /usr/sbin/uwsgi --ini pi.ini
privacy+ 5447 5406 0 09:35 ? 00:00:12 /usr/sbin/uwsgi --ini pi.ini

ll /proc/5447/fd | wc -l
13

Has anyone a similar setup and experiences similar issues?
Can anyone think of a privacy idea configuration setting that keeps ldap connections up indefinitly that I might have set by accident?
Is there any way to debug the LDAPIdResolver in more detail to understand if the open file descriptors are in-deed caused by this module?

I am grateful for any insight as the current situation leads to ugly surprises when I want to login remotely and I do not want to add another login-option which excludes OTP…

Thanks & Best Regards
Andi

I have some additional information that confirms that the ldap connections are kept open:

Analyzing the links in /proc//fd together with the information in /proc/net/tcp reveals that the open socket connections are indeed to my local ldap server. The log of the ldap server also confirms this:

Normal LDAP connections are logged with :
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 ACCEPT from IP=x.x.x.x:55262 (IP=0.0.0.0:389)
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 STARTTLS
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 RESULT oid= err=0 text=
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 TLS established tls_ssf=256 ssf=256
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SRCH base=“” scope=0 deref=0 filter="(objectClass=)"
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SRCH attr=
altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domainControllerFunctionality defaultNamingContext lastUSN highestCommittedUSN
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text=

May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=2 RESULT tag=97 err=0 text=
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=3 UNBIND
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 closed

The strange connection which is kept open looks like this:
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 fd=72 ACCEPT from IP=x.x.x.x:45215 (IP=0.0.0.0:389)
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 STARTTLS
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 RESULT oid= err=0 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 fd=72 TLS established tls_ssf=256 ssf=256
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=1 BIND dn=“” method=128
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=1 RESULT tag=97 err=0 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SRCH base=“” scope=0 deref=3 filter=“(objectClass=)"
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SRCH attr=subschemaSubentry +
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SRCH base=“cn=Subschema” scope=0 deref=3 filter=“(objectClass=subschema)”
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SRCH attr=objectClasses attributeTypes ldapSyntaxes matchingRules matchingRuleUse dITContentRules dITStructureRules nameForms createTimestamp modifyTimestamp * +
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SRCH base=“ou=xx,dc=xx,dc=xx” scope=2 deref=3 filter="(&(uid=
)(objectClass=inetOrgPerson)(uid=scrowling))”
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SRCH attr=telephoneNumber mobile mail sn givenName uid entryUUID
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SEARCH RESULT tag=101 err=0 nentries=0 text=

There is not UNBIND at the end of the connection 7058446, therefore the connection stays open.

I therefore checked the LDAPIdResolver.py file and found that in function _getDN (line 494 ff) and _bind (line 520 ff) call is used which has no matching unbind. Therefore any initial call to _getDN will open a connection to the LDAP Server which is never closed/unbinded. This seems to be intentional as the _bind function checks for self.i_am_bound and will not create a new bind effectively reusing the existing connection. In that case the variable self.l contains the active connection to the LDAP server. Shouldn’t there by a self.l.unbind somewhere (for example after some timeout_time).
Compare for example the function testconnection (line 1041ff) . Here we find a lokal variable l which holds the ldap connection which is created by l.bind() call which is nicely closed at the end of this function with l.unbind() (line 1124)

Best Regards
Andi