LDAPIdResolver seems to keep ldap connections open indefinetly

andioddi · May 16, 2023, 11:41am

Dear privacyidea community,

I am using privacy idea in the following setup:

Cisco ASA 5506 → Radius → privacyidea (started by uwsgi) (Version 3.8.1)

Up to privacy idea 3.7.x I experience after random timeframes authentication failures. The root casue were exhausted file desrciptors which resulted in “OSError(24, ‘Too many open files’)!” in the log file. After increasing the number of file descriptors in the uwsgi.ini file (max-fd=1024 is default) I still got the same issue. After switching to Version 3.8.1 I do not get the OSError above but still the service becomes inresponsive.
The root cause of the fd exhaustion issue was discovered to be many malicious log-ins with invalid username. The number of malicious logins is not very high and the exhaustion of the file desriptors which leads to the “too many open files” message seems to be privacyidea’s LDAPIdResolver. Often I have many of the following message in my log. As you can see they are sometimes a few seconds apart and sometimes minutes.

[2023-05-16 11:03:43,204][5447][139728796583680][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:03:44,828][5447][139728788190976][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:04:36,046][5447][139728707909376][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:04:37,436][5447][139728699516672][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:03,557][5447][139728682731264][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:05,087][5447][139728691123968][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:06,517][5447][139728674338560][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:07,928][5447][139728665945856][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:09,339][5447][139728657553152][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:10,839][5447][139728036820736][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:12,368][5447][139729174165632][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:15:13,775][5447][139728821761792][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.
[2023-05-16 11:22:02,275][5447][139728804976384][INFO][privacyidea.lib.resolvers.LDAPIdResolver:333] Setting system wide POOLING_LOOP_TIMEOUT to 10.

I am not sure why privacy idea’s number of used file handles increases over time. I checked this by calling. Example:
ps -fu privacyidea

UID PID PPID C STIME TTY TIME CMD
privacy+ 5406 5405 0 09:35 ? 00:00:03 /usr/sbin/uwsgi --ini pi.ini
privacy+ 5447 5406 0 09:35 ? 00:00:12 /usr/sbin/uwsgi --ini pi.ini

ll /proc/5447/fd | wc -l
13

Has anyone a similar setup and experiences similar issues?
Can anyone think of a privacy idea configuration setting that keeps ldap connections up indefinitly that I might have set by accident?
Is there any way to debug the LDAPIdResolver in more detail to understand if the open file descriptors are in-deed caused by this module?

I am grateful for any insight as the current situation leads to ugly surprises when I want to login remotely and I do not want to add another login-option which excludes OTP…

Thanks & Best Regards
Andi

andioddi · May 17, 2023, 9:03am

I have some additional information that confirms that the ldap connections are kept open:

Analyzing the links in /proc//fd together with the information in /proc/net/tcp reveals that the open socket connections are indeed to my local ldap server. The log of the ldap server also confirms this:

Normal LDAP connections are logged with :
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 ACCEPT from IP=x.x.x.x:55262 (IP=0.0.0.0:389)
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 STARTTLS
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=0 RESULT oid= err=0 text=
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 TLS established tls_ssf=256 ssf=256
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SRCH base=“” scope=0 deref=0 filter="(objectClass=)"
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SRCH attr= altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domainControllerFunctionality defaultNamingContext lastUSN highestCommittedUSN
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text=
…
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=2 RESULT tag=97 err=0 text=
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 op=3 UNBIND
May 17 09:59:01 XXXX slapd[13667]: conn=7062001 fd=28 closed

The strange connection which is kept open looks like this:
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 fd=72 ACCEPT from IP=x.x.x.x:45215 (IP=0.0.0.0:389)
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 STARTTLS
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=0 RESULT oid= err=0 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 fd=72 TLS established tls_ssf=256 ssf=256
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=1 BIND dn=“” method=128
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=1 RESULT tag=97 err=0 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SRCH base=“” scope=0 deref=3 filter=“(objectClass=)"
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SRCH attr=subschemaSubentry +
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SRCH base=“cn=Subschema” scope=0 deref=3 filter=“(objectClass=subschema)”
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SRCH attr=objectClasses attributeTypes ldapSyntaxes matchingRules matchingRuleUse dITContentRules dITStructureRules nameForms createTimestamp modifyTimestamp * +
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=3 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SRCH base=“ou=xx,dc=xx,dc=xx” scope=2 deref=3 filter="(&(uid=)(objectClass=inetOrgPerson)(uid=scrowling))”
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SRCH attr=telephoneNumber mobile mail sn givenName uid entryUUID
May 17 08:10:28 XXXX slapd[13667]: conn=7058446 op=4 SEARCH RESULT tag=101 err=0 nentries=0 text=

There is not UNBIND at the end of the connection 7058446, therefore the connection stays open.

I therefore checked the LDAPIdResolver.py file and found that in function _getDN (line 494 ff) and _bind (line 520 ff) call is used which has no matching unbind. Therefore any initial call to _getDN will open a connection to the LDAP Server which is never closed/unbinded. This seems to be intentional as the _bind function checks for self.i_am_bound and will not create a new bind effectively reusing the existing connection. In that case the variable self.l contains the active connection to the LDAP server. Shouldn’t there by a self.l.unbind somewhere (for example after some timeout_time).
Compare for example the function testconnection (line 1041ff) . Here we find a lokal variable l which holds the ldap connection which is created by l.bind() call which is nicely closed at the end of this function with l.unbind() (line 1124)

Best Regards
Andi

andioddi · June 1, 2023, 4:07pm

Hi all,

as a workaround i am currently monitoring the number of file-handles in /proc//fd and restart the uwsgi daemon when the number of open file handles exceeds a limit.

Best Regards
Andi

cornelinux · June 6, 2023, 4:38am

I have not seen this before.

It is the intended behaviour to keep the bind open for performance reasons. privacyIDEA avoids issuing a bind with the service account on each authentication request.
This is why the testconnection differs from the operational behavirour.

However, most systems I saw also with tens of thousands of users run apache with mod_wsgi. I am not sure about your setup. How you are running uwsgi and how timeouts (or all components, also LDAP server) are configured and how recycles of your wsgi processes happen.
Maybe this is a direction for you to look into.

andioddi · June 7, 2023, 8:30am

Hi Cornelinux,

thanks for your feedback. My installation load due to users is “lesss than tiny”. I have only 11 users , so compared to your examples ~ close to nothing. I am in fact using uwsgi standalone. I do not use the mod_uwsgi module of apache… Not sure anymore why. Maybe due to the fact that we have other services running on uwsgi… Not sure if that was a smart decision… The apache is configured to use ProxyPass “/” “uwsgi://localhost:8000/”.

I will now check the uwsgi daemon settings for timeouts and process recycling. Maybe I will also try again to get mod_wsgi running. Thanks again for pointing me in this direction.

cornelinux · June 7, 2023, 8:52am

This is interesting.

Currently we only saw and used

Apache with mod_wsgi
nginx with uwsgi

So I never saw apache with uwsgi.

ste · June 7, 2023, 9:14am

Hi all,

same Problem here. We use Apache with mod_wsgi and freeradius. ca. 1200 Users (test enviroment).
if we use local users setup all is well. We configured LDAP_Resolver over webui. After a few Minutes ldap connections grow indefinetly until openfile limit is reached. Set openfile limit from 1024 to 65000, it takes a bit longer but problem still exist. PrivacyIdea Version is 3.8.1

belfhi · October 11, 2023, 10:45am

We are also affected by this bug.
This amounts to O(1000) Ldap Connections per PI Server.
Only restarts of Apache2 reset the connections.
Any idea what could be the cause?

Apach2 
mod_wsgi 
Privacyidea 3.8.1
Ubuntu 22

Volker_Wend · November 10, 2024, 7:33am

Hi, we are also affected by this Problem. Any news regarding this issue?

Regards,
Volker

andioddi · November 11, 2024, 7:04am

Hi Volker,

not as far as I can tell. I implemented a cron-job restarting the privacyidea regularily to prevent downtime in case we hit this problem. I also tried to pinpoint the issue by debugging it more but so far I had to admit defeat…

Best regards
Andi

nilsbehlen · November 14, 2024, 9:04am

Hi, we will look into this after finishing the main features for 3.11.