We have several reports from users with LDAP sync enabled who are experiencing random users being blocked, only to be unblocked again in the future.
Browsing lib/gitlab/ldap/access.rb, particularly this method: https://gitlab.com/gitlab-org/gitlab-ce/blob/master/lib/gitlab/ldap/access.rb#L34 it appears that if the ldap_user is not found for any reason, the user is blocked. This could explain the symptoms seen if, e.g., transient network errors return nil out of Gitlab::LDAP::Person.find_by_dn(user.ldap_identity.extern_uid, adapter) rather than an exception.
The adapter is an instance of Gitlab::LDAP::Adapter, which catches a range of errors in ldap_search, returning an empty array. Timeouts will block users, for instance.
Steps to reproduce
Stub ldap_search to return any LDAP or timeout error, then run the LdapSyncWorker with some LDAP users in the DB.
Expected behavior
Users should not be blocked
Actual behavior
Users will be blocked
Designs
An error occurred while loading designs. Please try again.
Child items
0
Show closed items
GraphQL error: The resource that you are attempting to access does not exist or you don't have permission to perform this action
No child items are currently open.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Nick Thomaschanged title from LdapSyncWorker may block users if connectivity between GitLab and LDAP server is broken to LdapSyncWorker can block users if connectivity between GitLab and LDAP server is broken
changed title from LdapSyncWorker may block users if connectivity between GitLab and LDAP server is broken to LdapSyncWorker can block users if connectivity between GitLab and LDAP server is broken
It seems obvious that we shouldn't block users if we can't talk to the LDAP server. However, as @dblessing mentioned on slack, we might want to consider what to do if we can't talk to the LDAP server in this job for an extended period of time.
Imagine we fix this bug by skipping blocking if we can't talk to the LDAP server. A clever attacker who works out when the background sync runs, and can block connectivity to the LDAP server for that time (a denial of service targeted at 1am every morning, for instance), would be able to retain login access to the gitlab installation forever.
Perhaps a better solution would be to move the block / unblock actions out of this background worker and into the login flow. When a user gives us credentials and we go away to authenticate them with ldap, we can learn at that time whether they should be blocked or not, and then allow or deny the new session, and block or unblock the user in the gitlab database, as appropriate.
We might still need a background job to take care of existing sessions.
Also, we should not be using Timeout::timeout here. It's a good source of memory leaks and random insanity.
I agree, Timeout is probably a bad idea. The reason it was added is because there are situations where a hung connection can indefinitely block in GitLab. We saw this with lots of large customers. For whatever reason, when users tried to sign in the LDAP authentication was hanging indefinitely and eventually ate up all the Unicorn workers. The server melted and it was not a nice experience. Net::LDAP is supposed to provide a query timeout but it doesn't handle things in all cases. Do you have suggestions to improve this?
We have to be able to differentiate between a failed connection and a search that returns no user. We have a mechanism that checks if signed in users are still present in LDAP - by default this happens every hour, but in EE it's configurable. If the user is no longer present in LDAP, we invalidate the session and mark them as ldap_blocked. Similarly, in Active Directory if the user is present, but has the special AD 'blocked' attribute we mark the user as ldap_blocked.
Perhaps a better solution would be to move the block / unblock actions out of this background worker and into the login flow.
It is already in both places. Gitlab::LDAP::Access.allowed? is called by both the sign in process, the hourly check, and the nightly user sync.
Up until now this issue has mostly been a minor nuisance - a user could be blocked at one point if the connection fails but if they sign in again the problem resolves itself. I don't know why we are suddenly seeing so many more reports (coincidental).
GitLab is moving all development for both GitLab Community Edition
and Enterprise Edition into a single codebase. The current
gitlab-ce repository will become a read-only mirror, without any
proprietary code. All development is moved to the current
gitlab-ee repository, which we will rename to just gitlab in the
coming weeks. As part of this migration, issues will be moved to the
current gitlab-ee project.
If you have any questions about all of this, please ask them in our
dedicated FAQ issue.
Using "gitlab" and "gitlab-ce" would be confusing, so we decided to
rename gitlab-ce to gitlab-foss to make the purpose of this FOSS
repository more clear
I created a merge requests for CE, and this got closed. What do I
need to do?
Everything in the ee/ directory is proprietary. Everything else is
free and open source software. If your merge request does not change
anything in the ee/ directory, the process of contributing changes
is the same as when using the gitlab-ce repository.
Will you accept merge requests on the gitlab-ce/gitlab-foss project
after it has been renamed?
No. Merge requests submitted to this project will be closed automatically.
Will I still be able to view old issues and merge requests in
gitlab-ce/gitlab-foss?
Yes.
How will this affect users of GitLab CE using Omnibus?
No changes will be necessary, as the packages built remain the same.
How will this affect users of GitLab CE that build from source?
Once the project has been renamed, you will need to change your Git
remotes to use this new URL. GitLab will take care of redirecting Git
operations so there is no hard deadline, but we recommend doing this
as soon as the projects have been renamed.
Where can I see a timeline of the remaining steps?