Strange issue with LDAP queries hanging
Zendesk issue: https://gitlab.zendesk.com/agent/tickets/14082
@patricio and I spent quite a lot of time troubleshooting this strange issue with a customer tonight. It came on suddenly and caused a majority (but not all) LDAP users to be unable to sign in. We suspected it was related to LDAP because we started seeing read timeouts in the LDAP callback URL. Requests were timing out at 60 seconds and the workers were getting killed. Raising the timeout or number of workers had no effect. Also, whether a user could sign in or not was entirely consistent.
We started adding debug logging, starting in the omniauth controller. We quickly got down in to lib/gitlab/ldap/access.rb
and traced it to the following code.
def allowed?
if Gitlab::LDAP::Person.find_by_dn(user.ldap_identity.extern_uid, adapter) <--- This works fine!
return true unless ldap_config.active_directory
# Block user in GitLab if he/she was blocked in AD
if Gitlab::LDAP::Person.disabled_via_active_directory?(user.ldap_identity.extern_uid, adapter)
user.block
false
else
user.activate if user.blocked? && !ldap_config.block_auto_created_users
true
end
else
# Block the user if they no longer exist in LDAP/AD
user.block
false
end
rescue
false
end
def update_admin_status
admin_group = Gitlab::LDAP::Group.find_by_cn(ldap_config.admin_group, adapter) <--- This works fine!
admin_user = Gitlab::LDAP::Person.find_by_dn(user.ldap_identity.extern_uid, adapter) <--- This does *not* work fine! (Notice it's the same query from earlier).
if admin_group && admin_group.has_member?(admin_user)
unless user.admin?
user.admin = true
user.save
end
else
if user.admin?
user.admin = false
user.save
end
end
end
At this point we noticed we could optimize those 2 queries in to one. The access.rb
already has an ldap_user
method that ||=
and does the query once. After we changed both locations to ldap_user
their users were all able to log in again.
However, this is only existing users. New users that had never signed in before were unable to sign in. We created a test LDAP user and notice the GL user is created in the DB, as is the LDAP identity. Again we traced it down to a hanging query immediately after the admin group query. Except, since we already optimized and were using ldap_user
it now hung at the next LDAP query where is checked group membership.
def update_admin_status
admin_group = Gitlab::LDAP::Group.find_by_cn(ldap_config.admin_group, adapter) <--- This works fine!
if admin_group && admin_group.has_member?(ldap_user) <--- Now hangs here (member query in `group.rb`)
unless user.admin?
user.admin = true
user.save
end
else
if user.admin?
user.admin = false
user.save
end
end
end
So the questions are:
- Why does it hang for some users and not others. We couldn't draw any similarities between users.
- Why does the query immediately after the
Gitlab::LDAP::Group.find_by_cn
query fail each time. - Is something causing the
adapter
to become unusable?
@patricio and I are going to debug again Thursday. The emergent part of the issue is over for now. However, they say they add new users all the time so this cannot wait. Any help or ideas that others can give us is greatly appreciated.