Improve LDAP sync worker performance: memory usage and runtime

ZD Link: https://gitlab.zendesk.com/agent/tickets/82070

changed title from LDAP sync worker performance and memory usage to Improve LDAP sync worker performance: memory usage and runtime

Step 1 is very simple, and could be applied with the customer using a patch.

Steps 2 and 3 require more investigation into the workings of the workers.

In the mean time, customer can go to Admin Area > Monitoring > Background Jobs > Queues, find cronjob, and hit Delete, to get rid of those 7 workers that are trying to run concurrently.

Next time they get scheduled automatically (on the next hour and at 1:30am), they may finish faster since they run "alone".

The total of enqueued jobs quickly went to 0 after we upped the memory cap of the sidekiq workers from 1 GB to 4 GB.

About the ldap syncs running: I am not seeing any jobs in the cronjob queue. Does that mean the concurrency issue solved itself?

@joustie Yes! I'm really happy to hear that upping the max RSS effectively resolved the problem for you.

It would be helpful to have full debug output of the customer's LdapAllGroupsSyncWorker run. Other than the very first group sync, subsequent runs should only be confirming membership and should be very quick. Something like the following may work to get debug for a full sync. This will log all output to stdout so make sure you have a large buffer or you may want to redirect to a file.

Rails.logger.level = Logger::DEBUG
LdapAllGroupsSyncWorker.new.perform

It should be completely fine to limit LdapAllGroupsSyncWorker to a single concurrent run, but we shouldn't do this for LdapGroupSyncWorker (this worker wasn't mentioned in this issue, but I'm mentioning it to avoid confusion).

As for memory usage, the place we should look is probably the EE::Gitlab::LDAP::Sync::Proxy as it caches users and groups in an attempt to limit LDAP queries for the same user or group within a single group sync. Maybe we need an upper limit on the number of items the proxy stores.

added Platform label

changed milestone to %Next 2-3 months

mentioned in issue gitlab-com/infrastructure#943 (closed)

mentioned in issue #3188 (closed)

I have created a full debug output of the LdapAllGroupsSyncWorker run from the gitlab-rails console. This is however not public info, to whom can I send this?

Hi @joustie Please send it to Zendesk in your existing ticket (# 82070)

Joost sent the debug log in https://gitlab.zendesk.com/agent/#/tickets/82070. I looked at the output and I see that the sync of 1,306 LDAP groups took about 8 minutes. In this run nothing was updated, it was only cross-checking. That's not too bad IMO.

I wonder if this was a simple issue of things getting backed up in Sidekiq somehow and suddenly they had many of these syncs running concurrently - taking up lots of memory and CPU. In this case, would limiting concurrency resolve the issue?

Improve LDAP sync worker performance: memory usage and runtime

Designs

Child items ...

Activity

Admin message

Admin message

Improve LDAP sync worker performance: memory usage and runtime

Activity