Just checked that we have all sidekiq services running with be knife ssh 'roles:gitlab-base-be-sidekiq' 'sudo gitlab-ctl status'. We do, and workers are not overloaded.
The mirror when it finished it kicks schedules a new UpdateAllMirrorsWorker if a certain threshold has been reached. But the scheduler runs with a lease which means two of them cannot run at the same time.
@pcarranza@DouweM I would say it was not a bad idea to just have the crontab run every minute and drop the threshold because honestly I cannot think of a elegant solution that solves this.
If capacity is continuously below max capacity (as it is now), the threshold will always be reached, so a new one will get scheduled each time a mirror finishes, which is apparently about 10K/hour.
We can remove this threshold triggering of UpdateAllMirrorsWorker altogether and reduce the crontab time to something smaller like 2s
Be more aggressive with the threshold.
Personally I am in favour of the option 1. If we can add more backpressure without ever risking having 10k/hour being scheduled it would be preferrable
@tiagonbotelho If we only schedule up to capacity mirrors once a minute, we won't get a lot done. We'd want to schedule more often than that if we have mirrors to schedule.
What about instead of checking only that available >= threshold, also checking that wants to run >= threshold so that we can fill it up completely? That way we won't schedule a new scheduler if there is nothing to put into the hole that opened up. If there are fewer than threshold mirrors that want to run, they just have to wait until the next blip.
I think we should also use a lease around scheduling the scheduler so that we never schedule one more often than once every 2 seconds.
@tiagonbotelho With 1, we'll keep running schedulers every 2s even if there are no mirrors in the system at all, putting unnecessary strain on the DB and Redis.
@DouweM checking wants to run >= threshold and available >= threshold will also put a strain on the DB and Redis. Imagine all the mirrors doing that query when finishing up. What seems more expensive?
@pcarranza what are our current values for mirroring in the admin application_settings?
@tiagonbotelho That way it's at least limited by the number of mirrors in the system and how often those are run, which is less often than every 2s for small instances.