Use AuthorizedKeysCommand with fingerprint support
The problem
Accessing a git repo through SSH has become really slow.
for i in {0..20} ; do
{ time git ls-remote git@gitlab.com:gitlab-org/gitlab-development-kit.git ; } 2>&1
done | awk 'BEGIN {total=0 ; max=0 ; min=10} /^git/ {if($10 > max) max = $10} {if ($10 != "" && $10 < min) min=$10} {total+=$10} END {print "total " total " avg " total/20 " min " min " max " max}'
total 154.159 avg 7.70795 min 5.508 max 11.007
Whereas HTTP Access is just way faster
for i in {0..20} ; do
{ time git ls-remote https://gitlab.com/gitlab-org/gitlab-development-kit.git ; } 2>&1
done | awk 'BEGIN {total=0 ; max=0 ; min=10} /^git/ {if($10 > max) max = $10} {if ($10 != "" && $10 < min) min=$10} {total+=$10} END {print "total " total " avg " total/20 " min " min " max " max}'
total 19.088 avg 0.9544 min 0.775 max 1.505
Independently of the project, some tests had been executed to check if a private repo behaves the same way and it did.
The reason
We write all the ssh keys to one authorized_keys file, from older the newer. At this point this file is using 150MB.
The way openssh searches for a key when a user is authorising is by doing a linear search.
This means that a new user (or an old user with a new key) will force openssh to load the whole file and scan through it on every git ssh operation to find its key. On top of this, the file is not cached by the OS because it is being written pretty much all the time, which also means that IOPS are wasted here.
The solution
The idea is to configure our openssh servers to use an AuthorizedKeysCommand such that using the key fingerprint this command will return only the lines that match the key. This querying feature is already supported in gitlab-shell, so it will require minimum modifications to provide this executable.
In order to have key fingerprint expansion support (included in openssh 6.9) we need to patch openssh in our servers.
This configuration will still support using the authorized_keys file. So from our customers perspective this will be completely optional and only interesting in large enough installations.
To support this, we could provide a script that will apply the patch and configure openssh so anyone can do it.
A sample of this whole process can be found in this blog post: https://blog.heckel.xyz/2015/05/04/openssh-authorizedkeyscommand-with-fingerprint/
Related conversation: https://gitlab.com/gitlab-org/gitlab-git-http-server/issues/2
The plan
-
Add an index to the key fingerprint field in the database. -
Add a method into the internal api in gitlab-ee to find a key by fingerprint (consider mysql limitation on indexes, key cannot be larger than 767 bytes so we can't search by the whole key) -
Check how our fingerprints are, since they were MD5 based initially, and are SHA based now. Consider fixing them in the database. -
Add a specific script command in gitlab-shell to search for this ssh key that will return the whole ssh command line. -
Add the configuration to the omnibus package so we use this feature if it is available. -
Test it in a digital ocean VM instance -
replace workers with ubuntu 16.04LTS https://gitlab.com/gitlab-com/operations/issues/224 -
write chef cookbook to enable this feature -
Write documentation on how to enable this on any other platform that is not ubuntu (patching openssh to support this feature) -
Release this with the new Ubuntu LTS version (in April) so we don't need to provide any patching on openssh
.