Commits · d47cc16e1dd52b9bb7b3ce4cf28e5ac131d56352 · GitLab.org / gitlab_git

Sep 05, 2016
- Release 10.6.2 · d47cc16e
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.2 v10.6.2 Unverified
  
  d47cc16e
- Fix matching Git attributes using absolute paths · dd65dcef
  Yorick Peterse authored 8 years ago
  
  Git attributes can be defined using an absolute path. This commit fixes the matching procedure so that attributes and paths with leading slashes are supported properly.
  Verified
  
  dd65dcef
Sep 01, 2016
- Release 10.6.1 · 209cae3e
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.1 v10.6.1 Unverified
  
  209cae3e
- Handle missing attribute files when parsing · afaeb8f2
  Yorick Peterse authored 8 years ago
  
  Previously using Gitlab::Git::Attributes when $GIT_DIR/info/attributes didn't exist would result in a runtime error. This commit fixes this so an empty Hash is produced instead.
  Verified
  
  afaeb8f2
Aug 31, 2016

Bump version to 10.6.0 · 7e618d03
Douwe Maan authored 8 years ago

View commits for tag v10.6.0 v10.6.0

7e618d03

Merge branch 'ruby-gitattributes-parser' into 'master' · 62927165

Douwe Maan authored 8 years ago

Parse Git attribute files using Ruby

Commit 340e111e contains all the details. It's quite the read so the short summary is:

> Rugged is slow as heck because it runs multiple IO calls every time you request a set of Git attributes. gitlab_git now provides a pure Ruby parser that avoids this and is between 4 and 6 times faster.

Here's a Grafana screenshot to show how bad it can get:

![timings](/uploads/39f7b6b7b6a8d97f2b11a20a088988e4/timings.jpg)

See https://gitlab.com/gitlab-org/gitlab-ce/issues/10785 for more information.

See merge request !121

62927165

Parse Git attribute files using Ruby · 340e111e

Yorick Peterse authored 8 years ago

Rugged provides a way of parsing Git attribute files such as the one
located in $GIT_DIR/info/attributes. Per GitLab's performance monitoring
tools quite a lot of time can be spent in parsing/retrieving attributes.

This commit introduces a pure Ruby parser for gitlab_git that performs
drastically better than the one provided by Rugged.

== Production Timings

As an example, take the commit https://gitlab.com/nrclark/dummy_project/commit/81ebdea5df2fb42e59257cb3eaad671a5c53ca36
(as taken from https://gitlab.com/gitlab-org/gitlab-ce/issues/10785).
When loading this commit we spend between 4 and 6 seconds in
Rugged::Repository#fetch_attributes. This method is called around 1100
times. This is the result of two problems:

1. For every diff we call Gitlab::Git::Repository#diffable? and pass it
   a blob. This method in turn returns a boolean (based on the Git
   attributes for the blob's path) indicating if the content is
   diffable.

2. For every diff we use the GitLab class Gitlab::Highlight which calls
   Repository#gitattribute in the #custom_language method. This is used
   to determine what language to use for highlighting a diff.

As a result in the worst case we'll end up with 2 calls to
Gitlab::Git::Repository#attributes (previously delegated to
Rugged::Repository#attributes).

== Rugged Implementation

Rugged in turn implements the "attributes" method in a rather
in-efficient way. The first time this method is called it will run at
least a single open() call to open the file. On top of that it appears
to run 2 stat() calls for every call to Rugged::Repository#attributes.
In other words, if you call it a 100 times you will end up with 201 IO
calls:

* 200 stat() calls
* 1 open() call

== Rugged IO Overhead

To confirm the IO overhead of Rugged I created the following script
(saved as "confirm.rb"):

    require 'rugged'

    path = '/tmp/test/.git'
    repo = Rugged::Repository.new(path)

    10.times do
      repo.attributes('README.md')['gitlab-language']
    end

I then ran this as follows:

    strace -f ruby confirm.rb 2>&1 | grep -i 'info/attributes' | wc -l

This counts the number of instances an IO call refers to the
"$GIT_DIR/info/attributes" file. The output is "21", meaning 21 IO calls
were executed.

While this may not be a big problem when using physical storage (even
less so when using SSDs), this _will_ be a problem when using network
storage. For example, say every operation takes 2 milliseconds to
complete. This would result in _at least_ 400 milliseconds being spent
in _just_ the IO operations.

The Ruby parser on the other hand only uses a single open() IO call.

== Benchmarking

To measure the performance of this code I wrote the following benchmark:

    require 'rugged'
    require 'benchmark/ips'

    require_relative 'lib/gitlab_git/attributes'

    repo = Rugged::Repository.new('/tmp/test/.git')
    attr = Gitlab::Git::Attributes.new(repo.path)

    Benchmark.ips(time: 10) do |bench|
      bench.report 'Rugged' do
        repo.attributes('test.haml.html')['gitlab-language']
      end

      bench.report 'gitlab_git' do
        attr.attributes('test.haml.html')['gitlab-language']
      end

      bench.compare!
    end

The contents of /tmp/test/.git/info/attributes are as follows:

    # This is a comment, it should be ignored.

    *.txt     text
    *.jpg     -text
    *.sh      eol=lf gitlab-language=shell
    *.haml.*  gitlab-language=haml
    foo/bar.* foo
    *.cgi     key=value?p1=v1&p2=v2

    # This uses a tab instead of spaces to ensure the parser also supports this.
    *.md	gitlab-language=markdown

Running this benchmark on my development environment produces the
following output:

    Warming up --------------------------------------
                  Rugged     9.543k i/100ms
              gitlab_git    43.277k i/100ms
    Calculating -------------------------------------
                  Rugged    100.261k (± 2.0%) i/s -      1.012M in  10.093380s
              gitlab_git    482.186k (± 1.7%) i/s -      4.847M in  10.055286s

    Comparison:
              gitlab_git:   482185.6 i/s
                  Rugged:   100260.6 i/s - 4.81x  slower

The exact output differs on system load but usually the new Ruby based
parser is between 4 and 6 times faster than Rugged.

To further test this I wrote the following benchmark:

    require 'benchmark'

    amount = 5000
    rugged = Rugged::Repository.new('/var/opt/gitlab/git-data-ceph/repositories/gitlab-org/gitlab-ce.git')
    attrs = Gitlab::Git::Attributes.new(rugged.path)

    rugged = amount.times.map do
      timing = Benchmark.measure do
        rugged.attributes('README.md').to_h
      end

      timing.real * 1000.0
    end

    ruby = amount.times.map do
      timing = Benchmark.measure do
        attrs.attributes('README.md')
      end

      timing.real * 1000.0
    end

    puts "Rugged: #{rugged.inject(:+)} ms"
    puts "Ruby: #{ruby.inject(:+)} ms"

This script uses Rugged and the new attributes parser, parses the same
attributes file 5000 times, and then counts the total processing time.
Running this script on worker1 produced the following output:

    Rugged: 131.95287296548486 ms
    Ruby: 30.17003694549203 ms

Here the Ruby based solution is around ~4.5 times faster than Rugged.

== Further Improvements

GitLab may decide to at some point cache the parsed data structures in
for example Redis, which is now possible due to them being proper Ruby
data structures. Note that this is only really beneficial in cases where
Git attributes are requested for the same file path in different
requests. This also requires careful cache invalidation. For example, we
don't want to invalidate the entire cache when modifying some unrelated
file.

Because of the complexity involved it's best to leave this for later and
only implement it once we're certain it will actually be beneficial.

Unverified

340e111e

Aug 29, 2016
- Bump version to 10.5.0 · b205c79e
  Stan Hu authored 8 years ago
  
  View commits for tag v10.5.0 v10.5.0
  
  b205c79e
- Merge branch 'add-repository-find-branch' into 'master' · 8d987626
  Yorick Peterse authored 8 years ago
  
  Add Repository#find_branch to speed up branch lookups See merge request !119
  8d987626
Aug 27, 2016
- Add CHANGELOG entry about Repository#reload_rugged · 6cb92440
  Stan Hu authored 8 years ago
  
  6cb92440
- Provide Repository#reload_rugged to allow other callers to refresh the repository · 6a75ea50
  Stan Hu authored 8 years ago
  
  6a75ea50
- Add a force_reload parameter in Repository#find_branch to workaround stale in-memory · 0bbe707a
  Stan Hu authored 8 years ago
  
  refs DB See https://gitlab.com/gitlab-org/gitlab-ce/issues/15392#note_14538333
  0bbe707a
- Add Repository#find_branch to speed up branch lookups · 3606265d
  Stan Hu authored 8 years ago
  
  A common call in GitLab is to lookup a single branch, but previously this was done by calling Repository#branches, which loads all the branches into memory unnecessarily and causes many filesystem accesses for each branch. With Repository#find_branch, we can do a direct lookup for the branch we care about.
  3606265d
- Merge branch 'fix-specs' into 'master' · 1d1d6642
  Stan Hu authored 8 years ago
  
  Fix broken specs caused by update to gitlab-git-test Taken from https://gitlab.com/gitlab-org/gitlab_git/merge_requests/118 See merge request !120
  1d1d6642
- Fix broken specs caused by update to gitlab-git-test · e169891b
  Stan Hu authored 8 years ago
  
  Taken from https://gitlab.com/gitlab-org/gitlab_git/merge_requests/118
  e169891b
Aug 17, 2016

Release 10.4.7 · bd8946f8
Yorick Peterse authored 8 years ago

View commits for tag v10.4.7 v10.4.7 Unverified

bd8946f8
Merge branch 'fix/remove-unneeded-root-ref-call' into 'master' · 07821837
Yorick Peterse authored 8 years ago
```
Remove unneeded call to Repository#root_ref in #log

See merge request !117
```
07821837

Remove unneeded call to Repository#root_ref in #log · fc0ea05f

Ahmad Sherif authored 8 years ago

From our monitoring data, it seems that Repository#root_ref can be slow
sometimes (probably because it involves iterating over all branches),
and there's no need to have it in the `default_options` hash since
a similar effect is achieved in `actual_ref = options[:ref] || root_ref`
below, and subsequent calls don't need a `:ref` key in the passed
options.

fc0ea05f

Aug 15, 2016
- Release 10.4.6 · 11944874
  Douwe Maan authored 8 years ago
  
  View commits for tag v10.4.6 v10.4.6
  
  11944874
- Merge branch '26-optimize-fetching-of-a-rugged-commit-s-author-and-committer' into 'master' · 6d1ead1d
  Douwe Maan authored 8 years ago
  
  Optimize fetch of the author and committer of a Rugged commit Closes #26 See merge request !116
  6d1ead1d
Aug 12, 2016
- Optimize fetch of the author and committer of a Rugged commit · fd30738d
  Alejandro Rodríguez authored 8 years ago
  
  fd30738d
Aug 08, 2016
- Merge branch 'require-forwardable' into 'master' · 55e3d7bd
  Yorick Peterse authored 8 years ago
  
  We're using Forwardable so we need to require it See merge request !92
  55e3d7bd
Aug 05, 2016
- Release 10.4.5 · 316bb3b4
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.4.5 v10.4.5 Unverified
  
  316bb3b4
- Merge branch 'compare-commits-on-nil-refs' into 'master' · 8334ab21
  Yorick Peterse authored 8 years ago
  
  Compare returns an empty collection of commits on nil refs See merge request !114
  8334ab21
- Compare returns an empty collection of commits on nil refs · 83b7fec6
  Paco Guzman authored 8 years ago
  
  83b7fec6
- Merge branch 'write-gitattributes-raw' into 'master' · 20e42da4
  Rémy Coutable authored 8 years ago
  
  Write .gitattributes in binary mode to prevent Rails from converting ASCII-8BIT to UTF-8 This avoids Sidekiq errors in the PostReceive task due to .gitattributes files having ISO-8859 characters, such as: ``` Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8 ``` Closes gitlab-org/gitlab-ce#20647 See merge request !115
  20e42da4
- Write .gitattributes in binary mode to prevent Rails from converting ASCII-8 BIT to UTF-8 · d754c8bb
  Stan Hu authored 8 years ago
  
  Closes gitlab-org/gitlab-ce#20647
  d754c8bb
Aug 04, 2016
- Release 10.4.4 · cf2dd150
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.4.4 v10.4.4 Verified
  
  cf2dd150
- Merge branch 'compare-lazy-load-commits' into 'master' · 4f668368
  Yorick Peterse authored 8 years ago
  
  Lazy load compare commits See merge request !113
  4f668368
Aug 03, 2016
- Lazy load compare commits · 36f97f00
  Paco Guzman authored 8 years ago
  
  36f97f00
Aug 02, 2016
- Release 10.4.3 · e282c440
  Stan Hu authored 8 years ago
  
  View commits for tag v10.4.3 v10.4.3
  
  e282c440
- Merge branch 'spec-branch-cleanup' into 'master' · 19aebc48
  Stan Hu authored 8 years ago
  
  Clean up local branches after creation to make rspec pass See merge request !112
  19aebc48
- Clean up local branches after creation to make rspec pass · 240111d8
  Stan Hu authored 8 years ago
  
  240111d8
- Merge branch 'fix/use-exact-rubocop-rspec-version' into 'master' · e835d6a4
  Stan Hu authored 8 years ago
  
  Use an exact rubocop-rspec version that's compatible with ruby 2.1 ... otherwise it breaks rubocop and rspec builds. See merge request !110
  e835d6a4
- Use an exact rubocop-rspec version that's compatible with ruby 2.1 · fb08acc6
  Ahmad Sherif authored 8 years ago
  
  fb08acc6
- Merge branch 'feature/delta-only-diff-collection' into 'master' · 52aeef60
  Yorick Peterse authored 8 years ago
  
  Add deltas_only option for DiffCollection See merge request !109
  52aeef60
Jul 29, 2016
- Add deltas_only option for DiffCollection · db699fc9
  Ahmad Sherif authored 8 years ago
  
  It helps avoiding loading the actual patch (which can consume lots of memory) when not needed.
  db699fc9
- Release 10.4.2 · 56d193ee
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.4.2 v10.4.2 Unverified
  
  56d193ee
Jul 28, 2016
- Merge branch 'fastest-decorated-diff-collection' into 'master' · eb55f51a
  Yorick Peterse authored 8 years ago
  
  Improve performance of a decorated DiffCollection instance See merge request !108
  eb55f51a
- Improve performance of a decorated DiffCollection instance · c43d998d
  Paco Guzman authored 8 years ago
  
  c43d998d

Admin message

Admin message