Skip to content
Snippets Groups Projects
  1. Oct 25, 2016
    • Alejandro Rodríguez's avatar
      Refactor Ref's `target` to be consistent with Rugged · 43f5700b
      Alejandro Rodríguez authored
      !103 introduced an optimization where a Refs target would point to the
      dereferenced object, which saved us a lot operations, but became
      inconsistent with Rugged's API, where target would always reference the
      immediate object pointed at. These changes add a new property,
      dereference_target to be consistent with Rugged but still be able to get
      the target efficiently.
      43f5700b
  2. Oct 21, 2016
    • Alejandro Rodríguez's avatar
      Revert to previous diff pruning behavior · c297aa44
      Alejandro Rodríguez authored
      Before 10.6.9, if a rugged diff met the conditions for being too-large
      (which obiously meant it was also collapsable), too-large would take
      precedence. This reverts the behavior back to that, since gitlab-ce
      depends on this
      c297aa44
  3. Oct 12, 2016
  4. Oct 07, 2016
  5. Sep 29, 2016
  6. Sep 22, 2016
  7. Sep 14, 2016
  8. Sep 10, 2016
  9. Sep 08, 2016
    • Yorick Peterse's avatar
      Check for large diffs upon initialisation · 4c008a2f
      Yorick Peterse authored
      Prior to this commit the DiffCollection class was responsible for
      checking if a diff had to be collapsed or was too large to be displayed
      altogether.
      
      This commit changes both DiffCollection and Diff so that Diff itself
      checks if its too large or has to be collapsed. These checks happen when
      the Diff is being initialised. The patch size is based on the size of
      every line in every hunk of the diff, instead of relying on the diff as
      a string including diff markers.
      
      DiffCollection still has an extra check to collapse diffs when it has
      iterated over too many files. Since this is unrelated to the actual
      sizes this has been kept as-is.
      
      For binary files no pruning takes place as the diffs for these files are
      not displayed. In the past the size of a diff was reported based on the
      diff's size (including metadata). If we were to use the actual file's
      size a diff would be marked as being too large and in the case of an
      image would never be displayed.
      Unverified
      4c008a2f
  10. Sep 06, 2016
  11. Sep 05, 2016
  12. Sep 01, 2016
  13. Aug 31, 2016
    • Yorick Peterse's avatar
      Parse Git attribute files using Ruby · 340e111e
      Yorick Peterse authored
      Rugged provides a way of parsing Git attribute files such as the one
      located in $GIT_DIR/info/attributes. Per GitLab's performance monitoring
      tools quite a lot of time can be spent in parsing/retrieving attributes.
      
      This commit introduces a pure Ruby parser for gitlab_git that performs
      drastically better than the one provided by Rugged.
      
      == Production Timings
      
      As an example, take the commit https://gitlab.com/nrclark/dummy_project/commit/81ebdea5df2fb42e59257cb3eaad671a5c53ca36
      (as taken from https://gitlab.com/gitlab-org/gitlab-ce/issues/10785).
      When loading this commit we spend between 4 and 6 seconds in
      Rugged::Repository#fetch_attributes. This method is called around 1100
      times. This is the result of two problems:
      
      1. For every diff we call Gitlab::Git::Repository#diffable? and pass it
         a blob. This method in turn returns a boolean (based on the Git
         attributes for the blob's path) indicating if the content is
         diffable.
      
      2. For every diff we use the GitLab class Gitlab::Highlight which calls
         Repository#gitattribute in the #custom_language method. This is used
         to determine what language to use for highlighting a diff.
      
      As a result in the worst case we'll end up with 2 calls to
      Gitlab::Git::Repository#attributes (previously delegated to
      Rugged::Repository#attributes).
      
      == Rugged Implementation
      
      Rugged in turn implements the "attributes" method in a rather
      in-efficient way. The first time this method is called it will run at
      least a single open() call to open the file. On top of that it appears
      to run 2 stat() calls for every call to Rugged::Repository#attributes.
      In other words, if you call it a 100 times you will end up with 201 IO
      calls:
      
      * 200 stat() calls
      * 1 open() call
      
      == Rugged IO Overhead
      
      To confirm the IO overhead of Rugged I created the following script
      (saved as "confirm.rb"):
      
          require 'rugged'
      
          path = '/tmp/test/.git'
          repo = Rugged::Repository.new(path)
      
          10.times do
            repo.attributes('README.md')['gitlab-language']
          end
      
      I then ran this as follows:
      
          strace -f ruby confirm.rb 2>&1 | grep -i 'info/attributes' | wc -l
      
      This counts the number of instances an IO call refers to the
      "$GIT_DIR/info/attributes" file. The output is "21", meaning 21 IO calls
      were executed.
      
      While this may not be a big problem when using physical storage (even
      less so when using SSDs), this _will_ be a problem when using network
      storage. For example, say every operation takes 2 milliseconds to
      complete. This would result in _at least_ 400 milliseconds being spent
      in _just_ the IO operations.
      
      The Ruby parser on the other hand only uses a single open() IO call.
      
      == Benchmarking
      
      To measure the performance of this code I wrote the following benchmark:
      
          require 'rugged'
          require 'benchmark/ips'
      
          require_relative 'lib/gitlab_git/attributes'
      
          repo = Rugged::Repository.new('/tmp/test/.git')
          attr = Gitlab::Git::Attributes.new(repo.path)
      
          Benchmark.ips(time: 10) do |bench|
            bench.report 'Rugged' do
              repo.attributes('test.haml.html')['gitlab-language']
            end
      
            bench.report 'gitlab_git' do
              attr.attributes('test.haml.html')['gitlab-language']
            end
      
            bench.compare!
          end
      
      The contents of /tmp/test/.git/info/attributes are as follows:
      
          # This is a comment, it should be ignored.
      
          *.txt     text
          *.jpg     -text
          *.sh      eol=lf gitlab-language=shell
          *.haml.*  gitlab-language=haml
          foo/bar.* foo
          *.cgi     key=value?p1=v1&p2=v2
      
          # This uses a tab instead of spaces to ensure the parser also supports this.
          *.md	gitlab-language=markdown
      
      Running this benchmark on my development environment produces the
      following output:
      
          Warming up --------------------------------------
                        Rugged     9.543k i/100ms
                    gitlab_git    43.277k i/100ms
          Calculating -------------------------------------
                        Rugged    100.261k (± 2.0%) i/s -      1.012M in  10.093380s
                    gitlab_git    482.186k (± 1.7%) i/s -      4.847M in  10.055286s
      
          Comparison:
                    gitlab_git:   482185.6 i/s
                        Rugged:   100260.6 i/s - 4.81x  slower
      
      The exact output differs on system load but usually the new Ruby based
      parser is between 4 and 6 times faster than Rugged.
      
      To further test this I wrote the following benchmark:
      
          require 'benchmark'
      
          amount = 5000
          rugged = Rugged::Repository.new('/var/opt/gitlab/git-data-ceph/repositories/gitlab-org/gitlab-ce.git')
          attrs = Gitlab::Git::Attributes.new(rugged.path)
      
          rugged = amount.times.map do
            timing = Benchmark.measure do
              rugged.attributes('README.md').to_h
            end
      
            timing.real * 1000.0
          end
      
          ruby = amount.times.map do
            timing = Benchmark.measure do
              attrs.attributes('README.md')
            end
      
            timing.real * 1000.0
          end
      
          puts "Rugged: #{rugged.inject(:+)} ms"
          puts "Ruby: #{ruby.inject(:+)} ms"
      
      This script uses Rugged and the new attributes parser, parses the same
      attributes file 5000 times, and then counts the total processing time.
      Running this script on worker1 produced the following output:
      
          Rugged: 131.95287296548486 ms
          Ruby: 30.17003694549203 ms
      
      Here the Ruby based solution is around ~4.5 times faster than Rugged.
      
      == Further Improvements
      
      GitLab may decide to at some point cache the parsed data structures in
      for example Redis, which is now possible due to them being proper Ruby
      data structures. Note that this is only really beneficial in cases where
      Git attributes are requested for the same file path in different
      requests. This also requires careful cache invalidation. For example, we
      don't want to invalidate the entire cache when modifying some unrelated
      file.
      
      Because of the complexity involved it's best to leave this for later and
      only implement it once we're certain it will actually be beneficial.
      Unverified
      340e111e
  14. Aug 27, 2016
  15. Aug 17, 2016
  16. Aug 12, 2016
  17. Aug 05, 2016
  18. Aug 03, 2016
  19. Jul 28, 2016
  20. Jul 21, 2016
  21. Jul 19, 2016
  22. Jul 18, 2016
    • Alejandro Rodríguez's avatar
      Refactor Refs to preserve their target objects instead of just a string representation · 396cba39
      Alejandro Rodríguez authored
      Notice that in `spec/tag_spec.rb` I changed the sha because previously they
      referenced the tag annotation commit. With the refactor, the sha must be the one
      of the commit that the tag actually points at.
      396cba39
    • Paco Guzman's avatar
      Handle collapsable DiffCollection · 703105c3
      Paco Guzman authored
      We want to give the user the opportunity to see more
      files while preserving safe limits on rendering diffs
      so files bigger than 10KB (we consider too large 100KB)
      their lines are not taking into account to decide
      if the collection overflow and you can decide if
      you want to expand them on click or no.
      
      To keep good performance we start collapsing all
      the files since we are over the safe limits.
      
      To do that you have to methods `collapsed?` and
      `collapse` which means if it's quite large and the
      other if the collection collapsed it when iterated
      as a whole.
      
      It's quite similar to the `too_large`concept but
      give us a little bit more flexibility.
      703105c3
    • Valery Sizov's avatar
      Optional ref update on commit create · c0067588
      Valery Sizov authored
      c0067588
  23. Jul 06, 2016
  24. Jul 05, 2016
  25. Jun 23, 2016
  26. Jun 22, 2016
  27. Jun 16, 2016
Loading