Commits · 3214a6564d30233b63ca70cb3d96a69841709350 · GitLab.org / gitlab_git

Oct 10, 2016

Bump verison to 10.6.9 · 3214a656
Douwe Maan authored 8 years ago

View commits for tag v10.6.9 v10.6.9

3214a656

Merge branch 'patch-diff' into 'master' · 181fa5e8

Douwe Maan authored 8 years ago

Optimize diff creation from Rugged::Patch

Our diff threshold for pruning is fairly small, but we used to iterate over *each* line, even of a 100mb patch, just to check these small thresholds. Now we stop when any of the limits is met.

This should fix the issue reported in https://gitlab.com/gitlab-org/gitlab-ce/issues/2529

See merge request !128

181fa5e8

Merge branch 'rs-bundle-audit' into 'master' · c179b1e7

Douwe Maan authored 8 years ago

Add bundle-audit gem and CI task

Also correct `source` in Gemfile to use HTTPS.

See merge request !127

c179b1e7

Oct 07, 2016

Optimize diff creation from Rugged::Patch · ad3a18a7

Alejandro Rodríguez authored 8 years ago

Our diff threshold for pruning is fairly small, but we used to iterate
over *each* line, even of a 100mb patch, just to check these small
thresholds. Now we stop when any of the limits is met.

ad3a18a7

Add bundle-audit gem and CI task · ae1e840e
Robert Speicher authored 8 years ago
```
Also correct `source` in Gemfile to use HTTPS.
```
ae1e840e

Oct 03, 2016

Merge branch 'rs-be_valid_commit-matcher' into 'master' · 08984b77

Douwe Maan authored 8 years ago

Fix invalid `be_valid_commit` matcher

This matcher could have theoretically evaluated to the following:

```ruby
true
false
true
true
```

...and never would have caused a failure. Indeed, it should have been
failing for quite a while.

See merge request !126

08984b77

Oct 02, 2016

Fix invalid `be_valid_commit` matcher · 80e3a7a5

Robert Speicher authored 8 years ago

This matcher could have theoretically evaluated to the following:

```ruby
true
false
true
true
```

...and never would have caused a failure. Indeed, it should have been
failing for quite a while.

80e3a7a5

Sep 29, 2016
- Bump version to 10.6.8 · 347d234d
  Dmitriy Zaporozhets authored 8 years ago
  
  Signed-off-by: Dmitriy Zaporozhets <dmitriy.zaporozhets@gmail.com>
  View commits for tag v10.6.8 v10.6.8 Verified
  
  347d234d
- Merge branch 'dz-straight-diffs' into 'master' · a0d22d35
  Dmitriy Zaporozhets authored 8 years ago
  
  Straight diffs support Based on https://gitlab.com/gitlab-org/gitlab_git/merge_requests/41. All credentials to @ben.boeckel See merge request !125
  a0d22d35
- Add changelog entry for straight diff feature · 595ed5d8
  Dmitriy Zaporozhets authored 8 years ago
  
  Signed-off-by: Dmitriy Zaporozhets <dmitriy.zaporozhets@gmail.com>
  Verified
  
  595ed5d8
- Update specs according to newline styleguide · 7b31c259
  Dmitriy Zaporozhets authored 8 years ago
  
  Signed-off-by: Dmitriy Zaporozhets <dmitriy.zaporozhets@gmail.com>
  Verified
  
  7b31c259
Sep 23, 2016
- diff, compare: add support for `git diff A B` · a4800e5f
  Ben Boeckel authored 8 years ago
  
  Currently, only `git diff A...B` is supported. Add a `straight` argument for direct diff and commit results.
  a4800e5f
- diff: remove unused variable · 53e4552b
  Ben Boeckel authored 9 years ago
  
  53e4552b
Sep 22, 2016

Merge branch 'fix/commit-no-repo-error' into 'master' · fc105f0d

Rémy Coutable authored 8 years ago

Fix broken repo error raised after requesting a commit

Prevents a 500 error on screen after `Rugged::RepositoryError: Could not find repository from '/var/opt/gitlab/git-data/repositories/user/project.git'`, which could happen when checking for the last commit in the list of projects (dashboard).

Related https://gitlab.com/gitlab-org/gitlab-ce/issues/20501

See merge request !124

fc105f0d

Commit.find returns nil and no longer throws an error on an empty repository · 7e4bde8f
James Lopez authored 8 years ago
```
Also added relevant spec with broken repo.
```
7e4bde8f

Sep 14, 2016
- Release 10.6.6 · 5870f87d
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.6 v10.6.6 Verified
  
  5870f87d
- Attribute parser support for paths without attrs · 8509cae3
  Yorick Peterse authored 8 years ago
  
  This adds Git attribute parser support for file paths that don't contain any attributes.
  Verified
  
  8509cae3
- Release 10.6.5 · 78682cf7
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.5 v10.6.5 Unverified
  
  78682cf7
Sep 12, 2016

Merge branch 'file-permissions' into 'master' · d2bdcf7a

Rémy Coutable authored 8 years ago

gives same permissions to file even if the file was renamed

Used to fix https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/5979

See merge request !118

d2bdcf7a

Sep 10, 2016
- Retain file mode whenever file is renamed · 0f258836
  tiagonbotelho authored 8 years ago
  
  0f258836
Sep 09, 2016
- Release 10.6.4 · 4832ec2c
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.4 v10.6.4 Unverified
  
  4832ec2c
Sep 08, 2016

Merge branch 'mark-blobs-binary' into 'master' · 43cc7189

Douwe Maan authored 8 years ago

Mark blobs as binary whenever this is known

This removes the need for using `Linguist::BlobHelper#binary?` whenever the binary status is already known, in turn reducing loading times of https://gitlab.com/nrclark/dummy_project/commit/81ebdea5df2fb42e59257cb3eaad671a5c53ca36 (by about 2-ish seconds locally).

See merge request !123

43cc7189

Mark blobs as binary whenever this is known · 3bc9611e

Yorick Peterse authored 8 years ago

Previously we would rely on Linguist::BlobHelper to determine if a blob
was binary or not. Since Rugged knows if a blob is binary we can instead
just inherit this information and fall back to BlobHelper if the binary
flag wasn't set explicitly.

When testing this with https://gitlab.com/nrclark/dummy_project/commit/81ebdea5df2fb42e59257cb3eaad671a5c53ca36
it reduces loading times (locally) by around 2 seconds.

Verified

3bc9611e

Merge branch 'better-large-diff-handling' into 'master' · 750b9e73

Douwe Maan authored 8 years ago

Improve handling of large diffs

This MR adjusts the way checking for large diffs takes place. Prior to this MR the procedure was basically as follows:

1. Iterate over every diff in a collection
2. Just load the entire diff into memory, why not
3. Check if the resulting content _including_ any diff markers/meta data exceed a threshold
4. Prune or collapse the diff

This MR changes things around so the procedure is instead as follows:

1. Iterate over every diff in a collection
2. Check if the data modified (excluding diff markers) is larger than a threshold
3. If this is not the case, proceed as usual. if this _is_ the case we'll prune/collapse the diff

See merge request !122

750b9e73

Check for large diffs upon initialisation · 4c008a2f

Yorick Peterse authored 8 years ago

Prior to this commit the DiffCollection class was responsible for
checking if a diff had to be collapsed or was too large to be displayed
altogether.

This commit changes both DiffCollection and Diff so that Diff itself
checks if its too large or has to be collapsed. These checks happen when
the Diff is being initialised. The patch size is based on the size of
every line in every hunk of the diff, instead of relying on the diff as
a string including diff markers.

DiffCollection still has an extra check to collapse diffs when it has
iterated over too many files. Since this is unrelated to the actual
sizes this has been kept as-is.

For binary files no pruning takes place as the diffs for these files are
not displayed. In the past the size of a diff was reported based on the
diff's size (including metadata). If we were to use the actual file's
size a diff would be marked as being too large and in the case of an
image would never be displayed.

Unverified

4c008a2f

Sep 06, 2016
- Release 10.6.3 · d49ed5b3
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.3 v10.6.3 Unverified
  
  d49ed5b3
- Fix attribute support for the "binary" option · 0f62db54
  Yorick Peterse authored 8 years ago
  
  When the "binary" option is set to true the "diff" option is to be set to false automatically.
  Verified
  
  0f62db54
Sep 05, 2016
- Release 10.6.2 · d47cc16e
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.2 v10.6.2 Unverified
  
  d47cc16e
- Fix matching Git attributes using absolute paths · dd65dcef
  Yorick Peterse authored 8 years ago
  
  Git attributes can be defined using an absolute path. This commit fixes the matching procedure so that attributes and paths with leading slashes are supported properly.
  Verified
  
  dd65dcef
Sep 01, 2016
- Release 10.6.1 · 209cae3e
  Yorick Peterse authored 8 years ago
  
  View commits for tag v10.6.1 v10.6.1 Unverified
  
  209cae3e
- Handle missing attribute files when parsing · afaeb8f2
  Yorick Peterse authored 8 years ago
  
  Previously using Gitlab::Git::Attributes when $GIT_DIR/info/attributes didn't exist would result in a runtime error. This commit fixes this so an empty Hash is produced instead.
  Verified
  
  afaeb8f2
Aug 31, 2016

Bump version to 10.6.0 · 7e618d03
Douwe Maan authored 8 years ago

View commits for tag v10.6.0 v10.6.0

7e618d03

Merge branch 'ruby-gitattributes-parser' into 'master' · 62927165

Douwe Maan authored 8 years ago

Parse Git attribute files using Ruby

Commit 340e111e contains all the details. It's quite the read so the short summary is:

> Rugged is slow as heck because it runs multiple IO calls every time you request a set of Git attributes. gitlab_git now provides a pure Ruby parser that avoids this and is between 4 and 6 times faster.

Here's a Grafana screenshot to show how bad it can get:

![timings](/uploads/39f7b6b7b6a8d97f2b11a20a088988e4/timings.jpg)

See https://gitlab.com/gitlab-org/gitlab-ce/issues/10785 for more information.

See merge request !121

62927165

Parse Git attribute files using Ruby · 340e111e

Yorick Peterse authored 8 years ago

Rugged provides a way of parsing Git attribute files such as the one
located in $GIT_DIR/info/attributes. Per GitLab's performance monitoring
tools quite a lot of time can be spent in parsing/retrieving attributes.

This commit introduces a pure Ruby parser for gitlab_git that performs
drastically better than the one provided by Rugged.

== Production Timings

As an example, take the commit https://gitlab.com/nrclark/dummy_project/commit/81ebdea5df2fb42e59257cb3eaad671a5c53ca36
(as taken from https://gitlab.com/gitlab-org/gitlab-ce/issues/10785).
When loading this commit we spend between 4 and 6 seconds in
Rugged::Repository#fetch_attributes. This method is called around 1100
times. This is the result of two problems:

1. For every diff we call Gitlab::Git::Repository#diffable? and pass it
   a blob. This method in turn returns a boolean (based on the Git
   attributes for the blob's path) indicating if the content is
   diffable.

2. For every diff we use the GitLab class Gitlab::Highlight which calls
   Repository#gitattribute in the #custom_language method. This is used
   to determine what language to use for highlighting a diff.

As a result in the worst case we'll end up with 2 calls to
Gitlab::Git::Repository#attributes (previously delegated to
Rugged::Repository#attributes).

== Rugged Implementation

Rugged in turn implements the "attributes" method in a rather
in-efficient way. The first time this method is called it will run at
least a single open() call to open the file. On top of that it appears
to run 2 stat() calls for every call to Rugged::Repository#attributes.
In other words, if you call it a 100 times you will end up with 201 IO
calls:

* 200 stat() calls
* 1 open() call

== Rugged IO Overhead

To confirm the IO overhead of Rugged I created the following script
(saved as "confirm.rb"):

    require 'rugged'

    path = '/tmp/test/.git'
    repo = Rugged::Repository.new(path)

    10.times do
      repo.attributes('README.md')['gitlab-language']
    end

I then ran this as follows:

    strace -f ruby confirm.rb 2>&1 | grep -i 'info/attributes' | wc -l

This counts the number of instances an IO call refers to the
"$GIT_DIR/info/attributes" file. The output is "21", meaning 21 IO calls
were executed.

While this may not be a big problem when using physical storage (even
less so when using SSDs), this _will_ be a problem when using network
storage. For example, say every operation takes 2 milliseconds to
complete. This would result in _at least_ 400 milliseconds being spent
in _just_ the IO operations.

The Ruby parser on the other hand only uses a single open() IO call.

== Benchmarking

To measure the performance of this code I wrote the following benchmark:

    require 'rugged'
    require 'benchmark/ips'

    require_relative 'lib/gitlab_git/attributes'

    repo = Rugged::Repository.new('/tmp/test/.git')
    attr = Gitlab::Git::Attributes.new(repo.path)

    Benchmark.ips(time: 10) do |bench|
      bench.report 'Rugged' do
        repo.attributes('test.haml.html')['gitlab-language']
      end

      bench.report 'gitlab_git' do
        attr.attributes('test.haml.html')['gitlab-language']
      end

      bench.compare!
    end

The contents of /tmp/test/.git/info/attributes are as follows:

    # This is a comment, it should be ignored.

    *.txt     text
    *.jpg     -text
    *.sh      eol=lf gitlab-language=shell
    *.haml.*  gitlab-language=haml
    foo/bar.* foo
    *.cgi     key=value?p1=v1&p2=v2

    # This uses a tab instead of spaces to ensure the parser also supports this.
    *.md	gitlab-language=markdown

Running this benchmark on my development environment produces the
following output:

    Warming up --------------------------------------
                  Rugged     9.543k i/100ms
              gitlab_git    43.277k i/100ms
    Calculating -------------------------------------
                  Rugged    100.261k (± 2.0%) i/s -      1.012M in  10.093380s
              gitlab_git    482.186k (± 1.7%) i/s -      4.847M in  10.055286s

    Comparison:
              gitlab_git:   482185.6 i/s
                  Rugged:   100260.6 i/s - 4.81x  slower

The exact output differs on system load but usually the new Ruby based
parser is between 4 and 6 times faster than Rugged.

To further test this I wrote the following benchmark:

    require 'benchmark'

    amount = 5000
    rugged = Rugged::Repository.new('/var/opt/gitlab/git-data-ceph/repositories/gitlab-org/gitlab-ce.git')
    attrs = Gitlab::Git::Attributes.new(rugged.path)

    rugged = amount.times.map do
      timing = Benchmark.measure do
        rugged.attributes('README.md').to_h
      end

      timing.real * 1000.0
    end

    ruby = amount.times.map do
      timing = Benchmark.measure do
        attrs.attributes('README.md')
      end

      timing.real * 1000.0
    end

    puts "Rugged: #{rugged.inject(:+)} ms"
    puts "Ruby: #{ruby.inject(:+)} ms"

This script uses Rugged and the new attributes parser, parses the same
attributes file 5000 times, and then counts the total processing time.
Running this script on worker1 produced the following output:

    Rugged: 131.95287296548486 ms
    Ruby: 30.17003694549203 ms

Here the Ruby based solution is around ~4.5 times faster than Rugged.

== Further Improvements

GitLab may decide to at some point cache the parsed data structures in
for example Redis, which is now possible due to them being proper Ruby
data structures. Note that this is only really beneficial in cases where
Git attributes are requested for the same file path in different
requests. This also requires careful cache invalidation. For example, we
don't want to invalidate the entire cache when modifying some unrelated
file.

Because of the complexity involved it's best to leave this for later and
only implement it once we're certain it will actually be beneficial.

Unverified

340e111e

Aug 29, 2016
- Bump version to 10.5.0 · b205c79e
  Stan Hu authored 8 years ago
  
  View commits for tag v10.5.0 v10.5.0
  
  b205c79e
- Merge branch 'add-repository-find-branch' into 'master' · 8d987626
  Yorick Peterse authored 8 years ago
  
  Add Repository#find_branch to speed up branch lookups See merge request !119
  8d987626
Aug 27, 2016
- Add CHANGELOG entry about Repository#reload_rugged · 6cb92440
  Stan Hu authored 8 years ago
  
  6cb92440
- Provide Repository#reload_rugged to allow other callers to refresh the repository · 6a75ea50
  Stan Hu authored 8 years ago
  
  6a75ea50
- Add a force_reload parameter in Repository#find_branch to workaround stale in-memory · 0bbe707a
  Stan Hu authored 8 years ago
  
  refs DB See https://gitlab.com/gitlab-org/gitlab-ce/issues/15392#note_14538333
  0bbe707a
- Add Repository#find_branch to speed up branch lookups · 3606265d
  Stan Hu authored 8 years ago
  
  A common call in GitLab is to lookup a single branch, but previously this was done by calling Repository#branches, which loads all the branches into memory unnecessarily and causes many filesystem accesses for each branch. With Repository#find_branch, we can do a direct lookup for the branch we care about.
  3606265d

Admin message

Admin message