From https://gitlab.slack.com/archives/core/p1443048046001327
expect a lot of merge conflicts in the CHANGELOG file this release. Seems that the .gitattributes is not being considered at the moment. (I saw an issue for this, gotta find it^^)
I had to ask a user to rebase again today because of this
I did some investigation in this by digging through libgit2 and git source code. git has support for custom merge drivers (e.g. union), which allows the .gitattributes merge=union trick. While libgit2 and rugged have support for reading .gitattributes, they don't appear to have support for custom merge drivers. However, as the issue alludes, we might be able to do something like the following:
Attempt to merge A and B
If there are conflicts, load the .gitattributes and iterate through each conflict.
For each conflicted file, if there is an entry in .gitattributes and the merge type is union, then set the favor to GIT_MERGE_FILE_FAVOR_UNION in the merge options. Attempt to merge just those files one-by-one.
If all conflicts merge successfully individually, then declare success.
I'll play around to see if this is possible. Long term it would be better for libgit2/rugged to support this use case natively.
Looks like this one is a bit difficult to fix. I can easily call rugged.merge_commits(first, second, favor: :union), but this would result in a blanket union merge, which is not correct. We want this only to apply to the CHANGELOG, which isn't support at the moment by libgit2 and rugged.
I'll see what I can do push this forward, but no guarantees.
Just an update on this issue: Edward Thomson has taken my PR and added full merge driver support: https://github.com/libgit2/libgit2/pull/3564. This is currently being reviewed.
This PR will not only solve the merge=union case in .gitattributes, but also merge=text etc.
This probably will take a while longer to land, so I'm guessing 8.5 or 8.6 might be a more realistic target.
Would this feature also bring support for marking specific files as binary or text using the .gitattributes file? This issue on Github is marked as closed, but there is a question over if it is actually supported and if this work will change that.
Essentially I'd like to be able to mark compiled files as binary and their diffs to not show up in the changes tab of commits or merge requests.
I'm also very much interested in marking files as binary. We have our gitlab failing to generate a diff time after time, because there are some huge text files present which are actually binary. We mark them like that in .gitattributes: dist/**/* -diff, and then they're shown as binary in git diff, but GitLab ignores .gitattributes for now, and that makes merge requests much more painful (and sometimes even impossible to create)
I've created a small test repository which demonstrates the behaviour I think Gitlab should be showing. Like @tsayen I would like to mark some text files as binary as they are large and cause Gitlab to become unusable when they are changed (which they frequently are).
There there two branches. master has a txt file and a .gitattributes file which has the rule *.txt binary which marks all txt files as binary. On the change branch I've added a lint into the txt file.
At first, I assumed that libgit2 was not correctly supporting the binary attribute in .gitattributes, however after playing around a bit it seems that it is supported already for diffs.
You can try it yourself by compiling the diff` example, copying it into the example repository and running it.
It seems to me that the issue with the .gitattributes file being ignored is somewhere in either Rugged or gitlab_git. I've not yet had time to test the behaviour or either of the wrappers yet.
Hopefully this helps you. This is functionality which is really needed for us, and is causing some to question moving to Github (interestingly Github also fails to look at the .gitattributes file, pointing to it being a Rugged bug, however, they are able to cope with having large text files much better than our Gitlab instance is).
@matto1990 Thank you so much for the detailed test. I think I'm beginning to understand what is going on here. The key here is that using .gitattributes for binary diff generation does not appear to work on a bare repository:
$ git clone --bare https://gitlab.com/matto1990/diff-example.git$ git diff da222d2f..068f578diff --git a/untimely.txt b/untimely.txtindex 8a239dc..09430a5 100644--- a/untimely.txt+++ b/untimely.txt@@ -1,6 +1,8 @@ Untimely ========+This is an edit.+ Nothing in life has been made by man for man's using But it was shown long since to man in ages Lost as the name of the maker of it,
I verified that libgit2/Rugged properly supports generating the right diff for the binary case in a non-bare repository. Using your repo, you can show this via this test code:
gitlab_git uses similar code above to generate the patch, but since GitLab operates only on bare repositories, the diff doesn't obey the .gitattributes file.
Previous versions of GitLab used to use satellites, which would have generated the proper diff. Perhaps there is a way to patch libgit2 to look at .gitattributes even for a bare repositories.
Note that git merge drivers are different from diff drivers. The merge=union case should work on bare repositories once the libgit2 changes are merged.
@stanhu Thanks for the reply! So it seems like the issue lies with libgit2. I've opened an issue on their Github repo, so we'll and see what the maintainers say about it. My experience with these projects is fairly minimal, so I'm not sure how big a task it will be to support .gitattributes for bare repositories.
@dzaporozhets@stanhu Where would it be best to make this change? I'm happy to take a look at it if it's a good task for a codebase beginner. I'm assuming it would be in the gitlab-git project. After a very quick inspection, somwhere in the repository.py file seems like a good place to start.
@matto1990 I would first try to write a script that uses Rugged to do the following:
Look up the SHA ID for HEAD
Figure out what blob OID is there for .gitattributes
Retrieve the contents of the blob
Copy this file into the bare repo info/attributes.
Then I think we might be able to add that functionality into gitlab_git and add a call to that in app/workers/git_push_service.rb that updates the info/attributes files for every push to the default branch.
I think the GitLab compare controller handles the binary diff message, but if it doesn't that would have to be adjusted as well.
I've opened a merge request for the gitlab_git part of the task, however, I think the part for inside Gitlab will need more thinking than you mentioned so far.
The cases I can think of for using the .gitattributes file are (I'm mainly thinking about generating the diffs for merge requests:
No attributes in the target branch. None in source branch.
No attributes in the target branch. New file in source branch.
Attributes in the target branch. Same in source branch.
Attributes in the target branch. Updated ones in source branch.
Attributes in the target branch. Removed in source branch.
For the diffs to be correct, it seems to me that we would need to make sure we are using the attributes from the source branch before every diff.
The method you mentioned with using the attributes from the default branch would be simpler, but would lead to incorrect results in a lot of the cases. However, it's unlikely that the .gitattributes file changes often, so maybe this is fine for a first version.
@matto1990 Thanks for submitting the MR, and good investigation. You're right. It's too bad this git thread didn't get more comments. It sounds like one of the git authors thought this:
I was discussing this on IRC with pasky and he suggested that the best
way to approach the matter was to use the .gitattributes from the
relevant head.
Based on my experiments, it looks like the command-line git reads from the current working branch on a non-bare repository.
The only way to match this functionality is to write out the info/attributes file before each diff, but that seems like it could lead to other issues (e.g. race conditions when multiple users generate a diff for different branches).
I think if we want to keep things simple, stick to using the default branch (HEAD) and use the .gitattributes from that. My only worry this could also lead to unintended issues down the road. What do you think, @DouweM?
You could always hand-edit the info/attributes for the repositories in question. Not ideal, but it would work today.
I started poking at the git source code to see what it would take to support .gitattributes for a bare repo.
Curiously, I can't seem to get it working when I manually edit the info/attributes file in the repo on our Gitlab server.
I'm editing the file in /var/opt/gitlab/git-data/repositories/redacted/redacted.git/info/attributes and adding in the exact contents as is working in .gitattributes. I've also ensured the file has the correct owner and group (git) and given is very open permissions (775).
When doing a diff via a merge the request, the files I've marked as binary are still showing up with a textual diff.
It's not super important as it's a hack anyway, but any idea what might be going on? Maybe there some caching happening somewhere?
@matto1990 Yes, I believe the diffs are cached each time after a push occurs. One trick: append ?w=1 to the URL, and I think the diff is generated on-the-fly. You can also try pushing to the branch with a new commit (or force push after a git commit --amend).
Once I did that, I think I ran into a bug with how diffs are shown. The file in the screenshot is one of the text files which I would like to be treated as binary. It is now no longer showing a diff of the changes, but the text content at the start of the diff is still shown to the user (I couldn't show you the whole file in case there's something under NDA in there).
I think if we want to keep things simple, stick to using the default branch (HEAD) and use the .gitattributes from that.
Sounds good.
My only worry this could also lead to unintended issues down the road.
The only issue I can see is the wrong attributes being used when .gitattributes was changed in the branch being compared. That's slightly annoying, but acceptable. It doesn't change often, and most compares will be against the default branch anyway.
I've tried it out locally it it works fine, however, the diff is still showing up slightly strange in the MR for my test repository. It seems to be that it outputs the letter B (I assume for binary) and then the first line of the text file which has been marked as binary.
I've had a look at chasing through where the diff is coming from, but it seems like it's a few layers deep (Gitlab -> gitlab_git -> rugged -> libgit2) so I'm not certain where the problem is being introduced.
Aside from the diff display issue (which is more cosmetic than anything else), it would also be good to get some advice on the best way to test this new behaviour.
@matto1990 My guess is that the compare controller does not handle the "Binary files X and Y differ" case properly. We may need to look at the diff for the file and add some special handling for it.
@matto1990 To determine whether to render the diff as text or not, we call blob.text? which is Linguist::BlobHelper#text?, which uses heuristics to determine if the file is text or binary, and doesn't actually check .gitattributes. I don't know if we can ask rugged/libgit2 if it sees the file as text as binary, or if we should somehow parse .gitattributes manually.
@stanhu@DouweM I've taken a look at that and have done a quick experiment to prove it works.
I've added an extra check in app/helpers/blob_helper.rb's blob_text_viewable? method to check if the blob is diffable. To do this it gets the attributes from Rugged and checks for the diff attribute, returning true if there isn't one. This seems to be the best attribute to use, as it is set the false if you have either of these lines, both of which are commonly suggested as a way to stop files showing up in a diff:
*.txt -diff*.txt binary
When adding this in, the file text or comment buttons in the header are no longer shown. Instead, it now shows a No preview for this file type message, which is something I think could be changed to be more descriptive.
@matto1990 Sounds like a great plan. I do wonder what the performance impact would be of doing a .gitattributes lookup for every blob in a diff. I wonder if it's easier just to handle the "Binary files X and Y differ" message gracefully in a diff.
@matto1990 This line strips out the binary diff header and leaves the diff empty. What if we set an attribute (e.g. @binary = true) in the diff, and then had GitLab display a special message for binary files? That would not only help existing diffs but also take care of the .gitattributes issue.
@stanhu Has suggested making a further change to gitlab_git to handle this behaviour inside the diff creation itself. I'm hesitant to make the changes myself to this method as even the comments within the method seem unsure of its behaviour. I think the steps to get this working would be to set a binary attribute if it detects the file is binary and then file diff template look for this attribute and show a specific message.
If that is correct, and that is the preferred method of doing this, I can look into it over the weekend. Let me know what you think my next steps should be.
FYI, this issue deals with multiple things with .gitattributes:
@matto1990 has a MR to suppress diffs in files specified as binary
Changes necessary to support the merge=union attribute that helps GitLab avoid conflicts with CHANGELOG
The second was merged to libgit2 on March 17 via https://github.com/libgit2/libgit2/pull/3564. Presumably it will be available in a release in libgit2 v0.25.0. Still waiting for that release.