Directory structure produces too long file paths

The fix for that (at least partial one) is on master (bleeding edge). The discussion about the issue is there: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/8.

If the path is too long you can always specify different builds_dir (/builds?) that will cut path even further: /home/gitlab_ci_multi_runner/tmp/builds/.

I still haven't decided what default setting should be. So this is still partial (and not final) solution for that issue. I would like to have it work OOB and still have it compatible with: Go paths, but that compatibility may be dropped ufortunately. It's pity that this is issue on Linux :(

Added ~58730 label

Hi @carmenbbakker i had the same issue (see #8 (closed)) when using virtualenv for my builds. I solved this by creating a virtualenv with following lines in my build job:

VIRTUALENV_DIR="/tmp/$(uuidgen)"
virtualenv -p python3 $VIRTUALENV_DIR
source $VIRTUALENV_DIR/bin/activate
python yourscript.py

I'm sorry! I hadn't looked at the closed issues. I kind of figured out what went wrong after in the second error as well. tox explicitly calls pip within its code, and ended up with the following system call:

/home/gitlab_ci_multi_runner/tmp/builds/runner-9c5c3214-project-1329-concurrent-0/gitlab.com/carmenbbakker/schaakmat/.tox/py34/bin/pip install --pre -r/home/gitlab_ci_multi_runner/tmp/builds/runner-9c5c3214-project-1329-concurrent-0/gitlab.com/carmenbbakker/schaakmat/requirements.txt

Which would fail because the pip executable had a faulty shebang.

I'll try to move my virtualenv towards the /tmp folder and see what happens.

i fully understand your problem, since i use buildout i had this issue on my bin/buildout script. Be sure your /tmp directory is cleaning up.

In any case I now personally managed to circumvent the issue by moving both my virtualenv and my tox environment over to /tmp. Whether that solves this issue, I don't know.

@ayufan I'm not going to pretend that I know how this program works internally, but wouldn't a solution such as the following drastically reduce the folder size?

/home
└── ci_runner
    └── builds
        └── $DESCRIPTION_OF_SPECIFIC_BUILD
            ├── build_info.json
            └── $GIT_REPO

The tmp directory isn't exactly needed, and neither is gitlab.com/$GITLAB_USER. I also shortened the user to ci_runner instead of gitlab_ci_multi_runner.

You can put some of the info that currently resides in the file path inside of build_info.json, such as whether the build is concurrent and which repository and user it belongs to. This allows $DESCRIPTION_OF_SPECIFIC_BUILD to be about as short as a single hash.

Just my two cents.

Yes. I'm open to propositions. It would be great to have it resolved in v0.4.0 which should get shipped and the end of next week.

What I want to achieve:

make the path compatible with Go: Go puts all sources in $GOPATH/src/gitlab.com/ayufan/some-project/
make the path work with shebang limit: max path to the binary is set to 127 length
make the path that contains unique name of concurrent build (runner-%s-project-%s-concurrent-%s - or anything similar, shorter, put differently) for directories which can share builds

What is also important to look for best default: path can be put anywhere, it can be: /home/gitlab_ci_multi_runner/tmp/builds, it can also be /builds

We can also change the RUNNER_USER to gitlab-runner and change the name of the package to gitlab-runner instead of current gitlab-ci-multi-runner. It was proposed by @sytses (but that I would do only for new installations).

Please post your ideas. I'll review and pick the best one as the solution :)

@ayufan using gitlab-runner as the user would be very nice as the previous official runner used this user. Also doing runner-9c5c3214-project-1329-concurrent-0 seems a bit too much. We could do something like RUNNERID-project-PROJECTID-CONCURRENTID and document that well so anyone who needs to know what that means can easily decipher.

@marin Thanks for comments. The runner-... is already shortened on master, but if anyone have any better idea I'll be happy to adapt it :)

@marin If we document it well, I think the project in the path can be neglected.

@ayufan Why do you want the path to be compatible with Go conventions, though? I can kind of understand that it'd be convenient for Go programmers, but a CI runner ought to be able to be used for projects in any language. If one language (Go) has some weird requirements on folder structure, that doesn't mean that that single weird requirement should be forced unto all projects.

But I don't know enough about Go to figure out why it has or needs such a folder structure. Does building/testing Go stuff become significantly more difficult if the convention isn't followed? It's a little weird to me, because every other language I know just puts their stuff in a build dir inside of the project root, or puts their build dir inside of the parent of the project root.

In any case, I don't think the runner-$RUNNERID-project-$PROJECTID-concurrent-$CONCURRENTID directory should contain that much information inside of its filename. You could shorten it to simply $RUNNERID-$PROJECTID-$CONCURRENTID and document what the numbers mean. Or you could shorten it all the way to $BUILDID and store all the metadata inside of a configuration file in that directory.

@ayufan would

PROJECTID-

CONCURRENTID work? (I assume multiple runners would be hosted in different directories)

Ok. I simply think that Go paths are nice and pretty descriptive, but if it's such a big problem I have no further objections to drop them. One thing must be retained: the project name, because some of the projects relies on directory name to create executables.

What do you think about this default:

/home/gitlab-runner/builds/$RUNNER/$PROJECTID-$CONCURRENTID/project-name/

Currently gitlab-ci-multi-runner can run projects from multiple CI's so we need some differencing factor for CI, that's why $RUNNER needs to be added.

Ex. /home/gitlab-runner/builds/01234567/12345-0/project-name/

This gives a roughly 58 characters and leaves a lot of to play with. What do you think?

@ayufan if there is a project name already can't we drop the a$PROJECTID?

We could, but project-name alone is not unique, because we miss the group, but maybe something like that?

/home/gitlab-runner/builds/$RUNNER/$CONCURRENTID/group-name/project-name/

Second option is: we could drop $RUNNER and add the hostname:

/home/gitlab-runner/builds/$CONCURRENTID/gitlab-hostname/group-name/project-name/

what about hashing this information? HASH($RUNNER-$CONCURRENTID-GroupName-ProjectName)

It's good idea, but I would like at least to have valid ProjectName in path.

You could make a path like the following:

/home/gitlab-runner/builds/$HASH/project-name/

$HASH is then a unique folder. Second level folder preserves a bit of meta-information.

@razer6 what hashing function do you propose and how long it should be?

@ayufan Any cryptographic secure hash function is sufficient. Do not use MD5 (broken) or SHA-1 (considered to be broken). I would go for SHA-3 or SHA-2. They should be as long as they specify the hash. Truncating information breaks its security claims/proofs.

SHA224 (one algorithm of the SHA-2) family would be one choice. This uses a 224-bit hash.

Ok, but in that case it makes the path longer - not the shorter :) Full SHA-1 is 40 hexchars, SHA-2 is 64 hexchars.

I think that our goal is not to obfuscate path, just to make it shorter.

Agree. SHA224 would be 56 hexchars. But what about extralong group and project names?

Maybe we could get only part of the hash: 10 or 16 first hexs?

What advantages does hashing have? I think it is unneeded complexity for the end user trying to find a build.

Milestone changed to v0.4.0

Hashing would create a constant length directory name regardless of the actual name of the project. We could use parts of the hash, but we need to be able to handle collisions. e.g. something like that: pseude code

while True:
    h = hash(directory_name)
    path = os.path.join(build_dir, h)
    if not os.path.exists(path):
        # No collision, we can use it
        break
    # we have a collision, need to get a different hash
    # salt directory
    directory += randomword(10)

I totally agree that the current directory naming scheme is insanely verbose and will easily break. As a rule, no path should go beyond 240 chars as there are restrictions in many systems.

Now regarding the hashing approach, I am not so keep about it because it kills accessibility. Imagine that you want to go inside the builds directory to debug something, or to investigate how the disk space is wasted,... with hashes you are blind.

Let's try to have a simplification that does not make it impossible to use by humans. Anything that is redundant should go aways.

Here are few examples:

default username "gitlab_ci_multi_runner", why not only "gitlab_ci" or "gitlab_runner".
"builds" - do we really need it? if so use "b"

@sbarnea I guess that we don't have to be such aggressive in reducing the build path. I think that we are looking for solution that reduces the length of path, but still makes it at least a little readable :)

Sorry to not agree with you, you do have to be very aggressive on reducing the build paths. I think that you are missing the big picture here and underestimate the reality of path length limitations.

I worked with build systems for many years and I always reach bugs related to path lengths, on all platforms. User accessibility, while important, is the least important issue to address about path length.

network shares (NFS, SANs, NAS...) do often have a limitation 250 char limitation, sometimes imposed even for performance reasons.
think that the build files themselves could have a very deep structure that is adding to the build system one.

So if you want to avoid lots of unhappy users and lots of issues raised in the future now is the time to pick smart default settings that are less likely to break things later.

/home/gitlab-runner/b/{group-name}/{project-name}/

Do we really need to put the gitlab-hostname in path? I guess 99% of users will use a single server. How about putting it only when we have more than one, or use an ID (could also go for ~/b2/... approach)

Do we really need $RUNNERID or $CONCURRENTID? I don't think so.

Here is how I see simple:

/home/gitlab-runner/b/gimp/gimp-import/123/ 
/home/gitlab-runner/b2/team-a/myproject/456/ (if 2 means is the 2nd gitlab server)

It would be great if we could specify builds_dir without the runner adding any junk at the end. My requirement is to have a checkout in ..../myproject/branch. Config like this:

  executor = "shell"
  builds_dir = "myproject/$CI_BUILD_REF_NAME"

Results in myproject/master/runner-dcb16471-project-2581-concurrent-0/gitlab.com/...

For my use case having SharedBuildsDir toggled via config option would work. I'm willing to live with the consequences of multiple builders firing at the same time for the same branch, very unlikely to happen for us, and can always set concurrent = 0 if it becomes an issue.

@miskovic Thanks for feedback. I'll think about it. Can you describe what case you want to resolve with that approach?

I'm trying to use the runner to setup my staging sites. Currently we use Jenkins and have it checkout each branch into own folder by manually providing a branch name we want staged. The name of folder is significant as it allows nginx to find correct code when loading http://branch-name.staging.myproject.com/.

My end goal is to have the runner triggered by GitLab CI on each commit and have staging sites for each branch "just happen"(tm).

In such case I would advise to copy the content to some dedicated directory. First you run test suite steps and then as deploy step you rsync content to nginx exposed directory.

Makes sense, was starting to think in that direction but wasn't sure it would be canonical enough - in my case to even run the tests I need the build folder in the right place, so I may as well just rsync right after a checkout and take it from there. Thanks for the quick replies.

Status changed to closed

The final path for shared builds look like that:

/home/gitlab_ci_multi_runner/builds/runner-short/0/group/repo/

In 0.5.0: gitlab_ci_multi_runner will be changed to gitlab_runner for new installations

If anyone is willing to test it please check Bleeding Edge and report potential bugs as new issue.

I think I should be able to give it a go. Still first I would like to understand something regarding how the multi-runners are supposed to be used. For example, in my case SSH is the way to connect to all machines, even Windows ones (Cygwin).

I am wondering if it wouldn't be easier if I would install a single multi-runner instance on the gitlab server, one that would spawn via ssh runners that do run on different machines. Is this a valid setup or I am supposed to install the runners on each worker?

I am wondering what would happen if one worker is rebooted, the gitlab-ci server or the gitlab-ci-multi-runner.

Please create separate issues for different questions and we will try to answer them.

Sure, I added #57 (closed) which is a request to improve the documentation page by including information about PROs and CONs of different runner types.

BTW, The DEB download links do not work, I get Access Denied from Amazon S3.

@ayufan can you give an example of /home/gitlab_ci_multi_runner/builds/runner-short/0/group/repo/ with the sha1/tokens/numbers filled out?

@sytses Sure:

/home/gitlab_ci_multi_runner/builds/ae9f67c2/0/gitlab-org/gitlab-ci-multi-runner/

@ayufan Looks good!

The presented example is 81 chars long, and as we do know that many systems do have a limit of 250 chars, this will definitely cause problems.

@sbarnea I think that this is good compromise between readability and the need for shortening. You always have option to shorten it even further by defining builds_dir.

For deletion or handling of long name i check that "Long Path Tool" software should be try, many articles refer this for such solutions.

This is a peculiar bug of windows as it allows path name less than 256 characters. Either you can short the file name but that's just a temporary solution to this. You can try GS Richcopy 360, although its paid but it solved my problems related to this.

I would suggest to try "Long Path Tool" program.

Directory structure produces too long file paths

Designs

Child items ...

Activity

Admin message

Admin message

Directory structure produces too long file paths

Activity