Skip to content
Snippets Groups Projects

[WIP] GitLab pages (EE only)

Closed Kamil Trzcińśki requested to merge gitlab-pages into master

Fixes gitlab-org/gitlab-ce#3085

This is pretty naive approach for having static pages build on artifacts support:

  1. When job named pages is defined in .gitlab-ci.yml the artifacts from this build will be used for serving static pages.
  2. Build job needs to put all served files in public/ and upload artifacts to GitLab.
  3. GitLab detects and fires sidekiq job to unpack artifacts and put them into shared/pages/group/project folder.
  4. We then have nginx config with dynamic virtual host support serving files from shared/pages
  5. On every pages deploy we try to do atomic update using move filesystem operation
  6. If no pages is defined in .gitlab-ci.yml we inject predefined job that is executed for gl-pages branch.

Example .gitlab-ci.yml:

pages:
  image: jekyll/jekyll:builder
  script:
  - jekyll build --destination=public
  artifacts:
  - public/
  only:
  - master

Pros:

  1. By using CI and docker images we can build static webpage with any tools, we are not limited to stripped down Jekyll server as it is on GitHub Pages.
  2. By having Shared Runners it will work out of box on GitLab.com.
  3. By using Docker by default it will work out of box on most installations.

Cons:

  1. We need to build static page on CI and upload it to GitLab and unpack it later - this is time consuming
  2. We serve only static files, no dynamic content allowed

This is proof of concept and misses:

  1. Custom domain names (CNAME? file)
  2. Custom directory with static files, currently public/ is hardcoded
  3. Backup support
  4. Tests support
  5. Documentation how to use it
  6. Disable symlink following for Nginx
  7. Ability to define domain on which the sites will be served.
  8. Think how we should handle custom domains (if we want to have support for them)

The code is not yet nice, I'll make it better if we choose that this is a way how we want to do it.

@sytses @jacobvosmaer @marin @dzaporozhets What do you think?

It's based on build artifacts MR.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Kamil Trzcińśki Title changed from [WIP] Gitlab pages to [WIP] GitLab pages

    Title changed from [WIP] Gitlab pages to [WIP] GitLab pages

  • Kamil Trzcińśki Target branch changed from master to artifacts

    Target branch changed from master to artifacts

  • I think it is the perfect approach!

    "If no pages is defined in .gitlab-ci.yml we inject predefined job that is executed for gl-pages branch." => I don't think we should do this. For three reasons:

    1. A lot of magic, hard to see and change what happens.
    2. Now you have two ways of making pages, making it unclear what one is used, harder to document, force the user to decide, etc.
    3. I don't like the approach of having a branch that contains different content than the rest of the repository. I think the content to build should come out of a subdirectory of the repository that is also present in master.
  • BTW It is essential that NGINX uses a different FQDN to serve the pages. It can be the same FQDN for all pages, but it can't be the same as the FQDN for GitLab. This to prevent XSS attacks.

  • Author Maintainer

    BTW It is essential that NGINX uses a different FQDN to serve the pages. It can be the same FQDN for all pages, but it can't be the same as the FQDN for GitLab. This to prevent XSS attacks.

    This is how it is designed. The change exposes content under: http://<group>.some.address/<project>

    If no pages is defined in .gitlab-ci.yml we inject predefined job that is executed for gl-pages branch

    I agree with you on this, but this is also popular to have branch on which we have pages automatically deployed (as on GitHub). However, I'm not attached to this concept.

  • +1 for no special gl-pages branch. We already have .gitlab-ci.yml.

  • That is nice! It is something I really missed in GitLab.

    I especially like concept of building static pages using any tool. With this approach it would be possible to use, for example, custom script that uses pandoc aimed to combine LaTeX templates with Markdown content.

    And also I believe that cons described by @ayufan are in fact pros, because:

    1. Building and uploading to GitLab

      As I understand it, it makes it possible to upload not only static pages, but also static assets generated by build process. For example - documentation in PDF that may be created using pandoc.

    2. Serving only static content

      This is good, because serving static content from nginx is extremely efficient. Nginx handles such requests very well. And, over all, this is also very secure.

    Moreover, this concept is (almost) compatible with what I was thinking about. I was thinking about tool that could possibly replace jekyll and would suit better for generating documentation in HTML that may be converted into PDF easily.

    In large organizations documentation is very important, and currently (according to my research) there are no tools available that would adapt to Change Management Processes easily and provide pages/documentation/PDFs automatically. I thought about it a lot recently, I can write more about this concept if you are interested.

  • I thought about it a lot recently, I can write more about this concept if you are interested.

    I am interested, can you write about it?

    @ayufan I like the POC. Not 100% sure about storing the pages in shared although they are uploads so they fit into the concept. Have a few comments on the actual implementation but will hold off since it is a POC .

  • @marin I was thinking about solution that would help to handle project's documentation in a flexible way.

    Most important features would be:

    1. Building static content:

      1. generating static HTML pages (gl-pages build --html)

      2. generating PDF document from same source (gl-pages build --pdf)

    2. Serving static content (gl-pages serve --html)

    3. Serving REST API allowing to access documentation units/artifacts (gl-pages serve --api)

    Source documentation should be stored in project repository itself. Strict directory structure shouldn't be enforced - a lot of freedom/configurability needs to be supported regarding this manner.

    This tool should support LaTeX templates and Markdown files. I wasn't able to find existing solution that would make possible to generate rich HTML, with internal links and subpages and yet be able to generate single PDF from same source. However I believe it is possible with LaTeX and Markdown and some king of additional annotations engine build into this tool.

    That may be quite interesting feature to provide API that makes it possible to access parsed documentation as easily as invoking http GET host:port/docs/my_module/introduction.json. This should be, of course, possible to access different formats, like introducion.html or introduction.markdown.

    Making GitLab::Pages mountable in Rails (like any rack application) would be also a plus.

    So, tool like this, would provide means to:

    1. Use it with GitLab pages to generate static content and PDFs
    2. Use it after deploying your application to serve documentation on http://app/docs
    3. Use it after deployment to serve docs API than can be then consumed by application itself to provide (for example) context help to your users
    4. Use it during development to generate docs you can send to you client after sprint/iteration
    5. Make your docs always up-to-date as you are serving docs from repository branch/tag/commit

    And this is all in single Ruby Gem.

    What do you thing about it ? I think this would be a feature that fits into GitLab EE nicely as it would be needed mostly by enterprise users / in bigger projects.

  • Author Maintainer

    Have a few comments on the actual implementation but will hold off since it is a POC .

    Thanks. Implementation will change :) No point reviewing it now.

  • Author Maintainer

    And this is all in single Ruby Gem.

    I would like to keep it simple and not lock ourselves to our choosen solution. Also serving custom pages from GitLab has problems with potential XSS vulnerabilities.

    The proposal allows you to have any static site generator that you can think of. This requires serving pages under separate URL, but this shouldn't be a problem.

    Use it during development to generate docs you can send to you client after sprint/iteration

    You can download build artifact with docs.

    Make your docs always up-to-date as you are serving docs from repository branch/tag/commit

    The CI building process ensures that docs are always up-to date.

    From my point I think that you more into refactoring current Wiki (ie. generate Wiki automatically from docs/ in repository?), not the GitLab Pages. GitLab Pages the way I see it is more general solution, not only locked to documentation.

  • @ayufan Approach described by me does not conflict with your vision, in my opinion.

    My approach simply replaces jekyll with more flexible tool, nothing more. But you can - as you described - use whatever tool you like, and you can still choose jekyll.

    I like your solution for GitLab Pages very much, as it gives a lot of freedom and flexibility. Confining ourselves within chosen solution is not what I meant.

    What I was telling about is creating separate tool, that can be used independently, just like jekyll, and that may be used with GitLab Pages (but it is a matter of choice).

    Moreover, it is merely an introduction to that I was thinking about recently, as @marin wanted me to write about it. I'm not necessarily telling that this is something we should actually create.

    Edited by Grzegorz Bizon
  • Author Maintainer

    What I was telling about is creating separate tool, that can be used independently, just like jekyll, and that may be used with GitLab Pages (but it is a matter of choice).

    Ok, I get it now. Thanks for clarification :)

  • @grzesiek I think the confusion stems from calling your command gl-pages. As you indicated your gem would be a replacement for Jekyll/Middleman, not a replacement for GitLab Pages. If you want to discuss it further I recommend opening another issue.

  • @sytses Yes, I think you are right. Maybe name like Gitlab::Docs would be better, as this idea is related to documentation more than pages.

    @sytses, @marin, @ayufan do you think that this concept of gl-docs is worth talking about in separate issue, or we should leave it for now ?

  • Sid Sijbrandij Target branch changed from artifacts to master

    Target branch changed from artifacts to master

  • Author Maintainer

    @grzesiek I'm not sure that we need it now, but feel free to create an issue to have that in backlog.

  • @grzesiek Can you create the issue?

  • mentioned in issue #3479 (moved)

  • Making GitLab::Pages mountable in Rails (like any rack application) would be also a plus.

    I am jumping in the middle of a discussion here, but: I don't think we should be serving static pages / user websites with Unicorn.

    It makes more sense to me to build static content with a CI runner, send the result that to gitlab as 'artefacts', and then serve the artefacts from a special 'user content' domain using gitlab-workhose or NGINX. I probably just described @ayufan's plan.

  • Author Maintainer

    @jacobvosmaer This is what is implemented in this PoC :) It uses nginx for that.

  • @jacobvosmaer You are absolutely right, my idea wasn't about mounting static pages in gitlab but giving a possibility to mount pages in user's application using standalone tool/gem like Gitlab::Docs (see above)

  • I really would like to use something like harpjs.

    I've seen that's there's already a docker project using harp and nginx, also here it is written that can be done using only little bit of hardware. Could that be possible?

    Using the right script the pages could be compiled to static files once committed, right?

    Thank you!

  • Author Maintainer

    @bomba You will be able to use it as long as your page can be served statically.

  • @ayufan thank you! they could be. harp provide also a server application but it is not needed if the file are compiled/builded. harp need also to be installed as root (or sudoer)

    my question is:

    if i run the script "sudo harp compile" in the project folder the file are compiled to static pages into a "www" subfolder. Is it possible to use this script into the build process of gitlab and serve the pages/static files placed in the "www" subfolder?

    thanks!

  • Author Maintainer

    @bomba Yes.

  • People would like to see functionality like this that work with https https://news.ycombinator.com/item?id=10635769

  • Responding to pros/cons from the description.

    We need to build static page on CI and upload it to GitLab and unpack it later - this is time consuming

    I don't think this is bad. Only if it takes hours before a shared runner on gitlab.com is available (which can happen these days) but that should be solved by adding more shared runners, not in the design of the 'pages' feature.

    We serve only static files, no dynamic content allowed

    That is for later, static files is the first step.

    Disable symlink following for Nginx

    This is very important, I agree. Also the custom domain support as @sytses mentioned. Perhaps we should consider not even defaulting to the domain of the GitLab server because of the security risk (XSS I think) GitLab servers would otherwise be exposed to.

    @ayufan what about serving static pages with gitlab-workhorse?

  • Author Maintainer

    @ayufan what about serving static pages with gitlab-workhorse?

    I think that it should makes sense, but maybe not as part of gitlab-workhorse, but a new app that would also handle the virtual hosts and SNI certificates.

    Having the SNI support (along with any static generator support) would be big advantage of GitLab Pages. It would make it easy to implement with Go.

    Why a new app? I can think that GitLab Pages would get a lot of load and it may affect the performance of GitLab Workhorse.

    /cc @jacobvosmaer

    Edited by Kamil Trzcińśki
  • Author Maintainer

    @jacobvosmaer I think that we mostly agree to the concept. Do you have any thoughts about TAR unpacking?

    1. GitLab Pages can not use the domain of the GitLab server in any case for security reasons.
    2. I see a suggestion to use a custom Go app to serve the pages, why not use Nginx?
    3. Can we use build artifacts to upload/download the generated pages?
    4. FYI https://gitlab.com/jubianchi/labpages is something to make GitLab Pages, no idea about the approach
  • Author Maintainer

    GitLab Pages can not use the domain of the GitLab server in any case for security reasons

    I agree

    I see a suggestion to use a custom Go app to serve the pages, why not use Nginx?

    The PoC uses that. Using Go app would allow us to have automatic support for SNI, which some people are asking about it. However, it requires more work. For first iteration we can stick with Nginx.

    Can we use build artifacts to upload/download the generated pages?

    The proposal uses it.

    labpages no idea about the approach

    It looks like a Rails app that can receive hooks, updates some deployment directory and uses nginx to serve a content.

  • I've done something similar to that in Go, and the reason for using it instead of nginx is also that you can code some logic in it for example, to to dynamic host mapping (based on CNAMEs, for example).

    It's a lot easier to do that in Go, instead of rewriting vhosts and reload nginx, for every single page

    Edited by Gabriel Mazetto
  • @sytses can we use gitlab.io? I see that you have it registered. Using another domain will prevent XSS and will require a new wildcard certificate, if we want to have something like: <gitlab_username>.gitlab.io

    If we plan on support CNAMEs in the future, we could embrace https://letsencrypt.org/ for that

    1. Build artifacts nzip should be safe (prevent directory traversal)
    2. GitLab Pages will have separate domain name
    3. We'll ship with Nginx now, three options for the future: 1. Nginx (already done but dynamic SNI is hard), 2. Go (SNI is easy but more work than nginx), 3. HAproxy dynamic https termination (not sure if feasible)
    4. We'll use the gitlab.io domain name for now
    5. We'll need to beef up our infrastructure for this (DDoS protection)
    6. Would love to support https://letsencrypt.org/ but SNI support comes first
    7. @brodock can help with building the Go PoC if he wants
    Edited by Sid Sijbrandij
  • let's do it 👍

    can we put something like cloudflare in front of it, to protect it from DDoS and enable some CDN (as it's all static content)?

    Edited by Gabriel Mazetto
  • @brodock my thoughts exactly

  • @ayufan somebody looked in to tar extraction and directory traversal here not so long ago: http://www.openwall.com/lists/oss-security/2015/01/07/5

    Short version: use tar -x, no -P, and hope there is no unfixed old bug in tar on the server that does the extracting. Some tar implementations did wrong extracting symlinks.

    busybox had a symlink directory traversal bug fixed only this October http://www.openwall.com/lists/oss-security/2015/10/21/4

    libarchive (tar used by OS X, FreeBSD, I think maybe also used for cpio in RPM) had some overflow bugs fixed not long ago.

    In short, extracting untrusted tar files carries risk. But I cannot think of a better solution, and open source tar implementations have been probed for security holes recently.

  • @sytses about using gitlab-workhorse to serve static pages instead of NGINX: I like this idea because gitlab-workhorse (the Golang HTTP server) should be able handle it just fine, and because in my opinion it is a source of problems to have critical GitLab application logic in NGINX configuration files.

    The NGINX configuration file is 'owned' both by the administrator deploying GitLab and by GitLab itself. Some people refuse to install NGINX, or want to do their own special things inside the NGINX configuration. By pulling GitLab-specific parts of routing, serving static files etc. into gitlab-workhorse we create a better separation between the responsibilities of the GitLab developers and the system administrator deploying GitLab.

  • Author Maintainer

    @jacobvosmaer Thanks, that's what I needed :)

  • Author Maintainer

    Good, so it appears we only need to finish some missing bits and we have first iteration of working GitLab Pages.

  • @ayufan I also think it is good that we are using uncompressed tar. That should make it harder for somebody to create a 'tar bomb': a file that looks small when archived but fills up your hard drive when extracted.

  • Author Maintainer

    @jacobvosmaer In current PoC the tar is compressed.

  • @ayufan OK then I just added another thing to think about to your list: tar bombs.

  • if we use p7zip with either .zip or .7z cant we overcome most of the problems here?

    I've also read about tar having issues with unicode filenames.

  • I've done some investigations regarding to the archive format and compression...

    Here is what I found:

    https://www.gnu.org/software/tar/manual/html_section/tar_68.html

    gnu
    Format used by GNU tar versions up to 1.13.25. This format derived from an early POSIX standard, adding some improvements such as sparse file handling and incremental archives. Unfortunately these features were implemented in a way incompatible with other archive formats.
    
    Archives in `gnu' format are able to hold file names of unlimited length.
    
    oldgnu
    Format used by GNU tar of versions prior to 1.12.
    
    v7
    Archive format, compatible with the V7 implementation of tar. This format imposes a number of limitations. The most important of them are:
    
    The maximum length of a file name is limited to 99 characters.
    The maximum length of a symbolic link is limited to 99 characters.
    It is impossible to store special files (block and character devices, fifos etc.)
    Maximum value of user or group ID is limited to 2097151 (7777777 octal)
    V7 archives do not contain symbolic ownership information (user and group name of the file owner).
    This format has traditionally been used by Automake when producing Makefiles. This practice will change in the future, in the meantime, however this means that projects containing file names more than 99 characters long will not be able to use GNU tar 1.28 and Automake prior to 1.9.

    If we use tar >= 1.13.25 we are good to go in regards to size limitations.

    I've also made some tests with 7zip own archive extension that could be used as an alternative when we don't have tar >= 1.13.25. It's a bit slower but has support for encryptation (with may or may not be useful)

    .7z can work with many compress algorithms, by default it uses lzma but support others.

    Here is a small benchmark I did to validate .7z as an alternative:

    generating 10 random files with 2Mb each
    
                            user     system      total        real
    tar+gzip                0.000000   0.010000   0.690000 (  0.727700)
    7z default              0.000000   0.000000   5.600000 (  3.015612)
    7z optimized            0.000000   0.000000   3.060000 (  3.231693)
    7z optimized + deflate  0.000000   0.000000   0.840000 (  0.905711)

    Code to run it: https://gitlab.com/snippets/11184

    Edited by Gabriel Mazetto
  • Author Maintainer

    @brodock

    I would stick to tar, since we have always have uncompressing easily implemented in the app.

    @jacobvosmaer

    The tar bomb can be handled by splitting gunziping and untarring. If we limit gunzipped stream passed to tar we have control how big sites we handle.

    Example how it can be handled with shell command:

    gunzip -c test.img.tar.gz | cstream -n 100 | tar...

    The early closed stdout of gunzip makes the gunzip to return 141.

    Edited by Kamil Trzcińśki
  • @ayufan yes I was thinking the same about how to stop the tar bombs.

  • This will go into EE, not CE as discussed in https://dev.gitlab.org/gitlab/organization/issues/474

  • Sid Sijbrandij Title changed from [WIP] GitLab pages to [WIP] GitLab pages (EE only)

    Title changed from [WIP] GitLab pages to [WIP] GitLab pages (EE only)

  • Is there a chance that this will find it's way into CE at some point then?

    Because it wouldn't speak in Gitlab's favor if a feature that is for everyone to use on Github is limited to paid customers on Gitlab

  • @neico everyone will be able to use this feature for free on GitLab.com (which runs GitLab EE). Meaning you can host your personal site completely on our costs, similar to GitHub.

    Only if you run a self-hosted instance will you be required to use GitLab EE to have this functionality. We think this is mainly interesting for organisations with many resources to share, which is why we chose to make this EE exclusive.

    Edited by 🚄 Job van der Voort 🚀
  • Hello all!

    I'm the author of the previously linked labpages app.

    Basically it was something I wrote (actually ported and improved) to learn ruby/rails and it is used since then internally where I work.

    The app is working great but is quite simple: it's a simple webhook listener which will trigger sidekiq jobs to deploy static content.

    It uses plain old git to clone repository and fetch the gl-pages/gh-pages branch (yeah, at that time I was already thinking about users movi g from github ;)). It will then build the website using jekyll if a configuration is found or serve the content as is.

    Websites are then accessible using URLs like .labpages.local/.

    The only difficult things I had do deal with were storage sharding and requests routing. I did not implemented custom domain name.

    Nothing complex nor advanced: what you are planning here seems more powerfull, easier to use for end-users thanks to gitlab CI and easier to operate (I had many edge-cases with deploy keys, moved repositories and so on)

    Big 👍 for this feature. Let me know if I can help for anything ;)

  • Kamil Trzcińśki Status changed to closed

    Status changed to closed

  • mentioned in issue gitlab#3927

Please register or sign in to reply
Loading