I think about something different. Maybe it would be nice to have on right side the ability to browse the content of files. You click the image. You see the preview on right side. You can do it without the need of downloading the file.
I can think of that some people will start using artifacts to upload logs, capybara generated pages (.png) or other files that allows to ease the debugging.
It's currently not possible, because artifacts are sent only on success, but it maybe useful in the future.
@ayufan Yes, I agree this may be a useful feature. It would be then possible to embed pictures into READMEs etc. And this is relatively easy to introduce.
@DouweM Yes, I think we do, but we would need to determine content type using an extension, which may not be accurate in some cases. But, maybe, it is enough.
Removing Compress to column in gitlab-org/gitlab-ce!2509. This issue will be used to track implementation for displaying a content of a file in a browser (like git blob view), so merging !2509 (merged) will not close it.
Grzegorz BizonTitle changed from Decide which metadata fields are useful to display in build artifacts browser to Displaying content of a artifact inside a browser instead of downloading it
Title changed from Decide which metadata fields are useful to display in build artifacts browser to Displaying content of a artifact inside a browser instead of downloading it
Grzegorz BizonTitle changed from Displaying content of a artifact inside a browser instead of downloading it to Displaying content of an artifact inside a browser instead of downloading it
Title changed from Displaying content of a artifact inside a browser instead of downloading it to Displaying content of an artifact inside a browser instead of downloading it
...but we would need to determine content type using an extension, which may not be accurate in some cases. But, maybe, it is enough.
@grzesiek Perhaps using an interface to libmagic (e.g. filemagic, or mimemagic) to determine the file's MIME type would be best.
For example, I would be interested in viewing linker-generated map files right in the artifact browser. These are just plain-text files so there's no reason they need to be downloaded to view.
It would be nice to display contents of an artifact inside a browser, like git blob view.
This was the most interesting part of the ticket for me. In particular, for HTML and image assets, just show them in the browser. A lot of build tools support the generation of HTML reports (e.g. Behat/Cucumber, code coverage tools) and accessing these without download from the artifact area is helpful.
Related to this, it would be helpful if relative links between files in HTML documents still worked. In practise, this means mapping the folder structure directly to the artifact URL structure, which appears to already be the case, so you should be fine. This will keep image references in HTML files working.
Yes, please! JUnit tests output to build/reports/test, and while I can publish the artifacts, the build/reports/test directory has an HTML representation of the test results, but the Content-Disposition header in the response is keeping it from functioning in a usable way. Browsers are smart; I'm not sure why the header is necessary in the first place.
Comparing the two discussions: gitlab does have a functionality to display static content from a different domain (pages). Could the same machinery be used to safely render artifacts as a more broad use case?
+1 for this one, this one is keeping us from moving to GitlabCI from jenkins.
As a side note, issue #13227 (moved) is a more flexible solution to what is essentially the same issue, and would work as well
FYI: There are lot of users essentially clamoring for this feature, but doing so at the much more complex issues (#18664 (closed) and #13227 (moved), amongst others). I think having this would help to satisfy the people like myself, who really just want to be able to browse generated HTML reports without leaving the browser.
+1 vote
We are using gitlab and we would like to see the static contents created from code coverage reports tools (phpunit for example) from artifacts section without downloading it or setup pages.
I really wish GitLab would stop pretending like Pages is a reasonable workaround for this issue.
It's EE only. Yes, there need to be EE-only features to continue to fund GitLab development. But when GitLab already supports collecting these files, paying for an EE subscription, and adding a ton of configuration to simply see them is overkill.
There's only one visible version of a Gitlab Pages published page. Obviously, users will want to see their test/coverage results for branches that have not yet been merged to master - that's the whole point.
I understand that users tend to pick a feature that they really want, and then demand that GitLab devs implement it. But if I had to pick a single feature that I personally see on the issue tracker that people want, this is it.
Honestly, I don't think this feature would be much work for a moderately experienced Rails developer. As I mentioned, a first step would be to simply use libmagic to set the Content-type header in the response, and let the browser decide whether to display or download the file. I see no reason to force theContent-Disposition: attachment header at all. In the future, you could get fancy and display stuff in an iframe, or with custom renderers for specific file types. But for now, avoiding the painful step of downloading a file just to see a report would be a huge win.
@jonathon-reinhart +1. Though displaying page content right on the front page would be nice, the change you suggest would deliver most of the value with the least work.
Another really nice, and probably really cheap change would be to make an endpoint that links to the latest version of an artifact. As a workaround, I tried to put links to most recent artifacts into the readme.md-- which would make it easy to find things. But surprisingly, I found there is no way to create a static link that will always point to the latest version of an artifact. This is also possible in jenkins
It's all very frustrating. We currently use Jenkins and i'm trying to champion moving to gitlab. This missing feature causes nearly every dev I meet to throw up his/her hands and say "well we cant live without that-- viewing sonar and coverage and is kind of like-- the most important thing!
Pages doesn't have this issue because it's on a separate host.
I am not saying that we shouldn't do this (it's in our %Backlog and it's also requested by our EE customer), but it's not that simple and easy to implement.
@godfat You're pointing to the solution: make a separate host for the artifacts; putting them in a common location and having us configure another virtual host (the name of which we supply you to make the links work out) alleviates your security concerns.
In the meantime, I may just have Apache remove the Content-Disposition header to solve the problem myself.
thanks guys, yes, i should have clarified: there is no way to get a link to the latest artifact that works from within gitlab-ci itself, without gitlab-pages or another host or other such configuration
Would it be possible to mitigate cross-site scripting risk by setting an appropriate Content-Security-Policy instead of Content-Disposition?
@lksv's issue #23618 (moved) is closely related to my #13227 (moved), but these are both very "big" goals. I think no matter what is implemented there, we will always have users with the desire to simply view artifacts directly in the browser.
# /etc/gitlab/nginx-tweaks.conf# letsencrypt ftwlocation /.well-known/acme-challenge/ { root /var/www/html/; try_files $uri $uri/ =404;}# workaround for https://gitlab.com/gitlab-org/gitlab-ce/issues/10982# because it's very painful not to be able to see Robot Framework HTML reports with embedded screenshots directly in the browser# (and if you download, the filenames get suffixes because conflicts and then embedded links break)location ~ /(myorg)/(myproject)/builds/[0-9]+/artifacts/file/(robottests/output/) { proxy_hide_header content-disposition; # and we have to duplicate this bit from the main "location /" part because I don't know nginx proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Forwarded-Ssl on; proxy_pass http://gitlab-workhorse; location ~ /(myorg)/(myproject)/builds/[0-9]+/artifacts/file/(robottests/output/(client|server).log) { # because application/octet-stream files are downloaded, not displayed in the file add_header content-type text/plain; # OH COME ON proxy_pass http://gitlab-workhorse; }}
You will want to replace the (myorg), (myproject) and (robottests/output) parts to match your projects and file paths. I was being excessively cautious, perhaps (although I do trust all the committers for this project not to inject malicious javascript).
Is there any update on this? especially for self-hosted instances you could just add a toggle in the /admin to not send the Content-Disposition with a security warning next to it or something; we trust what we commit, and we're moving our entire flow to gitlab as we speak; without realising this was an issue.
Else we'd have to implement ways to expose them while still being secured behind a login, which would break our cohesive experience we were going for with gitlab.
p.s: isn't this missing the "feature-request" label?
@jonas1 See the workaround right above your comment. It uses NGINX to strip the Content-Disposition header added by GitLab. This doesn't change the fact that you still have to be auth'd to see the artifacts.
@JonathonReinhart I realize, however we're using githost.io which doesn't allow us direct tinkering with the system, additionally: this should be a trivial change :)
@jonas1 - yes it seems it should be a trivial change but "we have a workaround" and apparently gitlab (even for paying customers) is not interested in making trivial changes if there exists a half-assed workaround. I'm not suggesting users should stop posting workarounds because it does create a stopgap solution. The downside is the stopgap becomes permanent and the gitlab response becomes easy: "we have a workaround, see 'here' or 'here'". They then apparently go back to working features they enjoy developing instead of working features we all would enjoy using.
@anton-akhmerov That MR appears to be for using "blob viewer" to look at images and PDFs. If it doesn't allow one to view generated HTML documents (along with images, CSS, etc), then this issue is not closed.
Any updates on this? This is a great feature to have. Especially, when we deal with protractor e2e or webdriverio e2e test, which have mochawesome , jasmine html reports or allure reports which are static files and html report page.
@markglenfletcher the new changes from v9.2 for viewing artifacts within browsers does not support for html files. Given my job run and generates code coverage reports - static files in HTML format (with JS and CSS), still can not view the contents.
@leomyx I am having the same issue. That is what I was trying. It has option to download but nothing to view. Gitlab Pages can be a good option but is not sensible. I can even push my reports to wiki but that also is not a good way to show your reports.
I see that GoCD has such option where we can create a custom tab for such reports.
Is there any plugin for Gitlab which can help view artifacts with static html reports?
Here is the link. Instead of displaying the file in browser it says something like below:
The source could not be displayed because it is stored as a job artifact. You can download it instead.
The reason why you are seeing this is because html is also not rendered in gitlab otherwise. The reason for this is the need for sanitization since otherwise it's a big security loophole.
Why there are not this security concerns with gitlab pages? If we push the html files as pages, we get a way of browsing the html files. The problem is that pages are public and we cannot restrict them to team members only. And some of us may not want that all our project's artifacts are public.
This is a rightful concern, and using a separate domain is the only reliable way. And that also points to an easy mechanism for solving this problem: Reuse the pages-domain. You may have to use a specific hostname ("assets.acando.io" or "assets-22691280758786573496.acando.io" or even a user configurable hostname) to avoid conflict with existing projects, but as long as assets are served through the pages-domain, you're home free.
@anton-akhmerov and @elygre are right: We cannot render user HTML on the main GitLab domain.
Using assets.gitlab.io if a Pages domain is configured is an interesting idea, especially since assets is already forbidden as a group/user name, so this will not conflict.
This feels like an "Abbott and Costello" skit that always comes back to Pages ("Who's on first? Pages. Who's on second? Pages. Who's on Third...").
Pages has "no extra configuration"? We aren't using Pages now so I believe there will be extra configuration otherwise direct me to the documentation for the non-configuration Pages setup (oh, and how to enable this Pages setup to support multiple parallel builds on multiple parallel branches).
As with other self-hosted gitlab customers, we aren't concerned with limitations imposed on gitlab.com. We want to render our own generated HTML on our own server and gitlab.com should not be concerned with that. We trust ourselves and accept the consequences if that trust is misplaced... and we've been doing just that for nearly 10 years on a jenkins deployment that is still operational has yet to be taken down by auto-generated doxygen code.
What is the official justification/reason for "We cannot render user HTML on the main GitLab domain"? In other words, is this an official gitlab.com policy or an unofficial discussion-thread stance? If it is an official policy, it explains the reluctance to create an integrated feature instead of a good-enough-for-us-good-enough-for-you non-integrated Pages workaround.
I'm not a web developer but I find it surprising there is no sandbox available for safely rendering static HTML. We have self-driving cars that can safely navigate public roads but can't safely render static HTML?
To address gitlab.com concerns with rendering HTML, suggest an ability to disable the integrated HTML viewer (i.e. don't allow enabling it on gitlab.com).
We are using gitlab EE and we have our own domain and servers to host gitlab EE, it is an internal use, so I think the feature should be available for EE customers.
@ayufan just a remark: if a sandboxed iframe is secure enough, it could also then be used for svg and pdf, therefore simplifying the codebase. I don't understand what would control access to the assets that way though.
Now, the suggestion is not to use the GitLab Pages feature, but only the extra pages-domain. As you point out, GitLab Pages cannot handle the functional requirements of multiple builds and branches. The idea is only that serving static assets require a separate domain, and Pages already has that.
At least if it is set up. If you do not use Pages, then this feature requires some setup. True.
As for serving trusted content, I guess I understand the GitLab stance that this is a no-no, even for private installations. An extra domain is both easy and cheap to get, and GitLab Pages is then fairly easy to set up. I know, I managed to do it myself :-). To me, the trade-off between having a separate domain (but reused) and the risks involved indicate that a separate domain is worth it. YMMV.
@ayufan The one-time URL would also have to create a session of some sort. Many of these reports include other assets (scripts, images) from the same protected location, and I would also expect to be able to navigate links from the original asset, also into the protected location.
@ayufan The problem with sandboxed iframes is that older browsers that do not support the feature will render
the iframe without sandboxing. Are we comfortable implementing a feature that only leaves X% of users vulnerable? Or can we determine support and enable/disable per-user?
There's also the problem of authentication which I think you've already addressed.
I still feel like a separate domain is the right way to do this. Even if it's a bit of a hassle to configure.
@briann I'm worried that separate domain will not be implementable for most of the on-premises, only for us, so we solve a problem for us, but not for our customers. If we can reliably detect sandbox, we can simply enforce to use a quite new browser, which in the end is a good idea as it improves the security of our users overall.
I think that the main question is which browsers are supported, and compared it to us how many of our users do use them. If it is over 95% I don't think that it is worth of hassle of pushing the separate domain to the application stack.
@markpundsack@bikebilly What is your opinion on limiting support for that only to new browser (we don't yet know exact numbers, but I will expect that this is majority of users).
I'm with @ayufan here. To me, using a separate domain has always felt like a hack, a workaround for a missing browser feature. There's a lot of extra complexity involved with setting up a separate domain. This might be especially painful in enterprise scenarios where the person administering the GitLab server is not the person handing out DNS names.
@DouweM if pages can be private that would definitely solve it, IF pages also get supported on githost.io; otherwise you're leaving hosted EE customers out on something that would be most useful for them (as they're less likely to allow public viewing of such pages)
besides this, I really think it should be considered to allow it to be hosted on the same domain for EE, with a security warning marking the potential risks
I'd agree with @bikebilly in that the whole purpose of iframe sandboxing is to provide this type of security. So I don't think we should worry ourselves with it not being implemented properly by browsers.
@andy.helten@ayufan@anton-akhmerov@briann@bikebilly Unfortunately, using the sandbox attribute is not an alternative to using a separate domain, and will not effectively protect the user against cookie-stealing HTML/JS.
As written in the awesome Mozilla docs for <iframe>:
Sandboxing in general is only of minimal help if the attacker can arrange for the potentially hostile content to be displayed in the user's browser outside a sandboxed iframe. It is recommended that such content should be served from a separate dedicated domain, to limit the potential damage.
While sandbox prevents the page inside that iframe from accessing the cookies of the parent page while its loaded inside that iframe, it doesn't actually prevent anything if that same page is loaded outside the sandbox iframe, like in a separate tab.
If there was a way to detect on the server side if a page is being requested from inside an iframe, we could only render the HTML then and force a download or return 404 otherwise, but as far as @ayufan and I can see, there is no way to do that.
So unfortunately, sandbox is not going to help us.
The proper solution here is to use a separate domain, as discussed earlier. Since that domain will not have access to the cookies from the main domain, authentication is still something we would need to figure out, but potential options are to use a job-specific token that's impossible to guess, just like we do for uploaded files, or even a job-specific token that is hashed along with some kind of timestamp to generate a token with a validity of, say, only 5 minutes.
As with other self-hosted gitlab customers, we aren't concerned with limitations imposed on gitlab.com. We want to render our own generated HTML on our own server and gitlab.com should not be concerned with that. We trust ourselves and accept the consequences if that trust is misplaced... and we've been doing just that for nearly 10 years on a jenkins deployment that is still operational has yet to be taken down by auto-generated doxygen code.
What is the official justification/reason for "We cannot render user HTML on the main GitLab domain"? In other words, is this an official gitlab.com policy or an unofficial discussion-thread stance? If it is an official policy, it explains the reluctance to create an integrated feature instead of a good-enough-for-us-good-enough-for-you non-integrated Pages workaround.
To address gitlab.com concerns with rendering HTML, suggest an ability to disable the integrated HTML viewer (i.e. don't allow enabling it on gitlab.com).
@andy.helten Adding a checkbox to render arbitrary HTML on the same domain is effectively adding a checkbox to allow XSS, to allow any user on the system to steal any other user's credentials and gain access to their private data, etc. Even if an on-premises GitLab CE/EE admin may trust their users enough to effectively give them that access, I'm hesitant to add an option like that, even if it's off by default and has tons of warnings around the button to toggle it.
it is "official policy" to not make it too easy for users to shoot themselves in the foot :)
@DouweM Is right. I hadn't considered the impacts of directing users to the content outside of an iframe. We could restrict access to the content to a single individual but that's just a hack and probably wouldn't satisfy those who need this functionality.
@briann A single individual, like the creator of the pipeline, the pusher of the commit? That makes the artifacts basically useless to anyone else, unfortunately.
Not being terribly familiar with the problem, I'm surprised there isn't a well-supported HTTP header that says "access to cookies forbidden on this response" which the GitLab server could send.
@DouweM@briann Restricting the content to a single individual would not really fix the security issue: Mallory would commit an update to the build system, which generates reports with security problems. Alice and Bob would commit innocent changes, each loading the malicious test report generated for their builds.
@jonas1 and others: The suggestion is not to use GitLab Pages. Pages cannot handle the multi-dimensional nature of the problem, where each build generates a new set of artifacts, which all need to be available in parallel. However, Pages exposes the same security challenge: User generated must not be allowed to interfere with the GitLab application itself. And, Pages solved it, by requiring a separate domain. This may feel inelegeant, but it is the recommended solution by e.g. Mozilla, as pointed out by @DouweM above.
So, the solution is to serve build artifacts on another domain than GitLab itself, and the suggestion is to reuse the GitLab Pages domain. This does not expose any additional configuration requirements above those required by Pages, and since other requirements in GitLab actually guarantees that certain hostnames in the GitLab Pages domain are free, then artifacts can be served from one of those hostnames.
When discussing that with @DouweM we came to conclusion to generate a unique pre-authorized URL that would allow accessing artifacts for a limited time without any authentication.
This would probably be work for the separate daemon, just to build on top of microservice architecture.
@elygre It will work as long as links are relative. The other way is to use dynamically generated domain names to support absolute paths. This is interesting, because it brings me to extending GitLab Pages to be much more flexible in terms how we show data.
Fascinating thought about dynamicall generated domain names! Though I don't think this will be required; these tools already assume that the reports may be shown inside e.g. Jenkins, where the path changes depending on project names.
After some research to better understand the issues (thanks for the github links), I'm now in agreement that the use of something like Pages on gitlab.com is the right approach for gitlab.com. However, I'm still not convinced it is the right approach for self-hosted EE customers. Or maybe I'm not convinced it should be the only solution made available to self-hosted EE customers. Indeed, as discussed previously, Pages is not a solution -- it's a partial workaround.
On the other hand, a checkbox-enabled feature that allows self-hosted EE customers to display build-specific static HTML directly from within their gitlab deployment is a solution. This is what we want. Period. Go ahead and put it in an iframe for increased security. In fact, do everything you can to provide a safe way to render static HTML from our gitlab server. We are fine with security, we just don't want it to prevent or slow down progress.
The arguments against self-hosted EE customers rendering HTML on their own gitlab server are along the lines of:
Protect the customer from their own incompetence.
Protect the customer from their own employees.
These are both bogus arguments. An employer already entrusts an employee with a gitlab account, gives them an email account, gives them access to numerous other computing resources, gives them an identification badge, and probably even gives them keys to the office building. The employer let's them print to the printer, make copies, maybe even send faxes. But when it comes to rendering static HTML, that's where we draw the line on trust? The point is: if an employee will purposely insert an XSS attack into your project code for the purpose of attacking/compromising his own co-worker's gitlab accounts, then you have much bigger problems. In other words, if an employee wants to do damage to their own company, stealing a co-worker's gitlab credentials is probably not high on their TODO list. And, BTW, if it is high on their TODO list, they will almost certainly get the gitlab credentials by some other means!
Bottom line: we want the ability to display our doxygen and code coverage directly from the gitlab build page. We want the ability to shoot ourselves in the foot. If gitlab wants to stop their customers from shooting themselves in the foot, they'll need to unplug our servers and disconnect our Internet. Even then, some of us will find a way to shoot our own foot.
BTW, it occurred to me that gitlab does in fact already render static HTML by virtue of displaying markdown as HTML. Is this HTML made safe by the markdown->html converter or by some other means?
@andy.helten My #1 (closed) gripe with GitLab (and I am talking about an on-premises install) is the lack of artifact browsing capabilities. I've been fairly vocal on this, in a number of issues on a number of GitLab projects, and I even have a merge request on the web-repository to have them admit that Jenkis is better than GitLab in this respect. (Somewhat surprisingly, given their normal openness, that one has not been merged).
I see the situation as follows:
Using GitLab Pages is not a solution. I think the people of GitLab realizes this.
Reusing the GitLab Pages domain is a solution. It solves the security issue, but it adds the requirement of registering a domain and setting up a DNS entry.
Serving assets directly is another solution. It removes the overhead of domain and DNS configuration, but also removes any security safeguard.
Whether GitLab should allow you to disable this security or not is a somewhat interesting question. I guess the answer is yes, but perhaps only if it also allows you to actually run in a secure manner. I believe that creating new features that will only run in an insecure manner is a slippery slope. New features are more easily created without respect to security, and hey, the customer chose to run GitLab, right?
So my preference is to definitely do the first of these, and optionally the second:
Securely serve assets from the GitLab Pages domain, if it is configured.
Enable a GitLab configuration to serve assets insecurely on the main domain, if GitLab Pages is not configured. (Perhaps call this configuration option "enableDangerousAssetServerOnMainDomain", feel free to require the configuration option "dangerousAssetServerOnMainDomainSignedOffBy: email-address", and feel free to have a big old red warning sign on the administration page.)
+1 from us too. All we want to do is expose code coverage reports from Gitlab CI. This seems such a trivial and obvious requirement, and it's absolutely basic in Jenkins and other tools we've used. Obviously our CI is company-internal, so as @andy.helten has explained there is no security issue here.
As I understand it, all that is required is a checkbox in config (under a red warning sign if you like, or guarded by a Gitlab configuration option which defaults to "off") which turns off the adding of Content-Disposition when serving assets.
Re-using GitLab Pages domain sounds like a great hack, but I can't help but wonder why we wouldn't just use a different, separate domain specifically for artifacts or other raw content? IIRC, GitHub serves raw content on another domain (separate from Pages).
Also, yeah, I can see the strong argument for letting on-prem installations re-use the main domain rather than using a separate domain, despite strong warnings otherwise. We wouldn't do that for GitLab.com, of course, and wouldn't recommend it for any enterprise serious about their security, but for companies just starting out with GitLab, or that totally trust their internal security, then sure.
But also, how hard is it to buy another domain and set up DNS? The biggest problem is with enterprises where the paperwork to get the domain wouldn't be worth it, but wouldn't those be the same enterprises that would need the security?
First, all of us that desire this feature are happy to have the debate because it implies a small chance we get a feature that meets our needs. I don't mean to imply expertise in the proper implementation of that feature but I do know the feature will at least:
Be easy to setup and use
Support build-specific plus branch-specific HTML content (e.g. Debug build on branch feature/blah1 and Release build on the same branch)
Serve that HTML content at a known URL (not dynamic) so we can link to it
Cleanup the HTML content when the branch is removed
Second, we should stop referring to this as an "insecure setup". It can easily be argued that the most secure setup is one that doesn't include any active accounts and is not accessible via the Internet. We can then agree this is not a useful setup. We can further agree that giving an account to every user that registers for one (i.e. gitlab.com) and then allowing that user to serve HTML from a shared server (gitlab.com) is definitely not secure. Fortunately, that's not what we are asking for.
A company generally hosts their own gitlab server because they desire more security and control than is possible with hosting at gitlab.com (i.e. a server to which we don't control access). At least, that's our motivation. If one of our employees should desire to release all of our proprietary source code to wikileaks, well, there's not much we can do except fire that employee. Needless to say our code is probably less interesting than NSA source code but, either way, this scenario (a malicious employee) is a much bigger insecure setup than one that allows trusted users to serve doxygen or code coverage via static HTML. Or is the scenario envisioned by allowing a malicious employee to serve XSS-infected code worse than releasing proprietary source code? I don't think so but maybe I'm missing something?
My question about security issues with gitlab.com serving static HTML via markdown files (README.md) was not answered. Is this a security concern? I'm not sure given my limited knowledge of the capabilities of XSS but it seems like a potential security hole.
Re-using GitLab Pages domain sounds like a great hack, but I can't help but wonder why we wouldn't just use a different, separate domain specifically for artifacts or other raw content
There is some level of friction (cost and work) related to getting a new domain, in particular if you also want a proper SSL certificate. This work has already been done for those who runs Pages, and I thought this was a "nice trick" to reduce the setup costs. And if there are no drawbacks, then why would you not? Anyway, the important takeaway is this: We really, really want this feature. It is an obvious feature, with lots of supporters and little action. This was just another idea meant to reduce whatever obstacles are stopping the GitLab team from implementing this feature.
But, in the end: Feel free to require another domain, if you want. Feel free to then automatically support letsencrypt.org for automatic certificates. But, please do not delay this feature, if reusing the Pages-domain means that you can deliver it sooner.
First, all of us that desire this feature are happy to have the debate because it implies a small chance we get a feature that meets our needs.
First, this. A thousand times this. As customers, we have chosen GitLab, we spend time and money getting it to run, and now are also investing our time in making it better. We do this only because we think this is an important feature!
Earlier in this thread, I've argued that it is acceptable to require the use of a separate domain. I believe this is a security issue (sorry, @andy.helten, we'll have to agree to disagree on this), and I will not enable this feature on our corporate project hosting platform without knowing that it can be done securely. Yet, here I am not being fair, because the actual project team where I spend most of my time runs a project-specific Jenkins setup, serving artifacts on the main domain. And if I were to run a project-specific GitLab instance, I would also be happy to run it on the main domain. So while I think this is a security issue, I would willingly and knowingly make a trade-off on this.
But this is a strawman, because this is not a difficult problem to solve. If you were to decide that this was an important feature, supporting both "main domain" and "separate domain" configurations is not difficult. Supporting "main domain" happens by default, supporting "separate domain" requires programming, but is a known problem. You already solve it for GitLab Pages, so it's not a feature that needs inventing.
It is a Nike-problem: Just do it. If you do not, it is not because it is difficult, but because you don't think it is important. But it is.
I totally agree with the previous post. But the most important thing is:
There needs to be some progress on this. This seems like such a fundamental
feature, that I can't understand why it is not on the roadmap for the
next release. To me it feels like there has been no real (speak important)
progress in this discussion for the last few weeks/months, because all the points have already been
discussed over and over again. To me it feels like this issue is at a
stage where there really just needs to be a decision from one of the
responsible people from gitlab. And then this feature just needs to be
happening!
@andy.helten All user-supplied HTML, whether it's raw HTML or converted from Markdown/Markup, is run through a sanitization filter to remove any dangerous content.
@briann - is this filter appropriate for sanitizing other HTML such as the type we are discussing here, such as, auto-generated HTML from doxygen, code coverage, etc? Or would it render such HTML unusable? But if it is good enough for sanitizing markdown/html on gitlab.com seems like it would be good enough for sanitizing user-generated (auto-generated) HTML on a self-hosted server.
@andy.helten I haven't tried it against output from those tools but I imagine it would render it unusable or at least difficult to read. The HTML allowed in markdown is a tiny subset of HTML.
My question to all experts here, who are discussing on security issues with a simple rendering of html static file => Is it really difficult to implement this?
I believe that Gitlab CE is open source so creating a branch with the solution could be great for testing.
Or if someone can provide solution which can be implemented on our own for on-premise/self-hosted setups?
So the risk part is then individual concern.
I do agree with @jzielke , please show us some progress or create a guide to perform it on our own.
Create some settings for personal preferences. Please do something.
From a security perspective, the requirement is that the html static files are served using a different domain. Whether that is difficult to implement or not probably depends on your perspective, though it may be relevant to know that GitLab Pages has solved this problem in this way.
The technical challenges are well understood, and the only thing currently lacking is the will to do it. There is no hiding behind "don't understand", only "don't want to", or at least "want other things more".
@elygre ok so you mean to say that it is not priority as of now? It is really very important stuff. GitLab pages seem to have a completely different target, while artifacts which contain reports target a different situation.
For me different domain is out of question. Hoping that this feature comes with some user setting.
@rahulraut I'm only a customer, and don't speak for GitLab. But, given that the technical problem is understood, the only reason left would be priority.
GitLab Pages has a completely different target, and cannot be used to solve this functional requirement. However, it presents the same security challenges: by solving the problem for GitLab Pages, they know how to solve it for this issue.
The only way to guarantee that there are no security issues is to use a separate domain. Allowing a solution without a separate domain is possible (and even easier to implement), but GitLab is reluctant to leave this hole open. I personally understand GitLab's position, but a lot of others do not. This, too, is a product management question: What should the product look like?
In all cases product should not be insecure by design. We all agree about the final result: allowing to view HTML artifacts. I don't think that in any way we will support insecure way as this matters to small and as well large installations, like GitLab.com.
Actually, it doesn't sound like there will be any solution anytime in the near future. Which is why this Sprint we are working a "Setup Jenkins" Story. If we take the time to find a replacement for the merge request functionality, we suddenly have very few reasons to continue paying for gitlab.
I can appreciate your stance on security but I disagree that this feature is "insecure by design" when it is running on a self-hosted server that is accessible to trusted users only. Consider that it would be just as easy to insert malicious code into downloaded html (e.g. phishing) so gitlab might already be considered "insecure by design" by virtue of allowing downloads.
@andy.helten I agree with you. Security reasons which many guys are giving here seem to me bit irrelevant. I was talking about choice. I think orgs should have the right to make decision what they or what they do not. I am not personally interested in the Pages stuff for displaying reports. I am rather interested in the viewing of html report. One more query is a simple html without javascript will also have security issues?
Is it not possible to disable part of the HTML file which can pose security issue? I am not an expert in security, so please ignore technicality.
@bikebilly@ayufan I think it makes sense to use a fixed subdomain of the pages domain - this is much less work for people who already have pages set up, and no more work than pages for people who don't.
It may also make sense to use the pages daemon itself.
The reason why I suggest doing this with the pages daemon, rather than letting workhorse and rails take these requests, is because we already have extensive discussion and a plan to extend gitlab authentication to pages here: https://gitlab.com/gitlab-org/gitlab-ce/issues/33422 - once implemented, we could use it to provide authentication here as well.
If we could use artifacts. instead of assets., I'd be happier - the latter could be misinterpreted.
We should not hijack aritfacts. or assets., but it should be per-project domain. So if we have pages already, we could forward the request to the corresponding domain:
Since, a reading archive is time-consuming operation, maybe pages should fetch the archive and store it in a temporary directory, because if switch to ObjectStorage it will be "quite demanding" operation to fetch multiple files.
We should not hijack aritfacts. or assets., but it should be per-project domain
This opens the possibility of conflicts with actual pages content. Now that I think a bit harder, though, you're right that we don't want to use the same subdomain for all these unrelated projects - that's just as bad as having it on gitlab.com directly. Ugh.
maybe pages should fetch the archive and store it in a temporary directory
Quite possibly, but we don't need this for an MVP, so I left it out.
On the object storage point, putting the caching in gitlab-rails or workhorse would allow the existing browse and download functionality to benefit as well as this pages-based suggestion, so perhaps it would be better there anyway.
Moving these artifacts to the Pages domain itself could be done, but then are we fine making these artifacts public? If not, authentication would have to be used and that would be a big security hole for Pages domains.
They must obviously be kept private (that is a key requirement). Your assumption that authentication leads to any security hole, big or small, is questionable.
You can substitute "user-provided scripts and content" with "scripts and content provided by the site owner". All content on the Pages domain is provided by the site owner. On any web site, for example built by hand and published on a aws-hosted server, the site owner can publish malicious content. This also applies to content published using GitLab Pages.
The key thing to ensure is that the session information used by (and provided to) the GitLab Pages content, and it's domain, is usable only on that particular domain.
@nick.thomas@briann As one of the more vocal customers on this issue, let me be on the record with this: A solution which provides unauthenticated access to build artifacts is not one that we can use. For us, it does not satisfy the "viable" part of MVP. You may have other customers that this is sufficient for, but my understanding from this thread, and other similar ones, is that keeping content secure is a minimum requirement for most users.
(On a related note: If you intent to go forward with a very limited implementation, please open another issue. It would be too bad if we lost this issue, with all its discussion and all its comments, for a limited solution. Having, and closing, a separate issue will more easily allow us to maintain the history of this one)
We'd like to pick up a first iteration of this in 9.5, but it will likely be limited, probably only for public projects, for example. Since this won't solve the problem for a lot of people, we're going to create a new issue to track the reduced scope for 9.5. @bikebilly Can you update here when you have the new issue and scope?
The other issue looks like it's for public artifacts only so I'll post this here.
I had some ideas about how this could be done safely without requiring separate authentication for the artifacts domain and all the risks that go with mixing that and unsanitized user input.
If links to the artifacts are kept on the primary domain, the artifact content on a new user-specific Pages-style domain, and a one-time or expiring token is included with the request for each artifact, it would effectively restrict access to authorized users based on the project visibility settings. If links to the artifacts need to be shared then the primary domain link could be used.
With this being scheduled for 10.1, have you decided on a technical solution? Or, will this mechanism be reusable for #33422 (moved) (Make GitLab pages support access control), which has pretty much the same needs?
GitLab is moving all development for both GitLab Community Edition
and Enterprise Edition into a single codebase. The current
gitlab-ce repository will become a read-only mirror, without any
proprietary code. All development is moved to the current
gitlab-ee repository, which we will rename to just gitlab in the
coming weeks. As part of this migration, issues will be moved to the
current gitlab-ee project.
If you have any questions about all of this, please ask them in our
dedicated FAQ issue.
Using "gitlab" and "gitlab-ce" would be confusing, so we decided to
rename gitlab-ce to gitlab-foss to make the purpose of this FOSS
repository more clear
I created a merge requests for CE, and this got closed. What do I
need to do?
Everything in the ee/ directory is proprietary. Everything else is
free and open source software. If your merge request does not change
anything in the ee/ directory, the process of contributing changes
is the same as when using the gitlab-ce repository.
Will you accept merge requests on the gitlab-ce/gitlab-foss project
after it has been renamed?
No. Merge requests submitted to this project will be closed automatically.
Will I still be able to view old issues and merge requests in
gitlab-ce/gitlab-foss?
Yes.
How will this affect users of GitLab CE using Omnibus?
No changes will be necessary, as the packages built remain the same.
How will this affect users of GitLab CE that build from source?
Once the project has been renamed, you will need to change your Git
remotes to use this new URL. GitLab will take care of redirecting Git
operations so there is no hard deadline, but we recommend doing this
as soon as the projects have been renamed.
Where can I see a timeline of the remaining steps?