Improve startup and rescan performance of GitLab Pages

For you to be aware of this, as this affect starting time of pages daemon. Currently, pages daemon takes 1min to start.

mentioned in issue gitlab-com/infrastructure#1058 (closed)

There are two ways:

We should hook GitLab Pages to Redis and make GitLab Rails to publish all domains in Redis with project mapping and TLS certificate. That way we would always have a full data available in Redis.
Create a configuration file with all domains serialized to single file, but this will take a long time to publish a configuration with 5k of custom domains every time.

For example 2., If we would create a file with list of all custom domains (and possibly a list of pages configured):

Benchmark.measure { PagesDomain.joins(project: :namespace).pluck(:domain, "concat(namespaces.path, ' ', projects.path)").to_json.length }
=> #<Benchmark::Tms:0x007faf8a035ca0 @label="", @real=0.1558125102892518, @cstime=0.0, @cutime=0.0, @stime=0.009999999999999787, @utime=0.04999999999999716, @total=0.059999999999996945>

Uncompressed file: `241241 bytes`.

This would be good acceleration structure for having a list of domains with all namespaces. The configs for these domains could then be read lazily (certificate, key) when needed.

Ideally the solution should 1. Maybe this is generally a better way to move forward.

Moving to %9.1, because %8.17 has passed!

changed milestone to %9.1

We finally have monitoring on this, just to stress the bad performance right now; the update process could take up to 5 minutes on .com.

I was reading through the Rails side of this, this week, and thought maybe we could let the pages daemon behave more like a runner? So we have a special job which can be only picked up by the runner where it only updates the pages site which is needed? We could also have a trace where we can show important information for the user? Not sure how this would work internally yet, and if it would work at all, but seems like the work that GL is performing now could be transferred to the pages daemon?

Startup time remains an issue this way, btw.

This missed %9.1(!) and %9.2 but I agree it should be a priority. I'll try to get on it.

We should be able to process these in the background, and do an eager load of the configuration if a request comes in for a domain that hasn't been processed yet. Even on GitLab.com, request volume is far lower than number of domains served.

I'll try to do it for %9.3 - this should fix #31 too.

changed milestone to %9.3

assigned to @nick.thomas

It's occurred to me that the list of custom domains is spread out across config files in the pages-root. As-is, that makes it difficult to "do an eager load of the configuration".

We may need to modify the data we pass from rails to make this possible, or try to come up with an alternative, along the lines of @ayufan's suggestions from 4(!) months ago.

Off the top of my head, I like 2) for a first iteration. Allowing that map to be published in redis later would be an incremental enhancement on top of that.

This seems certain to slip to %9.4 - I've got a WIP MR for gitlab-rails, but nothing for pages yet.

changed milestone to %9.4

Custom domains are the sticking point here. gitlab-org/gitlab-ce!11830 passes them through the .update file, but handling https://gitlab.com/gitlab-org/gitlab-pages/issues/68 may need us to behave differently.

A more future-proof - and generally better - alternative might be to "flatten" the custom domains. Right now, we do the following:

shared/
  pages/
    group/
      project/
        public/
          index.html
        config.json
          +> `Domains: ["example.com"]`
          |> `Key: "..."`
          |> `Certificate: "..."`

Instead, we could do this:

shared/
  pages/
    .custom-domains/ # avoid conflicts with group names
      example.com/
        config.json
          +> Key: "...."
          |> Certificate: "...."
          |> WWWRoot: "group/project/public"
        public
          +> "group/project"
    group/
      project/
        public/
          index.html

In short, we reify custom domains onto the filesystem and allow one project to reference another by providing a public file, instead of a public directory. With this setup, we don't need to scan the filesystem at startup at all. We just take the Host header / SNI value and look it up on the filesystem directly.

We can provide a Rake task in GitLab CE to convert between layouts, and support both mechanisms in GitLab pages and CE for multiple releases, to avoid a downtime window.

Each custom domain would get a copy of the config.json, but I don't think that's a significant problem.

/cc @ayufan

removed milestone

This is actually the very clever idea. I believe that we could have transition period: prepare migration to generate domains, but still use old update mechanism. Then. We don't need to prepare rake migration.

Having the folder with domains and resolve based on this 💯

In long term I wonder then if we would like to migrate to Redis for doing that?

@nick.thomas I wonder if our config.json should not include the path where the public folder is stored too.

It makes then possible for us to create structure with .custom-domains/example.com/.config.json (for example.com) and .pages/example/group/.config.json (for example.gitlab.io/group).

We could start storing new pages in new folder, and slowly migrate all pages from old to new location. I would envision that new location would be:

> projects/part-of-id/id/generated-identifier-of-deployment/public

Instead of current:

group/project/public

This also allows us to start supporting subgroups.

This also then scales well with supporting object storage, as we can have in-memory cache for config files.

@ayufan I'm not a huge fan of putting the data into redis, as I think Pages should keep running if redis is down; you may also want to run Pages in a security or availability context with no access to redis at all.

Putting the path to the destination in config.json instead of in a separate public file makes perfect sense, I'll amend the proposal to incorporate that.

I don't think we can get away from having a Rake task to migrate the data. No matter how long a transition period, we can't guarantee that every site will be rebuilt, or even accessed, within it.

We have background migrations. If with Background Migrations we connect first building a separate model to hold pages information for project we would know which projects are migrated.

I know that rake tasks are hard and is better to say that we don't support other method of operation.

Redis would help as cache when we would have to run Pages in HA configuration, accessing FS/OS is much more demanding.

The FS (or object store, if we move to that) always has to be available - it's where the HTML is stored. Adding redis as well can only make Pages less reliable, and does nothing to help enable a HA environment.

removed assignee

changed milestone to %Backlog

Improve startup and rescan performance of GitLab Pages

Designs

Child items 0

Activity

Admin message

Admin message

Improve startup and rescan performance of GitLab Pages

Activity