As part of our effort to support a cloud native deployment, we should consider moving away from the Docker all in one image that we officially support.
Having one service per container would allow us to have more flexibility at GitLab.com scale, and better align with cloud native best practices. This would also allow us to cover a section of the market where our omnibus-gitlab package or Docker all in one were not possible to use, due to requiring root permissions.
We should consider creating separate docker images for:
For installation, we would utilize a single gitlab Helm chart. Underneath the covers, it would utilize the sub charts and docker containers we have built through this issue.
This would allow us to not only create charts that are simpler to scale, but also allow more flexibility with external services. These images could still be built in one place together with the rest of our build process.
Separating services would also allow us to remove the root requirement that we currently have in our official image.
To make things clear, we would not be separating the existing Docker image but we would be introducing the 3rd option. This 3rd option would be available through Helm Charts at first, down the line we could add other options.
It has most of the gitlab services decomposed into separate containers and uses an automated LetsEncrypt for SSL deployment.
Containers based on official images:
HAProxy
CertBot
NGINX (for certbot verification)
Redis
PostgreSQL
Containers based on GitLab's docker gitlab-ce Omnibus container:
NGINX (proxy for workhorse)
workhorse
unicorn
shell
sidekiq
registry + NGINX proxy for registry
Does not currently implement (would be fairly easy to add):
Mattermost
Pages
Runner
Prometheus
NFS or other storage engine
Currently it is running omnibus reconfiguration inside of the gitlab-ce based service containers on first start, rather than at container-build time of the derived container.
The primary goal for us with this issue though is to create individual containers, not based upon the Omnibus image.
Let's go all the way and rid ourselves of Chef and Knife in this context. The offering we shared (MIT License) may be a good baseline for what you describe. The Cloud Git Cluster is already:
shared nothing (other than user-data and secrets access keys)
there was a lot of learning of the non-obvious interdependencies of existing services due to their sharing of configuration data in a shared filesystem.
all ingress is SSL terminated at the point of entry to the cluster
a number of the internal services made assumptions about their immediate downstream connection being SSL
utilizing an encrypted overlay network (via swarm or rancher) for:
roles to talk to each other even on different nodes (no need for private VPN, SSL key management, or each role to provide its own SSL termination)
service discovery (no need for consul, etcd)
has secrets management available (via swarm)
has no privileged root for any of the roles.
container root and privileged root are two separate things in this deployment, and
as you suggest, getting rid of container root (except at docker build time) would be a good thing
discovered -- several roles actually use redis and postgresql access outside of the unicorn role.
discovered -- identity, authentication, and general authorization permissions need to be moved out of the base rails application controller into their own services
discovered -- there needs to be a completely separate db migration role.
discovered -- there needs to be a completely separate key/secrets generation role.
discovered -- there were unmanaged race conditions within service order configuration/dependency graph
This will reduce the overall size, complexity, and some requirements as well (like root access).
one thing learned from this tooling is that the minimum node that can run all services is 4GB ram. (2GB falls over with swap and OOM killer, even on a site with ZERO front end access -- a research issue)
the biggest challenge we see in terms of engineering resources for containerization and orchestration is addressing the issue of behavioral test coverage from a URL driven test framework.
Each and every route with all of its parameters needs to be fuzzed via URL and parameters to have a high degree of confidence for fault tolerance to achieve > 99.9% availability.
Need to track performance data from the URL driven test framework and fail performance regressions as early as possible in the testing process.
reduce the overall size, complexity -- to go all the way with this theme, many of the rails controllers and their supporting GEMS needs to be separated into their own versioned container/role.
As for context, the Cloud Git Cluster is part of (POC for) a more ambitious project and we would love to collaborate on your requirements as part of the greater good.
If nothing else, the offering is a good prototyping framework for whatever direction GitLab chooses to go with container services.
Still WIP -- it is a little rough around the edges with respect to logging and full system integration test (GitLab actually needs an end-user perspective functionality and performance test suite in addition to all of the unit tests it already has)
My 2 cents (which in this case, might be putting too high a price on it). We've been seriously exploring GitLab Enterprise as a way to replace Github/Jira/Jenkins. I've spent time installing it in a number of different ways: from source, the Omnibus package, into Kubernetes, and (my personal favorite) Rancher; the last two using Docker, obviously. For Docker, we ran across sameersbn/docker-gitlab and it fit the bill... almost. Serious props to sameersbn and @solidnerd for their work on that docker image. It's frankly pretty amazing and the documentation is at a level that one rarely sees for such a project. One of the only downsides has been configuration. While it's eminently configurable, almost all of the examples and documentation one can find on the web for a GitLab newbie like myself is targeted at the Omnibus package and it's different enough that we initially struggled mightily with some things that eventually turned out to be pretty simple.
Due to this last issue, as well as a few small other ones, this week I'm setting up an alternative GitLab deployment based on the official Docker images (which is how I ran across this ticket). It will be an interesting comparison and it's why I have a keen interest in the subject of this ticket.
Looking at the initial proposition here, I appreciate that it, correctly I think, leaves the storage layers out of the picture. As much as I wish it weren't so, databases and Docker still aren't a good production combo. I firmly believe we'll get there; either with Docker, or whatever container-ish technology comes next, but just not yet.
Regarding the cloud-belt gitlab-stack... again, amazing work, but it feels like replacing one monolithic stack with another that is instead driven by a static compose file and startup/shutdown scripts. Someone with the right knowledge and experience can de-compose some of the elements and build something that, for instance, leaves the storage layers out of it and doesn't utilize the defined networks, etc., but then we've lost much of benefit and work that was originally put into it. When I see this, I ask myself, "well, what am I going to have to change to get this working in Rancher/Kubernetes/Swarm/DC-OS?", and in the case of this setup, the answer would be, "a lot."
On that last point, I know the part of the reason behind both the cloud-belt work and this ticket are to adhere to the mantra of one-container-one-process, but even the official Docker docs state this as "one container, one concern". I don't know nearly enough about GitLab's processes to know if each of the items listed in the original proposal really are separate "concerns" and may have the need to scale independently of each other. If so, then yeah, it makes complete sense to break things apart, but if not, then it seems that it may just be adding unnecessary complexity.
Anyway, lots of rambling there. Back to a PoC using the existing official images in our Rancher environment for me!
Sounds like we have an alignment on goals and my team would love to incorporate more of your feedback into the cloud-belt project -- may save your team some time and effort if we collaborate. Yes, cloud-belt/cloud-git-stack is currently somewhat monolithic based on the single docker-compose file -- that is temporary while we were learning how to separate the various concerns and still have a working deployment.
> I know the part of the reason behind both the cloud-belt work and this ticket are to adhere to the mantra of one-container-one-process
The focus thus far has been learning -- cloud-belt is not wedded at all to the idea of one container per process.
> well, what am I going to have to change to get this working in Rancher/Kubernetes/Swarm/DC-OS ...
Let us know what you see as your requirements and my team is excited to collaborate and incorporate those.
> Back to a PoC using the existing official images in our Rancher environment for me!
which `official images` ? And ... Rancher is my favorite deployment environment right now ... kudos to Sheng and his team.
> I appreciate that it, correctly I think, leaves the storage layers out of the picture.
I disagree -- formal management of the data formats, whether in a persistent store, or on-the-wire, is paramount to longevity, stability, and scalability of an architecture and the availability of the deployments. The functional logic is a distant third in my book. (API interfaces are 2nd in my career experience)
Just look for a moment where the GitLab.com team has been most challenged in their site availability in this calendar year -- it is all about storage and deployments, and the fact that they have not [yet] architected a fault-tolerant logic layer above their storage layers. Creating HA via `failover` of a service such as Redis or PostgreSQL just doesn't cut it in my book -- ultimately everything needs to be partitioned into small redundant distributed chunks -- both the structured and unstructured data, AND needs to account for a dynamic mismatch between data formats (on the wire and persistent) and the logic bundles that are interacting with the data.
IMO - the GitLab team could save themselves a lot of growing pains and dramatically improve the availability of `GitLab.com` if they sharded EVERYTHING at a `project` level first, then pursued sharding of each of the independent storage components [Redis which has the concerns of (Caching, Queues, Shared State), PostgreSQL - which implements the concern of persistence for ActiveRecord ORM in Ruby, and Posix FS [git + blobs] as a later phase of decomposition into separate scalable concerns. I liked Facebooks' interim approach (circa 2009) of using WebDav for an intermediate step of their migration from Posix to Object Storage at a time where they were ingesting 100TB of new data per week of 200kb file blobs -- it avoided the scaling problems that CephFS and Gluster both still have with their Posix front-ends.
That being said, it would be really cool for to have drop-in containers or service API for the various storage components that `just work` and are fully fault tolerant at a globally distributed level. [ i.e.: Redis-like, PGSQL-like, Posix FS] for Docker/Rancher/K8S/DC-OS generally for any deployment.
All of that being said, what can we best do to benefit the community at large?
I have been working on #2441 (closed) (standalone registry in helm, using official container), and as a part of that, have reviewed everything here.
My concern at this point is coordination of all of the services. We're trying to get away from the monolithic structure of Omnibus, but creating a gigantic cluster & namespace is not the route we're likely going to end up, as that doesn't allow us the ability to safely "spread the load" and create HA via geographically disparate locations and possibly even services. We'll need to be able to safely coordinate the necessary configuration items across not only pods or namespaces, but entire clusters from any point elsewhere in the wider network.
To this end, we can't rely on Kubernetes secrets and most definitely not on shared mounts.*(1) What other options do we have? I've considered the possibility of Consul as we're beginning to package that into Omnibus GitLab and thus have existing applicable knowledge. This would require that this be online and secured prior to other parts of entire suite of services, as far as I can tell. The trick here, is that we'd want to be able to make use of that to perform the configuration of other services from the Helm charts.
Ideally speaking, we want to use upstream origin containers whenever possible, and not modify any part of those containers. How, if at all, can we do this when items such as the registry require having the FQDN of the GitLab endpoint for authentication pass through (auth.token.realm), as well as the root certificate bundle from that server (auth.token.rootcertbundle)*(2)? While the GitLab (gitlab-rails) service FQDN may be known, the generated certificate bundle may not be. We'll need this information: either pre-generated so that it can be populated, or to be captured after spin-up. At that point we'll need an accessible, secured, and preferably resilient method to share the information to all concerned containers.
Notes:
There is a minor exception to shared mounts, which is that one could make use of a cloud storage mount such as S3 or GCS, but there are severe security concerns about sharing sensitive communication keys on such a platform.
Omnibus GitLab does the generation of this in cookbooks/gitlab/libraries/registry.rb#L109
@WarheadsSE I would love to collaborate with you on your requirements.
We'll need to be able to safely coordinate the necessary configuration items across not only pods or namespaces, but entire clusters from any point elsewhere in the wider network.
Both Rancher and docker-swarm do this for my team. We added a layer of node labelling that is used for autoscale (up/down).
we can't rely on Kubernetes secrets and most definitely not on shared mounts
The latest docker-swarm has distributed secrets with RAFT consensus protocol
the lack of end-to-end integration tests for a deployed system
zero downtime deployments will require multiple versions of each service to run concurrently, in parallel, while not interfering with the downlevel version of the same service, and while being able to communicate to both upstream and downstream services that may be at a different rev-level. This requires some different thinking about high level architecture above and beyond the MVC of Rails and Active Record.
@techguru isn't the entire idea around decomposing to MSA is so each service can have an independent lifecycle? Your #2 (closed) concern should be solved issue. Is orchestration and message delivery is within the scope of this issue?
I think each service should use generic tcp or rpc endpoint. When deployed the services can be sidecar'd with a service mesh or other messaging delivery infrastructure depending on the deployment strategy. I personally would want to use Linkerd.
isn't the entire idea around decomposing to MSA is so each service can have an independent lifecycle?
of course. This particular issue does not seem to reach that far in its stated goals.
there is a lot of room to interpret the definition of service. From what I see with GitLab, more scalability would come from packaging a lot of the Ruby internal MVC "Controllers" and "Models" as their own Gems ... the current monolith application has hundreds of controllers on about 100 models, using hundreds of 3rd party Gems in total, mostly independent from each other, and weighs in at 400+ MB for a single "service"
Is orchestration and message delivery is within the scope of this issue?
I have not seen any of the GitLab internal folks comment on this publicly (in this or other issue). It is a shift that must be made in order to achieve better than 99.9% SLA when running on a cloud infrastructure that is only 99.9% SLA.
LinkerD
Yes, there are so many people reinventing and repacking the wheel ... LinkerD has the advantage of being promoted by the CNCF.
GitLab does not yet appear to be at the point of adopting a formal RPC strategy between services -- right now most of the interconnect is HTTP without IDL, supplemented by opaque streams such as Redis, NFS and PostqreSQL.
In regards to the current goals for separation of duties, GitLab currently intends to separate the services currently a part of the available Omnibus GitLab. That does not currently include the separation of the gitlab-rails component(s). We would, for now, have a container that contains the gitlab-rails component(s) (read https://gitlab.com/gitlab-org/gitlab-ce) as a complete code base, and then configured to behave a single behavior such as Unicorn API, Unicorn Web, and Workhorse.
In the future we will consider further separating the individual concerns within the codebase into separate projects, however that is much further into the future and extends well beyond the Build Team and throughout all of the company.
Our current and foreseeable target is Kubernetes with Helm, so I do not currently see docker-swarm and related tooling as an option though they may future possibilities/alternatives. This choice was made explicitly for ease of integration with existing cloud providers. (cc @joshlambert )
In regards to the component interconnections, refactoring those methods is not within any part of the current scope of work, but allow me to point out that Gitaly is intended to remove any NFS component from any node other than the data storage / Gitaly endpoint hosts which have the final say on repository contents (where the files are actually housed).
@WarheadsSE I appreciate your clarity in the statement of intention
then configured to behave a single behavior such as Unicorn API, Unicorn Web, and Workhorse
are Unicorn Web and Unicorn API currently separated in any meaningful way other than front end routing to otherwise identical code instances?
isn't Workhorse is already a separate concern written in Go?
Gitaly is intended to remove any NFS component from any node other than the data storage
Gitaly is replacing git shared mounts, I have seen no mention of it replacing the other dozen uses of NFS storage shared between nodes (registry data store, secrets, configs, LFS, build artifacts, etc)
the generated certificate bundle may not be. We'll need this information: either pre-generated so that it can be populated, or to be captured after spin-up. At that point we'll need an accessible, secured, and preferably resilient method to share the information to all concerned containers.
In my experience, sharing a single secret for the entire cluster is asking for trouble and preventable downtime ... I advocate bulk-heading by creating separate failure zones even at this level, so that if one certificate is compromised, the remainder of the global cluster can keep running.
For example, with the registry service to have multiple clusters of registry nodes, each cluster having its own private key, and then the JWT for gitlab-ce having copies of all of the corresponding public keys ... this would be an architectural change for run-time rather than build or deployment-time configuration of those keys.
This separation is also relevant when you have some nodes running registry-service:1.n and other nodes running registry-service:1.(n+1) during a zero-downtime deployment ... they may want different keys.
@techguru in regards to the Docker Distribution / Registry and private keys: the GitLab instance with JWT has the private key, and the registry containers/service have only a full cert bundle from that instance.
Joshua Lambertchanged title from Create a docker image per GitLab service to Create a docker image and chart per GitLab service
changed title from Create a docker image per GitLab service to Create a docker image and chart per GitLab service
I'm glad that we can have a better deployment of gitlab (I've personally felt that gitlab seems so close to being great on k8s, but it has/had some monolithic areas) - so as yet another guy that maintains a gitlab chart, thanks for making a better one!
That said, I was wondering if you had looked at / thought about doing
istio integration
using an operator
The reason for the operator is to help on things like backup/restore (gitlab seems to only do a partial backup, and while I understand the desire to store those things in separate locations, I'd still like to have an off-the-shelf answer to backup/restore. Also things like growing the persistent storage (wouldn't it be great if this was done automatically?) the list goes on.
If you're interested in it, maybe we can work together on doing it. I appreciate that helm makes things easy (but there's no reason why an operator and a CRD cannot be delivered through a helm chart)
An operator is a great suggestion, will add it to the issue so we can discuss.
On Istio, that is pretty interesting. I'm not sure what our plans are for service-mesh like functionality, but will open an issue to discuss with our infrastructure team. https://gitlab.com/gitlab-com/infrastructure/issues/2770