Balasankar C
--- a/doc/architecture/README.md

+ 8

− 8
+++ b/doc/architecture/README.md

+ 8

− 8
 @@ -93,7 +93,7 @@ Omnibus-GitLab repository uses ChefSpec to test the cookbooks and recipes it shi
 So, of the components described above, some (software definitions, project metadata, tests, etc.) find use during the package building, in a build environment, and some (Chef cookbooks and recipes, GitLab configuration file, Runit, gitlab-ctl commands, etc.) are used to configure the user's installed instance.

 ## Work life cycle of Omnibus-GitLab
-### What happens when packages are built using our CI
+### What happens during package building

 The type of packages being built depends on the OS the build process is run. If build is done on a Debian environment, a `.deb` package will be created. What happens during package building can be summarized to the following steps
 1. Fetching sources of dependency softwares
 @@ -112,19 +112,19 @@ The type of packages being built depends on the OS the build process is run. If

 Omnibus uses two types of cache to optimize the build process- one to store the software artifacts (sources of dependent softwares), and one to store the project tree after each software component is built

-##### Software artifact cache
+##### Software artifact cache (for GitLab Inc builds)

-Software artifact cache uses an Amazon S3 bucket to store the sources of the dependent softwares. In our build process, this cache is populated using the command `bin/omnibus cache populate`. This will pull in all the necessary software sources from the Amazon bucket and store it in the necessary locations. When there is a change in the version requirement of a software, omnibus pulls it from the original upstream and add it to the artifact cache. This process is internal to omnibus and we configure the Amazon bucket to use in [omnibus.rb](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/omnibus.rb) file available in the root of the repository. So this cache ensures availability of the dependent softwares even if their original upstream remotes go down.
+Software artifact cache uses an Amazon S3 bucket to store the sources of the dependent softwares. In our build process, this cache is populated using the command `bin/omnibus cache populate`. This will pull in all the necessary software sources from the Amazon bucket and store it in the necessary locations. When there is a change in the version requirement of a software, omnibus pulls it from the original upstream and add it to the artifact cache. This process is internal to omnibus and we configure the Amazon bucket to use in [omnibus.rb](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/omnibus.rb) file available in the root of the repository. This cache ensures availability of the dependent softwares even if their original upstream remotes go down.

 ##### Build cache

-A second type of cache that plays an important role in our build process is the build cache. Build cache can be described in simple words as snapshots of the project tree (where the project actually gets built - `/opt/gitlab`) after each dependent software is built. To understand it easily, consider a project with 5 dependent softwares - A, B, C, D and E. For simplicity, we are not considering the dependencies of these individual softwares. So the build order is A -> B -> C -> D -> E. Build cache makes use of git tags to make snapshots. So, after each software is built, a git tag is computed and committed. Now, consider we made some change to the definition of software D. A, B, C and E remains the same. So, when we try to build again, omnibus can reuse the snapshot that was made before D was built in the previous build. So, the time taken to build A, B and C can be saved as it can simply checkout the snapshot that was made after C was built. Omnibus uses the snapshot just before the software which "dirtied" the cache (dirtying can happen either by a change in the software definition, a change in name/version of a previous component, or a change in version of the current component) was built. So, if in a build, there is a change in definition of software A, it has dirtied the cache and hence A and all the following dependencies get built from scratch. If C dirtied the cache, A and B gets reused and C, D and E gets built again from scratch.
+A second type of cache that plays an important role in our build process is the build cache. Build cache can be described in simple words as snapshots of the project tree (where the project actually gets built - `/opt/gitlab`) after each dependent software is built. To understand it easily, consider a project with 5 dependent softwares - A, B, C, D and E, built in that order. For simplicity, we are not considering the dependencies of these individual softwares. Build cache makes use of git tags to make snapshots. After each software is built, a git tag is computed and committed. Now, consider we made some change to the definition of software D. A, B, C and E remains the same. When we try to build again, omnibus can reuse the snapshot that was made before D was built in the previous build. Thus, the time taken to build A, B and C can be saved as it can simply checkout the snapshot that was made after C was built. Omnibus uses the snapshot just before the software which "dirtied" the cache (dirtying can happen either by a change in the software definition, a change in name/version of a previous component, or a change in version of the current component) was built. Similarly, if in a build there is a change in definition of software A, it will dirty the cache and hence A and all the following dependencies get built from scratch. If C dirties the cache, A and B gets reused and C, D and E gets built again from scratch.

-So, this cache makes sense only if it is retained across builds. For that, we use the caching mechanism of GitLab CI. We have a dedicated runner which is configured to store its internal cache in an Amazon bucket. So, before each build, we pull in this cache (`restore_cache_bundle` target in out Makefile), move it to appropriate location and start the build. It gets used by the omnibus until the point of dirtying. After the build, we pack the new cache and tells CI to back it up to the Amazon bucket (`pack_cache_bundle` in our Makefile).
+This cache makes sense only if it is retained across builds. For that, we use the caching mechanism of GitLab CI. We have a dedicated runner which is configured to store its internal cache in an Amazon bucket. Before each build, we pull in this cache (`restore_cache_bundle` target in out Makefile), move it to appropriate location and start the build. It gets used by the omnibus until the point of dirtying. After the build, we pack the new cache and tells CI to back it up to the Amazon bucket (`pack_cache_bundle` in our Makefile).

-So, this cache helps in reducing the overall build time of GitLab.
+Both types of cache reduce the overall build time of GitLab and dependencies on external factors.

-So the cache mechanism can be summarised as follows:
+The cache mechanism can be summarised as follows:
 1. For each software dependency
  1. Parse definition to understand version and SHA256
  1. If the source file tarball available in artifact cache in Amazon bucket matches the version and SHA256, use it
 @@ -141,4 +141,4 @@ So the cache mechanism can be summarised as follows:

 ### What happens during `gitlab-ctl reconfigure`

-One of the commonly used commands while managing a GitLab instance is `gitlab-ctl reconfigure`. This command, in short, parses the config file and runs the recipes with the values supplied from it. The recipes to be run are defined in a file called `dna.json` present in the `embedded` folder inside the installation directory (This file is generated by a software dependency named `gitlab-cookbooks` that is defined in the software definitions). In case of GitLab CE, the cookbook named `gitlab` will be selected as the master recipe, which in-turn invokes all other necessary recipes, including runit. So, reconfigure is basically a chef-client run that configures different files and services with the values provided in configuration template.
+One of the commonly used commands while managing a GitLab instance is `gitlab-ctl reconfigure`. This command, in short, parses the config file and runs the recipes with the values supplied from it. The recipes to be run are defined in a file called `dna.json` present in the `embedded` folder inside the installation directory (This file is generated by a software dependency named `gitlab-cookbooks` that is defined in the software definitions). In case of GitLab CE, the cookbook named `gitlab` will be selected as the master recipe, which in-turn invokes all other necessary recipes, including runit. In short, reconfigure is basically a chef-client run that configures different files and services with the values provided in configuration template.