Macro UX initial experiences for Kubernetes Agents

934102ee · Shinya Maeda · d20b14cd · 934102ee · 934102ee
Commit 934102ee authored 3 years ago by Shinya Maeda
--- a/doc/stories/index.md
+++ b/doc/stories/index.md
+---
+stage: Multiple stages
+group: Macro UX
+info: The page to describe the stories.
+---
+
+# Stories at GitLab
+
+Welcome to the stroies.
+This documentaion show-cases how a person accomplishes their mission within GitLab.
+Each story begins with a clear definition of personas, job role and its resposibilities.
+And, it describes how the person actually accomplish their tasks with GitLab features.
+
+Unlike the other feature centric documentation, stories exclusively focus on user jorney,
+so various features could be used across different DevOps stages.
+This is also used for measuring usability factors from [macro perspective](https://about.gitlab.com/handbook/engineering/ux/product-design/#macro-ux).
+
+Here are the stories:
+
+- [Platform Engineer and Kubernetes Deployment](kubernetes_deployment.md)
--- a/doc/stories/kubernetes_deployment.md
+++ b/doc/stories/kubernetes_deployment.md
+---
+stage: Multiple stages
+group: Macro UX
+info: The page to describe a story related to Platform Engineer and Kubernetes Deployment.
+---
+
+# Platform Engineer and Kubernetes Deployment
+
+## Introduction
+
+I am a Platform Engineer in XYZ coorporation.
+Recently, my company launched a new project that runs an ecommerce website.
+My team consists of one developer, one product manager and one platform engineer,
+and I'm in a charge of deploying the web application to the production environment
+in a scalable, safe and easy-maintenance fashion.
+
+## Chapter: Launch the website
+
+After a few weeks of the initial sprint, the very first GA version of application
+has been developed.
+Now we have to run the application on somewhere on cloud services.
+
+According to the development note,
+the application is compatible with container orchestration platforms,
+so I decided to use a Kubernetes cluster on Google Cloud Platform.
+I have to create and configure the cluster at first.
+Once it's ready, I'll deploy the application image to the
+cluster and make sure that the website is accessible from end-users.
+
+### How I accomplished this task
+
+- Create a Kubernetes Cluster on GKE
+- Setup [Kubernetes Agent](https://docs.gitlab.com/ee/user/clusters/agent/install/index.html)
+- Create a manifiest for Ingress Controller. This will be pulled and applied by the agent.
+- Get an external endpoint via CLI `kubectl get svc --all-namespaces`
+- Setup [Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/stages.html#auto-deploy)
+  - Fix the ports `.gitlab/auto-deploy-values.yaml`. https://docs.gitlab.com/ee/topics/autodevops/troubleshooting.html#error-release--failed-timed-out-waiting-for-the-condition
+  - Setup PostgreSQL `POSTGRES_USER` variable.
+  - Input the DB information at the initial installation.
+- Access to the website
+
+## Chapter: Keep shipping a new feature
+
+The website was successfully launched. In order to expand the active user base,
+our team decided to add a new feature every week.
+To keep the production environment up-to-date, I have to repeat the manual deployment process
+every month, however, this is not sustainable approach that could fail by a human error.
+
+So I decided to setup Continous Deployment pipeline to
+automate the deployment process.
+
+### How I accomplished this task
+
+- Customize `.gitlab-ci.yml` to control the deployment job.
+
+## Chapter: Monitor the infrastructure
+
+As the website grows, more users are visiting to the website, which means
+more computation resource is required on the server.
+I have to monitor the status of the performance load of the production cluster
+to make sure that the server is not overwhelmed by many accesses.
+I also have to setup an alert system to get a notification when the error rate of server responces goes up high,
+so that I can quickly jump on the incident investigation and mitigation.
+
+### How I accomplished this task
+
+- Setup Promehteus instance in the cluster.
+  - Add a manifest file to create Prometheus resource.
+  - Pull-based Agenet applies the config to the cluster.
+- Setup [Prometheus Integration](https://docs.gitlab.com/ee/user/project/integrations/prometheus.html)
+
+## Chapter: Scale up the infrastructure
+
+One day I got a message from a product manager that
+rendering website pages takes a long time, thus it's frustrating end-users.
+
+After some investigation, I realized that CPU usage of the server is saturated at 100%,
+so there are not enough resources to handle the large number of requests.
+So I decided to scale up the cluster nodes to resolve the performance issue.
+
+### How I accomplished this task
+
+- [Set `REPLICAS` to increase the number of pods](https://docs.gitlab.com/ee/topics/autodevops/customize.html)
+- Run a pipeline.
+
+## Chapter: Rollback when an incident happens
+
+One day I got an alert that all HTTP requests to the web server were resulting in errors.
+
+After some investigation, I realized that the latest application codebase had a critical bug.
+So I decided to rollback to the previous stable version in order to quickly mitigate the
+production incident.
+
+### How I accomplished this task
+
+- [Rollback a deployment](https://docs.gitlab.com/ee/ci/environments/#environment-rollback)
+  - Visit the production environment and find our the stable deployment.
+  - Click rollback button on the specific deployment.
+
+## Chapter: Measure the deployment frequency
+
+One day my manager asked me about the performance of our DevOps project
+based on [the four keys to measure](https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance).
+
+So I decided to measure our deployment frequency and share it with my manager.
+
+### How I accomplished this task
+
+- Check [DORA metrics](https://docs.gitlab.com/ee/user/analytics/ci_cd_analytics.html#devops-research-and-assessment-dora-key-metrics).