Skip to content
Snippets Groups Projects
Commit 934102ee authored by Shinya Maeda's avatar Shinya Maeda
Browse files

Macro UX initial experiences for Kubernetes Agents

parent d20b14cd
No related branches found
No related tags found
No related merge requests found
---
stage: Multiple stages
group: Macro UX
info: The page to describe the stories.
---
# Stories at GitLab
Welcome to the stroies.
This documentaion show-cases how a person accomplishes their mission within GitLab.
Each story begins with a clear definition of personas, job role and its resposibilities.
And, it describes how the person actually accomplish their tasks with GitLab features.
Unlike the other feature centric documentation, stories exclusively focus on user jorney,
so various features could be used across different DevOps stages.
This is also used for measuring usability factors from [macro perspective](https://about.gitlab.com/handbook/engineering/ux/product-design/#macro-ux).
Here are the stories:
- [Platform Engineer and Kubernetes Deployment](kubernetes_deployment.md)
---
stage: Multiple stages
group: Macro UX
info: The page to describe a story related to Platform Engineer and Kubernetes Deployment.
---
# Platform Engineer and Kubernetes Deployment
## Introduction
I am a Platform Engineer in XYZ coorporation.
Recently, my company launched a new project that runs an ecommerce website.
My team consists of one developer, one product manager and one platform engineer,
and I'm in a charge of deploying the web application to the production environment
in a scalable, safe and easy-maintenance fashion.
## Chapter: Launch the website
After a few weeks of the initial sprint, the very first GA version of application
has been developed.
Now we have to run the application on somewhere on cloud services.
According to the development note,
the application is compatible with container orchestration platforms,
so I decided to use a Kubernetes cluster on Google Cloud Platform.
I have to create and configure the cluster at first.
Once it's ready, I'll deploy the application image to the
cluster and make sure that the website is accessible from end-users.
### How I accomplished this task
- Create a Kubernetes Cluster on GKE
- Setup [Kubernetes Agent](https://docs.gitlab.com/ee/user/clusters/agent/install/index.html)
- Create a manifiest for Ingress Controller. This will be pulled and applied by the agent.
- Get an external endpoint via CLI `kubectl get svc --all-namespaces`
- Setup [Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/stages.html#auto-deploy)
- Fix the ports `.gitlab/auto-deploy-values.yaml`. https://docs.gitlab.com/ee/topics/autodevops/troubleshooting.html#error-release--failed-timed-out-waiting-for-the-condition
- Setup PostgreSQL `POSTGRES_USER` variable.
- Input the DB information at the initial installation.
- Access to the website
## Chapter: Keep shipping a new feature
The website was successfully launched. In order to expand the active user base,
our team decided to add a new feature every week.
To keep the production environment up-to-date, I have to repeat the manual deployment process
every month, however, this is not sustainable approach that could fail by a human error.
So I decided to setup Continous Deployment pipeline to
automate the deployment process.
### How I accomplished this task
- Customize `.gitlab-ci.yml` to control the deployment job.
## Chapter: Monitor the infrastructure
As the website grows, more users are visiting to the website, which means
more computation resource is required on the server.
I have to monitor the status of the performance load of the production cluster
to make sure that the server is not overwhelmed by many accesses.
I also have to setup an alert system to get a notification when the error rate of server responces goes up high,
so that I can quickly jump on the incident investigation and mitigation.
### How I accomplished this task
- Setup Promehteus instance in the cluster.
- Add a manifest file to create Prometheus resource.
- Pull-based Agenet applies the config to the cluster.
- Setup [Prometheus Integration](https://docs.gitlab.com/ee/user/project/integrations/prometheus.html)
## Chapter: Scale up the infrastructure
One day I got a message from a product manager that
rendering website pages takes a long time, thus it's frustrating end-users.
After some investigation, I realized that CPU usage of the server is saturated at 100%,
so there are not enough resources to handle the large number of requests.
So I decided to scale up the cluster nodes to resolve the performance issue.
### How I accomplished this task
- [Set `REPLICAS` to increase the number of pods](https://docs.gitlab.com/ee/topics/autodevops/customize.html)
- Run a pipeline.
## Chapter: Rollback when an incident happens
One day I got an alert that all HTTP requests to the web server were resulting in errors.
After some investigation, I realized that the latest application codebase had a critical bug.
So I decided to rollback to the previous stable version in order to quickly mitigate the
production incident.
### How I accomplished this task
- [Rollback a deployment](https://docs.gitlab.com/ee/ci/environments/#environment-rollback)
- Visit the production environment and find our the stable deployment.
- Click rollback button on the specific deployment.
## Chapter: Measure the deployment frequency
One day my manager asked me about the performance of our DevOps project
based on [the four keys to measure](https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance).
So I decided to measure our deployment frequency and share it with my manager.
### How I accomplished this task
- Check [DORA metrics](https://docs.gitlab.com/ee/user/analytics/ci_cd_analytics.html#devops-research-and-assessment-dora-key-metrics).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment