Skip to content
Snippets Groups Projects
Commit a32581ed authored by Ahmad Sherif's avatar Ahmad Sherif
Browse files

Add an alert for Sidekiq stats no longer showing

Closes #6
parent e49a6ce1
No related branches found
No related tags found
1 merge request!140Add an alert for Sidekiq stats no longer showing
Loading
Loading
@@ -17,6 +17,7 @@ The aim of this project is to have a quick guide of what to do when an emergency
* [Kibana is down](troubleshooting/kibana_is_down.md)
* [SSL certificate expires](troubleshooting/ssl_cert.md)
* [GitLab registry is down](troubleshooting/gitlab-registry.md)
* [Sidekiq stats no longer showing](troubleshooting/sidekiq_stats_no_longer_showing.md)
 
### Replication fails
 
Loading
Loading
ALERT sidekiq_stats_are_not_scraped
IF absent(sidekiq_queue_size)
FOR 1m
LABELS {severity="warning"}
ANNOTATIONS {
title="Sidekiq stats failed to be scraped for the last minute",
runbook="troubleshooting/sidekiq_stats_no_longer_showing.md"
}
Sidekiq stats are collected by [gitlab-monitor](https://gitlab.com/gitlab-org/gitlab-monitor/blob/fdad76bdff3698111744c4bfbc129c57d99355b7/lib/gitlab_monitor/sidekiq.rb) by talking to Redis, and scraped by Prometheus.
If you see no stats in the [Sidekiq dashboard](http://performance.gitlab.net/dashboard/db/sidekiq-stats) then something could be wrong with these three components.
## Symptoms
* A warning message in prometheus-alerts
* Total Running Jobs or Running Jobs panels are showing flat lines
## 1. Identify the Redis master
On any of the redis nodes run:
```
/opt/gitlab/embedded/bin/redis-cli -p 26379
```
then type this in Redis console:
```
sentinel master gitlab-redis
```
you should see output like this:
```
1) "name"
2) "gitlab-redis"
3) "ip"
4) "10.90.80.70"
5) "port"
6) "6379"
```
the master is the node with private IP of `10.90.80.70`, to get the actual node run the following on your machine:
```
knife ssh -aipaddress "role:gitlab-cluster-redis*" "ifconfig | grep '10.45.2.6'"
```
the first column of the output is IP you should ssh to.
## 2. Verify gitlab-monitor service is running
On the master node, run:
```
sudo sv status gitlab-monitor
```
which ideally should return something like this:
```
run: gitlab-monitor: (pid 1271) 19889s; run: log: (pid 1267) 19889s
```
if not, run:
```
sudo sv start gitlab-monitor
```
## 3. Verify gitlab-monitor is collecting metrics
On the master node, run:
```
curl http://localhost:4567/sidekiq
```
it should return something like:
```
sidekiq_queue_size{name="system_hook"} 0
sidekiq_queue_size{name="update_merge_requests"} 0
sidekiq_queue_latency{name="admin_emails"} 0
sidekiq_queue_latency{name="archive_repo"} 0
<snip>
```
If it returned some Ruby errors, open an issue in gitlab-monitor project.
## 4. Verfiy Prometheus is scraping the master node
Login to the prometheus node, run:
```
less /opt/prometheus/prometheus/inventory/gitlab-monitor-redis.yml
```
it should have multiple entries for the redis nodes we have, make sure it got an entry
for with the private IP you obtained from step 1. If not, then make sure the `prometheus-server` Chef role
is configured to scrape both roles of `gitlab-cluster-redis-master` and `gitlab-cluster-redis-slave` (more on that in
[gitlab-prometheus README](https://gitlab.com/gitlab-cookbooks/gitlab-prometheus)), then run:
```
sudo chef-client
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment