Skip to content
Snippets Groups Projects
Commit 0a8b45f6 authored by Alejandro Rodríguez's avatar Alejandro Rodríguez
Browse files

Add recording and alert rules per Gitaly gRPC method

parent 439f0a1d
No related branches found
No related tags found
No related merge requests found
gitaly:grpc_server_handled_total:error_max_rate1m = max(gitaly:grpc_server_handled_total:error_rate1m)
gitaly:grpc_server_handled_total:instance_error_max_rate1m = max(gitaly:grpc_server_handled_total:instance_error_rate1m)
 
## Gitaly error rate
## Gitaly error rate per method
ALERT gitaly_error_rate_too_high
IF gitaly:grpc_server_handled_total:error_max_rate1m > 5
FOR 5m
LABELS {severity="critical", channel="gitaly"}
ANNOTATIONS {
title="Gitaly error rate is too high: {{$value | printf \"%.2f\" }}",
description="Gitaly error rate for the last 20 minutes is over 5. Check Gitaly logs and consider disabling it.",
description="Gitaly error rate for the last 20 minutes is over 5 for {{$labels.grpc_method}}. Check Gitaly logs and consider disabling that method.",
runbook="troubleshooting/gitaly_error_rate.md"
}
## Gitaly error rate per instance
ALERT gitaly_error_rate_too_high
IF gitaly:grpc_server_handled_total:instance_error_max_rate1m > 5
FOR 5m
LABELS {severity="critical", channel="gitaly"}
ANNOTATIONS {
title="Gitaly error rate is too high: {{$value | printf \"%.2f\" }}",
description="Gitaly error rate for the last 20 minutes is over 5 on {{$labels.instance}}. Check Gitaly logs and consider disabling it on that host.",
runbook="troubleshooting/gitaly_error_rate.md"
}
Loading
Loading
@@ -13,4 +13,5 @@ cmd:redis_command_call_duration_seconds_count:irate1m = sum(irate(redis_command_
 
# GRPC calls handled by Gitaly
gitaly:grpc_server_handled_total:rate1m = sum(rate(grpc_server_handled_total[1m])) by (job, grpc_method)
gitaly:grpc_server_handled_total:error_rate1m = sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) by (job, instance)
gitaly:grpc_server_handled_total:error_rate1m = sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) by (job, grpc_method)
gitaly:grpc_server_handled_total:instance_error_rate1m = sum(rate(grpc_server_handled_total{grpc_code!="OK"}[1m])) by (job, instance)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment