Runit too aggressive to restart prometheus
While deploying a new prometheus setting chef-client just reported an error while restarting it:
- execute the ruby block wait for prometheus service socket
================================================================================
Error executing action `run` on resource 'ruby_block[restart_service]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/sv restart /etc/service/prometheus ----
STDOUT: timeout: run: /etc/service/prometheus: (pid 47336) 65347s, got TERM
STDERR:
---- End output of /usr/bin/sv restart /etc/service/prometheus ----
Ran /usr/bin/sv restart /etc/service/prometheus returned 1
Cookbook Trace:
---------------
/var/chef/cache/cookbooks/runit/libraries/helpers.rb:162:in `safe_sv_shellout'
/var/chef/cache/cookbooks/runit/libraries/helpers.rb:194:in `restart_service'
/var/chef/cache/cookbooks/runit/libraries/provider_runit_service.rb:56:in `block (3 levels) in <class:RunitService>'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:78:in `run_action'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:136:in `run_delayed_notification'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:124:in `block in run_delayed_notifications'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:123:in `each'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:123:in `run_delayed_notifications'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:113:in `converge'
/var/chef/cache/cookbooks/runit/libraries/provider_runit_service.rb:282:in `block in <class:RunitService>'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:78:in `run_action'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:106:in `block (2 levels) in converge'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:106:in `each'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:106:in `block in converge'
/var/chef/cache/cookbooks/compat_resource/files/lib/chef_compat/monkeypatches/chef/runner.rb:105:in `converge'
This probably happened because prometheus took a few seconds to free up all its allocated memory. The service actually got restarted and it's running just fine with the new setting.
We should increase the timeout a little bit for prometheus.