WIP: add backup monitoring
Merge request reports
Activity
1 1 2 module GitLab 2 3 module Monitor 3 4 4 5 # Probes a process for info then writes metrics to a target 5 6 class BackupProber 6 7 8 Aws.config.update({ 9 region: 'eu-central-1', 10 credentials: Aws::Credentials.new('AKIAI56UH4XKC4TMFL4A', 'DoV6g31mU11LAokd6yEyj2ta/6ujOhj/hObqL3TZ') 11 }) Pull this out please @maratkalibek
Read a yaml file or use environmental vars.
8 14 @metrics = metrics 15 @wal_bucket = options[:wal_bucket] 16 @opts = options 9 17 end 10 18 11 def probe_backup_size 12 @metrics.add("gitlab_backup_size", 124) 13 self 14 end 19 # Assumed that with the specified prefixes, there is json files in directory with the first 57 chars as the corresponding backup directory name 20 def probe_backup_info 21 region = @opts[:backup_region] 22 key = @opts[:backup_key] 23 password = @opts[:backup_password] 24 bucket = @opts[:backup_bucket] 25 prefix = @opts[:backup_prefix] @maratkalibek Make this a configuration object?
Use .fetch instead of [] - .fetch will raise an exception when you try to access the value. [] will return nil silently.
@maratkalibek You could extract this vars to the initializer, by loading them here you failing way too late.
12 @metrics.add("gitlab_backup_size", 124) 13 self 14 end 19 # Assumed that with the specified prefixes, there is json files in directory with the first 57 chars as the corresponding backup directory name 20 def probe_backup_info 21 region = @opts[:backup_region] 22 key = @opts[:backup_key] 23 password = @opts[:backup_password] 24 bucket = @opts[:backup_bucket] 25 prefix = @opts[:backup_prefix] 26 27 credentials = Aws::Credentials.new(key, password) 28 s3 = Aws::S3::Resource.new(region:region, credentials: credentials) 29 30 # reference an existing bucket by name 31 bucket = s3.bucket(bucket) @maratkalibek don't reuse vars, you are changing the meaning of it in the middle of the method.
30 # reference an existing bucket by name 31 bucket = s3.bucket(bucket) 32 33 obj = bucket.objects(prefix: prefix) 34 35 # enumerate every object in a bucket 36 last_backup_dir = nil 37 last_backup_time = nil 38 dir_name = nil 39 bucket.objects(prefix: prefix).each do |obj| 40 if obj.key.end_with?('.json') 41 if last_backup_time == nil || last_backup_time < obj.last_modified 42 last_backup_time = obj.last_modified 43 dir_name = "#{obj.key[0..57]}/" 44 end 45 end More ruby-like - use objects
Backup < Struct.new(:dir, :time) do def <=>(other) time <=> other.time end end ... next unless obj.key.end_with?('.json') possible_backup = Backup.new("#{obj.key[0..57]}/", obj.last_modified) last_backup ||= possible_backup last_backup = possible_backup if last_backup < possible_backup
@maratkalibek
probe_backup_info
andprobe_wal_info
are the same thing with small differences. Extract what is exactly the same so you can only implement what's different. It looks like pure noise.@pcarranza I can make them mostly similar. But actually, they are working in different ways.
- WAL - For one I just get last modified date and total size in bucket directory.
- Backup - For the second, first of all I look to last modified date of json file (not in needed directory). Only then I calculate only size of calculated directory. But I can reuse here method from the wal version for this.
They are time consuming operations, so tried not to do time consuming and extra calculations in one method. Only needed info.
66 s3 = Aws::S3::Resource.new(region: region, credentials: credentials) 67 68 # reference an existing bucket by name 69 bucket = s3.bucket(bucket) 70 71 obj = bucket.objects(prefix: prefix) 72 73 # enumerate every object in a bucket 74 size = 0 75 last_backup_time = nil 76 bucket.objects(prefix: prefix).each do |obj| 77 if last_backup_time == nil || last_backup_time < obj.last_modified 78 last_backup_time = obj.last_modified 79 end 80 size += obj.content_length 81 end @maratkalibek Connascense of algorithm => you are implementing the same algorithm twice.
50 size += entry.content_length 51 end 52 53 @metrics.add('gitlab_backup_size', size) 54 @metrics.add('gitlab_backup_time', last_backup_time.to_i) 19 55 end 20 56 21 def write_to(target) 22 target.write(@metrics.to_s) 57 # Assumed that with the specified prefixes, there is json files in directory with the first 57 chars as the corresponding backup directory name 58 def probe_wal_info 59 region = @opts[:wal_region] 60 key = @opts[:wal_key] 61 password = @opts[:wal_password] 62 bucket = @opts[:wal_bucket] 63 prefix = @opts[:wal_prefix] @maratkalibek This is being implemented already somewhere else. Extract this to an object.
Think about this this way:
echo $vars |
s/(wal|backup)//
=> this leaves us withregion = @opts[:region] key = @opts[:key] password = @opts[:password] bucket = @opts[:bucket] prefix = @opts[:prefix]
so they are the same, but you are splitting at the var level, and you may need to split at the object level as you suggested.
mentioned in issue gitlab-com/infrastructure#1199
added 2 commits
@pcarranza please review
- lib/gitlab_monitor/backup.rb 0 → 100644
58 end 59 end 60 61 # Probes a backup bucket/path for info then writes metrics to a target 62 class BackupProber < S3Prober 63 def initialize(options, metrics: PrometheusMetrics.new) 64 @metrics = metrics 65 66 @region = options[:aws_region] 67 @key = options[:aws_key] 68 @secret = options[:aws_secret] 69 @bucket_name = options[:bucket] 70 @prefix = options[:prefix] 71 72 @s3 = obtain_s3(@key, @secret, @region) 73 end @maratkalibek Both initializers are the same, also, please use composition instead of inheritance whenever possible. S3 is a concept on it's own. So just have an S3 builder or something like that to which you send the options and it returns the S3 client object.
- lib/gitlab_monitor/backup.rb 0 → 100644
44 45 wal = Wal.new(0) 46 size = 0 47 48 bucket = @s3.bucket(@bucket_name) 49 bucket.objects(prefix: @prefix).each do |obj| 50 size += obj.content_length 51 current_wal = Wal.new(obj.last_modified.to_i) 52 next unless wal < current_wal 53 wal = current_wal 54 end 55 56 @metrics.add("gitlab_wal_size", size) 57 @metrics.add("gitlab_wal_time", wal.time) 58 end 59 end @maratkalibek isn't this approach going to be taking a lot of time as it cycles through all the files? shouldn't we be doing the cloudwatch option?
- lib/gitlab_monitor/backup.rb 0 → 100644
39 end 40 41 def probe_wal_info 42 # current_backup = Backup.new(obj.last_modified.to_i, "#{obj.key[0..57]}/") 43 # next unless backup < current_backup 44 45 wal = Wal.new(0) 46 size = 0 47 48 bucket = @s3.bucket(@bucket_name) 49 bucket.objects(prefix: @prefix).each do |obj| 50 size += obj.content_length 51 current_wal = Wal.new(obj.last_modified.to_i) 52 next unless wal < current_wal 53 wal = current_wal 54 end @maratkalibek This looks way more complicated than it should be.
You just need something that accumulates and keeps the last modified date, please isolate all this logic in the WAL object as it is not the responsibility of the prober to do it all here.
- spec/backup_spec.rb 0 → 100644
1 require "spec_helper" 2 require "gitlab_monitor/backup" 3 require "aws-sdk" 4 5 describe GitLab::Monitor do 6 describe GitLab::Monitor::BackupProber do 7 let(:writer) { StringIO.new } 8 9 let(:options) { { aws_key: "key", aws_secret: "secret", aws_region: "region", bucket: "bucket", prefix: "prefix" } } 10 let(:s3) { double(Aws::S3::Client.new) } 11 let(:bucket) { double(Aws::S3::Bucket) } 12 let(:objects) { double(Aws::Resources::Collection) } 13 let(:files) { double(Aws::Resources::Collection) } 14 let(:object) { double(Aws::S3::ObjectSummary) } 15 let(:file1) { double(Aws::S3::ObjectSummary) } 16 let(:file2) { double(Aws::S3::ObjectSummary) } @maratkalibek This is a lot of doubles and mocking, can we inject dependencies instead of stubbing class methods?
- spec/backup_spec.rb 0 → 100644
63 allow(file1).to receive(:content_length).and_return(100) 64 allow(file2).to receive(:last_modified).and_return(Time.at(123_123_122)) 65 allow(file2).to receive(:content_length).and_return(200) 66 end 67 68 it "backup prober responds with valid data" do 69 prober = GitLab::Monitor::WalProber.new(options) 70 prober.probe_wal_info 71 72 prober.write_to(writer) 73 74 expect(writer.string).to match(/gitlab_wal_size 300/) 75 expect(writer.string).to match(/gitlab_wal_time 123123123/) 76 end 77 end 78 end @maratkalibek tests are failing and old conversations are not resolved. Please fix all that before further reviewing.
@maratkalibek why was this closed?
@ernstvn this one was closed because we tried a different approach, but it's not responsive enough.
@maratkalibek could you point at the right MR ?
yes, as @pcarranza mentioned. I installed cloudwatch, but it was not ok for us, because it return metrics for us only once in 24h.
Even this method currently I'm trying to solve here is not optimized for WAL backups, since it iterates among 80-90k files to calculate their total size, which results in 5-6 minute run. And it is for data less than one month. And size and time needed for WAL backups will linearly increase with the time.