Skip to content
  • Developer

    test

  • Author Maintainer

    Benchmarks

    Ran benchmarks today. I don't remember it being this slow when I first ran it, perhaps because it was so much faster than previous approaches.

    2.5m issues in 10m45s

    pry(main)> Benchmark.bm{|b| b.report('bulk_insert:'){Issue.bulk_insert{|w|([project_id: p.id, author_id: author.id, title: FFaker::Lorem.sentence]* 2500000).each{|i| w.add(i)}};nil} };nil
           user     system      total        real
    bulk_insert:
    312.370000 176.510000 488.880000 (645.096816)

    1.5m projects in 41m21s

    Benchmark.bm{|b| b.report('bulk_insert:'){Project.bulk_insert{|w| 1500000.times.map{|i|{name: n=FFaker::Product.product, path: n.parameterize("_"), creator_id: author.id, namespace_id: author.namespace_id}}.each{|p| w.add(p)}}}};nil
           user     system      total        real
    bulk_insert:
    930.670000 861.590000 1792.260000 (2481.286568)
  • Author Maintainer

    1.5m projects in 30m22s with columns specified

    This approach doesn't use the default values so might mean less data being sent to postgres. This gem just generates INSERT operations with sets of 500 rows, so we could probably do better by in

    [97] pry(main)> Benchmark.bm do |b|
    [97] pry(main)*   b.report('bulk_insert:') do  
    [97] pry(main)*     Project.bulk_insert(:name, :path, :creator_id, :namespace_id){|w| 1500000.times.map{|i|{name: n=FFaker::Product.product, path: n.parameterize("_"), creator_id: author.id, namespace_id: author.namespace_id}}.each{|p| w.add(p)}}    
    [97] pry(main)*   end    
    [97] pry(main)* end  
           user     system      total        real
    bulk_insert:
    502.490000 749.220000 1251.710000 (1822.020671)

    1.5m projects in 4m23s using generate_series

    Much better performance this way, and could extend to use default values similar to https://github.com/jamis/bulk_insert/blob/master/lib/bulk_insert/worker.rb#L38-L44

    [136] pry(main)> Benchmark.bm do |b|
      b.report('raw_insert:') do  
        query = <<-ENDSQL    
        INSERT INTO projects (name, path, creator_id, namespace_id)
        SELECT
          md5(random()::text),
          md5(random()::text),
          #{author.id},
          #{author.namespace_id}
        FROM generate_series(1, 1500000) s(i);
        ENDSQL
        puts ActiveRecord::Base.connection.execute(query).values.join("\n");nil    
      end    
    end  
           user     system      total        real
    raw_insert:
      0.000000   0.010000   0.010000 (263.568029)
  • Author Maintainer

    1.5m projects in 5m33s using generate_series and defaults

    Using the active_record-pg_generate_series gem to include default values for each column. Without the default values the UI will be broken, as values like created_at are used throughout.

    [6] pry(main)> Benchmark.bm do |b|
      b.report('generate_series:') do
        Project.insert_using_generate_series(1, 1500000) do |sql|
          sql.name = raw("md5(random()::text)")
          sql.path = raw("md5(random()::text)")
          sql.creator_id = author.id
          sql.namespace_id = author.namespace_id
        end
      end
    end
           user     system      total        real
    generate_series:  0.000000   0.000000   0.000000 (332.865800)

    1.5m projects in 2m7s by replacing md5 with number sequence

    This version saves time by naming projects seed_project_1500000 instead of calculating multiple md5 hashes. It still doesn't create associated tables like project_features, so the projects aren't very useful, but is a good staring point for slowing things down.

    [40] pry(main)> Benchmark.bm do |b|
      b.report('generate_series:') do
        Project.insert_using_generate_series(1, 1500000) do |sql|
          project_name = raw("'seed_project_' || seq")
          sql.name = project_name
          sql.path = project_name
          sql.creator_id = author.id
          sql.namespace_id = author.namespace_id
        end
      end
    end
           user     system      total        real
    generate_series:  0.000000   0.000000   0.000000 (127.377769)
    Edited by username-removed-128633
  • Developer

    Rake task used for testing Related Issues feature:

    desc "Bulk insert"
    task bulk_insert: :environment do
      # require 'bulk_insert' or manually paste code from Gem first
      # After doing the following my queries ran at 1/10th of the speed of staging, instead of 1/5000th of the speed.
    
      # Disable database insertion logs so speed isn't limited by ability to print to console
      old_logger = ActiveRecord::Base.logger
      ActiveRecord::Base.logger = nil
    
      # Create lots of issues
      user = FactoryGirl.create(:user, username: "user-#{rand}", email: "foo-#{rand}@example.com")
      namespace = FactoryGirl.create(:namespace, owner: user, route: nil, name: "hi-#{rand}")
      p = FactoryGirl.create(:empty_project, creator: user, namespace: namespace)
      Issue.bulk_insert{|w|([project_id: p.id, author_id: user.id, title: FFaker::Lorem.sentence]* 2500000).each{|i| w.add(i)}};nil
    
      # Create lots of projects
      Project.bulk_insert{|w| 1500000.times.map{|i|{name: n=FFaker::Product.product, path: n.parameterize("_"), creator_id: user.id, namespace_id: user.namespace_id}}.each{|p| w.add(p)}}
    
      # Force a different/slower query plan by updating project visibility
      Project.where(visibility_level: Gitlab::VisibilityLevel::PRIVATE).limit(200000).update_all(visibility_level: Gitlab::VisibilityLevel::PUBLIC)
      Project.where(visibility_level: Gitlab::VisibilityLevel::PRIVATE).limit(20000).update_all(visibility_level: Gitlab::VisibilityLevel::INTERNAL)
    
      IssueLink.bulk_insert do |w| 
        count = Issue.count
        first_issues = Issue.all.take(100_000)
        first_issues.each_with_index.each do |issue, i|
          relations = ((i+1)..(i+6)).to_a.map do |x|
            target_issue = first_issues[x]
    
            if target_issue
              { source_id: issue.id, target_id: target_issue.id }
            end
          end.compact
    
          relations.each { |i| w.add(i) }
        end
      end
    
      # Reset logging
      ActiveRecord::Base.logger = old_logger
    end
  • Developer

    Great to know @rymai, thanks!

0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment