Skip to content

Avoid bisect command to get stuck

gitlab-qa-bot requested to merge deal_with_stuck_bisect into master

Created by: benoittgt

Fix: https://github.com/rspec/rspec-core/issues/2637


From Palkan in benoittgt/rspec_repro_bisect_deadlock#1

First, I've tried to play with the number of specs which led to the interesting conclusion: the process hangs only at 1548+ specs.

 RSpec.describe "a bunch of nothing" do
   (0...(ENV.fetch('N', 3000).to_i)).each do |t|
     it { expect(t).to eq t }
   end
 end

Try to run with N=1547 and N=1548.

Seems suspicious, right?

Let's add pry-byebug to the equation (or Gemfile).

In order it to work we need to tweak our runner code a bit:

- $stdout = $stderr = @spec_output
+ # $stdout = $stderr = @spec_output

After a bit of puts debugging I localized the problem: @channel.send.

Channel#send calls IO#write here https://github.com/rspec/rspec-core/blob/7b6b9c3/lib/rspec/core/bisect/utilities.rb#L41:

def send(message)
  packet = Marshal.dump(message)
  @write_io.write("#{packet.bytesize}\n#{packet}")
end

Do you know, what is the packet.bytesize for N=1548? It's 65548. This number is very important: the pipe size is only 65536 on MacOS (see docs for IO#write_nonblock for more).

That makes @write_io.write hangs forever, because no one reads the buffer: we call Channel#receive only after Process.waitpid(pid), thus waiting for the write operation to complete.


A basic proposal is to use WNOHANG. From waitpid doc:

If WNOHANG was specified in options and there were no children in a waitable state, then waitid() returns 0 immediately (...)

To validate this proposal on OSX we run just before running bisect: lsof -n -P -r1 -c ruby | grep -e 'PIP' -e '===' -e 'COMMAND'

This will give us in loop the PIPE sizes of Ruby processes. Without our patch we see that quickly we hit 65536 bytes on two pipes, with the patch we keep pipes at the right size.

COMMAND PID    USER     FD   TYPE DEVICE                SIZE/OFF  NODE NAME
ruby    40134  benoit    3   PIPE 0xf3b025a6a6cd6005    16384     ->0xf3b025a6a6cd5045
ruby    40134  benoit    4   PIPE 0xf3b025a6a6cd5045    16384     ->0xf3b025a6a6cd6005
ruby    40134  benoit    5   PIPE 0xf3b025a6a6cd7805    16384     ->0xf3b025a6a6cd7145
ruby    40134  benoit    7   PIPE 0xf3b025a6a6cd7145    16384     ->0xf3b025a6a6cd7805
ruby    40134  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40134  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5
ruby    40144  benoit    3   PIPE 0xf3b025a6a6cd5d05    16384     ->0xf3b025a6a6cd5c45
ruby    40144  benoit    4   PIPE 0xf3b025a6a6cd5c45    16384     ->0xf3b025a6a6cd5d05
ruby    40144  benoit    5   PIPE 0xf3b025a6a6cd7085    16384     ->0xf3b025a6a6cd6785
ruby    40144  benoit    7   PIPE 0xf3b025a6a6cd6785    16384     ->0xf3b025a6a6cd7085
ruby    40144  benoit   10   PIPE 0xf3b025a6a6cd6fc5    16384     ->0xf3b025a6a6cd5a05
ruby    40144  benoit   11   PIPE 0xf3b025a6a6cd5a05    16384     ->0xf3b025a6a6cd6fc5

Improvements: The bisect command request lot's of ram. The next step should be to reduce that consumption.

Thanks to Palkan, julik and Samuel Williams for the help. 🙇🏻‍♂️

Merge request reports