Skip to content

Set timeouts for network.client

Tomasz Maczukin requested to merge fix/network-client-timeout-settings into master

What does this MR do?

Add timeout definitions for network.client's transport.

Why was this MR needed?

In some edge cases jobs could hang at the finished state:

ci-jobs-hanging-on-finished

After some digging we've found, that this goroutine looks to be responsible for such hanging jobs:

64 @ 0x42d88a 0x43d025 0x43bdbc 0x63f75b 0x635b6a 0x5ebe2f 0x5eb842 0x5eb9e2 0x69b2ff 0x69b6c7 0x69d55f 0x6a1933 0x6a67e4 0x6a7852 0x45e8c1
#    0x63f75a    net/http.(*persistConn).roundTrip+0x93a                                /usr/local/go/src/net/http/transport.go:1840
#    0x635b69    net/http.(*Transport).RoundTrip+0x4f9                                /usr/local/go/src/net/http/transport.go:380
#    0x5ebe2e    net/http.send+0x15e                                        /usr/local/go/src/net/http/client.go:256
#    0x5eb841    net/http.(*Client).send+0x101                                    /usr/local/go/src/net/http/client.go:146
#    0x5eb9e1    net/http.(*Client).Do+0xa1                                    /usr/local/go/src/net/http/client.go:189
#    0x69b2fe    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*client).doBackoffRequest+0x4e        /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/client.go:222
#    0x69b6c6    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*client).do+0x1b6                /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/client.go:261
#    0x69d55e    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*GitLabClient).doRaw+0xde            /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/gitlab.go:86
#    0x6a1932    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*GitLabClient).PatchTrace+0x492        /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/gitlab.go:294
#    0x6a67e3    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*clientJobTrace).incrementalUpdate+0x323    /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/trace.go:206
#    0x6a7851    gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*clientJobTrace).watch+0xd1        /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/trace.go:278

The process hangs on https://github.com/golang/go/blob/release-branch.go1.7/src/net/http/transport.go#L1840 while transport is defined as https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/v9.3.0/network/client.go#L154:

n.Transport = &http.Transport{
    Proxy: http.ProxyFromEnvironment,
    Dial: func(network, addr string) (net.Conn, error) {
        logrus.Debugln("Dialing:", network, addr, "...")
        return dialer.Dial(network, addr)
    },
    TLSHandshakeTimeout: 10 * time.Second,
    TLSClientConfig:     &tlsConfig,
}

Setting missing timeouts for Transport should resolve the problem.

Are there points in the code the reviewer needs to double check?

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Tests
    • Added for this feature/bug
    • All builds are passing
  • Branch has no merge conflicts with master (if you do - rebase it please)

What are the relevant issue numbers?

Merge request reports