Set timeouts for network.client
What does this MR do?
Add timeout definitions for network.client
's transport.
Why was this MR needed?
In some edge cases jobs could hang at the finished
state:
After some digging we've found, that this goroutine looks to be responsible for such hanging jobs:
64 @ 0x42d88a 0x43d025 0x43bdbc 0x63f75b 0x635b6a 0x5ebe2f 0x5eb842 0x5eb9e2 0x69b2ff 0x69b6c7 0x69d55f 0x6a1933 0x6a67e4 0x6a7852 0x45e8c1
# 0x63f75a net/http.(*persistConn).roundTrip+0x93a /usr/local/go/src/net/http/transport.go:1840
# 0x635b69 net/http.(*Transport).RoundTrip+0x4f9 /usr/local/go/src/net/http/transport.go:380
# 0x5ebe2e net/http.send+0x15e /usr/local/go/src/net/http/client.go:256
# 0x5eb841 net/http.(*Client).send+0x101 /usr/local/go/src/net/http/client.go:146
# 0x5eb9e1 net/http.(*Client).Do+0xa1 /usr/local/go/src/net/http/client.go:189
# 0x69b2fe gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*client).doBackoffRequest+0x4e /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/client.go:222
# 0x69b6c6 gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*client).do+0x1b6 /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/client.go:261
# 0x69d55e gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*GitLabClient).doRaw+0xde /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/gitlab.go:86
# 0x6a1932 gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*GitLabClient).PatchTrace+0x492 /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/gitlab.go:294
# 0x6a67e3 gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*clientJobTrace).incrementalUpdate+0x323 /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/trace.go:206
# 0x6a7851 gitlab.com/gitlab-org/gitlab-ci-multi-runner/network.(*clientJobTrace).watch+0xd1 /go/src/gitlab.com/gitlab-org/gitlab-ci-multi-runner/network/trace.go:278
The process hangs on https://github.com/golang/go/blob/release-branch.go1.7/src/net/http/transport.go#L1840 while transport is defined as https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/v9.3.0/network/client.go#L154:
n.Transport = &http.Transport{
Proxy: http.ProxyFromEnvironment,
Dial: func(network, addr string) (net.Conn, error) {
logrus.Debugln("Dialing:", network, addr, "...")
return dialer.Dial(network, addr)
},
TLSHandshakeTimeout: 10 * time.Second,
TLSClientConfig: &tlsConfig,
}
Setting missing timeouts for Transport should resolve the problem.
Are there points in the code the reviewer needs to double check?
Does this MR meet the acceptance criteria?
-
Documentation created/updated - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Branch has no merge conflicts with master
(if you do - rebase it please)