I think @45cali tried this approach and seemed to work, is this the right way of doing it?, i.e we are not making more complex to debug the code with this change?.
Thanks!
I'm just lost, I could not find the function that sets the s.Config variable on the executors.
My guesses are: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/master/common/build.go
func (b *Build) Run(globalConfig *Config, trace JobTrace) (err error) {
var executor Executor
logger := NewBuildLogger(trace, b.Log())
logger.Println(fmt.Sprintf("Running with %s\n on %s (%s)", AppVersion.Line(), b.Runner.Name, b.Runner.ShortDescription()))
b.CurrentState = BuildRunStatePending
defer func() {
if _, ok := err.(*BuildError); ok {
logger.SoftErrorln("Job failed:", err)
trace.Fail(err)
} else if err != nil {
logger.Errorln("Job failed (system failure):", err)
trace.Fail(err)
} else {
logger.Infoln("Job succeeded")
trace.Success()
}
if executor != nil {
executor.Cleanup()
}
}()
context, cancel := context.WithTimeout(context.Background(), b.GetBuildTimeout())
defer cancel()
trace.SetCancelFunc(cancel)
options := ExecutorPrepareOptions{
Config: b.Runner,
Build: b,
Trace: trace,
User: globalConfig.User,
Context: context,
}
provider := GetExecutor(b.Runner.Executor)
if provider == nil {
return errors.New("executor not found")
}
executor, err = b.retryCreateExecutor(options, provider, logger)
if err == nil {
err = b.run(context, executor)
}
if executor != nil {
executor.Finish(err)
}
return err
}
and we should change to this
copy_config := b.Runner
options := ExecutorPrepareOptions{
Config: copy_config,
Build: b,
Trace: trace,
User: globalConfig.User,
Context: context,
}
Any Ideas?
Yes, I will probably need to deep dive in the code to see what's the method that kicks the job, couldn't find it the other day, while I was doing a quick check.
Maybe @ayufan or @bikebilly could give us some insight?
@ayufan it's from another CI/CD job that ran before, is strange, seems that the options are somehow cached.
Fixed.
Fixed.
@ayufan done.
Fixed
username-removed-1261797 (e174fccd) at 19 May 20:46
adding runner auto-deregister/prometheus monitoring/support for all...
username-removed-1261797 (3a30e2ba) at 19 May 18:56
adding runner auto-deregister/prometheus monitoring/support for all...
Fixed.
@ayufan isn't better to add the auto-deregister option to the runner?
Resolving this as there is no issue.
username-removed-1261797 (ce2749c8) at 18 May 17:53
adding runner auto-deregister/prometheus monitoring/support for all...
Fixed.
@ayufan signal is never trapped, I spent a few hours debugging it, never got to work, maybe I was doing something wrong?. I remember the main problem was setting the command to [ "/bin/bash","/tmp/scripts/entrypoint"] and the fact the runner entry-point uses dumb init so the signal got lost on the first command, that is fixed but I never got it to work without the above lines, do you have any suggestion that I can try?
username-removed-1261797 (4478ba15) at 18 May 16:35
adding runner auto-deregister/prometheus monitoring/support for all...
username-removed-1261797 (361b0b47) at 17 May 22:18
adding runner auto-deregister/prometheus monitoring/support for all...
Fixed.
we use the kubernetes service-endpoints to discover the pods, that are behind the service, I think is the recommended approach of doing it.