README.md 12.9 KB
Newer Older
Jacob Vosmaer's avatar
Jacob Vosmaer committed
1
# gitlab-workhorse
Jacob Vosmaer's avatar
Jacob Vosmaer committed
2

3
4
5
Gitlab-workhorse is a smart reverse proxy for GitLab. It handles
"large" HTTP requests such as file downloads, file uploads, Git
push/pull and Git archive downloads.
Jacob Vosmaer's avatar
Jacob Vosmaer committed
6

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
## Features that rely on Workhorse

Workhorse itself is not a feature, but there are several features in
GitLab that would not work efficiently without Workhorse.

To put the efficiency benefit in context, consider that in 2020Q3 on GitLab.com [we see][thanos] Rails application threads using on average about 200MB of RSS vs about 200KB for Workhorse goroutines.

[thanos]: https://thanos-query.ops.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum(ruby_process_resident_memory_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%20%2F%20sum(puma_max_threads%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g0.tab=1&g1.range_input=1h&g1.max_source_resolution=0s&g1.expr=sum(go_memstats_sys_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%2Fsum(go_goroutines%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g1.tab=1

Examples of features that rely on Workhorse:

### 1. `git clone` and `git push` over HTTP

Git clone, pull and push are slow because they transfer large amounts
of data and because each is CPU intensive on the GitLab side. Without
workhorse, HTTP access to Git repositories would compete with regular
web access to the application, requiring us to run way more Rails
application servers.

### 2. CI runner long polling

GitLab CI runners fetch new CI jobs by polling the GitLab server.
Workhorse acts as a kind of "waiting room" where CI runners can sit
and wait for new CI jobs. Because of Go's efficiency we can fit a lot
of runners in the waiting room at little cost. Without this waiting
room mechanism we would have to add a lot more Rails server capacity.

### 3. File uploads and downloads

File uploads and downloads may be slow either because the file is
large or because the user's connection is slow. Workhorse can handle
the slow part for Rails. This improves the efficiency of features such
as CI artifacts, package repositories, LFS objects, etc.

### 4. Websocket proxying

Features such as the web terminal require a long lived connection
between the user's web browser and a container inside GitLab that is
not directly accessible from the internet. Dedicating a Rails
application thread to proxying such a connection would cost much more
memory than it costs to have Workhorse look after it.

49
50
51
## Quick facts (how does Workhorse work)

-   Workhorse can handle some requests without involving Rails at all:
Ben Bodenmiller's avatar
Ben Bodenmiller committed
52
    for example, JavaScript files and CSS files are served straight
53
54
55
56
57
58
59
60
61
62
    from disk.
-   Workhorse can modify responses sent by Rails: for example if you use
    `send_file` in Rails then gitlab-workhorse will open the file on
    disk and send its contents as the response body to the client.
-   Workhorse can take over requests after asking permission from Rails.
    Example: handling `git clone`.
-   Workhorse can modify requests before passing them to Rails. Example:
    when handling a Git LFS upload Workhorse first asks permission from
    Rails, then it stores the request body in a tempfile, then it sends
    a modified request containing the tempfile path to Rails.
63
64
-   Workhorse can manage long-lived WebSocket connections for Rails.
    Example: handling the terminal websocket for environments.
65
-   Workhorse does not connect to Postgres, only to Rails and (optionally) Redis.
66
67
68
69
70
71
-   We assume that all requests that reach Workhorse pass through an
    upstream proxy such as NGINX or Apache first.
-   Workhorse does not accept HTTPS connections.
-   Workhorse does not clean up idle client connections.
-   We assume that all requests to Rails pass through Workhorse.

72
73
For more information see ['A brief history of
gitlab-workhorse'][brief-history-blog].
Jacob Vosmaer's avatar
Jacob Vosmaer committed
74

75
76
77
78
79
80
## Configuring Workhorse

For historical reasons Workhorse uses both command line flags, a configuration file and environment variables.

All new configuration options that get added to Workhorse should go into the configuration file.

81
82
83
## Usage

```
Jacob Vosmaer's avatar
Jacob Vosmaer committed
84
  gitlab-workhorse [OPTIONS]
85
86

Options:
87
  -apiCiLongPollingDuration duration
88
      Long polling duration for job requesting for runners (default 50s - enabled) (default 50ns)
Kamil Trzcinski's avatar
Kamil Trzcinski committed
89
  -apiLimit uint
90
      Number of API requests allowed at single time
Kamil Trzcinski's avatar
Kamil Trzcinski committed
91
  -apiQueueDuration duration
92
      Maximum queueing duration of requests (default 30s)
Kamil Trzcinski's avatar
Kamil Trzcinski committed
93
  -apiQueueLimit uint
94
      Number of API requests allowed to be queued
95
  -authBackend string
96
      Authentication/authorization backend (default "http://localhost:8080")
97
  -authSocket string
98
      Optional: Unix domain socket to dial authBackend at
Heinrich Lee Yu's avatar
Heinrich Lee Yu committed
99
100
101
102
  -cableBackend string
      Optional: ActionCable backend (default authBackend)
  -cableSocket string
      Optional: Unix domain socket to dial cableBackend at (default authSocket)
103
104
  -config string
      TOML file to load config from
105
  -developmentMode
106
      Allow the assets to be served from Rails app
107
  -documentRoot string
108
      Path to static files content (default "public")
109
  -listenAddr string
110
      Listen address for HTTP server (default "localhost:8181")
111
  -listenNetwork string
112
      Listen 'network' (tcp, tcp4, tcp6, unix) (default "tcp")
113
  -listenUmask int
114
115
116
117
118
      Umask for Unix socket
  -logFile string
      Log file location
  -logFormat string
      Log format to use defaults to text (text, json, structured, none) (default "text")
119
  -pprofListenAddr string
120
121
122
      pprof listening address, e.g. 'localhost:6060'
  -prometheusListenAddr string
      Prometheus listening address, e.g. 'localhost:9229'
123
  -proxyHeadersTimeout duration
124
      How long to wait for response headers when proxying the request (default 5m0s)
125
  -secretPath string
126
      File with secret key to authenticate with authBackend (default "./.gitlab_workhorse_secret")
127
  -version
128
      Print version and exit
129
130
```

Marco Vito Moscaritolo's avatar
Marco Vito Moscaritolo committed
131
The 'auth backend' refers to the GitLab Rails application. The name is
Jacob Vosmaer's avatar
Jacob Vosmaer committed
132
133
a holdover from when gitlab-workhorse only handled Git push/pull over
HTTP.
134

135
Gitlab-workhorse can listen on either a TCP or a Unix domain socket. It
136
137
138
can also open a second listening TCP listening socket with the Go
[net/http/pprof profiler server](http://golang.org/pkg/net/http/pprof/).

139
140
Gitlab-workhorse can listen on redis events (currently only builds/register
for runners). This requires you to pass a valid TOML config file via
141
142
`-config` flag.
For regular setups it only requires the following (replacing the string
143
with the actual socket)
144
145
146

### Redis

147
148
149
150
151
152
153
154
155
156
157
158
159
160
Gitlab-workhorse integrates with Redis to do long polling for CI build
requests. This is configured via two things:

-   Redis settings in the TOML config file
-   The `-apiCiLongPollingDuration` command line flag to control polling
    behavior for CI build requests

It is OK to enable Redis in the config file but to leave CI polling
disabled; this just results in an idle Redis pubsub connection. The
opposite is not possible: CI long polling requires a correct Redis
configuration.

Below we discuss the options for the `[redis]` section in the config
file.
161

162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
```
[redis]
URL = "unix:///var/run/gitlab/redis.sock"
Password = "my_awesome_password"
Sentinel = [ "tcp://sentinel1:23456", "tcp://sentinel2:23456" ]
SentinelMaster = "mymaster"
```

- `URL` takes a string in the format `unix://path/to/redis.sock` or
`tcp://host:port`.
- `Password` is only required if your redis instance is password-protected
- `Sentinel` is used if you are using Sentinel.
  *NOTE* that if both `Sentinel` and `URL` are given, only `Sentinel` will be used

Optional fields are as follows:
```
[redis]
179
180
181
DB = 0
ReadTimeout = "1s"
KeepAlivePeriod = "5m"
182
183
184
185
MaxIdle = 1
MaxActive = 1
```

186
187
188
- `DB` is the Database to connect to. Defaults to `0`
- `ReadTimeout` is how long a redis read-command can take. Defaults to `1s`
- `KeepAlivePeriod` is how long the redis connection is to be kept alive without anything flowing through it. Defaults to `5m`
189
190
191
- `MaxIdle` is how many idle connections can be in the redis-pool at once. Defaults to 1
- `MaxActive` is how many connections the pool can keep. Defaults to 1

192
193
194
195
196
197
198
199
200
201
### Relative URL support

If you are mounting GitLab at a relative URL, e.g.
`example.com/gitlab`, then you should also use this relative URL in
the `authBackend` setting:

```
gitlab-workhorse -authBackend http://localhost:8080/gitlab
```

202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
### Interaction of authBackend and authSocket

The interaction between `authBackend` and `authSocket` can be a bit
confusing. It comes down to: if `authSocket` is set it overrides the
_host_ part of `authBackend` but not the relative path.

In table form:

|authBackend|authSocket|Workhorse connects to?|Rails relative URL|
|---|---|---|---|
|unset|unset|`localhost:8080`|`/`|
|`http://localhost:3000`|unset|`localhost:3000`|`/`|
|`http://localhost:3000/gitlab`|unset|`localhost:3000`|`/gitlab`|
|unset|`/path/to/socket`|`/path/to/socket`|`/`|
|`http://localhost:3000`|`/path/to/socket`|`/path/to/socket`|`/`|
|`http://localhost:3000/gitlab`|`/path/to/socket`|`/path/to/socket`|`/gitlab`|

Heinrich Lee Yu's avatar
Heinrich Lee Yu committed
219
220
The same applies to `cableBackend` and `cableSocket`.

Jacob Vosmaer's avatar
Jacob Vosmaer committed
221
222
## Installation

223
To install gitlab-workhorse you need [Go 1.13 or
Jacob Vosmaer (GitLab)'s avatar
Jacob Vosmaer (GitLab) committed
224
225
newer](https://golang.org/dl) and [GNU
Make](https://www.gnu.org/software/make/).
Jacob Vosmaer's avatar
Jacob Vosmaer committed
226

227
228
229
230
231
232
To install into `/usr/local/bin` run `make install`.

```
make install
```

233
234
235
236
237
To install into `/foo/bin` set the PREFIX variable.

```
make install PREFIX=/foo
```
238

239
240
On some operating systems, such as FreeBSD, you may have to use
`gmake` instead of `make`.
Jacob Vosmaer (GitLab)'s avatar
Jacob Vosmaer (GitLab) committed
241

242
243
244
245
246
247
248
249
250
## Dependencies

### Exiftool

Workhorse uses [exiftool](https://www.sno.phy.queensu.ca/~phil/exiftool/) for
removing EXIF data (which may contain sensitive information) from uploaded
images. If you installed GitLab:

-   Using the Omnibus package, you're all set.
251
252
    *NOTE* that if you are using CentOS Minimal, you may need to install `perl`
    package: `yum install perl`
253
254
255
256
257
258
259
260
261
262
-   From source, make sure `exiftool` is installed:

    ```sh
    # Debian/Ubuntu
    sudo apt-get install libimage-exiftool-perl

    # RHEL/CentOS
    sudo yum install perl-Image-ExifTool
    ```

263
264
265
266
## Error tracking

GitLab-Workhorse supports remote error tracking with
[Sentry](https://sentry.io). To enable this feature set the
Ben Bodenmiller's avatar
Ben Bodenmiller committed
267
268
`GITLAB_WORKHORSE_SENTRY_DSN` environment variable.
You can also set the `GITLAB_WORKHORSE_SENTRY_ENVIRONMENT` environment variable to
269
270
use the Sentry environment functionality to separate staging, production and
development.
271

272
273
274
Omnibus (`/etc/gitlab/gitlab.rb`):

```
275
276
277
278
gitlab_workhorse['env'] = {
    'GITLAB_WORKHORSE_SENTRY_DSN' => 'https://foobar'
    'GITLAB_WORKHORSE_SENTRY_ENVIRONMENT' => 'production'
}
279
280
281
282
283
284
```

Source installations (`/etc/default/gitlab`):

```
export GITLAB_WORKHORSE_SENTRY_DSN='https://foobar'
285
export GITLAB_WORKHORSE_SENTRY_ENVIRONMENT='production'
286
287
```

288
289
## Tests

Jacob Vosmaer's avatar
Jacob Vosmaer committed
290
Run the tests with:
Jacob Vosmaer's avatar
Jacob Vosmaer committed
291
292

```
Jacob Vosmaer's avatar
Jacob Vosmaer committed
293
make clean test
Jacob Vosmaer's avatar
Jacob Vosmaer committed
294
295
```

Jacob Vosmaer's avatar
Jacob Vosmaer committed
296
### Coverage / what to test
297

Jacob Vosmaer's avatar
Jacob Vosmaer committed
298
299
300
301
302
Each feature in gitlab-workhorse should have an integration test that
verifies that the feature 'kicks in' on the right requests and leaves
other requests unaffected. It is better to also have package-level tests
for specific behavior but the high-level integration tests should have
the first priority during development.
Jacob Vosmaer's avatar
Jacob Vosmaer committed
303

Jacob Vosmaer's avatar
Jacob Vosmaer committed
304
It is OK if a feature is only covered by integration tests.
Jacob Vosmaer's avatar
Jacob Vosmaer committed
305

306
307
## Distributed Tracing

308
Workhorse supports distributed tracing through [LabKit][] using [OpenTracing APIs](https://opentracing.io).
309

310
By default, no tracing implementation is linked into the binary, but different OpenTracing providers can be linked in using [build tags][build-tags]/[build constraints][build-tags]. This can be done by setting the `BUILD_TAGS` make variable.
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325

For more details of the supported providers, see LabKit, but as an example, for Jaeger tracing support, include the tags: `BUILD_TAGS="tracer_static tracer_static_jaeger"`.

```shell
make BUILD_TAGS="tracer_static tracer_static_jaeger"
```

Once Workhorse is compiled with an opentracing provider, the tracing configuration is configured via the `GITLAB_TRACING` environment variable.

For example:

```shell
GITLAB_TRACING=opentracing://jaeger ./gitlab-workhorse
```

326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
## Continuous Profiling

Workhorse supports continuous profiling through [LabKit][] using [Stackdriver Profiler](https://cloud.google.com/profiler).

By default, the Stackdriver Profiler implementation is linked in the binary using [build tags][build-tags], though it's not
required and can be skipped.

For example:

```shell
make BUILD_TAGS=""
```

Once Workhorse is compiled with Continuous Profiling, the profiler configuration can be set via `GITLAB_CONTINUOUS_PROFILING`
environment variable.

For example:

```shell
GITLAB_CONTINUOUS_PROFILING="stackdriver?service=workhorse&service_version=1.0.1&project_id=test-123 ./gitlab-workhorse"
```

More information about see the [LabKit monitoring docs](https://gitlab.com/gitlab-org/labkit/-/blob/master/monitoring/doc.go).

Jacob Vosmaer's avatar
Jacob Vosmaer committed
350
351
## License

Ben Bodenmiller's avatar
Ben Bodenmiller committed
352
This code is distributed under the MIT license, see the [LICENSE](LICENSE) file.
353
354

[brief-history-blog]: https://about.gitlab.com/2016/04/12/a-brief-history-of-gitlab-workhorse/
355
356
[LabKit]: https://gitlab.com/gitlab-org/labkit/
[build-tags]: https://golang.org/pkg/go/build/#hdr-Build_Constraints