Commits · 9812e5dd7c52e67b22781a440ee04dbb2a086000 · gpt / large_projects / gitlabhq1

Aug 01, 2018

Add repository languages for projects · 79a5d768

Zeger-Jan van de Weg authored 6 years ago

Our friends at GitHub show the programming languages for a long time,
and inspired by that this commit means to create about the same
functionality.

Language detection is done through Linguist, as before, where the
difference is that we cache the result in the database. Also, Gitaly can
incrementaly scan a repository. This is done through a shell out, which
creates overhead of about 3s each run. For now this won't be improved.

Scans are triggered by pushed to the default branch, usually `master`.
However, one exception to this rule the charts page. If we're requesting
this expensive data anyway, we just cache it in the database.

Edge cases where there is no repository, or its empty are caught in the
Repository model. This makes use of Redis caching, which is probably
already loaded.

The added model is called RepositoryLanguage, which will make it harder
if/when GitLab supports multiple repositories per project. However, for
now I think this shouldn't be a concern. Also, Language could be
confused with the i18n languages and felt like the current name was
suiteable too.

Design of the Project#Show page is done with help from @dimitrieh. This
change is not visible to the end user unless detections are done.

Unverified

79a5d768

Jul 30, 2018
- Delete todos when users loses target read permissions · 501fb04e
  Jarka Kadlecova authored 6 years ago
  
  501fb04e
Jul 18, 2018
- Delete UserActivities and related workers · c62fce98
  Imre (Admin) authored 6 years ago
  
  Unverified
  
  c62fce98
Jun 24, 2018
- Delete non-latest merge request diff files upon diffs reload · f5ed18e1
  Oswaldo Ferreir authored 6 years ago
  
  f5ed18e1
May 24, 2018

Persist truncated note diffs on a new table · bb8f2520

Oswaldo Ferreir authored 6 years ago

We request Gitaly in a N+1 manner to build discussion diffs. Once the diffs are from different revisions, it's hard to make a single request to the service in order to build the whole response.
With this change we solve this problem and simplify a lot fetching this piece of info.

bb8f2520

May 07, 2018
- Backports every CE related change from ee-5484 to CE · 9a130593
  Tiago Botelho authored 6 years ago
  
  9a130593
Mar 30, 2018

Send emails for issues due tomorrow · 2db218f8

Sean McGivern authored 6 years ago

Also, refactor the mail sending slightly: instead of one worker sending all
emails, create a worker per project with issues due, which will send all emails
for that project.

2db218f8

Mar 26, 2018
- Use cron for sending emails · 9d81d5aa
  Stuart Nelson authored 7 years ago
  
  9d81d5aa
- Add new queue to sidekick_queues · a67f8486
  Stuart Nelson authored 7 years ago
  
  a67f8486
Mar 22, 2018
- Backport ee-40781-os-to-ce · 44f37504
  Micael Bergeron authored 6 years ago
  
  44f37504
Mar 06, 2018
- Integrate two workers into one ArchiveTraceWorker with pipeline_background... · 335bc0fe
  Shinya Maeda authored 7 years ago
  
  Integrate two workers into one ArchiveTraceWorker with pipeline_background queue. This queue takes loqer precedence than pipeline_default.
  335bc0fe
- Add object_storage queue to sidekiq_queues.ym. and correct queue name in all_queues.yml. · d4c9c522
  Shinya Maeda authored 7 years ago
  
  d4c9c522
Mar 01, 2018
- fix the prepare_untracked_uploads_spec from using the EE schema · e43d7d2b
  Micael Bergeron authored 7 years ago
  
  e43d7d2b
- port the object storage to CE · 0f1d348d
  Micael Bergeron authored 7 years ago
  
  0f1d348d
Feb 28, 2018
- Merge branch 'jej/lfs-object-storage' into 'master' · bc760627
  Douwe Maan authored 7 years ago
  
  Can migrate LFS objects to S3 style object storage Closes #2841 See merge request !2760
  bc760627
Feb 26, 2018
- Add plugin queue to sidekiq config [ci skip] · 4b998239
  Dmitriy Zaporozhets authored 7 years ago
  
  Signed-off-by: Dmitriy Zaporozhets <dmitriy.zaporozhets@gmail.com>
  4b998239
Feb 23, 2018
- Add DNS verification to Pages custom domains · ee68bd97
  Nick Thomas authored 7 years ago
  
  Verified
  
  ee68bd97
Jan 06, 2018
- Remove check_gcp_project_billing queue from sidekiq · 2885dc06
  Matija Čupić authored 7 years ago
  
  Verified
  
  2885dc06
Dec 16, 2017
- Add CheckGcpProjectBillingWorker to sidekiq queue · 63859419
  Matija Čupić authored 7 years ago
  
  Verified
  
  63859419
Dec 13, 2017
- Remove unused queues · 4a6ba82b
  Douwe Maan authored 7 years ago
  
  4a6ba82b
Dec 12, 2017
- Use a dedicated queue for each worker · b1849ee2
  Douwe Maan authored 7 years ago
  
  b1849ee2
Nov 28, 2017
- BE for automatic pipeline when enabling Auto DevOps · a4a389a0
  Matija Čupić authored 7 years ago
  
  Fix https://gitlab.com/gitlab-org/gitlab-ce/issues/38962
  a4a389a0
Nov 07, 2017

Rewrite the GitHub importer from scratch · 4dfe26cd

Yorick Peterse authored 7 years ago

Prior to this MR there were two GitHub related importers:

* Github::Import: the main importer used for GitHub projects
* Gitlab::GithubImport: importer that's somewhat confusingly used for
  importing Gitea projects (apparently they have a compatible API)

This MR renames the Gitea importer to Gitlab::LegacyGithubImport and
introduces a new GitHub importer in the Gitlab::GithubImport namespace.
This new GitHub importer uses Sidekiq for importing multiple resources
in parallel, though it also has the ability to import data sequentially
should this be necessary.

The new code is spread across the following directories:

* lib/gitlab/github_import: this directory contains most of the importer
  code such as the classes used for importing resources.
* app/workers/gitlab/github_import: this directory contains the Sidekiq
  workers, most of which simply use the code from the directory above.
* app/workers/concerns/gitlab/github_import: this directory provides a
  few modules that are included in every GitHub importer worker.

== Stages

The import work is divided into separate stages, with each stage
importing a specific set of data. Stages will schedule the work that
needs to be performed, followed by scheduling a job for the
"AdvanceStageWorker" worker. This worker will periodically check if all
work is completed and schedule the next stage if this is the case. If
work is not yet completed this worker will reschedule itself.

Using this approach we don't have to block threads by calling `sleep()`,
as doing so for large projects could block the thread from doing any
work for many hours.

== Retrying Work

Workers will reschedule themselves whenever necessary. For example,
hitting the GitHub API's rate limit will result in jobs rescheduling
themselves. These jobs are not processed until the rate limit has been
reset.

== User Lookups

Part of the importing process involves looking up user details in the
GitHub API so we can map them to GitLab users. The old importer used
an in-memory cache, but this obviously doesn't work when the work is
spread across different threads.

The new importer uses a Redis cache and makes sure we only perform
API/database calls if absolutely necessary.  Frequently used keys are
refreshed, and lookup misses are also cached; removing the need for
performing API/database calls if we know we don't have the data we're
looking for.

== Performance & Models

The new importer in various places uses raw INSERT statements (as
generated by `Gitlab::Database.bulk_insert`) instead of using Rails
models. This allows us to bypass any validations and callbacks,
drastically reducing the number of SQL queries and Gitaly RPC calls
necessary to import projects.

To ensure the code produces valid data the corresponding tests check if
the produced rows are valid according to the model validation rules.

Verified

4dfe26cd

Oct 03, 2017
- Specify defaults, fix policies, fix db columns · c6d53250
  Kamil Trzcińśki authored 7 years ago
  
  c6d53250
- Introduce manage_cluster queue for sidekiq workers · 1ce09b6e
  Shinya Maeda authored 7 years ago
  
  1ce09b6e
Sep 30, 2017
- Replace reactive_cache by multipel sidekiq workers · e499c1c3
  Shinya Maeda authored 7 years ago
  
  e499c1c3
Sep 28, 2017
- Add support to migrate existing projects to Hashed Storage async · f4de14d7
  Gabriel Mazetto authored 7 years ago and Nick Thomas committed 7 years ago
  
  Verified
  
  f4de14d7
Sep 20, 2017

Stop using Sidekiq for updating Key#last_used_at · b3566a01

Yorick Peterse authored 7 years ago

This makes things simpler as no scheduling is involved. Further we
remove the need for running a SELECT + UPDATE just to get the key and
update it, whereas we only need an UPDATE when setting last_used_at
directly in a request.

The added service class takes care of updating Key#last_used_at without
using Sidekiq. Further it makes sure we only try to obtain a Redis lease
if we're confident that we actually need to do so, instead of always
obtaining it. We also make sure to _only_ update last_used_at instead of
also updating updated_at.

Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/36663

Verified

b3566a01

Aug 21, 2017
- Adjust sidekiq queues weights in queues config file · 82056644
  Grzegorz Bizon authored 7 years ago
  
  82056644
- Assign some CI/CD workers to pipeline default queue · ad12ee2a
  Grzegorz Bizon authored 7 years ago
  
  ad12ee2a
- Assign all pipeline workers to specific queues · 84175072
  Grzegorz Bizon authored 7 years ago
  
  84175072
- Simplify pipeline sidekiq queues naming scheme · 48776f27
  Grzegorz Bizon authored 7 years ago
  
  48776f27
- Make it possible to check if worker uses a known queue · ce274fd6
  Grzegorz Bizon authored 7 years ago
  
  ce274fd6
Aug 07, 2017
- Move some after_create parts to worker to improve performance · 9ef3c431
  Jarka Kadlecova authored 7 years ago
  
  9ef3c431
Jul 27, 2017
- generate gpg signature on push · e63b693f
  Alexis Reigel authored 7 years ago
  
  e63b693f
- perform signature update in sidekiq worker · 9816856d
  Alexis Reigel authored 7 years ago
  
  9816856d
Jun 12, 2017

Add the ability to perform background migrations · d83ee2bb

Yorick Peterse authored 7 years ago

Background migrations can be used to perform long running data
migrations without these blocking a deployment procedure.

See MR https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/11854 for
more information.

Unverified

d83ee2bb

May 25, 2017

Implement web hooks logging · 330789c2

Alexander Randa authored 7 years ago

* implemented logging of project and system web hooks
* implemented UI for user area (project hooks)
* implemented UI for admin area (system hooks)
* implemented retry of logged webhook
* NOT imeplemented log remover

330789c2

May 10, 2017

Use worker to destroy namespaceless projects in post-deploy · 0ad80cab

Toon Claes authored 7 years ago

Destroying projects can be very time consuming. So instead of destroying them in
the post-deploy, just schedule them and make Sidekiq do the hard work.

They are scheduled in batches of 5000 records. This way the number of database
requests is limited while also the amount data read to memory is limited.

0ad80cab

May 05, 2017
- refactor code based on feedback · 6ecf16b8
  James Lopez authored 7 years ago
  
  6ecf16b8

Admin message

Admin message