Commits · 79a5d76801a45696db629e1f543f2e1d6fa4784f · gpt / large_projects / gitlabhq1

Aug 01, 2018

Add repository languages for projects · 79a5d768

Zeger-Jan van de Weg authored 6 years ago

Our friends at GitHub show the programming languages for a long time,
and inspired by that this commit means to create about the same
functionality.

Language detection is done through Linguist, as before, where the
difference is that we cache the result in the database. Also, Gitaly can
incrementaly scan a repository. This is done through a shell out, which
creates overhead of about 3s each run. For now this won't be improved.

Scans are triggered by pushed to the default branch, usually `master`.
However, one exception to this rule the charts page. If we're requesting
this expensive data anyway, we just cache it in the database.

Edge cases where there is no repository, or its empty are caught in the
Repository model. This makes use of Redis caching, which is probably
already loaded.

The added model is called RepositoryLanguage, which will make it harder
if/when GitLab supports multiple repositories per project. However, for
now I think this shouldn't be a concern. Also, Language could be
confused with the i18n languages and felt like the current name was
suiteable too.

Design of the Project#Show page is done with help from @dimitrieh. This
change is not visible to the end user unless detections are done.

Unverified

79a5d768

Jul 30, 2018
- Create GPG commit signature in bulk · 9a81550f
  Francisco Javier López authored 6 years ago and Nick Thomas committed 6 years ago
  
  9a81550f
Jul 11, 2018
- Resolve "Rename the `Master` role to `Maintainer`" Backend · a63bce1a
  Mark Chao authored 6 years ago
  
  a63bce1a
May 07, 2018
- Backports every CE related change from ee-5484 to CE · 9a130593
  Tiago Botelho authored 6 years ago
  
  9a130593
Dec 22, 2017
- Replace '.team << [user, role]' with 'add_role(user)' in specs · 27c95364
  blackst0ne authored 7 years ago
  
  27c95364
Aug 29, 2017
- replace `is_default_branch?` with `default_branch?` · 1c0def2a
  Maxim Rydkin authored 7 years ago
  
  1c0def2a
Aug 16, 2017
- Only create commit GPG signature when necessary · ba7251fe
  Douwe Maan authored 7 years ago
  
  ba7251fe
Aug 10, 2017

Migrate events into a new format · 0395c471

Yorick Peterse authored 7 years ago

This commit migrates events data in such a way that push events are
stored much more efficiently. This is done by creating a shadow table
called "events_for_migration", and a table called "push_event_payloads"
which is used for storing push data of push events. The background
migration in this commit will copy events from the "events" table into
the "events_for_migration" table, push events in will also have a row
created in "push_event_payloads".

This approach allows us to reclaim space in the next release by simply
swapping the "events" and "events_for_migration" tables, then dropping
the old events (now "events_for_migration") table.

The new table structure is also optimised for storage space, and does
not include the unused "title" column nor the "data" column (since this
data is moved to "push_event_payloads").

== Newly Created Events

Newly created events are inserted into both "events" and
"events_for_migration", both using the exact same primary key value. The
table "push_event_payloads" in turn has a foreign key to the _shadow_
table. This removes the need for recreating and validating the foreign
key after swapping the tables. Since the shadow table also has a foreign
key to "projects.id" we also don't have to worry about orphaned rows.

This approach however does require some additional storage as we're
duplicating a portion of the events data for at least 1 release. The
exact amount is hard to estimate, but for GitLab.com this is expected to
be between 10 and 20 GB at most. The background migration in this commit
deliberately does _not_ update the "events" table as doing so would put
a lot of pressure on PostgreSQL's auto vacuuming system.

== Supporting Both Old And New Events

Application code has also been adjusted to support push events using
both the old and new data formats. This is done by creating a PushEvent
class which extends the regular Event class. Using Rails' Single Table
Inheritance system we can ensure the right class is used for the right
data, which in this case is based on the value of `events.action`. To
support displaying old and new data at the same time the PushEvent class
re-defines a few methods of the Event class, falling back to their
original implementations for push events in the old format.

Once all existing events have been migrated the various push event
related methods can be removed from the Event model, and the calls to
`super` can be removed from the methods in the PushEvent model.

The UI and event atom feed have also been slightly changed to better
handle this new setup, fortunately only a few changes were necessary to
make this work.

== API Changes

The API only displays push data of events in the new format. Supporting
both formats in the API is a bit more difficult compared to the UI.
Since the old push data was not really well documented (apart from one
example that used an incorrect "action" nmae) I decided that supporting
both was not worth the effort, especially since events will be migrated
in a few days _and_ new events are created in the correct format.

Verified

0395c471

Aug 09, 2017
- Enable the Layout/SpaceBeforeBlockBraces cop · c946ee12
  Rémy Coutable authored 7 years ago
  
  Signed-off-by: Rémy Coutable <remy@rymai.me>
  Verified
  
  c946ee12
Aug 01, 2017
- Rename many path_with_namespace -> full_path · abb87832
  Gabriel Mazetto authored 7 years ago
  
  abb87832
Jul 28, 2017
- Load and process at most 100 commits when pushing into default branch · 0e355e5c
  Douwe Maan authored 7 years ago
  
  0e355e5c
- refactors git push service spec code · 1cd43c38
  Tiago Botelho authored 7 years ago
  
  1cd43c38
Jul 27, 2017
- generate gpg signature on push · e63b693f
  Alexis Reigel authored 7 years ago
  
  e63b693f
- Remove superfluous lib: true, type: redis, service: true, models: true,... · ddccd24c
  Rémy Coutable authored 7 years ago
  
  Remove superfluous lib: true, type: redis, service: true, models: true, services: true, no_db: true, api: true Signed-off-by: Rémy Coutable <remy@rymai.me>
  ddccd24c
Jul 24, 2017
- Support both internal and external issue trackers · 7bee7b84
  Jarka Kadlecova authored 7 years ago
  
  7bee7b84
Jul 18, 2017
- Incorporate Gitaly's Commits#between RPC · 25b01b4c
  Alejandro Rodríguez authored 7 years ago
  
  25b01b4c
Jul 11, 2017
- Support multiple Redis instances based on queue type · cb3b4a15
  Paul Charlton authored 7 years ago and Robert Speicher committed 7 years ago
  
  cb3b4a15
Jun 30, 2017
- Improve support for external issue references · 9da30769
  Adam Niedzielski authored 7 years ago
  
  9da30769
Jun 21, 2017
- Enable Style/DotPosition Rubocop · 0430b764
  Grzegorz Bizon authored 7 years ago
  
  0430b764
May 31, 2017
- Introduce source to pipeline entity · 161af17c
  Kamil Trzcińśki authored 7 years ago
  
  161af17c
May 30, 2017
- Don’t create comment on JIRA if link already exists · ab8d54b2
  Jarka Kadlecova authored 7 years ago
  
  ab8d54b2
May 09, 2017
- Don't use DiffCollection for deltas · 48254d18
  Jacob Vosmaer (GitLab) authored 7 years ago
  
  48254d18
May 04, 2017
- Use regex to skip unnecessary reference processing in ProcessCommitWorker · 020295ff
  James Edwards-Jones authored 7 years ago
  
  020295ff
Mar 28, 2017
- Use `:empty_project` where possible in service specs · ca9a79f6
  Robert Speicher authored 7 years ago
  
  ca9a79f6
Mar 07, 2017
- Moved call of SystemHooksService from UpdateMergeRequestsWorker to GitPushServic… · 4bcd900f
  gpongelli authored 8 years ago
  
  4bcd900f
Feb 23, 2017
- Revert "Prefer leading style for Style/DotPosition" · 1fe7501b
  Douwe Maan authored 8 years ago
  
  This reverts commit cb10b725c8929b8b4460f89c9d96c773af39ba6b.
  1fe7501b
- Prefer leading style for Style/DotPosition · 206953a4
  Douwe Maan authored 8 years ago
  
  206953a4
Dec 23, 2016

Schedule at most 100 commits · 89d3ef38

Yorick Peterse authored 8 years ago

When processing push payloads we now schedule at most the 100 most
recent commits, instead of all commits that were in a payload. This
prevents one from overloading the system by pushing thousands if not
millions of commits in a single go.

Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/25827

Verified

89d3ef38

Dec 21, 2016

Add more storage statistics · 3ef4f74b

Markus Koller authored 8 years ago

This adds counters for build artifacts and LFS objects, and moves
the preexisting repository_size and commit_count from the projects
table into a new project_statistics table.

The counters are displayed in the administration area for projects
and groups, and also available through the API for admins (on */all)
and normal users (on */owned)

The statistics are updated through ProjectCacheWorker, which can now
do more granular updates with the new :statistics argument.

Verified

3ef4f74b

Dec 01, 2016

Pass commit data to ProcessCommitWorker · 6b4d3356

Yorick Peterse authored 8 years ago

By passing commit data to this worker we remove the need for querying
the Git repository for every job. This in turn reduces the time spent
processing each job.

The migration included migrates jobs from the old format to the new
format. For this to work properly it requires downtime as otherwise
workers may start producing errors until they're using a newer version
of the worker code.

Unverified

6b4d3356

Nov 21, 2016

Refactor cache refreshing/expiring · ffb9b3ef

Yorick Peterse authored 8 years ago

This refactors repository caching so it's possible to selectively
refresh certain caches, instead of just expiring and refreshing
everything.

To allow this the various methods that were cached (e.g. "tag_count" and
"readme") use a similar pattern that makes expiring and refreshing
their data much easier.

In this new setup caches are refreshed as follows:

1. After a commit (but before running ProjectCacheWorker) we expire some
   basic caches such as the commit count and repository size.

2. ProjectCacheWorker will recalculate the commit count, repository
   size, then refresh a specific set of caches based on the list of
   files changed in a push payload.

This requires a bunch of changes to the various methods that may be
cached. For one, data should not be cached if a branch used or the
entire repository does not exist. To prevent all these methods from
handling this manually this is taken care of in
Repository#cache_method_output. Some methods still manually check for
the existence of a repository but this result is also cached.

With selective flushing implemented ProjectCacheWorker no longer uses an
exclusive lease for all of its work. Instead this worker only uses a
lease to limit the number of times the repository size is updated as
this is a fairly expensive operation.

Verified

ffb9b3ef

Nov 18, 2016

Use `Gitlab.config.gitlab.host` over `'localhost'` · 9c4e0d64

Lin Jen-Shin authored 8 years ago

This would fix long standing failures running tests on
my development machine, which set `Gitlab.config.gitlab.host`
to another host because it's not my local computer. Now I
finally cannot withstand it and decided to fix them once and
for all.

9c4e0d64

Add JIRA remotelinks and prevent duplicated closing messages · 85dd05b5
Felipe Artur authored 8 years ago

85dd05b5

Nov 07, 2016

Process commits in a separate worker · 509910b8

Yorick Peterse authored 8 years ago

This moves the code used for processing commits from GitPushService to
its own Sidekiq worker: ProcessCommitWorker.

Using a Sidekiq worker allows us to process multiple commits in
parallel. This in turn will lead to issues being closed faster and cross
references being created faster. Furthermore by isolating this code into
a separate class it's easier to test and maintain the code.

The new worker also ensures it can efficiently check which issues can be
closed, without having to run numerous SQL queries for every issue.

Unverified

509910b8

Flush Housekeeping data from Redis specs · 89bb29b2

Yorick Peterse authored 8 years ago

These specs use raw Redis objects which can not use the memory based
caching mechanism used for tests. As such we have to explicitly flush
the data from Redis before/after each spec to ensure no data lingers on.

Unverified

89bb29b2

Oct 26, 2016
- Finish updates to use JIRA gem · c2d6822e
  Felipe Artur authored 8 years ago
  
  Code improvements, bug fixes, finish documentation and specs
  c2d6822e
Oct 19, 2016
- Prevent wrong markdown on issue ids when project has Jira service activated · 8e4301d9
  Felipe Artur authored 8 years ago
  
  8e4301d9
Oct 17, 2016

Add a be_like_time matcher and use it in specs · bfb20200

Nick Thomas authored 8 years ago

The amount of precision times have in databases is variable, so we need
tolerances when comparing in specs. It's better to have the tolerance defined
in one place than several.

bfb20200

Oct 13, 2016
- Extract project#update_merge_requests and SystemHooks to its own worker from GitPushService · bba47886
  Paco Guzman authored 8 years ago
  
  bba47886
Oct 07, 2016

Add markdown cache columns to the database, but don't use them yet · e94cd6fd

Nick Thomas authored 8 years ago

This commit adds a number of _html columns and, with the exception of Note,
starts updating them whenever the content of their partner fields changes.

Note has a collision with the note_html attr_accessor; that will be fixed later

A background worker for clearing these cache columns is also introduced - use
`rake cache:clear` to set it off. You can clear the database or Redis caches
separately by running `rake cache:clear:db` or `rake cache:clear:redis`,
respectively.

e94cd6fd

Admin message

Admin message