Skip to content
Snippets Groups Projects
  1. Aug 01, 2018
    • Zeger-Jan van de Weg's avatar
      Add repository languages for projects · 79a5d768
      Zeger-Jan van de Weg authored
      Our friends at GitHub show the programming languages for a long time,
      and inspired by that this commit means to create about the same
      functionality.
      
      Language detection is done through Linguist, as before, where the
      difference is that we cache the result in the database. Also, Gitaly can
      incrementaly scan a repository. This is done through a shell out, which
      creates overhead of about 3s each run. For now this won't be improved.
      
      Scans are triggered by pushed to the default branch, usually `master`.
      However, one exception to this rule the charts page. If we're requesting
      this expensive data anyway, we just cache it in the database.
      
      Edge cases where there is no repository, or its empty are caught in the
      Repository model. This makes use of Redis caching, which is probably
      already loaded.
      
      The added model is called RepositoryLanguage, which will make it harder
      if/when GitLab supports multiple repositories per project. However, for
      now I think this shouldn't be a concern. Also, Language could be
      confused with the i18n languages and felt like the current name was
      suiteable too.
      
      Design of the Project#Show page is done with help from @dimitrieh. This
      change is not visible to the end user unless detections are done.
      Unverified
      79a5d768
  2. Jul 30, 2018
  3. Jul 11, 2018
  4. May 07, 2018
  5. Dec 22, 2017
  6. Aug 29, 2017
  7. Aug 16, 2017
  8. Aug 10, 2017
    • Yorick Peterse's avatar
      Migrate events into a new format · 0395c471
      Yorick Peterse authored
      This commit migrates events data in such a way that push events are
      stored much more efficiently. This is done by creating a shadow table
      called "events_for_migration", and a table called "push_event_payloads"
      which is used for storing push data of push events. The background
      migration in this commit will copy events from the "events" table into
      the "events_for_migration" table, push events in will also have a row
      created in "push_event_payloads".
      
      This approach allows us to reclaim space in the next release by simply
      swapping the "events" and "events_for_migration" tables, then dropping
      the old events (now "events_for_migration") table.
      
      The new table structure is also optimised for storage space, and does
      not include the unused "title" column nor the "data" column (since this
      data is moved to "push_event_payloads").
      
      == Newly Created Events
      
      Newly created events are inserted into both "events" and
      "events_for_migration", both using the exact same primary key value. The
      table "push_event_payloads" in turn has a foreign key to the _shadow_
      table. This removes the need for recreating and validating the foreign
      key after swapping the tables. Since the shadow table also has a foreign
      key to "projects.id" we also don't have to worry about orphaned rows.
      
      This approach however does require some additional storage as we're
      duplicating a portion of the events data for at least 1 release. The
      exact amount is hard to estimate, but for GitLab.com this is expected to
      be between 10 and 20 GB at most. The background migration in this commit
      deliberately does _not_ update the "events" table as doing so would put
      a lot of pressure on PostgreSQL's auto vacuuming system.
      
      == Supporting Both Old And New Events
      
      Application code has also been adjusted to support push events using
      both the old and new data formats. This is done by creating a PushEvent
      class which extends the regular Event class. Using Rails' Single Table
      Inheritance system we can ensure the right class is used for the right
      data, which in this case is based on the value of `events.action`. To
      support displaying old and new data at the same time the PushEvent class
      re-defines a few methods of the Event class, falling back to their
      original implementations for push events in the old format.
      
      Once all existing events have been migrated the various push event
      related methods can be removed from the Event model, and the calls to
      `super` can be removed from the methods in the PushEvent model.
      
      The UI and event atom feed have also been slightly changed to better
      handle this new setup, fortunately only a few changes were necessary to
      make this work.
      
      == API Changes
      
      The API only displays push data of events in the new format. Supporting
      both formats in the API is a bit more difficult compared to the UI.
      Since the old push data was not really well documented (apart from one
      example that used an incorrect "action" nmae) I decided that supporting
      both was not worth the effort, especially since events will be migrated
      in a few days _and_ new events are created in the correct format.
      Verified
      0395c471
  9. Aug 09, 2017
  10. Aug 01, 2017
  11. Jul 28, 2017
  12. Jul 27, 2017
  13. Jul 24, 2017
  14. Jul 18, 2017
  15. Jul 11, 2017
  16. Jun 30, 2017
  17. Jun 21, 2017
  18. May 31, 2017
  19. May 30, 2017
  20. May 09, 2017
  21. May 04, 2017
  22. Mar 28, 2017
  23. Mar 07, 2017
  24. Feb 23, 2017
  25. Dec 23, 2016
  26. Dec 21, 2016
    • Markus Koller's avatar
      Add more storage statistics · 3ef4f74b
      Markus Koller authored
      This adds counters for build artifacts and LFS objects, and moves
      the preexisting repository_size and commit_count from the projects
      table into a new project_statistics table.
      
      The counters are displayed in the administration area for projects
      and groups, and also available through the API for admins (on */all)
      and normal users (on */owned)
      
      The statistics are updated through ProjectCacheWorker, which can now
      do more granular updates with the new :statistics argument.
      Verified
      3ef4f74b
  27. Dec 01, 2016
    • Yorick Peterse's avatar
      Pass commit data to ProcessCommitWorker · 6b4d3356
      Yorick Peterse authored
      By passing commit data to this worker we remove the need for querying
      the Git repository for every job. This in turn reduces the time spent
      processing each job.
      
      The migration included migrates jobs from the old format to the new
      format. For this to work properly it requires downtime as otherwise
      workers may start producing errors until they're using a newer version
      of the worker code.
      Unverified
      6b4d3356
  28. Nov 21, 2016
    • Yorick Peterse's avatar
      Refactor cache refreshing/expiring · ffb9b3ef
      Yorick Peterse authored
      This refactors repository caching so it's possible to selectively
      refresh certain caches, instead of just expiring and refreshing
      everything.
      
      To allow this the various methods that were cached (e.g. "tag_count" and
      "readme") use a similar pattern that makes expiring and refreshing
      their data much easier.
      
      In this new setup caches are refreshed as follows:
      
      1. After a commit (but before running ProjectCacheWorker) we expire some
         basic caches such as the commit count and repository size.
      
      2. ProjectCacheWorker will recalculate the commit count, repository
         size, then refresh a specific set of caches based on the list of
         files changed in a push payload.
      
      This requires a bunch of changes to the various methods that may be
      cached. For one, data should not be cached if a branch used or the
      entire repository does not exist. To prevent all these methods from
      handling this manually this is taken care of in
      Repository#cache_method_output. Some methods still manually check for
      the existence of a repository but this result is also cached.
      
      With selective flushing implemented ProjectCacheWorker no longer uses an
      exclusive lease for all of its work. Instead this worker only uses a
      lease to limit the number of times the repository size is updated as
      this is a fairly expensive operation.
      Verified
      ffb9b3ef
  29. Nov 18, 2016
  30. Nov 07, 2016
    • Yorick Peterse's avatar
      Process commits in a separate worker · 509910b8
      Yorick Peterse authored
      This moves the code used for processing commits from GitPushService to
      its own Sidekiq worker: ProcessCommitWorker.
      
      Using a Sidekiq worker allows us to process multiple commits in
      parallel. This in turn will lead to issues being closed faster and cross
      references being created faster. Furthermore by isolating this code into
      a separate class it's easier to test and maintain the code.
      
      The new worker also ensures it can efficiently check which issues can be
      closed, without having to run numerous SQL queries for every issue.
      Unverified
      509910b8
    • Yorick Peterse's avatar
      Flush Housekeeping data from Redis specs · 89bb29b2
      Yorick Peterse authored
      These specs use raw Redis objects which can not use the memory based
      caching mechanism used for tests. As such we have to explicitly flush
      the data from Redis before/after each spec to ensure no data lingers on.
      Unverified
      89bb29b2
  31. Oct 26, 2016
  32. Oct 19, 2016
  33. Oct 17, 2016
  34. Oct 13, 2016
  35. Oct 07, 2016
    • Nick Thomas's avatar
      Add markdown cache columns to the database, but don't use them yet · e94cd6fd
      Nick Thomas authored
      This commit adds a number of _html columns and, with the exception of Note,
      starts updating them whenever the content of their partner fields changes.
      
      Note has a collision with the note_html attr_accessor; that will be fixed later
      
      A background worker for clearing these cache columns is also introduced - use
      `rake cache:clear` to set it off. You can clear the database or Redis caches
      separately by running `rake cache:clear:db` or `rake cache:clear:redis`,
      respectively.
      e94cd6fd
Loading