It mostly slow on projects with huge amount of open issues and merge requests because appears it loaded autocomplete for users, issues and merge requests with one request
Just looking at my network log for this issue, I see this:
Notice the autocomplete_sources endpoint is being called twice. Also, the content download dominates the download time, since so much is being retrieved all at once.
This appears to be slow because of the following (ignoring gzip):
The auto complete endpoint is requested twice. It appears this is caused by the "focus" event of the comment box being triggered twice.
When requesting the list of users for the autocomplete the code will look up all participants of the current issue. This goes through the whole user reference filter pipeline, which is quite slow. Removing this cuts down loading times by about a second, at the cost of only being able to auto complete group members (thus it's not really a solution).
I looked into this at one point. I believe we're calling GfmAutoComplete.setup() multiple times per page load due to the spaghetti-like nature of our JS.
We've got the list of participants at the top of the notes, perhaps we could load just the team members server-side and then add in any participants that aren't already there via Javascript?
Further digging reveals that this code processes the notes of an issue as if they're about to be rendered. This means the code is replacing user references with actual links, the whole shebang. Instead all we need is basically the following process:
Grab all notes of an issue
Scan for all user references in every note, stuff these in a single Set
Find all User objects based on the Set created in step 2
This would need at most 1 SQL query, and no string replacements. I'll do some more digging to see if this is even possible to implement without too much trouble.
In MR !1602 (merged)@DouweM & friends are working on adding caching to the HTML pipeline. Using this branch the time it takes for the auto-complete page to load is reduced from around 2 seconds to around 800 milliseconds. This is already a pretty good step but I think we can shave off some extra time.
Because !1602 (merged) is still a work in progress I'll wait with making any further changes until it's merged, otherwise I may end up making changes to code that's going to be changed/removed/etc. @DouweM mentioned he was hoping to get said MR released in 8.3, which would nicely coincide with the intended milestone of this current issue.
!1602 (merged) seems to have made some improvements (from what I remember when I checked a week or two ago), but the performance varies and I think we can tune things even further. I'll try to dig up more.
According to Sherlock/Yorlock only about 5% of the time is spent in SQL queries. Looking at the line profiler most time appears to be spent as following (timings are inclusive):
Time (inclusive) Count Path842.06 ms 1 app/models/concerns/participable.rb814.94 ms 1 lib/banzai/reference_extractor.rb784.01 ms 1 app/models/concerns/mentionable.rb762.74 ms 1 lib/gitlab/reference_extractor.rb637.91 ms 1 lib/banzai/renderer.rb629.87 ms 1 lib/banzai/pipeline/base_pipeline.rb513.66 ms 1 lib/banzai/filter/reference_filter.rb501.5 ms 1 lib/banzai/filter/abstract_reference_filter.rb147.53 ms 1 app/services/projects/participants_service.rb103.43 ms 1 lib/banzai/filter/user_reference_filter.rb80.9 ms 1 lib/gitlab/sherlock/query.rb71.76 ms 1 app/models/project.rb
A large chunk of time is spent in extracting all text nodes from a document. Sadly I don't see a way of optimizing this at this time.
When extracting the users for the dropdown the full HTML pipeline is activated, e.g. issue links are also processed. Skipping this should speed things up quite a bit.
When extracting the users for the dropdown the full HTML pipeline is activated, e.g. issue links are also processed. Skipping this should speed things up quite a bit.
I'm sure we can do some kind of custom pipeline for this right, @DouweM?
The more I look into this, the more I think we really need to start storing participants in a table. Even when excluding finding issue links and all that the process is still going to take some time. Should participants be stored in a database table the only thing you'd have to do afterwards is:
Get the participants of an issue (from the issue_participants table or something like that)
Filter out any participants based on permissions (if applicable at all)
I don't really see any other way that would get the loading times of the autocomplete dropdown under a second.
One little trick we can do is to simply load the autocomplete data when loading the page (but still using an XHR call). This should hopefully ensure the list of usernames is present when one starts typing a comment, at the cost of always having to load it.
@dzaporozhets should we just make autocomplete for public and innersource projects search all users and groups on the server? And in case of private projects just load the participants instead of parsing all participants. I frequently miss the ability to mention people outside of the project. We can discuss in https://gitlab.com/gitlab-org/gitlab-ce/issues/3872 /cc @yorickpeterse
should we just make autocomplete for public and innersource projects search all users and groups on the server? And in case of private projects just load the participants instead of parsing all participants. I frequently miss the ability to mention people outside of the project.
@sytses makes sense but I think that @mention should have first results from issue participants rather then name match. For example we have issue with @dmitriy participating and 100 users on server with name starting @dm.... If I start typing it should give @dmitriy as first match instead of @dmaaa, @dmbbb etc. Because in 90% of cases I want to mention someone who already participating in issue or in project.
Another thing to keep in mind with autocomplete for all users: running a query for every new character entered is going to have a serious impact on the database.
I think we should wait with a participants table until we have caching fully working, which will resolve many of these issues:
According to Sherlock/Yorlock only about 5% of the time is spent in SQL queries.
Caching will save us SQL queries, but also much of Banzai / Markdown pipeline.
A large chunk of time is spent in extracting all text nodes from a document.
With caching, the only lookup is doc.css('a.gfm'), which should be quite fast.
When extracting the users for the dropdown the full HTML pipeline is activated, e.g. issue links are also processed. Skipping this should speed things up quite a bit.
I'm sure we can do some kind of custom pipeline for this right, @DouweM?
With caching, the whole pipeline is only run once.
With !2312 (merged) merged this should at least no longer annoy users. Since the remainder of this depends on either caching the HTML pipeline output and/or moving participants to a table I'll close this issue.