Fuzzy search issues / merge requests
What does this MR do?
This MR resolves the following problem.
When performing a multi-word issue search, we appear to be looking for the exact string. e.g. "smart deploy" must match > both words, in that order, consecutively. But I want it to find any issue with "smart" and "deploy" in any order. e.g. > "deploy smart" or "smart f'ing deploy".
Example from comment:
The search term foo "really bar" baz
would return results with:
foo f'ing really bar f'ing baz
foo f'ing baz f'ing really bar
really bar f'ing foo f'ing baz
really bar f'ing baz f'ing foo
baz f'ing foo f'ing really bar
baz f'ing really bar f'ing foo
For performance reason, the words shorter than 3 chars is ignored. This problem exists in GitLab.com now, but not noticeable.
I have tested to Issues API by curl.
1 char -> X-Runtime: 47.466618
% time curl -I --header "PRIVATE-TOKEN: xxx" "https://gitlab.com/api/v4/issues?scope=all&search=a" satouhiroyuki@satou-no-MacBook-Air
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Aug 2017 13:43:23 GMT
Content-Type: application/json
Content-Length: 58696
Cache-Control: no-cache
Link: ; rel="next", ; rel="first", ; rel="last"
Vary: Origin
X-Frame-Options: SAMEORIGIN
X-Next-Page: 2
X-Page: 1
X-Per-Page: 20
X-Prev-Page:
X-Request-Id: 23f26aa2-a19f-4a45-b915-288349b90601
X-Runtime: 47.466618
X-Total: 326873
X-Total-Pages: 16344
Strict-Transport-Security: max-age=31536000
curl -I --header "PRIVATE-TOKEN: xxx" 0.03s user 0.02s system 0% cpu 51.296 total
2 chars -> 502 Bad Gateway
$ time curl -I --header "PRIVATE-TOKEN: xxx" "https://gitlab.com/api/v4/issues?scope=all&search=aa" satouhiroyuki@satou-no-MacBook-Air
HTTP/1.1 502 Bad Gateway
Server: nginx
Date: Wed, 30 Aug 2017 13:42:09 GMT
Content-Type: text/plain
Content-Length: 24
curl -I --header "PRIVATE-TOKEN: xxx" 0.03s user 0.02s system 0% cpu 1:03.42 total
3 chars -> X-Runtime: 8.509205
% time curl -I --header "PRIVATE-TOKEN: xxx" "https://gitlab.com/api/v4/issues?scope=all&search=aaa" satouhiroyuki@satou-no-MacBook-Air
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Aug 2017 13:43:37 GMT
Content-Type: application/json
Content-Length: 60599
Cache-Control: no-cache
Link: ; rel="next", ; rel="first", ; rel="last"
Vary: Origin
X-Frame-Options: SAMEORIGIN
X-Next-Page: 2
X-Page: 1
X-Per-Page: 20
X-Prev-Page:
X-Request-Id: 759f61b1-2305-4b3f-a15f-7921c18ef407
X-Runtime: 8.509205
X-Total: 1462
X-Total-Pages: 74
Strict-Transport-Security: max-age=31536000
curl -I --header "PRIVATE-TOKEN: xxx" 0.03s user 0.01s system 0% cpu 9.771 total
Are there points in the code the reviewer needs to double check?
- SQL performance
The following is the EXPLAIN ANALYSE
output made by Issue.full_search("foo bar")
.
gitlabhq_development=# SELECT COUNT(*) FROM issues;
count
-------
12333
(1 row)
gitlabhq_development=#
gitlabhq_development=# EXPLAIN ANALYSE SELECT "issues".* FROM "issues" WHERE "issues"."deleted_at" IS NULL AND ("issues"."title" ILIKE '%foo%' AND "issues"."title" ILIKE '%bar%' OR "issues"."description" ILIKE '%foo%' AND "issues"."description" ILIKE '%bar%') ORDER BY "issues"."id" DESC;
QUERY PLANSort (cost=52.03..52.04 rows=1 width=347) (actual time=0.053..0.053 rows=0 loops=1)
Sort Key: id DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on issues (cost=48.00..52.02 rows=1 width=347) (actual time=0.042..0.042 rows=0 loops=1)
Recheck Cond: ((((title)::text ~~* '%foo%'::text) AND ((title)::text ~~* '%bar%'::text)) OR ((description ~~* '%foo%'::text) AND (description ~~* '%bar%'::text)))
Filter: (deleted_at IS NULL)
-> BitmapOr (cost=48.00..48.00 rows=1 width=0) (actual time=0.040..0.040 rows=0 loops=1)
-> Bitmap Index Scan on index_issues_on_title_trigram (cost=0.00..24.00 rows=1 width=0) (actual time=0.024..0.024 rows=0 loops=1)
Index Cond: (((title)::text ~~* '%foo%'::text) AND ((title)::text ~~* '%bar%'::text))
-> Bitmap Index Scan on index_issues_on_description_trigram (cost=0.00..24.00 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)
Index Cond: ((description ~~* '%foo%'::text) AND (description ~~* '%bar%'::text))
Planning time: 0.694 ms
Execution time: 0.110 ms
(13 rows)
Why was this MR needed?
It is difficult to find issues by multi-word query.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by UX -
Has been reviewed by Frontend -
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together