use "Full Text Search" features to improve search results

Reassigned to @mvdan

Assignee removed

mentioned in merge request !166 (closed)

FWIW, this should be solved by using "full text search" in sqlite: http://blog.andresteingress.com/2011/09/30/android-quick-tip-using-sqlite-fts-tables/

mentioned in issue #511 (closed)

mentioned in issue #549

mentioned in issue #66 (closed)

Milestone changed to %1.1

Added ~16415 label

I came here to suggest sorting the search results by relevance. Is that what this issue is referring to or should I open a new new one for that? The title of this one seems to imply that some form of ranking already takes place and simply needs improving.

There is no ranking at the moment, don't open another issue.

Improved searching is definitely needed, the hard part is determining "relevance". Google does it by tracking everything you do in detail, and then selling that info to people to purchase relevance (aka ads). F-Droid actively avoids all kinds of tracking of its users, so those kinds of models don't work.

You may be overestimating the scale of the problem. We're not trying to rank millions of apps with identical names according to personal preferences here. Simply determining a score based on some common sense criteria would go a long way.

For example, consider the following list of apps:

Dictionary: For looking up words
Introspection: An app that gets you thinking
Planetarium: Astronomy tool with many special features
Spec: Tower Defense Game

The search term "Spec" would lead to the following ranking:

Spec [highest score, match on title]
Introspection [medium score, substring match in title]
Planetarium [low score, substring match in description]
Dictionary [zero score]

I believe the FTS (full text search) engine in SQLite3 helps us with this, by providing relevance numbers for each result. Not sure if you can weight the relevance by which field matches, but it is still better than nothing. I'm not sure on the algorithm they use, but there are plenty of algorithms which work without collecting user data. One of the most famous is TF/IDF (term frequency / inverse document frequency). The intuition for that is: "As a term from the query appears with a high frequency in, e.g., a description, rank higher. However, if that term appears in many different documents (e.g. "the"), then weight it lower.

The other improvement you get from the FTS engine is word stemming. So if I search for "runner", then it will stem the word to its base (unfortunately English) word "run" and then match other stemmed words that match, such as "running". Not sure how this goes for other languages though.

Looks like SQLite's FTS4 or FTS3 should give us a lot. FTS4 was added in android-11, but I think it would be worth using the existing search on android-10 then use FTS4. And as usual, when I search for a technical answer about Android, I find something good from @commonsguy: https://commonsware.com/blog/2015/11/30/book-excerpt-full-text-indexing-searching-part-1.html

This looks like a useful overview: https://stackoverflow.com/questions/29815248/full-text-search-example-in-android/29926430#29926430

It seems that SQLite FTS works a lot easier with ASCII, but full Unicode support is possible: https://stackoverflow.com/questions/29669342/unicode-support-for-sqlite-full-text-search-in-android

Changed title: Improve the search ranking → use "Full Text Search" features to improve search results

@eighthave Beyond those blog posts, the full FTS4 sample app is available.

I use FTS3 for full-text search of the APK edition of my book. It's astonishing how fast it is. However, do note that it will consume a chunk of disk space. For example, for 11.5MB of book prose HTML, the FTS3 index is 9.6MB.

@commonsguy any experience on FTS3 vs FTS4? Given that we don't want to bump minSdk past android-10, should we use FTS3 on all versions or some combination of FTS4 and FTS3/nothing?

Edit: just noticed his blog post mentions pros and cons, which answers the first question.

@mvdan The docs have a section on the differences. Unless you need certain matchinfo fields that are only available on FTS4, you should be OK with FTS3. Or, do what @eighthave suggested, and go FTS4 with limited searching on Android 2.3.

mentioned in issue #691 (closed)

use "Full Text Search" features to improve search results

Designs

Child items ...

Activity

Admin message

Admin message

use "Full Text Search" features to improve search results

Activity