why are search functions on certain sites considerably worse than others? (not slower, but rather with way more inaccurate results)

336 views

why are search functions on certain sites considerably worse than others? (not slower, but rather with way more inaccurate results)

In: 8

6 Answers

Anonymous 0 Comments

TL;DR: Successfully indexing a small collection of information to have good recall and precision against the huge variety of the way humans ask questions is ***hard***.

There’s a name for this issue in the search community. It’s referred to as “the small corpus problem.” This refers to the small number of words in the search index compared to the infinite variety of things people search for. There are a couple of factors involved:

* The small number of words in the corpus means that the likelihood of **all** query terms being present in any search result is small, unless the user happens to “guess lucky” in their search. This drives the search designer to allow a hit on **any** word in the query in the hope that the user will get lucky.
* Users tend to include extraneous information in their request, e.g. on a site that sells motors, they’ll search for “12 Volt 6-inch long yellow dc motor.” In a corpus composed of larger documents, or a huge corpus (such as Google’s) such extra information is more likely to be present in at least some of the documents, so the user will get some results. In a small corpus, having a document containing all of these terms is unlikely. Constraining searches to require all terms in the query means that a search like this will likely return no results.

Search has been getting better over time as website designers learn to include more descriptive text on their pages, increasing the chances of a match.

You are viewing 1 out of 6 answers, click here to view all answers.