why are search functions on certain sites considerably worse than others? (not slower, but rather with way more inaccurate results)

113 views

why are search functions on certain sites considerably worse than others? (not slower, but rather with way more inaccurate results)

In: 8

6 Answers

Anonymous 0 Comments

As an ecommerce website developer I can say it has more to do with the search implementation.

Some websites think of search as one of the most important things on a store and spend good time and money to research and find the best available solution

While some clients don’t really place any importance to search and/or have a tight budget and hence settle with whatever comes out of the box (which usually is implemented very poorly)

Anonymous 0 Comments

Let’s say you want to find a book called “Obscure Book”. You give two libraries a try: Library A and Library B.

Library A experience: As you walk into this library, you notice that the hundreds of books are not organized or labelled or indexed in any way. You don’t find the book and the librarian instead recommends similar books to you.

Library B experience: As you walk into this library, you notice that the books are organized by name, author, revision, year of publication, and other such things. You find the book and quickly at that.

Think of these libraries as websites; and “Obscure Book” as your search term. Library (website) A will have a poor search experience compared to Library (website) B. Obviously, library A can organize their books (webpages) better. However, organizing and maintaining that structure takes time, effort, and money.

Anonymous 0 Comments

Ohhhh, if only anyone freaking knew.

I’m genuinely not sure how one could break this down in an ELI5 way, because there is a lot of stuff going on here. It’s not necessarily all very difficult to understand, it’s just very long chains of causality, and things don’t make a lot of sense if you don’t wander down that chain for quite some time.

On a technical level, searching for stuff is actually really, really difficult. Because what a computer thinks is important and what a human thinks is important are two very different issues, so you need to do a lot of translating between the two. For instance, if you searched for “I like bread”, a computer might, naively, take those words, run through the content and throw a million results at you, 99% of which are useless, because it turns out you don’t actually care about “I”, you care about “like bread”. But the computer didn’t.

That’s what some sites do that have really terrible searches, they just naively return whatever happens to match what you asked for. Which is usually extremely useless. You can do various things to improve this approach, but a lot of them are either computationally expensive, require user input or aren’t very user-friendly.

For example, people could actively tag their content, which makes it a lot easier to translate what a human wants to something a computer understands. But it requires human input and humans input lots of nonsense. Then there are algorithms that weigh content in certain ways, say, for example, they might massively devalue the “I” from the earlier example, because it’s a fairly useless data point. But, as you may imagine, that’s pretty damn tricky. So this kind of thing is just rather difficult, so a lot of sites just don’t bother with all that.

Then there are decently effective ways, like boolean searches, for example. Which allow you to construct logical connections between search terms. You know, something like:

>I AND like AND apples OR oranges

That would, effectively, search for both “I like apples” and “I like oranges” at the same time, allowing you to be vastly more precise in what you want. A big problem with that is that, even today, most people don’t find that very intuitive. It looks very simple, and it is, but there is a certain switch you have to flick in your head to think in that kind of cold logic and, presumably, that’s not the user experience people want. Frankly, I don’t know. That’s the only halfway reasonable explanation I could ever come up with for why most sites don’t even give you this basic luxury. Worse: A lot of sites do, but they don’t tell you that it actually works. Or how exactly their version works. Which isn’t awfully useful.

Then there is the big data approach, which comes in various flavors as well. Think Google. When people weren’t in a constant arms race with the algorithm, it often felt like Google could read your mind. It’s a bit more awkward these days because Google and the websites it crawls are constantly trying to mess with each other (also, there’s just waaaaay more content to go through), but the point is, you can achieve truly marvelous things this way. However, it’s also basically cutting edge, so to speak. You can’t expect a random site to rival Google.

Having said all that, the actual answer probably has more to do with the infrastructure of the internet itself. Few websites are truly built from scratch. For instance, remember forums? It’s no accident that there were like 2 designs and every forum ever looked like either one of those, same features and everything. That’s because writing the code behind something like a forum is a boatload of work, and there is little point in reinventing the wheel for every little forum. So there is a bit of a Lego principle happening here and, for unknown reasons, people just don’t seem to value searches very much. Furthermore, due to the difficulty of making a useful search in the first place, you have keep a lot of things in mind when structuring the data of your website, you can’t (usually) just take a random solution, plop it in your project and expect everything to work perfectly. For truly good results, you’ll have to fine tune a bunch.

Now, truth be told, everything I just told you is very half-true, simplified and doesn’t always apply equally to everything. As I said, it’s a really complex chain of events that affects a lot of things on the internet in general, not just searches. To tell you even more truth: I genuinely have no idea why the state of website searches is so utterly abysmal, I, too, failed to find an actually good reason for why things are like this. The only explanation I got that actually made any amount of logical sense is that, perhaps, people just generally don’t care.

Anonymous 0 Comments

Lots of long answers here and I appreciate the knowledge in them. Here’s a shorter version:

There’s not a standard way to search things in computers, and there’s not a standard way to build websites. Every single site organizes its data differently.

Searching stuff in computers is hard. If there’s enough data, you can’t just inspect every word to see if you find matches. Instead, you have to try and build “indexes” that make it easier to find certain things, or other data structures that help.

Some programmers are very talented and come up with good techniques to create high quality search results fast. Other programmers are not so talented and their results are either slower or less high quality. Sometimes the concept of “high quality” isn’t even very clear, and if you can’t describe what you want well in English there is no hope of describing it to a computer in any programming language.

Anonymous 0 Comments

TL;DR: Successfully indexing a small collection of information to have good recall and precision against the huge variety of the way humans ask questions is ***hard***.

There’s a name for this issue in the search community. It’s referred to as “the small corpus problem.” This refers to the small number of words in the search index compared to the infinite variety of things people search for. There are a couple of factors involved:

* The small number of words in the corpus means that the likelihood of **all** query terms being present in any search result is small, unless the user happens to “guess lucky” in their search. This drives the search designer to allow a hit on **any** word in the query in the hope that the user will get lucky.
* Users tend to include extraneous information in their request, e.g. on a site that sells motors, they’ll search for “12 Volt 6-inch long yellow dc motor.” In a corpus composed of larger documents, or a huge corpus (such as Google’s) such extra information is more likely to be present in at least some of the documents, so the user will get some results. In a small corpus, having a document containing all of these terms is unlikely. Constraining searches to require all terms in the query means that a search like this will likely return no results.

Search has been getting better over time as website designers learn to include more descriptive text on their pages, increasing the chances of a match.

Anonymous 0 Comments

Since this has already been explained, I’m going to share some other useful information. You can filter Google searches using “site:”. You can put anything from a whole URL to just a top-level domain after the “site:”. Then you will only get results from there.