How does an internet search work? Do they get slower for each new internet page created or are there mecanisms to avoid that?

669 views

How does an internet search work? Do they get slower for each new internet page created or are there mecanisms to avoid that?

In:

To put very simply, Google has a robot that is constantly reading the entire internet and organizing it so that only pages relevant to what you type appear.

The search engine couldn’t search all websites each time someone puts in a search word. Instead, it creates a list of all words or phrases it finds on every webpage, and next to each word it writes down the links to every webpage containing that word. Now if someone puts in a search term, the search engine can simply check its list and immediately give you your search results. Such a list is called an index. The search engine will visit all websites on a regular basis in order to see if something’s changed. This is called crawling.

Does internet search get slower for each new internet page? Well, updating the index will take longer each time a new website is created. Looking up something in the index will also take longer if the index gets bigger. However, servers get faster and faster, and database systems probably keep getting more efficient. So I doubt this would be noticeable.

For a complete answer check out Nine Algorithms That Changed the Future by MacCormick. Best $10.00 I ever spent.

Others have mentioned how, in general, a search engine crawls the internet and indexes the pages in advance, but there’s probably value in talking specifically about PageRank, which is how Google became the winner of the early search engine competition.

Put simply, since you’re five, the indexer may tell you what subjects a web site covers, but it doesn’t really tell you which websites are *the best* or *the most important* for your search terms. PageRank uses links from other websites as a way to determine how important a site is. Lots of other people link to the Wikipedia entry for a subject, so therefore, that Wikipedia entry must be one of the most important websites about that subject and it should come up really early in the search results.

Now, of course, Google does more than that to rank search results these days, but PageRank was pretty innovative, and was one of the big things that put Google out in front.

A web search engine has a copy of every web page. Each one is given a number.

An index is built. For every word, there is a list of all the numbers of the web pages that have that word. This list is in order.

To search for multiple words, we walk though the index lists of each of them, looking for numbers of web pages that appear in both lists. For example, if one word is on pages 2, 5, and 100, and the second is on 1, 5, and 50, then page 5 is one of our hits. These scans are fast because we can “fast forward” on the index lists and skip lots of comparisons.

Once we have the pages that might answer your query, we have to pick the best ones and show them to you.

When the web gets bigger, the index gets bigger. But searches stay about the same speed because many computers work together on them, and computers keep getting faster and cheaper. Figuring out how to get many computers to work together effectively is hard and fun.

Slightly off topic: I told my computer illiterate mother in law that when she Googled something always say “please” and “thank you” in the search because the google employees who to the searching for you will do a better job.

Still does it.