Search engines like Google use programs called “web crawlers” or “spiders”. These programs move around the web, starting from a list of known web pages. When they arrive at a page, they do two things:
They scan the page’s content to understand what it’s about. This is how search engines can tell you which websites mention the thing you’re interested in.
They also look for links to other web pages, and add those pages to their list of pages to visit next.
So, web crawlers are constantly hopping from page to page, scanning content and looking for new pages. This process allows the search engine to keep an updated index of what’s on the web. When you search for something, the search engine looks in that index to find relevant pages.
In the earliest days of the Web, you could tell people about your website through non-electronic means (word-of-mouth, paper newsletters), or older forms of computer-based communication (email, BBS’s, Usenet).
Some people quickly started making lists of useful websites. Heck, there were so few websites that some people made lists of *every website they knew about*.
Anyone who has a website can put links to other websites on it.
So if you want to make a search engine, you start with a list of websites — maybe a list you made, or maybe a list made by somebody else.
Then you write a program to go to each website, check it for links, and follow those links to more websites. Those websites also have links, and you can follow them to even more websites.
If there’s a new website, and somebody posts a link to that website on any other website, odds are that your program will eventually find it by following a chain of links from one of the sites in your original list.
Visiting massive numbers of websites by following links is referred to as “crawling,” short for “crawling the web”. A program that crawls the web is called a “crawler” or “spider.”
Yahoo! (one of the earliest search engines) started in 1994 as a list of websites called “Jerry and David’s Guide to the World Wide Web.” That list of websites continued to be updated at least through 2014.
The short and simple answer is that they create an index just like the one in the back of your textbook.
When you go to the index in your textbook you can look for a specific word, topic, or thing, and it will tell you which pages of the book it is on. The fundamental process is the same.
Search engines basically try to read the whole internet and reduce it into an index so that you can do the same thing you did in your book. When you put the whole internet in an index there are going to be a lot of results! So the search engines differentiate mostly on how they determine what pages to list first. Google became the biggest by creating the best algorithm for determining which pages were most relevant.
Latest Answers