AnswerCult

Question

374 viewsJanuary 2, 2024

Question 100.55K July 20, 2022 0 Comments

I get the basic working of a web crawler. They start at a page and index it, then follow all its links and index those, and so on. Eventually they have a good enough representation of the entire internet.

Recently I made a new web page on a new domain, and soon enough it showed up in a Google search. How did Google know the address of my site (or any other one) to begin with? Is there an actual directory of every site on the internet? Or does it try random combinations of domain names?

In: 2

3 Answers

You are viewing 1 out of 3 answers, click here to view all answers.

Answer 1 · 2022-07-20T19:13:28+00:00

Even if no one knows about your site yet, whoever created the site and published it likely tested it from their browser to make sure they could access it over the internet. When they did this the protocol used on the internet basically said “who is at <yourdomain.com>?”, then the the domain name servers which covert domain names to their actual IP addresses replied “I just got an update yesterday that that particular domain name is at <your server’s IP address>”. This created a change in the route tables of the major routers that handle a lot of traffic. Major network companies have access to this route table and will use it, in addition to links and other data to further index all the content they know about. And it all happened in milliseconds.

AnswerCult

How do web crawlers (like Google) find pages that aren’t linked to/from anything

3 Answers

Search questions

Popular Questions

Latest Answers