How do web crawlers (like Google) find pages that aren’t linked to/from anything

228 views

I get the basic working of a web crawler. They start at a page and index it, then follow all its links and index those, and so on. Eventually they have a good enough representation of the entire internet.

Recently I made a new web page on a new domain, and soon enough it showed up in a Google search. How did Google know the address of my site (or any other one) to begin with? Is there an actual directory of every site on the internet? Or does it try random combinations of domain names?

In: 2

3 Answers

Anonymous 0 Comments

if your domain is www.IMovedYourCheese.com, then the fact of its registration is public (it is part of DNS database), so google got it from there.

if your domain is www.IMovedYourCheese.provider.com, then provider.com probably tells google about new domains that it created, or actually has a page listing all www.*.provider.com domains.

Anonymous 0 Comments

Even if no one knows about your site yet, whoever created the site and published it likely tested it from their browser to make sure they could access it over the internet. When they did this the protocol used on the internet basically said “who is at <yourdomain.com>?”, then the the domain name servers which covert domain names to their actual IP addresses replied “I just got an update yesterday that that particular domain name is at <your server’s IP address>”. This created a change in the route tables of the major routers that handle a lot of traffic. Major network companies have access to this route table and will use it, in addition to links and other data to further index all the content they know about. And it all happened in milliseconds.

Anonymous 0 Comments

Did you test your new site using Chrome? Rest assured Google put it on the list for indexing soon after.