AnswerCult

Question

638 viewsJanuary 1, 2024

Question 100.55K December 31, 2022 0 Comments

By constantly downloading random information from the internet, wouldn’t you be exposing yourself to tons of malicious content? Aren’t there pages that can run malware without you even clicking on anything?

A better example than search engines might be something like “the wayback machine”, a site that actually saves the pages, and not just links.

In: 12

6 Answers

Answer 1 · 2022-12-31T07:34:25+00:00

When you click on a link the browser loads a lot of stuff, including page layout information (HTML & CSS), references to images, and Javascript code, which is like a program to do things. Normally the code communicates with the host platform to get information from databases and pass along things like passwords and emails, but that can be changed to do bad things.

Search engines don’t do that. The contents of the page are downloaded and scanned for text, links, and images, but no code is run. It’s like the difference between looking at directions on a map, and actually following those directions.

Answer 2 · 2022-12-31T08:06:49+00:00

Think of it like the difference between photocopying a book and reading one. Your Web browser reads the page code and interprets it. Crawlers and things like the way back machine just copy the page code or specific bits in the code.

Answer 3 · 2022-12-31T08:41:41+00:00

Anonymous Posted December 31, 2022 0 Comments

Because they just read and write, and don’t execute.
Kinda like copypasta

You can read a manual on how to hurt yourself physically without being harmed. Just acting on it is damaging you

Answer 4 · 2022-12-31T12:34:35+00:00

No, not really. Modern browsers are pretty resilient, they generally don’t trust the code on the page, and limit its possible actions. Loopholes still happen, but they get patched quickly. This is the first line of defense.

Then, they run the crawler code on a restricted user account, so the operating system will refuse any access to system files. That’s the second line.

Finally, if the malicious code somehow finds a loophole in a browser, AND THEN a loophole in OS, they get to live – up until the next system wipe.

Answer 5 · 2022-12-31T15:23:26+00:00

ELI5: You can pretty easily tell if something is a book right? So you are looking for something to read. Pick it up. Is it a book? No. Toss it. Yes? Read it.

Search engines do the same with everything they process. Malware can’t be embedded in a webpage, its a seperate executable downloaded by the page. So anytime the crawler reads something “is it a webpage?” No, toss it. Yes, process it, then find everything it links to, repeat.

Answer 6 · 2022-12-31T17:17:47+00:00

a) zero day exploits really aren’t that common anymore – most viruses require a human to manually start them, just visiting a web site and clicking links won’t do it

b) most crawlers aren’t actually “looking” at most of the content, so they’d just move around the virus without actually being affected by it

c) any exploit would likely be targeted against common browsers – the environment of the crawler would be different and the exploit/virus likely wouldn’t work there, unless specifically targeting the crawler (and targeting the crawler is hard, because unlike the browser, it’s not public so you can’t easily test your attack)

d) if the operators have any common sense, the crawlers running inside a sandbox, so exploiting the crawler does nothing and the sandbox will be automatically destroyed and recreated from a clean version on a regular basis

e) targeting crawlers specifically would be a dangerous game: due to the sandboxing it’s not too valuable, but you’re exposing your (valuable) zero day to an environment that could be tightly monitored. If you get caught, your zero day will be fixed and become worthless.

AnswerCult

How web crawlers and other engines don’t constantly get infected with viruses?

6 Answers

Search questions

Popular Questions

Latest Answers