[ELI5] What is data scraping and why is it bad?

361 views

[ELI5] What is data scraping and why is it bad?

In: 16

6 Answers

Anonymous 0 Comments

Data scraping is simply finding data – no matter what it is – out on the internet intended for human consumption, but having a bot (ie. software) collect the data instead. “Data” could be actual tables of information like weather reports, or it could be news articles on the front page of the New York Times.

Why is it bad? Well, this isn’t the intended use of the information on the web site. There are “better” ways for people to get the data, and maybe those methods are intended to be behind some kind of pay-wall. Using the human-intended version of a web page probably serves ads, which the bot won’t load and display. Statistics tracking on the bot is going to throw off said statistics compared to humans.

It’s one of those weird edge cases whose morality is questionable. The information is generally available, but it wasn’t intended to be slurped up by a bot at high speed and now people are disturbed by it, from bots loading more data than humans, to the fact that you’re trying to avoid the pay-wall, to trying to collect data to build your own database for your own purposes.

You are viewing 1 out of 6 answers, click here to view all answers.
0 views

[ELI5] What is data scraping and why is it bad?

In: 16

6 Answers

Anonymous 0 Comments

Data scraping is simply finding data – no matter what it is – out on the internet intended for human consumption, but having a bot (ie. software) collect the data instead. “Data” could be actual tables of information like weather reports, or it could be news articles on the front page of the New York Times.

Why is it bad? Well, this isn’t the intended use of the information on the web site. There are “better” ways for people to get the data, and maybe those methods are intended to be behind some kind of pay-wall. Using the human-intended version of a web page probably serves ads, which the bot won’t load and display. Statistics tracking on the bot is going to throw off said statistics compared to humans.

It’s one of those weird edge cases whose morality is questionable. The information is generally available, but it wasn’t intended to be slurped up by a bot at high speed and now people are disturbed by it, from bots loading more data than humans, to the fact that you’re trying to avoid the pay-wall, to trying to collect data to build your own database for your own purposes.

You are viewing 1 out of 6 answers, click here to view all answers.