How do websites collect data from their users?


How do websites collect data from their users?

In: 0

When you visit a website, you share some basic information about your device. Such as the browser you are using, the ip address etc. they store those informations to collect the general information.

Another type of data collection is made by cookies. I belive you had seen the cookie disclaimer on the web sites. The store some information on your device and when you visit another website of the same company the quickly identify you.

E.g. lets say you have gmail account. Google knows exactly who you are even if you are not signed up to youtube.

The simplest answer is you enter your information in a form to receive the service. This is the very old school way though.

Marketing companies (which includes Big Tech like Google and Facebook) figured out if they spy on some of the users, they can convince them to buy stuff. At least that’s the premise.

The crux of tracking users is Information Theory. Basically they combine little bits of information to uniquely identify users. Having two different options for a feature enabled in the user’s browser give one “bit” of an information. Another option may give two bits which creates 4 different possibilities. If you collect enough unique bits, you will end up with a unique set of values to single out a user. After that the marketing companies either group these unique users under the interest categories, or worse they find specific patterns of their behavior (e.g. overweight and listening depressive songs, maybe show more junk food ads or even magic drugs) and show ads for it.

The old style web is limited in the communication interface it offers. So when you request a web page your browser sends some information about what file formats it supports and the language. It also sends an identity field about itself. These information combined with the IP and maybe the previous connections, it could identify a user uniquely.

However, the quest of providing even more features in the browser ended up with browsers that expose a lot more about the user. At the very basic level comes styling. A website can use many images and fonts. The way you write those styles can be conditional. So a developer can say, if the user has this many pixels in their display load this image otherwise load this image. It looks like a harmless very useful feature but in the end it provides information which is the most important part. It will go into the bucket of other bits of information collected.

The worst part comes with the ability of program browsers. At the start of 21st century, the website owners and browser developers (like Netscape and Microsoft), wanted to do stuff without reloading pages like change how the elements looked, show messages, arrange things etc. They also created ways to send smaller requests without complete page reloads. Every time you upvote something you use that feature. With this programmability a developer can track all of the activities of the user: they can collect more information about the user’s system, they can log key presses and mouse movements, detect whether they are interacting with the page and so on. Basically everything you do on a website creates information.

Either you give it to them, or you gave it to someone else, who gives it to them.

Your browser gives them some information about the browser and operating system. Your operating system or home router send your IP address, which can be correlated with geographic information.

Most websites ask for an email address. Some ask for your name and credit card number. The real value is often in the platform-specific things. Shopping websites keep track of all the things you bought or clicked on, along with associated things like the price of things you didn’t buy, how long you looked at them, etc. Every interaction on the website page, like typing, moving a mouse, clicking, and scrolling, can potentially be registered by your browser and sent to the website. Social media websites will often ask for more personal information, will host your photographs and written content, and will allow you to add contacts. All of this information, and the patterns by which you produce and consume it, go into the data pipeline.

Companies will also sometimes share this data using a variety of mechanisms. It used to be fashionable to simply buy and sell it, but that’s become less popular recently since people caught on to it. Nowadays, many companies will share data with the big collectors (Google, Twitter, Facebook) by allowing them to embed little web elements into other websites. Then they use all the normal tracking mechanisms to collect data about your actions on this other website. In return, Google/FB/Twitter provide tools for other companies to describe how they want advertisements to be targeted, and then the ad companies will perform the targeting on the other companies’ behalf.