Why do some specific web pages have addresses that contain SEVERAL dozen nonsense characters in the address bar? Even if there are quadrillions of individual web pages there are still way too many characters than necessary for them all to be unique and leave room for more.

420 viewsOtherTechnology

Why do some specific web pages have addresses that contain SEVERAL dozen nonsense characters in the address bar? Even if there are quadrillions of individual web pages there are still way too many characters than necessary for them all to be unique and leave room for more.

In: Technology

6 Answers

Anonymous 0 Comments

Those “nonsense” characters are more than like url parameters which don’t have anything to do with the page itself, but instead pass back information. Usually for tracking. These could be things like where the user was referred from etc. E.g. if you went to www.reddit.com/r/explainlikeimfive?source=google; you’re getting this subreddit but there’s a parameter saying that you came from Google to get here (assuming that the server wants a parameter called “source”, this is a hypothetical example)

If you see a question mark in the url, you can – in most cases – remove it and everything that comes after and still get the page. There will be some instances where this breaks because what gets passed over might be a validation key to be allowed to view something (you can see this for example if you open an image from Facebook – the parameters are a way of allowing not-logged in users to still view the actual photograph.)

Anonymous 0 Comments

If you had given an example, it would really have been helpful…

But what you are probably talking about is a tracking ID that the websites use to know exactly who is doing the clicking, at what time, and where the link came from, such as what ad or chat or share or whatever. These are typically called “GUID”s, for “Guaranteed Unique ID”s. They are large enough, and also based on time, so the same system will never duplicate an ID.

Anonymous 0 Comments

I imagine you’re talking about the autogenerated IDs?

Things like this post [https://www.reddit.com/r/explainlikeimfive/comments/**1c74iqz**/eli5_why_do_some_specific_web_pages_have/](https://www.reddit.com/r/explainlikeimfive/comments/1c74iqz/eli5_why_do_some_specific_web_pages_have/) or everyone’s favorite YouTube Video [https://www.youtube.com/watch?v=**dQw4w9WgXcQ**](https://www.youtube.com/watch?v=dQw4w9WgXcQ) contain an automatically generated ID in them.

This ID is essentially a representation of a number. While normally we count up 0-9 and then roll over into the next column and start again, if you add more “digits”, you can count 0-9 then a-z then A-Z and then two more special characters (usually **-** and **_** in URLs, because the normal convention of **+** and **/** already have special meaning in URLs).

Now there are reasons you don’t just increment this number. If I made videos 1 2 and 3, but made video 2 unlisted, then someone could just go looking for it. By using random numbers in the range, it reduces the ability to guess. There should be a very good chance that someone guessing random numbers does not actually find a result.

In addition, with something that is decentralized, you need to add a mechanism for a server in, for example, Australia to generate a number and know that another server in, for example, New York does not also generate the same number (or even a secondary server in the same location that is handling excess traffic). Having very large numbers is part of the solution to this.

So once you’ve figured out how big of a range you need to make it so that you don’t have collisions on IDs when posts or videos are created, and so that people can’t randomly guess IDs to find things, you’ve got your upper bound. Now you just randomly generate digits in that range and turn them into Base64 for the URL.

Anonymous 0 Comments

The quick answer is that what seems nonsense to you, makes perfect sense to the computer. And importantly, the opposite is also true (what makes sense to you seems nonsense to the computer).

The most basic example of this is the ‘space’ character; many times, computers can only interpret it one way: as a break between objects. So what happens when your web address, which is a single object, has a space in it? Well, by default, it breaks. So instead the computer will use a ‘space seeming character’ to look like a space, but actually be something else. And when *that* shows up in a web address that you then look at, it turns into a series of % and numbers. This also happens with other symbols that the computer has trouble with ( ” marks, slashes, rare/non-english characters). So if that’s what you’re talking about, that’s your answer.

Now, if you mean the string of random letters/numbers some web pages use (using this question as an example, the ‘1c74iqz’ in the URL), it’s a way to be consistent and brief. Every time that sequence is properly used on Reddit, it will point to this page. And it’s not like every single combination before it in sequence are used up, it’s that the programmers look at how fast new pages are created, and they think ‘7 characters will last us long enough’ (e.g. a few years or decades before they need to reprogram anything). And then when a new page is created they just take a random sequence and use it.

Anonymous 0 Comments

Sometimes it’s encoded data. Like data either you entered or the server got from a third party being passed to another page. If the data is sensitive, then it won’t be in plain text.

Anonymous 0 Comments

A lot of it is for tracking purposes and for filtering products. Rather than seeing the address as one thing see it as a series of 8-10 things. Some are names others are sets of random alphaneumerals that can include some of:

The name of the website i.e. Nike.com

the referring search engine i.e. Google, normally as a code

the name of the ad campaign Nike is paying for i.e. utm856304

The name of any affiliate partners involved as a code i.e 649263850 might mean NYTimes.com so that if you buy shoes after clicking on a link from NYTimes.com Nike will pay 3% or so to NY times.

The product name, either as the name or as a code. Of note there could be some system of these where the 2023 version of a shoe is 3456 and the 2024 version is 3457. It could also be random or chronological or a host of other systems that make sense to the people at Nike but is gobeldygook to an Internet user.

Then it gets to filtering. If you are looking at shoes and start clicking the filters on the side of the page more sophisticated e-commerce sites will handle that with a new address, often separated by a % or other less used character. If you click ‘Red’ that might add %_red/ to you address.

And then tracking of you or your session on the site as a specific computer/phone as a randomly generated string of characters. Sometimes this is in the address, sometimes it is kept hidden from you.