Why are most downloaded files have such weird nonsence names like 2a0000016009f479?


Why are most downloaded files have such weird nonsence names like 2a0000016009f479?

In: 2

It’s a UUID – a unique identifier that avoids the problems associated with filenames, namely weird characters (think non-roman alphabet characters which may or may not be supported on a given system), multiple files with the same name, filenames that contain personal information etc.


Instead the file is given a UUID and then this is stored in a database along with whatever information you want, so that the two are associated and you can retrieve the correct file despite its name being a load of balls.

This is a hash (e.g. sha1, sha256, sha512 or whatever are “hashing-algorithms”, that can be used to generate (in an ideal world) unique hashes for each file.

It’s used, if you have a filename and the hash stored in a database and the actual file is on some file-system, instead of bloating up the database.

Also, you’d overwrite a file, that has the exact same content (and thus the same hash), instead of storing it multiple times with different names.

etc .. There are multiple reasons, why you micht wanna do that.

Currently there are two conflicting answers on this post, and neither are necessarily wrong. *TL;DR at the bottom.*

OP the example name you gave is a hexadecimal number. Hexadecimal is a number system that has sixteen unique digits (0-F), unlike the decimal system you’re used to which has ten digits (0-9). Hexadecimal is often used to represent numbers in computing because it maps so neatly to the binary numbers (digits 0-1) computers are built to operate with (conceptually, because they actually represent low and high voltage electrical signals).

Computers often work with groups of eight **bi**nary dig**its** (bits) 00000000 to 11111111 – also called a Byte – which is 0 to 255 in our decimal system, but 00 to FF in hexadecimal. Each 4 bits maps exactly to a single hexadecimal digit. So you will see hexadecimal numbers used to represent binary values that often don’t meaningfully translate to decimal numbers (hence why you often see numbers like 16, 32, 64, 256 etc, which seem rather arbitrary in decimal, but are actually nice round numbers in binary and hexadecimal).

As to why a downloaded file might be named that way – well firstly I dispute your premise, most files that I download *don’t* look like that. They have proper file names given by a human that indicate what the file is. So it’s probably more to do with where you’re downloading files from than some common thing. But ignoring that, there isn’t one reason why files might be named with hexadecimal values.

For a site that hosts thousands of files, it might make more sense to organise them in a way that’s easier for the computer to manage than for a human. So giving each file a unique (but essentially random) identifier and keeping other data about that file (the metadata) in a database, might be a more optimal way to store those files for easy downloading. That’s where the UUID thing CyclopsRock mentioned comes in; although UUID is a specific standard of issuing Universally Unique IDentifiers, and does not represent the concept as a whole. A UUID requires 32 hexadecimal digits (128 bits), your example only has 16 (64 bits).

But what Sp0olio said may also be true – a hash is a mathematical function that can deterministically generate an output of fixed length for a given input (deterministically means there’s no randomness, the same input will always give the same output). You might be dealing with a system that generates a 64 bit hash of the file, which again would be used to make it easier for the computer system to organise and find files for download. And like they said, this also has the advantage that two identical files will get the same hash, meaning you don’t store duplicate files even if they were uploaded by different users and originally had different file names; and if you know the hashing algorithm, you can potentially verify the file integrity by running the same algorithm and making sure it matches the file name.

Either way, whether those hexadecimal values are assigned randomly (ie. UUID or similar) or deterministically (ie. a hash) or by some other mechanism, the organisation of those files by the system might include storing them across multiple hierarchical folders. Putting lots of files in a single folder can be bad for performance, but if you take a file like “6f1d.jpg” and store it in folder “6”, then subfolder “f”, then subfolder “1”, then you limit the number of files in any given folder. Then take into account that files may need to be stored across multiple disks and even multiple computers; if you have some predictable method to determine exactly where a file is located based on its file name, it’s much easier for the website’s backend code to find it and send it to the user requesting it for download.

This is the way that many sites that host lots of files operate internally. However, usually they would also store a meaningful name for the file with the metadata in the database. That “proper” name would come from the website’s operator, data entry person, the user uploading the file etc depending how the site works. The site might have good reasons to not to do that (eg. for image hosting sites and social media, it might be a privacy issue to expose the original filename), or it could be developer laziness.

TL;DR what you have there is a 16-digit hexadecimal number, such numbers make it easier for computers to organise large amounts of file, so they will often name files this way internally, either by issuing random numbers or deterministically based on the file content. The website your’re downloading from can choose to present a more meaningful file name to you upon download, but they may or may not have good reasons not to do so.