AnswerCult

Question

1.12K viewsMarch 11, 2024Engineering Other

Question 100.55K March 11, 2024 0 Comments

For example colours #ff0000, Unicode points \u{003F} and others all use hex.

In: Engineering

7 Answers

Answer 1 · 2024-03-11T08:37:34+00:00

Base16 maps nicely to bytes. 1 byte has 8 bits and you need 4 bits for each hex (base16) character. So 1 byte can be written down as 2 hex characters.

For base32 you need 5 bits for each character and that’s just awkward.

You would have to split each byte into 3+5 bits and you really wouldn’t help yourself. You really don’t want to combine unrelated bytes together into a single number, so you wouldn’t really help yourself.

Answer 2 · 2024-03-11T08:33:56+00:00

16 individual symbols is reasonable, and simple to display.

32 individual symbols is far harder.

Most importantly, many computer systems use bytes of 8 bits, and a base-16 representation allows for encoding 4 bits at a time -> 2 characters per byte.

Answer 3 · 2024-03-11T09:10:12+00:00

Two base-16 characters represent a single byte in computer memory. A byte is 8 bits, and each bit can be 1 or 0, so you have 2^(8) = 256 = 16^(2) possible binary values.

Representations of many common values, such as colors, are also structured along bytes. For instance, colors are often encoded as RGB-triplets: a red, a green and a blue value, and each value can range between 0-255, for a total of 256 possible values – in other words 16^(2), or precisely one byte per value. This makes base-16, or hexadecimal, notation very convenient, as every individual value can be converted to two characters, leading to 6-character “words”. In base-32, you’d either be stuck still using two characters each for R, G and B, or you could convert the entire 3-byte binary number to a 5-character base-32 “word”. But that word would not be very readable to human eyes, because you’ve lost the triplet structure. That is, in a hexadecimal representation like FF8000, I can easily tell that this means an orange color (FF=100% red, 80 = 50% green and 00 = no blue). Using a base-32 5-character word, you’d get FV000, which I can’t read.

And therein lies the rub. This kind of notation is really only for human eyes. The underlying data is still binary. When we make the representation more “compact”, we’re not gaining any computational advantage. In fact, if anything, it’s making things harder, because the base-32 representation is more difficult to convert back to individual bytes (with base-16, you can just take each individual character and convert it separately; with base-32, you have to convert the entire word with what amounts to long division procedure). So there’s no advantage for the computer, and no advantage on the human side either, since the more “compact” representation is actually harder to read. Even if you’re not dealing with colors, it’s easier to think in base-16 than 32, since base-16 only adds 6 more digit characters beyond the familiar base-10. Most everyone knows that F is the 6th letter in the alphabet (and thus F in hex converts to 9+6=15 in decimal), but very few people would know from the top of their head that, e.g. S is the 19th (and thus would represent a decimal value of 28).

Edit: dumb math mistakes pointed out by u/jackerhack

Answer 4 · 2024-03-11T08:41:58+00:00

The point of using hex isn’t compactness, it’s human readability. Inside the computer, it’s all binary anyway. Octal was also used in the old days (and is still often supported), but ever since the word sizes got standardized to 8 bits and multiples thereof, hex has most often struck the right balance between compactness and readability.

Answer 5 · 2024-03-11T09:12:13+00:00

Base16 means that you can encode one byte in two symbols.

Each symbol represents half a byte or 4 Bit

Base32 would allow for 5 bits per symbol, but that wouldn’t clearly map onto bytes. You would be able to represent the contents of 5 bytes in 8 symbols instead of the just four you get with hex, but the easy ability to read the contents of a byte would be lost.

A small increase in the data you can represent with human readable symbols at the cost of actual readability.

Not to mention that there are very few occasions where you need to represent 5 bytes worth of data. mostly it is multiples of two like 2, 4 or 6. Occasionally 3 for colors etc but never 5.

So the one character you would save would still be wasted much of the time.

You would be able to represent a 24-bit color in 5 characters instead of 6 but you wouldn’t be able to extract the amounts of red green and blue just by looking at it.

It would be very little benefit at too great a cost.

For the rare occasions where space is much, much more important than readability, Base64 is preferable.

There are enough printable characters even in the most restrictive environments that can be sure to work everywhere to reach 64 possible symbols.

This allows you to encode 6-bits of information per character. It is not very readable, but useful in situations where you can use the full 8-bit (actually 7-Bit) per character that ASCII has.

For example if you want to put an ID into an URL, space is obviously a premium and you are very limited with the characters you can use there, but 64 is doable. This is how Youtube IDs work for example.

Also Mime encoding for emails uses Base64.

Answer 6 · 2024-03-11T09:37:13+00:00

It’s primarily to do with making data human readable in a practical way. Hex makes it fairly easy to know the underlying binary states, as it’s easy to work with sums of 1, 2, 4, and 8. Base-32 would just be more complex to understand.
Also, many processors work on 8, 16, 32, or 64 bit values, so 4-bit nibbles divides into those well for the purposes of presentation

Answer 7 · 2024-03-11T10:41:37+00:00

Warning: Fuzzy memory below. For the source of everything I reference below, here is (I think) [the video](https://youtu.be/thrx3SBEpL8) I keep referencing

I watched a Computerphile episode that delved deep into the history of memory addressing and the choice of what numerical base to represent everything. In short, we think of base-10 because we have 10 fingers and 10 toes, but it really sucks trying to fit it to computer logic due to no great analog to base-10. If memory serves, thermionic valves were originally implemented using a base-5 numbering system, but they revisited the decision when implementing transistors as silicon, and went with a binary (base-2) representation. This was in part because the whole system was built on the idea of on/off or high/low voltages.

Again, really pulling from the deep memory here, but I believe in that same episode, Professor Brailsford mentioned that the origin of calling them bits (short for binary digits) was also the creation of the new term “digital” since it was the end of analog computing.

AnswerCult

why did computer science settle on base16 for compact binary representation over the doubly compact base32?

7 Answers

Search questions

Popular Questions

Latest Answers