I’m learning the basics about computers and got to see about Unicode. Apparently it can be divided in 3 with UTF (Unicode Tranformation Format) which would be UTF 8, UTF 16 and UTF 32. I understand that each one has different value UTF 8 – 1B; UTF 16 – 2B; UTF 32 – 4B. But I don’t understand beyond how much space each one of them takes what’s the difference between one and the other?
Also, apologies if I got any concept wrong :$ Feel free to correct me if I did
In: Technology
Unicode assigns a number to every possible character. However, it doesn’t dictate how these numbers are represented as bits – that’s what UTF-8/16/32 do.
In UTF-8 the base unit is a single byte. Numbers from 0 to 127 are stored as a single byte, numbers from 128 to 2047 are stored using two bytes, numbers from 2048 to 65535 take 3 bytes, and from 65536 and above they take 4 bytes.
Meanwhile, in UTF-16 the base unit is 2 bytes, which means each character either takes 2 bytes or 4 bytes (the actual encoding is a bit more complicated than UTF-8).
Finally, UTF-32 is a fixed width encoding – every character is simply encodes using a 4-byte integer.
UTF-8 is backwards compatible with ASCII and is therefore most efficient if you’re mainly using Latin letters. UTF-16 is more efficient when you’re using multiple languages.
Latest Answers