How did ASCII assign letters a specific number? How did that system came to be? How did early computers adapt to it?

404 views

For example: how was the letter A given the binary code of “01000001”? (I really don’t know anything about this but I’m interested)

In: 12

8 Answers

Anonymous 0 Comments

Surprisingly there is a Wikipedia page called … wait for it … [*ASCII*](https://en.wikipedia.org/wiki/ASCII)!

Anonymous 0 Comments

They had to represent characters somehow. And binary was what they needed to do it in, because binary is how computers work.

You can represent 2 unique things with 1 bit: 0 and 1. You can represent 4 with 2 bits: 00, 01, 10, and 11. You can represent 2^N (“2 to the power of N”) things with N bits.

So they counted up the number of characters in English, including lower case, upper case, numbers, punctuation, and a number of special other characters. It turns out that number was greater than 2^7 (= 128) and less than 2^8 (= 256). So 8 bits it was. And they called that a byte.

So they could represent 256 things in 1 byte. And then they just started assigning numbers to the characters. There was some reason to it, for instance “B” is one greater than “A”. For obvious reasons. But otherwise, they just decided. Chose a number. No science behind that.

And then they designed computers to expect that. When you type an “A” and the keyboard sends “01000001”, the computer sees that as “A” because there’s literally a chip in the computer which knows that 01000001 is an A. No magic. Just hardware designed to match the ASCII table, which someone made by assigning characters to numbers.

Anonymous 0 Comments

I don’ know, but its the bain of my existences incorporating the old ASCII with newer technology

Anonymous 0 Comments

> How did ASCII assign letters a specific number? How did that system came to be?

The answer to this isn’t really that interesting. ASCII is a Camel Standard (a camel being, as everyone knows, a horse designed by committee): Before ASCII every computer manufacturer sort of did their own thing for character codes, which made interoperability a pain in the ass, so a bunch of groups came together and created an “American” (US) standard.
They standardized the first 7 bits of ASCII, the first 32 values (0-31) being control characters (things like BEL which is the terminal bell, now a beep), and the remainder being the printable alphabet, numbers, spaces, and the most common punctuation. Basically all the stuff you see on a keyboard.

Letters and numbers were obviously assigned in their “traditional” order (A-Z, a-z, 0-9) and punctuation is mostly in one series.

> How did early computers adapt to it?

Badly, at first. In fact until 8-bit ASCII was standardized there were a bunch of “extended ASCII” systems (for example [PETSCII](https://en.wikipedia.org/wiki/PETSCII) used by Commodore computers), and you’ll notice parts of that don’t match the 1963 ASCII table it was derived from. The printable stuff is more-or-less the same, but the space between upper-case Z and lower-case a differs and a bunch of the control codes are replaced with Commodore-specific function codes.

As more software was written using and expecting ASCII-conforming data though it became not just a published US standard but a de facto standard for interoperability.

It’s not the only one though – [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC) is still around (and is not compatible with ASCII). [JIS X 0201](https://en.wikipedia.org/wiki/JIS_X_0201) was commonly used in Japan and is compatible with printable US-ASCII but differs significantly in the extended (8-bit) range.

Today Unicode (specifically variable-length utf-8) has largely replaced ASCII by consuming it (the first 7 bits of utf-8 are the 7 bits of standard ASCII, including all the control characters) and allowing it to be extended to encompass many other languages and graphical characters.
As with ASCII there was a period where Unicode adoption and interoperability was pretty awful – competing implementations like utf-16 and utf-32 were tried but largely failed to gain popularity because they were not ASCII-compatible.
Even today some software *assumes* ASCII encoding and fails to process Unícödē Characters ☹️.

Anonymous 0 Comments

ASCII didn’t come out of nowhere. Before ASCII there was Baudot, and before Baudot there was morse. In Morse code, a sequence of long and short (dots and dashes) 1 and 0 signals (sound or silence, voltage or not voltage, or whatever), allowed a human operator with an electromechanical switch and sound making device to send and receive numbers and letters digitally (ie on a wire or radio signal that is either “on” or “off”. The rate of morse is limited to how fast a human operator can accurately encode and decode.

When typewriters became common, the idea was that instead of a human morse key operator, a machine could be produced that would automatically generate a digital encoding of the key press on a typewriter, and send that to a remote typewriter, so that when I press a key here, the typewriter there types the letter. Baudot was designed with typewriters in mind, so contained the keys on a typewriter, and the necessary control signals (like carriage return and line feed). Different languages had standardised on different typewriters to reflect things like accented letters and the different frequency of letters in the language (eg the French AZERTY layout rather than the English QWERTY layout). Baudot is a 5 bit digital encoding, and in addition to immediate typing and printing, it was possible to punch the code as holes in a paper tape, that could be read electromechanically to allow storage and sending/receiving at a faster rate (the data connection could support a faster speed than a human typist, so having multiple human typists typing to paper tape, and then sending them at the maximum rate enabled a higher throughput.

When computers reached the level of development that text input/output was feasible, the idea was landed upon of using a teletype as an interface. Rather than typing here and printing there, a person could type here, and the digitally encoded text would go to the computer. The computer output would then be sent back to the teletype as a digital signal, which would be printed in the same way as a conventional teletype. This use is why on Unix (and Linux) systems, the serial port is referred to as TTY (for teletype).

There were a couple of problems with using Baudot for computers, though. First, different languages had different encoding standards. Second, to use a computer, you want to be able to send control signals to make the computer do things that are not needed for simple sending and receiving text. Also there are symbols that are useful for computers that are not really useful for simple text.

The idea of ASCII was to create something that was based on the same concept of a digital signal that could be generated by a keyboard, sent to a computer, and the output sent to a typewriter type printer, that would actually be designed with computer use in mind. Baudot was only 5 bit, and that didn’t allow enough different symbols for all the uses computer users wanted, so ASCII was set at 7 bits. That allowed more characters (for example Baudot generally doesn’t have separate upper and lower case letters, ASCII does), as well as the extra control signals. In the same way that upper case letters are produced by holding down the shift key and typing the key for a lower case letter, the control signals are produced by holding down a “control” key and typing the relevant letter. That’s the origin of the CTRL key on a modern computer keyboard.

Anonymous 0 Comments

> For example: how was the letter A given the binary code of “01000001”

There’s actually a lot of thought that went into making ‘A’ =(0)1000001.

So to start off, you need 5 bits to represent the alphabet. 5 bits gives us 32 possible combinations. We don’t need 32, but 4 bits gives us 16, which is not enough. We need more than 16, so we get 32.

The next step is surprisingly intuitive. in this 5-bit set, A becomes 00001 because it’s the first letter of the alphabet. B becomes 00010 because it’s the second, etc.

Next we’ll build up another 5-bit set. Why? We’ve assigned A-Z, next we want to assign a-z.

‘a’ is still the first letter of the alphabet, so in this second set, we’ll assign a=1, b=2, etc.

The last one I’ll hit on is digits. We’re going to need us some digits. There’s only 10 digits, so we can do this in a 4-bit set, 0000-1111. 0=0000, 1=0001, etc because, well, 0001=1, it totally makes sense.

But because we’re building this system out of 5-bit sets, we’ll drop that 4 bit set inside a 5-bit set, it’ll be fine.

In its original 7-bit form, you can read it as two bits for ‘set’, and 5 bits for value. Essentially there’s a control set, a characters (and digits) set, an upper-case set, and a lower-case set.

(Plus a whole bunch of things that are stuffed in the gaps, because 26 letters doesn’t need 32 positions, and we really couldn’t afford to waste any)

10 00001 = ‘A’, the first letter in the 2nd set.
11 00001 = ‘a’, the first letter in the 3rd set.

This kinda thing was really important back in the day, because computers didn’t have a lot of grunt to them at all – so it allowed a lot of shortcuts. You can convert text to upper-case by just unsetting the first bit, you don’t need to go through and say “if a, then A, if b, then B”. Similarly, you can convert the text “101” to the digits 1,0,1 by just unsetting the first three bits.

So for the specific example of why A is 01000001, it’s because A is the first letter of the alphabet, and (0)10 is the upper-case set.

Anonymous 0 Comments

To add to the other comments you may or may not be aware or interested that you can enter ascii codes directly on your keyboard. Hold down the alt key type 6 then 5 on the numeric key pad then let the alt key go and up pops an “A” in your document.

Anonymous 0 Comments

#ELI5

[You know what Morse Code is](https://www.youtube.com/watch?v=_J8YcQETyTw)?

For a while, we could only communicate long distances with Morse Code. Basically, a dude pressing a button that makes a sound. I could give it a long press or a short press. Beep bip bip beeeep!

How the heck do I communicate with someone on the other side, if all I’m sending is beeps?

Well, we’d have to get together and agree that “bip bip bip” means the letter S and “beeep beeep beeep” means the letter O.

So everyone who wanted to talk Morse Code had to agree on what the beeps mean.

You’ve heard it said many times that computers talk in 1s and 0s.

When all you can send is 1s and 0s, how the heck do you communicate with someone?

Well, just like Morse Code, people got together and agreed that 010010 means A and 101000 means B, and so on.

Bottom line, a bunch of computer people (manufacturers, software engineers, scientists) got together and proposed a STANDARD. Like a dictionary. Like a secret key decoder.

How was 010001 chosen as A? It doesn’t really matter, does it? It doesn’t really have to have any rhyme or reason, as long as everyone agrees to the standard. Right?

The standard was called [ASCII](https://en.wikipedia.org/wiki/ASCII). It stands for American **Standard Code for Information Interchange**. That makes sense, right? It’s a standard for communicating with 1s and 0s.

There’s some history at that link, but it’s kind of boring.

It’s not the only standard, by far! There’s others, like [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC), and [Unicode](https://en.wikipedia.org/wiki/Unicode).

Both parties communicating have to agree on which standard they want to use.