Chomsky Hierarchy


I’ve never been able to wrap my head around the concept of the Chomsky Hierarchy. Whenever each level is explained, I just don’t see what it’s telling me. Can someone please?

In: 1

You can think of the Chomsky hierarchy as a classification of games for making sentences. Each game is as follows: you have symbols and text, and rules for rewriting something you’ve written into some other string of symbols and text. From a starting symbol, you apply the rules however you like until you end up with only text, and that is one of the sentences produced by this system (called a grammar).

For example, let’s say I have the symbols A and B, and the pieces of text are a and b. My starting symbol is A, and the rules are (1) A -> Ba, (2) B -> Ba, and (3) B -> b. So let’s try to make some sentences.

Ba (rule 1)
Baa (rule 2)
Baaa (rule 2)
baaa (rule 3, no more symbols so this is one of our sentences)

Here’s a slightly more complex set of rules: same symbols and text, but now the rules are (1) A -> aA, (2) A -> B, (3) aB -> b. I can do something like this

aA (rule 1)
aB (rule 2)
b (rule 3)

I hope these examples helped a little bit by showing you how broad the rules are – whenever you have a partially derived sentence, you can pick any bit of it that matches the left hand side of a rule, and transform it into the right hand side of a rule. It can take some work to figure out all the possible sentences that some set of rules can produce.

The Chomsky hierarchy is one way to classify sets of these rules depending on their relative power. I’ll try to give an intuitive explanation of each level based on [the wiki](, based on looking at these rules as capabilities that you have to construct sentences.

Type-3 grammars only know what’s going on at the rightmost end of your derivation. You could never use these, for example, to describe palindromes, because to build a palindrome from left to right you need to keep track of what was going on on the left hand side, and type-3 grammars only know about the current symbol. However, you can use them to produce strings that have a length that’s a multiple of 5, by keeping track of every chunk of 5 you create.

Type-2 grammars are more flexible – they know what’s going on at every symbol that hasn’t yet been turned into a piece of text, which can be on the right, left, or in the middle. You can use them to build a palindrome, by keeping your symbol in the middle and emitting mirrored text to either side (e.g. A -> aAa, B -> bBb, A -> B, B -> A). However, it still doesn’t know about stuff that’s happening far away, so you can’t use it to describe even something like abc, aabbcc, aaabbbccc, … – the sentences with three matching parts. This is because whatever you use to match up a’s and b’c will lose track of whatever you’re using to match up b’s and c’s.

Type-1 grammars are even more flexible – instead of only the symbol, the rule can also ‘look at’ some of the surrounding text and symbols. Since you can look at context, you can now actually do that aaabbbccc example above, by leaving behind context while matching up a and b that allows you to further match up c. It’s honestly somewhat hard, when you’re thinking of sentences, to make something that you can’t do with a type-1 grammar – however, note that you cannot delete any text with this grammar, so if you have short sentences which nevertheless take a huge amount of work to verify, you won’t be able to do this in a type-1 grammar. The most natural examples are often facts about arbitrarily large games (e.g. best first move on a chessboard of any size), and if you look at sentence production as a game, this includes facts about Chomsky grammars themselves (e.g. printing all the type-3 grammars that produce the same sentences can’t be done by a type-1 grammar).

Type-0 grammars have no restrictions, meaning they can also rewrite existing text. At this point you essentially have a computer – you have memory that you can access and change at will to produce some end result. Type-0 grammars are as powerful as any kind of computation we know about (at least if you don’t consider complexity – we don’t have anything that can produce more kinds of languages, but we do have systems that are more efficient at doing so). There are things type-0 grammars can’t do, and they are similar in nature to the things type-1 grammars can’t do. Type-1 grammars can’t print small sentences that take a huge amount of work, and type-0 grammars can’t print sentences that take an infinite amount of work. For example, while you can now solve the previous problem of printing all the type-3 grammars that produce the same sentences, the next step of this (all the type-2 grammars that produce the same sentences) can’t even be done here, because (very loosely speaking) there’s no theoretical shortcut to keeping track of the infinite strings that each grammar is producing.

It’s basically a way to classify formal languages (sets of strings) by their complexity.

The more complex a language is the more complex the pattern it can produce.

Regular language are the simplest ones (non-extended regular expressions), they are, among other things decidable: For any string, you can always tell whether it’s part of a given regular language or not. You can always tell with 100% certainty whether a string is accepted by a regular expression.

Context-free languages are a bit more complex, but they are still nicely behaved enough. Most actual programming languages are context free. Parsing them is easy enough and still decidable, while allowing enough of flexibility.

Context-sensitive languages are getting harder to parse, several programming languages and nearly all natural languages are context sensitive languages.

Recursively enumerable languages are the last layer of the hierarchy. Basically, you need a full on Turing machine to parse them. That includes the possibility of infinite loops.