As others have said, we can only speculate because there are no records of the original process, but we can see analogies in the modern day, both in humans and in animals.
*All* language is about communication, and communication is fundamentally a two-party thing. If the listener (or reader or whatever) isn’t able to understand (or at least *guess*) your meaning, you need to adapt to that. At the same time, the listener is trying to discern your intended meaning in the face of innumerable ambiguities. The truth of this isn’t really affected by how much shared language you already have, just the amount of ambiguity that needs to be clarified (and inversely how much meaning gets successfully communicated through those ambiguities)
Whatever sounds (or actions or facial expressions etc.) you start with, providing the meaning is effectively communicated, they can be re-used next time without having to go through the whole rigmarole of working out what you meant. Of course, they may be abbreviated or modified to save time and strain on the vocal chords or whatever. Miaow replaces “myEAAAHwwrr!” or whatever, ditto words like “thud” or “clap”. This is similar to what happens to loan words where the donor language uses sounds not generally used in the recipient language (so anglophones rarely pronounce “Paris” with the guttural “R” of the original, for example) The most frequently used words are generally very short partly because we don’t have the patience to say “The Person That Is Talking” every time we want to say “I”.
The first “words” are likely to have been motivated by particularly important (or urgent) communication requirements, such as getting attention or warnings of danger. Use of sounds rather than, say, pointing means you don’t rely on them already looking at you, or even being in line of sight.
We see this in the surprisingly sophisticated vocabulary of some animals – they can have different alert calls for snakes vs. birds of prey, for example (which direction you look / run matters!) I would expect that a human (or proto-human) would have similarly used a variety of different sounds in the period before the use of more concrete “words” – perhaps some of our words are 1000th generation descendents of such sounds…
Referring to various objects (or actions) is a little less “urgent” but still an extremely powerful improvement over no language at all. Onomatopoeia* probably formed a key part of very early language development.
If you want to refer to something that makes a sound of its own, it’s very easy to just use the sound it makes and that’s easier for a random second caveman to guess the meaning of than some randomly chosen sound that needs “explaining” by pointing or miming or whatever. E.g. In Mandarin the word for “cat” is pretty much a miaow, “Māo” (as in “Mao Zedong”, making the “[Chairman Miaow](https://images-na.ssl-images-amazon.com/images/I/512FB96B5XL.jpg)” jokes amusingly circular :-P)
The sounds made by objects are also affected by what’s being done with them, so you start to have ways to indicate actions. With nouns and verbs, you’ve got a pretty good start on “language”.
Referring to, say, a stone by the “tok tok” sound it makes when smacking it against another is a pretty simple way to get started here. Perhaps you could use a deeper version of the sound to indicate a larger rock. Perhaps a “shik shik” sound to indicate a sharp rock used for scraping, or a “scrunch!” sound for when it’s used to stove in the skull of a hapless meal.
Less obvious forms of onomatopoeia can also be seen, where a sound that represents some *non sound* characteristic of a thing. See the [https://en.wikipedia.org/wiki/Bouba/kiki_effect effect](https://en.wikipedia.org/wiki/Bouba/kiki_effect), for example. The “sharpness” of a sound is near universally perceived to correspond to the “sharpness” of a shape. I imagine a lot of early words would be formed with reference to this kind of pattern.
“[Metonymy](https://en.wikipedia.org/wiki/Metonymy)” is another multiplier of vocabulary, perhaps the different shapes of rocks used for bashing seeds vs. skinning a wildebeest vs. stoving in its skull so it objects less to the skinning could be referred to with sounds that previously indicated the actions, rather than the tool.
If you want to refer to something that doesn’t have a word, but you see some kind of similarity with something you *do* have a word for, you can re-use that word by *metaphor*. Perhaps you use the sound for “rock” to indicate a particularly hard stick, and your co-cavemen realise you don’t actually mean a “rock” but something else *similar*.
Some of these words will be modified to distinguish between different uses, perhaps combining it with another word – a sound indicating “big” and a sound representing a “cat” might be used together to indicate a “lion”.
If you’re trying to communicate with somebody and you don’t share a language, you’ll go through all the same processes except for the initial invention of sounds to use (you’ll start with your own language’s words) but with a little imagination / pantomiming / pointing etc. you’ll soon start to grasp the meaning of words from the other’s language. Eventually, you’ll both be able to communicate by using whatever bits of each other’s language you’ve managed to learn the meaning of, and a shared repertoire of sounds you both realise are likely to be understood, based on the other’s response to your attempts to use them.
If you want to try a fun experiment, hop over to some random part of the world where nobody speaks your language (and you don’t speak any of theirs) and try to communicate, but refuse to use any English at all. Make up some sounds and use them instead. I’d wager that (with sufficiently kind-hearted and patient locals) you’ll be able to communicate without using either your own language *or* the local language. Congratulations, you are now a caveman 😛
* *so* proud I spelled that right first attempt!
Latest Answers