Like the title says, how have we gone from ChatGPT being the apex of it’s type (and therefore presumably very complex and rare) to seeing so many clones in sl short a time? Wouldn’t the code/ system that makes up ChatGPT be in copyright or the code be difficult to mimic, wouldn’t the amount of data scraping take forever?
In: Technology
> Wouldn’t the code/ system that makes up ChatGPT be in copyright or the code be difficult to mimic
The code is surprisingly simple. OpenAI is a non-profit research group (now with a for-profit arm), and they’ve [published](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) all the initial research that went into GPT. Nowadays you can create an LLM in just a [couple hundreds of lines of code](https://github.com/karpathy/nanoGPT/blob/master/train.py).
What’s prohibitive about creating them is the training data and compute resources. GPT 4 cost more than $100 million to train. Getting models that high in quality is really only doable with Microsoft or Google money, but everything’s accessible enough that *anytime* can create something that works *reasonably* well for specific uses.
They let anybody build on top of ChatGPT.
Imagine a blank t-shirt store opened their doors and said “anybody who wants to buy our blank t-shirts can draw whatever they want on them and do what they wish with them”.
The underlying t-shirt doesn’t change, but whatever goes on top of it is up to any developer.
ChatGPT and other AI models are the results of programs. People have been researching how to best write programs that make AI models for a long time, with some nice breakthroughs in the last decade.
Yet, there is nothing magical about ChatGPT other than that we finally got over a threshold. Using the same AI training on a computer that’s 5 years older would produce an unusable model that barely manages to string 5 words together. Not because the older computer does something different, but because it is slower.
Note that on the scale of computing power that is needed for an AI, even small differences can make a huge difference in the outcome. When training a model takes several months, doubling the compute power by letting it train twice the time isn’t that easy.
The same goes double for the amount of training data that can be used. Getting a 50 TB SSD nowadays is expensive. Getting it 20 years ago was impossible. Even collecting all the data got easier.
And it goes triple for the size of the model. You can’t train a model as big as ChatGPT on a GPU with 8 GB RAM. So the available training hardware had an effect on the size of the models, and bigger models unsurprisingly work better (brain size matters. try asking a mouse about it!).
—
So, in short: We got to the point where we have the hardware to train a big enough model with enough data for long enough to get a good result.
Additionally, models don’t scale linearly but appear to have thresholds where they suddenly get better.
It’s relatively simple technology. GPT architecture was created in Google and published in 2018, another paper that improved this tech is called “attention is all you need” was a breakthrough and also was published. Facebook launched it open source models LLaMa right after ChatGPT surge. Everything else is training, data collection, fine-tuning. It’s expensive, but nothing complex here
Latest Answers