How did new large language models come out so quickly after chatgpt was unveiled?

745 views

How does ‘cutting edge’ technology suddenly get implemented by other parties soon after new technology is unveiled…for example. chatgpt came out as an amazing new technology that was like nothing we ever saw, and then right after, we hear that other parties are coming up with their own version of chatgpt…If the technology was so secret, then how come everyone else just started coming up with their own version so soon?

In: 506

18 Answers

Anonymous 0 Comments

OpenAI’s models are priviate, but the methods for creating a large language model are mostly public. The challange isn’t inventing the method; the challenge is training it and running it.

To do that, you need lots of high-powered computers, skilled programmers, and *tons and tons* of data. Those are all things that companies like Google, Facebook/Meta, Snapchat, Microsoft, etc. all have access to already. Once it became apparent to them that demand for LLMs was high, it was just a matter of shuffling their resources around to make it happen.

Also, a lot of AI — such as Bing Chat — are actually just GPT with pre-prompts or minor tweaks.

Anonymous 0 Comments

They started around the same time. It takes a while to train a model, so when these large language models became possible, everyone started, and chatGPT was unveiled first as openAI has some of the most resources invested

Anonymous 0 Comments

i’m going to guess a bunch of small tweaks (creative) that worked well on similar-related language tasks.

Anonymous 0 Comments

Google has actually had Bard ready for a few years, but wasn’t willing to risk reputational damage (in the unveiling, Bard gave an incorrect answer to a question and it knocked billions off their market cap) but once there was competition they had to do it.

Anonymous 0 Comments

History is full of people inventing the same things at the same time. Isaac Newton and Gottfried Leibniz invented (discovered?) calculus independently around the same time. It’s because the ideas, math, techniques and technology behind the invention are ready and sharp minds see the possibilities.

So, don’t hoard that idea you have. Someone else will get to it. Get started now even if you’re not ready.

Anonymous 0 Comments

Those technologies are never « new », because R&D are always working on it. They’re just « ready » at a certain point in time. Also, even if everybody has their own product ready, it may be a good idea to let a competitor take the risk first, see how the public react and then launch your own product taking into account the feedback. If you haven’t noticed, exactly the same thing happened with multi-blades razors. Once the first 2-blades came out, every-body had their own 2-blades ready. Then 3-blades, 4…(are we at 5 now?). You may think « surely, 4 blades is out-of-the-box thinking », but I guarantee you these people have been experimenting with 3-10 blades from the moment 2 blades became popular, and they’re just waiting for the market to be ready.

Anonymous 0 Comments

TL;DR: It was never secret, just expensive, the “hype” pushed academia and business to pour money into this.

So, this turned out to not really be a ELI-5, but more like ELI-CS-Undergrad, but here goes:

First, to get some definitions out of the way

* Model architecture – the complete description of how a model is built (e.g. a picture of how the data flows through it input to output). Having the architecture will allow you to *train* the model.
* Model weights (aka model parameters) – specific numbers that control how the model changes the input data into the output. Having the weights (and the architecture) allows you to *use* the model.
* Training – setting the model parameters in such a way that the model fits the data you have well (for generative LLMs this means predicting the next word in the text well).

Watch [this series](https://www.youtube.com/watch?v=aircAruvnKk) from 3b1b for a much better introduction than I could ever do. With that out of the way, back to LLMs.

The underlying technology behind ChatGPT, GPT in general, and other language models (including [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)) and friends) has been public for years / decades. They are based on the transformer architecture ([2017](https://arxiv.org/abs/1706.03762)), which is based on the attention mechanism ([2015](https://arxiv.org/abs/1409.0473)), which itself was designed to alleviate some problems with training [LSTMs](https://en.wikipedia.org/wiki/Long_short-term_memory) ([1997](https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory?redirectedFrom=fulltext)) and other such models (e.g. [GRUs](https://en.wikipedia.org/wiki/Gated_recurrent_unit)).

ChatGPT itself was not really secret technology either, it (was initially) based on GPT-3.5, which is essentially GPT-3 with some inference and tuning tricks built in. [GPT-1](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf), [GPT-2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GPT-3](https://arxiv.org/abs/2005.14165) all have their architectures public. We have model weights (essentially, the stuff we need to run the models) public, we do not have it for GPT-3 or GPT-4.

We do not have architecture for GPT-4 public, and while it kinda does make it secret, there have been [leaks](https://twitter.com/swyx/status/1671272883379908608) saying that it’s an [MoE model](https://machinelearningmastery.com/mixture-of-experts/) with GPT-3 like transformers as experts.

Other open source models, like [Falcon](https://arxiv.org/abs/2306.01116) (this mostly describes training, but publishing weights implies publishing architecture) and [LLaMA](https://arxiv.org/abs/2302.13971), also have both their architectures and weights public, and have performance comparable to GPT-3.5.

Basically, the challenge here was not the technology, but amassing enough data and computational power to train these models. With compute becoming cheaper over time (especially after the chip shortage ended), and with people creating and sharing these extremely large datasets ([here’s](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) the dataset for Falcon-180B for example), it became easier for people with money to train these models. The incentives here are also quite clear, it brings fame for academics, and the ability to control your own models for companies, so both are ready to pay / work on them.

Additionally, fine-tuning these models (basically making them tailored to a specific) task is possible on commodity hardware now, so there will be more model variants popping up all over the world.

Anonymous 0 Comments

Long story short: Everyone was working on their models. Once OpenAI released a model that looked like it was actually working, everyone panicked and released their equally half baked versions. Turns out OpenAI’s really isn’t that good after all, and neither was the competition.

Anonymous 0 Comments

I chose my undergrad dissertation on the subject a few months before chat-gpt was released. For the most part, the technology is a few years old. June 2017 is when the most important, general paper was published

Anonymous 0 Comments

As others have noted, the technology is mostly not secret. If you want a good introduction, see [The generative AI revolution has begun—how did we get here?](https://arstechnica.com/gadgets/2023/01/the-generative-ai-revolution-has-begun-how-did-we-get-here/)