How did new large language models come out so quickly after chatgpt was unveiled?

469 views

How does ‘cutting edge’ technology suddenly get implemented by other parties soon after new technology is unveiled…for example. chatgpt came out as an amazing new technology that was like nothing we ever saw, and then right after, we hear that other parties are coming up with their own version of chatgpt…If the technology was so secret, then how come everyone else just started coming up with their own version so soon?

In: 506

18 Answers

Anonymous 0 Comments

One of the most fascinating things about a large number of inventions throughough history seem to have been invented in parallel, often by people who didn’t know of each other’s work.

It’s the meaning behind the phrase “Every idea has its day”, the fact that when conditions are right many people seem to have the same idea at the same time (or manage to achieve an existing idea, at the same time).

Anonymous 0 Comments

The data has been around for a while. Every search query on google for text, and for natural language, there is a reason google voice was always free

Anonymous 0 Comments

like you are five?

Imagine you see someone build a really cool sandcastle at the beach, and all the adults are clapping around the castle. You already have buckets and shovels and you’ve made small sandcastles before. So, you quickly focus on building your own cool sandcastle, maybe adding a moat or a tower to make it special. You need to use your own buckets but as long as you have enough sand in your parcel, you can do a big castle. Big companies did the same thing; they saw ChatGPT and quickly made their own versions because they already had the tools and enough data.
In this scenario, tools are accesible but sand is not, the same way that the techniques are open but the data is not.

Anonymous 0 Comments

There was a key paper around attention released in 2017. Basically everyone apart from Google realised it was a big thing. So lots of companies started building LLM of a similar type after that paper.

>Attention Is All You Need
>
>https://arxiv.org/abs/1706.03762

Anonymous 0 Comments

I think everyone covers a point, but I want to add that Facebook had their LLM leaked to the public and open source community made a great deal of improvements on it which kind of sped some things up.

Anonymous 0 Comments

The general algorithm isn’t super hard. The trick is spending a butt ton of money to run it on a huge dataset. No one was bothering until Chatgpt made it a big deal then everyone going “oh, yeah, let’s spend the money”

Anonymous 0 Comments

It is simply that Research is always years ahead of any product for consumers.

OpenAI invested tons of money on making chatgpt make free predictions. This costs a lot of server infrastructure but is such good marketing they risked it and it paid off. Other companies already had research llms and just ran to release the “consumer friendly” versions.

Anonymous 0 Comments

This is a really common phenomenon.

Someone invented a telephone? Great! Turns out a dozen people were all inventing the telephone at the same time. Same as most other breakthroughs, they are just the *final* piece of the puzzle. The inventor gets the credit, but they are standing on the shoulders of ever growing giants. That giant grows & suddenly anyone looking can see something new on the horizon.

It’s truly rare for anything revolutionary to come out of left field, what generally happens is lots pieces fall into place & something that was impossible (or impossibly expensive) suddenly becomes viable.

For LLM it was a lot of factors, of which an important one was GPUs. They were designed for games & that industry funded them for 30 years. They do lots of small operations in parallel with insane throughput & eventually it reached critical mass which enabled lots of new technology.

Once we had this cheap, fast & specialized compute we found lots of cool things to do with it.

… The thing that bums me out is that the math used for raytracing doesn’t seem to have any useful applications outside of ray & wavetracing. There is a rumor AMD has an algorithm that uses massive multiplication matrixes which *is* useful math & *may* build out silicon to perform them.