How image generation models are trained? Why they understand prompt words and can generate the images from texts.

186 views

How image generation models are trained? Why they understand prompt words and can generate the images from texts.

In: 5

5 Answers

Anonymous 0 Comments

I’m going to avoid going into detail about Neural Networks (the innards, so to speak) as it has already been [covered below](https://www.reddit.com/r/explainlikeimfive/comments/1428yqy/comment/jn3nfg3/?utm_source=reddit&utm_medium=web2x&context=3).

There are multiple techniques, but all of them require a human to provide context and meaning first. The computer needs to be able to start making relationships between shapes, shadows and detail with the descriptions of the image. That’s a significant amount of data to obtain because your sample size has to be huge to get passable results.

So how do we get those descriptions? Well, we can either farm them ourselves, buy them from a private data seller, hire a bunch of people to do it in house, or just flat out use Google Images. Algorithms get data from us all of the time on how to better improve their optical skills.

You know those “prove your not a robot” tests where you have to select all squares with a bicycle or traffic light? Well, the image you got was probably low-quality and grainy to mitigate automation, but it was selected because the algorithm has a degree of uncertainty about it and it wants human input.

Google knows where it thinks the bike is, but its not 100% confident. You are confirming it by clicking on the correct squares. This feeds back into a large data set of other people’s answers… here, the algorithm can now either confirm or deny the accuracy of algorithm and evolve as needed. Every time you fail this test, the algorithm still learns – you’ve just told it to be more uncertain about that image.

In practice though, this topic can be way more convoluted and complex.

You are viewing 1 out of 5 answers, click here to view all answers.