Eli5 Why is DALL-E special, and how do we know it isn’t just doing a Google image search behind the scenes and mixing the results together?

21 views
0

Eli5 Why is DALL-E special, and how do we know it isn’t just doing a Google image search behind the scenes and mixing the results together?

In: 0

We know it isn’t doing a Google Search for a very simple reason: Some search terms are yielding vastly different results from a Google Search, sometimes to the point of missing the mark completely. I have met this while using video game characters that, while they are from less known games, were still the main character of their game, to a point enough that without naming the game in the Google Search, it hit the right result, but Dall-E returned something completely different. Try Google Searching “Lotus Real Form”, then try those terms in Dall-E. Google will likely prioritize the Warframe character, while Dall-E will give you a flower.

This is because Dall-E is using contents that was taken off the internet, given specific labels for Dall-E to search through, and told that the goal has to resemble a related object per distinct noun in the search (or at least, this is what it landed upon as a result, as that is what my testing of its limits have shown ***me***).

The artwork is generally uniquely distinct from Google Image Search, too, where it is often not just a “remix” of a few images, tried 9 times from a limited set. Rather, it seems to try generating a new image by basing the components on existing entries in its database, and using copy/paste-like functions as little as possible. It also seems to generate the images upon request, rather than storing preset stuff for the popular stuff, as I have tried “Mario Jumping Around” 3 times, with a couple of hours between each, and have received 27 different images, or at least different enough that I couldn’t pinpoint saved images.

Now, though, there is a word I’ve been using this whole time when talking about Dall-E’s actual actions (and not how it was trained). A word you may have noticed, and been curious about, but also a word that you may have ignored. ***”Seems”.***

Dall-E is an AI platform, and AI platforms are trained towards a goal that we, humans, seem to think is objective, and unbiased… But give 10 machines the same starting code, and the same set of data to train upon, and the same “objective” goal, and you’ll end up with 10 vastly different AI. Dall-E was likely the end result of many different “generations” of AI, using controlled and highly subjective results as ‘the best one’. The simplest way to explain is this:

Give 10 machines, with the same starting AI code, access to the same set of data, the same goal, and the same time to get to that goal. This is Generation 1 (G1, for short). At the end of the time period, you ask each G1 machine to produce something, the same for all 10, and you pick what is closer to what you want, and that becomes your base for the next lot.

You take that “winning” G1 machine’s code, and use that as the new starting AI code, and put that on all 10 machines, and repeat the training: Same data set, same goal, same training time. This is Generation 2 (G2). Ay the end of the period, you repeat the test that you did at the end of G1, and pick a new winner.

Repeat until it is close enough, or highly proficient, at the goal you had in mind. This process may take thousands of steps, using thousands of machines per Generation, though I suspect that to keep it manageable, the number of machines per Generation was below 100.

By G2, we already have no idea what the code does, since the code has been modified essentially by itself in ways that we can’t always track back properly or easily.

Now, why is this important to understand? Because art, like what is generated by Dall-E, is highly subjective. To train an AI to be able to roughly guesstimate what is artistically passable (or better), and able to be presented as “art”, is giving an AI the ability to be “objectively subjective”, a concept that, in AI, tends to not exist. Biases are a thing, but on the overall, ***Dall-E manages to provide what is subjectively artistic.*** Now, it may be displeasing art. It may reek of abstract “at best”. Or it could simply be splotches of color in the vague shape that you are seeking. But it is art, with a decent skill at it, nonetheless.

I wrote [this post](https://www.reddit.com/r/bigsleep/comments/u08sjh/how_openais_dalle_2_works_explained_at_the_level/) about how DALL-E 2 works technically, intended for the layperson; included is a link from one of the co-creators of DALL-E 2.

[Here](https://www.reddit.com/r/dalle2/comments/v1sc2z/kermit_the_frog_through_the_ages/) is a post with 20 DALL-E 2 images of Kermit the Frog in the style of various movies, any of which you can try searching for evidence of previous existence using image search engines such as [these 4](https://www.bellingcat.com/resources/how-tos/2019/12/26/guide-to-using-reverse-image-search-for-investigations/).

0 views
0

Eli5 Why is DALL-E special, and how do we know it isn’t just doing a Google image search behind the scenes and mixing the results together?

In: 0

We know it isn’t doing a Google Search for a very simple reason: Some search terms are yielding vastly different results from a Google Search, sometimes to the point of missing the mark completely. I have met this while using video game characters that, while they are from less known games, were still the main character of their game, to a point enough that without naming the game in the Google Search, it hit the right result, but Dall-E returned something completely different. Try Google Searching “Lotus Real Form”, then try those terms in Dall-E. Google will likely prioritize the Warframe character, while Dall-E will give you a flower.

This is because Dall-E is using contents that was taken off the internet, given specific labels for Dall-E to search through, and told that the goal has to resemble a related object per distinct noun in the search (or at least, this is what it landed upon as a result, as that is what my testing of its limits have shown ***me***).

The artwork is generally uniquely distinct from Google Image Search, too, where it is often not just a “remix” of a few images, tried 9 times from a limited set. Rather, it seems to try generating a new image by basing the components on existing entries in its database, and using copy/paste-like functions as little as possible. It also seems to generate the images upon request, rather than storing preset stuff for the popular stuff, as I have tried “Mario Jumping Around” 3 times, with a couple of hours between each, and have received 27 different images, or at least different enough that I couldn’t pinpoint saved images.

Now, though, there is a word I’ve been using this whole time when talking about Dall-E’s actual actions (and not how it was trained). A word you may have noticed, and been curious about, but also a word that you may have ignored. ***”Seems”.***

Dall-E is an AI platform, and AI platforms are trained towards a goal that we, humans, seem to think is objective, and unbiased… But give 10 machines the same starting code, and the same set of data to train upon, and the same “objective” goal, and you’ll end up with 10 vastly different AI. Dall-E was likely the end result of many different “generations” of AI, using controlled and highly subjective results as ‘the best one’. The simplest way to explain is this:

Give 10 machines, with the same starting AI code, access to the same set of data, the same goal, and the same time to get to that goal. This is Generation 1 (G1, for short). At the end of the time period, you ask each G1 machine to produce something, the same for all 10, and you pick what is closer to what you want, and that becomes your base for the next lot.

You take that “winning” G1 machine’s code, and use that as the new starting AI code, and put that on all 10 machines, and repeat the training: Same data set, same goal, same training time. This is Generation 2 (G2). Ay the end of the period, you repeat the test that you did at the end of G1, and pick a new winner.

Repeat until it is close enough, or highly proficient, at the goal you had in mind. This process may take thousands of steps, using thousands of machines per Generation, though I suspect that to keep it manageable, the number of machines per Generation was below 100.

By G2, we already have no idea what the code does, since the code has been modified essentially by itself in ways that we can’t always track back properly or easily.

Now, why is this important to understand? Because art, like what is generated by Dall-E, is highly subjective. To train an AI to be able to roughly guesstimate what is artistically passable (or better), and able to be presented as “art”, is giving an AI the ability to be “objectively subjective”, a concept that, in AI, tends to not exist. Biases are a thing, but on the overall, ***Dall-E manages to provide what is subjectively artistic.*** Now, it may be displeasing art. It may reek of abstract “at best”. Or it could simply be splotches of color in the vague shape that you are seeking. But it is art, with a decent skill at it, nonetheless.

I wrote [this post](https://www.reddit.com/r/bigsleep/comments/u08sjh/how_openais_dalle_2_works_explained_at_the_level/) about how DALL-E 2 works technically, intended for the layperson; included is a link from one of the co-creators of DALL-E 2.

[Here](https://www.reddit.com/r/dalle2/comments/v1sc2z/kermit_the_frog_through_the_ages/) is a post with 20 DALL-E 2 images of Kermit the Frog in the style of various movies, any of which you can try searching for evidence of previous existence using image search engines such as [these 4](https://www.bellingcat.com/resources/how-tos/2019/12/26/guide-to-using-reverse-image-search-for-investigations/).