how does a transformer in machine learning work?

152 views

i know that answer in progress on youtube has a really great explanation with an example (the ‘I taught an AI to make pasta’ video); but i’d like to hear a few more. i’ve tread researching, but they all use very technical terms 😅.

In: 1

Anonymous 0 Comments

The math and the structure of the model is complicated and not really good for a meaningful ELI5 answer.

Transformers are a type of model for dealing with sequential data. The quintessential example is text. If you’re trying to predict what word comes next in a sentence, your model is, in one form or another, using information from the prior words to try and predict the next one.

Before transformers we often used recurrent neural networks for tasks like this. These models literally go through and sequentially predict each subsequent word. This makes them pretty slow and computationally expensive. They also struggle to make connections between words far apart in a sentence.

Transformers handle the same task by processing all the information at once. Basically, when trying to predict a word, you use information from all previous states rather than just the one immediately prior (some forms of RNNs carry some information from prior states, but not in the same way and without preserving all of the information).

Think of the text, “Lisa had to do a bunch of stuff and, oh my god, it was just so much. I just can’t believe how much they made her do.” To predict “her” the key word to analyze is really “Lisa” (so you know the gender). In an RNN the information from the word “Lisa” would still, technically, be in the model when trying to predict what word should go after “made” but it would be seriously diluted, for lack of a better term.

You are viewing 1 out of 1 answers, click here to view all answers.