AnswerCult

Question

98 viewsDecember 31, 2023

Question 91.53K September 23, 2023 0 Comments

I’m currently diving into the world of machine learning and transformers, and I’m trying to wrap my head around the concept of “attention” in transformer models. I’ve been reading papers and documentation, but I’m still struggling to fully grasp it.

**My Struggle:**

I get that attention involves multiplying “query” and “key” vectors to determine the importance of different words in a sequence, but I don’t quite understand why this multiplication gives us a meaningful metric for importance.

**What I’m looking for:**

I’m comfortable with moderate level technicalities but require a deeper insight into the inner workings and rationale behind these mechanisms. Please share any insights, analogies, or technical details that can shed light on this concept.

Thanks a bunch!

In: 0

1 Answer

Answer 1 · 2023-09-23T16:02:11+00:00

You may have more luck on an ML specific subreddit. This is an extremely technical question that requires specialist knowledge even to understand what you’re asking.

AnswerCult

How does ‘self-attention’ work in transformer models?

1 Answer

Search questions

Popular Questions

Latest Answers