Saturday, January 24, 2026

Self Attention

 

x → Embedding → MultiHeadAttention → Concat → Project to lower dim

→ Add(x) → LayerNorm → FFN → Add → LayerNorm



Vocab to embedding




























torch.nn.embedding(Vocab, embed_dim) 


Batch X Seq Len X Vocab  → Batch X Seq Len X embed_dim


PE = Batch X Seq Len X embed_dim



Self Attention- K, Q matrix, & Attention weight Score


Self Attention- Attention weight softmax example



Self Attention- Attention weighted features


Linear Attention


No comments:

Post a Comment

Self Attention

  x → Embedding → MultiHeadAttention → Concat → Project to lower dim → → Add(x) → LayerNorm → FFN → Add → LayerNorm Vocab to embedding t...