Vasu’s Newsletter • 78 implied HN points • 25 Jan 26
- Each token creates query, key, and value vectors so it can ask what it needs, match that against other tokens, and gather useful information.
- Tokens compare their query to every key to get raw scores, convert those scores to attention weights with softmax, and use the weights to take a weighted sum of value vectors to produce a new contextual vector.
- Self-attention makes token meanings contextual (helping with pronouns, disambiguation, and long-range links), and models use multiple attention heads plus feed-forward layers to capture different relation patterns and refine each token's representation.