What is Attention Mechanism
Mechanism for neural network to focus on important input parts
Attention Mechanism
Attention Mechanism — a key component of modern neural networks that allows the model to dynamically weigh the importance of different parts of input data.
How It Works
- Computing attention weights for each element
- Query, Key, Value — three computation components
- Weighted sum of values by importance
- Allows model to "look at" relevant parts
Types of Attention
| Type | Description | |------|-------------| | Self-Attention | Attention within single sequence | | Cross-Attention | Attention between different sequences | | Multi-Head | Multiple parallel attention heads | | Sparse Attention | Optimized sparse attention |
Applications
- NLP — machine translation, GPT, BERT
- Computer Vision — Vision Transformer (ViT)
- Multimodal models — CLIP, DALL-E
- Recommendation systems — personalization
Self-Attention Formula
Attention(Q, K, V) = softmax(QK^T / √d_k) × V
Advantages
- Capturing long-range dependencies
- Computation parallelization
- Interpretability through attention weights