What is Embeddings
Vector representations of data for ML
Embeddings — numerical vectors representing objects (words, images, users) in multidimensional space so that similar objects are located near each other.
Types of Embeddings
- Text — Word2Vec, GloVe, FastText, BERT embeddings
- Sentence — Sentence-BERT, Universal Sentence Encoder
- Image — ResNet features, CLIP embeddings
- User/product — for recommendation systems
- Graph — Node2Vec, GraphSAGE for network data
Key Properties
- Semantic similarity — similar objects are close in space
- Vector arithmetic — king - man + woman = queen
- Dimensionality — typically 128-1536 dimensions
- Cosine similarity — metric for comparing vectors
Business Applications
- Semantic search — search by meaning, not keywords
- Recommendations — "similar products", "you might like"
- Chatbots — RAG systems for knowledge base answers
- Clustering — automatic content grouping
- Duplicate detection — finding similar documents and images