Wiki

Embeddings

Embeddings as representations for semantic search, RAG, recommendations, multimodal retrieval, and language systems.

Related Wiki Pages

Vector Databases Search Retrieval-Augmented Generation Multimodal LLMs NLP

Embeddings are numerical representations of text and images, users and products, or other objects. They let a system compare meaning or behavior by distance in a shared vector space instead of comparing only exact words or hand-written rules. The representation concept includes what gets embedded and how systems use the resulting vector. It also includes what can go wrong when the representation doesn’t preserve the distinctions the task needs.

Embeddings sit behind search and vector databases, and they also appear in retrieval-augmented generation systems. Recommendation systems and multimodal retrieval use them too.

In weak-supervision workflows and production ML systems, they’re a representation layer, not the whole product. Embedding generation stays separate from storage and ranking, and from evaluation, citations, and business logic. For the retrieval-method choice, use Vector Search vs Keyword Search. For the infrastructure boundary, use Vector Database vs Search Engine.

Representation Space

A search system can map queries and searchable items into the same representation space. Retrieval can then find items with similar meaning even when the words differ (^[1]). Vector compute stays separate from vector storage: the embedding model is distinct from the database that stores and searches vectors.

A transcript-chatbot example uses the same representation idea in a retrieval system. Chunks with overlap are embedded and stored as vectors for retrieval (^[2]). The embedding model creates the representation and the vector database retrieves nearby vectors. The application still needs prompts, references, and evaluation.

Marcello La Rocca connects this representation layer to the underlying nearest-neighbor problem. Once items, users, or images become vectors, search finds nearby points in multi-dimensional space. Exact search can become too costly as dimensionality grows. Approximate nearest-neighbor structures and libraries such as Faiss trade a small amount of optimality for faster candidate retrieval (^[3]).

In production LLM systems, vector databases work through embeddings, indexing, and semantic search (^[4]). Retrieval fits changing knowledge, while fine-tuning changes model behavior or style, a boundary expanded in RAG vs Fine-Tuning.

Semantic Search

Keyword matching can be too brittle when users express the same intent with different language. Search teams may then use vector search as a semantic candidate-generation method (^[1]). Embeddings provide the shared representation that makes that method possible, while Vector Search vs Keyword Search owns the lexical-versus-semantic retrieval comparison.

Candidate generation is separate from ML ranking (^[1]). A vector match finds plausible candidates, but the product still decides which result belongs first and trades semantic similarity against freshness and popularity.

Metadata, behavior, query-time weights, and business rules still matter. Filters and recency make embeddings one signal inside production search evaluation, not a substitute for product ranking (^[1]).

The architecture choice is separate from the representation choice: plug-and-play vector search versus vector support inside existing search systems (^[2]). That decision is the same boundary covered in Vector Database vs Search Engine. Teams can choose the embedding model, vector storage, and search application behavior as separate design decisions.

RAG Systems

In RAG, embeddings retrieve context for a language model. A transcript-chatbot example chunks transcripts with overlap, embeds them, retrieves relevant passages, and generates an answer with references (^[2]). Evaluation then extends beyond nearest-neighbor retrieval into generated answer quality, citation quality, and human review.

The update path favors retrieval over retraining for systems that need current or proprietary knowledge (^[4]). A team can re-ingest, re-embed, and re-index documents instead of fine-tuning the model every time facts change.

Chunking and embeddings are a practical first step for useful LLM systems (^[5]). Fixed-size chunks, sliding windows, and context quality determine what the embedding model can retrieve. Embeddings help only when the chunks preserve the information an answer needs. The broader Retrieval-Augmented Generation page treats retrieval as search with generation attached.

Recommendations and Multimodal Retrieval

Embeddings aren’t limited to text, and multimodal embeddings include image-text matching and CLIP-style representations. The vector can also extend beyond raw text or image content by adding metadata, behavior, and popularity, as in e-commerce personalization (^[1]). That shared image-text space helps multimodal LLMs retrieve across modalities.

Vector databases also serve session-based recommendations and re-ranking outside RAG (^[2]). Embeddings retrieve candidates for the next stage. Ranking, constraints, and product goals decide what users actually see.

That places embeddings below machine learning personalization because they make users, items, and sessions comparable. The product still decides what to personalize for the current context.

In the OLX recommender example, users and items are fixed-length vectors. The system can search for item vectors close to a user’s vector. Similar-image retrieval uses the same vector-search structure because the embedding narrows the candidate set. The recommender or search system then decides which nearby items are useful enough to show. Marcello La Rocca makes the same general connection between vector similarity, embeddings, recommender systems, and Faiss ^[3].

NLP Data Work

From an NLP tooling perspective, embeddings connect to weak supervision and labeling workflows. They also connect to Hugging Face and data management (^[6]). They help teams look at text, cluster similar examples, build heuristics, and manage messy labels before a production search system exists.

This data-work framing makes embedding versioning part of model governance. If labels, source documents, or model versions change, the stored vectors and downstream checks may need to change too. Public search, RAG, and labeling use embeddings differently. They share the same representation risk: a vector only helps if it preserves the distinction the downstream task needs.

Production Evaluation

Vector search has multiple moving parts, so embeddings create operational work. Teams have to manage model versioning, query-vector compatibility, and batch reindexing. They also have to manage latency and rollback (^[1]). A vector database can store and retrieve vectors, but it can’t repair stale embeddings or a mismatch between document and query encoders.

Evaluation has to match the product. Search quality ties to business KPIs and A/B tests (^[1]). RAG adds answer quality, citation quality, and human review (^[2]). LLM workflows add gold evaluation sets, failure analysis, logs, and traces (^[5]).

Nearest-neighbor matches are candidate evidence, not proof. A retrieved passage can be wrong, stale, incomplete, or irrelevant to the user’s real task. The production system needs provenance, citations, feedback loops, and regression tests. LLM Evaluation Workflows covers LLM-specific checks, while Production Search Evaluation covers retrieval and ranking checks.

DataTalks.Club