Wiki
Knowledge Graph vs Vector Search
How DataTalks.Club podcast guests compare explicit graph relationships with embedding-based retrieval for search, RAG, and domain knowledge systems.
Related Wiki Pages
Knowledge graphs and vector search solve different retrieval problems because a knowledge graph stores entities and relation types. It also stores properties, paths, and neighborhoods. Vector search stores embeddings and retrieves nearby items by similarity.
In the podcast archive, Anahita Pakiman uses graphs for automotive R&D relationships and semantic reporting. She also uses them for Cypher-driven retrieval. Atita Arora uses vectors for transcript RAG and semantic search. Reem Mahmoud uses them for production search and hybrid retrieval (Knowledge Graphs and LLMs for Automotive R&D, Modern Search Systems, Production ML Search).
This comparison covers retrieval substrates. Use Graph RAG vs Vector RAG when the question is what context to give an LLM. Use Vector Database vs Search Engine when the question is whether a dedicated vector store belongs next to an existing search stack.
Atita and Reem discuss Solr, Lucene, Elasticsearch, and OpenSearch in that architecture choice. Reem also discusses Postgres and specialized vector stores (Modern Search Systems, 20:27. Production ML Search, 55:53).
Link Map
Start with these retrieval and knowledge-system pages:
- Search, RAG, and Knowledge Systems
- Graph RAG vs Vector RAG
- Vector Databases
- Embeddings
- Search
- Retrieval-Augmented Generation
- Production Search Evaluation
- LLM Evaluation Workflows
The closest podcast discussions are:
- Knowledge Graphs and LLMs for Automotive R&D with Anahita Pakiman
- Modern Search Systems with Atita Arora
- Production ML Search with Reem Mahmoud
- Building Agentic AI Systems with Ranjitha Kulkarni
- Deploying LLMs in Production with Meryem Arik
- Practical LLM Engineering and RAG with Hugo Bowne-Anderson
Common Definition
The archive’s common definition is representation-first. Vector search turns a query and candidate items into vectors. It then retrieves nearby vectors. Reem explains this as shared representation space at 21:55 in Production ML Search: the embedding model must encode the properties the product cares about. The retrieval system then matches query vectors against item vectors.
Atita shows the RAG version at 38:24-42:49 in Modern Search Systems: transcripts are chunked and embedded. They’re stored, retrieved, and passed to the LLM with prompt instructions and citations.
A knowledge graph makes the relationships explicit before retrieval. Anahita’s automotive R&D discussion uses Neo4j and graph structure for semantic reporting and simulation comparison. It also uses graphs for clustering and load-path detection (Knowledge Graphs and LLMs for Automotive R&D, 15:58-20:32). Later in the same episode, she contrasts “chunks in a vector database” with knowledge graph semantics and relations, then shows Cypher-driven prompt templates around 38:10-39:56.
The useful comparison isn’t “which one is smarter.” Vector search is better when the retrieval task is semantic recall across words, passages, users, or products. It also works for media similarity.
A knowledge graph is better when the relation, path, constraint, or lineage is part of the answer. Entity neighborhoods fit the same graph side. This is the same boundary used by Search and Retrieval-Augmented Generation.
First define what must be retrieved. Then choose the representation that preserves it (Production ML Search, 12:45 and 34:00. Knowledge Graphs and LLMs for Automotive R&D, 38:10).
Common Decision Rule
Use vector search when the main failure is missing semantically related material. Atita’s transcript RAG example needs a user question to find relevant podcast chunks even when the wording differs from the transcript. That points toward embeddings and chunking. It also points toward vector retrieval and RAG evaluation (Modern Search Systems, 38:24-48:09. Embeddings).
Use a knowledge graph when the main failure is losing relationships. Anahita’s examples depend on simulation-to-part and chapter-to-section connections. They also depend on domain semantics. She says vector chunks can miss relations and semantics. A graph can preserve the structure that retrieval should expose
(Knowledge Graphs and LLMs for Automotive R&D, 20:32 and 38:10. Graph RAG vs Vector RAG).
Combine them when both failures matter. The archive’s production-search episodes already combine vector similarity with filters and recency. Reem’s hybrid search discussion also adds metadata, popularity, and ranking weights at 34:00-45:11 in Production ML Search.
The graph version of the same principle is to retrieve candidate text with vectors. Then use graph traversal for entity context and paths. Constraints and validated facts can come from the graph too (Knowledge Graphs and LLMs for Automotive R&D, 39:56. Search, RAG, and Knowledge Systems).
Guest Differences
Atita Arora starts from classical information retrieval. Her vector-search discussion asks whether teams should use a standalone vector database or add vectors to an existing search stack. At 17:01 and 20:27 in Modern Search Systems, she frames Qdrant and vector databases as useful when the use case needs vectors. She still keeps Solr, Lucene, Elasticsearch, and OpenSearch in the architecture conversation.
Reem Mahmoud starts from production search as candidate generation plus ranking. Her 29:00-30:22 discussion in Production ML Search separates vector compute from vector storage. A vector database doesn’t remove the need for ingestion and embedding-model consistency. It also doesn’t remove reindexing or query-time encoding.
Her 34:00-45:11 discussion adds filters and recency while also covering metadata, behavior, popularity, and query-time weights. Together, those choices make production search evaluation part of the vector-search decision.
Anahita Pakiman starts from domain semantics. In Knowledge Graphs and LLMs for Automotive R&D, her comparison isn’t only about retrieval relevance. It’s about whether the system can preserve relationships across simulations and reports. It also needs to preserve relationships across sections, entities, and engineering concepts. Around 42:42, she also warns that LLM-extracted graph content needs verification, so graphs shift trust work into modeling and validation rather than eliminating it.
Ranjitha Kulkarni adds the agent boundary. At 29:30-37:39 in Building Agentic AI Systems, she treats RAG and search as tools with latency and cost constraints. Metadata and garbage-in/garbage-out constraints matter too.
Retrieval is enough when it reduces a large search space to useful context. Agents enter when the workflow needs planning, multiple tools, or dynamic state. That keeps both vector search and graph lookup inside a wider agent engineering design rather than turning either into the whole system.
Practical Retrieval Comparison
Read the comparison by the question the system must answer.
Retrieval Units
Vector search retrieves nearby embeddings for chunks and products, and it can represent users, images, and sessions too.
Knowledge graphs retrieve entities and relationships during traversal through paths and neighborhoods. That traversal can include section hierarchy and domain semantics from simulation links. Reem supports the vector side in Production ML Search at 21:55. Atita supports the transcript-chunk side in Modern Search Systems at 38:24. Anahita supports the graph side in Knowledge Graphs and LLMs for Automotive R&D at 15:58-20:32 and 38:10.
Question Types
Vector search fits requests for passages or items that mean something similar to the query. Knowledge graphs fit requests about connected entities and constraints. They also fit requests about order and explanations (Embeddings, Search, Graph RAG vs Vector RAG).
Production Work
Vector search needs chunking, embedding generation, vector storage, and reindexing. Hybrid ranking, citations, and evaluation belong on the vector side too.
Knowledge graphs need entity modeling, relation design, and graph ingestion. Cypher queries, provenance, and verification belong on the graph side (Modern Search Systems, 42:49-48:09. Production ML Search, 29:00-34:00. Knowledge Graphs and LLMs for Automotive R&D, 39:56-42:42).
Failure Modes
Vector systems can return similar but wrong neighbors. They can also have stale embeddings, missing metadata, or weak ranking. Poor chunks create another vector failure.
Graph systems can return wrong relations and stale graph facts. Brittle schemas and unverified LLM extraction create graph failures too (Modern Search Systems, 38:24-48:09. LLM Evaluation Workflows, Knowledge Graphs and LLMs for Automotive R&D, 42:42).
RAG Role
Vector search supplies candidate chunks and semantic recall for answers grounded in source material. Graph retrieval supplies structured facts and relations. It can also supply query-derived context (Retrieval-Augmented Generation, Modern Search Systems, 30:38-42:49. Knowledge Graphs and LLMs for Automotive R&D, 33:43-39:56).
Vector Search Works Best For Fuzzy Recall
Vector search is the stronger starting point when users don’t know the exact terms used in the source. Reem describes embeddings as a more robust candidate retrieval mechanism than brittle keyword rules at 20:02-21:55 in Production ML Search. Atita’s podcast-transcript system uses the same idea for RAG. A question is embedded and matched against transcript chunks. The system then builds an answer with source references (Modern Search Systems, 38:24-42:49).
This doesn’t make vector search a complete knowledge system. Reem separates candidate retrieval from ranking at 12:45 in Production ML Search and then adds hybrid constraints at 34:00-45:11. Atita adds chunking, retrieval-count choices and prompt design. She also adds citations and multi-level evaluation at 38:24-48:09 in Modern Search Systems. For adjacent tooling choices, see Vector Databases and Vector Database vs Search Engine.
Knowledge Graphs Work Best When Relationships Are The Answer
Knowledge graphs are the stronger starting point when the user needs a path or dependency. They also fit hierarchy and lineage. Containment relations and domain-specific connections fit graph retrieval too.
Anahita’s automotive R&D example uses graph structure because crash simulation teams compare many simulations and identify common parts. They also detect load paths and preserve semantic relationships that are difficult to see in tables (Knowledge Graphs and LLMs for Automotive R&D, 15:58-20:32).
The LLM version of the same point appears at 38:10-39:56 in Knowledge Graphs and LLMs for Automotive R&D. Anahita contrasts vector chunks with graph semantics and then uses graph queries in prompt templates. That design matters for RAG systems where the answer should explain how entities are connected, not only quote a similar paragraph. See Graph RAG vs Vector RAG for the LLM-context version of this tradeoff.
Hybrid Retrieval Is The Real Production Design
The archive’s strongest production guidance is to route by failure mode. If retrieval misses paraphrases or synonyms, improve embeddings and chunking. If it misses semantically related examples, improve vector search or reranking (Modern Search Systems, 38:24-48:09. Embeddings).
If retrieval loses entity relationships or paths, add graph modeling or graph queries (Knowledge Graphs and LLMs for Automotive R&D, 38:10-39:56. Graph RAG vs Vector RAG).
If results are plausible but not useful, add search ranking and metadata. Then add business constraints and evaluation (Production ML Search, 34:00-45:11 and 1:01:25. Production Search Evaluation).
This hybrid view also fits production LLM systems. Meryem Arik frames retrieval as the better fit for changing knowledge in Deploying LLMs in Production, 40:46-48:01.
Hugo Bowne-Anderson turns that into a builder workflow in Practical LLM Engineering and RAG. Chunking and RAG are a quick business win only when the retrieved context is good enough to support the answer.
Ranjitha Kulkarni then places retrieval inside agent systems. In her view, retrieval is one tool among knowledge stores, APIs, and memory. Workflow actions sit beside it (Building Agentic AI Systems, 36:11-37:39).
Evaluation And Trust
Vector search should be evaluated as retrieval and ranking. It should also be evaluated as answer quality, not only nearest-neighbor speed. Atita breaks RAG evaluation into multiple levels at 42:49-48:09 in Modern Search Systems.
She checks the embedding model, chunking strategy, retrieval strategy, and generated answer quality. She also adds human-in-the-loop review.
Reem adds business KPIs and A/B testing. She also adds offline tests and fast iteration at 1:01:25-1:03:50 in Production ML Search.
Knowledge graphs should be evaluated as modeled knowledge, not only as a nicer retrieval interface. Anahita’s warning at 42:42 in Knowledge Graphs and LLMs for Automotive R&D is that LLM-generated content and extracted semantics can be untrustworthy without verification. A graph can make provenance, relation type, and paths visible, but incorrect nodes or edges still corrupt downstream RAG and analysis.
Use LLM Evaluation Workflows for answer-level checks and Production Search Evaluation for retrieval and ranking checks.
Related Pages
Use these pages for the surrounding retrieval, search, RAG, and evaluation topics: