Wiki
RAG Portfolio Projects
Archive-backed guidance for RAG portfolio projects that prove retrieval quality, context design, citations, evaluation, failure analysis, and production-minded AI engineering.
Related Wiki Pages
Definition and Scope
A RAG portfolio project is a public artifact. It proves that the builder can turn a document corpus into a retrieval-backed LLM system with grounded answers. Atita Arora gives the core path in Modern Search Systems at 30:38-42:49. She connects retrieval and context packaging. She also covers generation, prompt design, and citations.
Hugo Bowne-Anderson adds evaluation sets and failure categories. He also covers logs, traces, and chunking choices in Practical LLM Engineering and RAG at 23:00-48:20.
This topic covers RAG projects aimed at AI engineering, LLM production, search engineering, or career-transition proof. For the concept page, start with Retrieval-Augmented Generation and Search, RAG, and Knowledge Systems. For general project evidence, compare Machine Learning Portfolio Projects and Open Source Portfolio Evidence.
Link Map
Core wiki routes:
- Retrieval-Augmented Generation
- Search, RAG, and Knowledge Systems
- LLM Evaluation Workflows
- RAG vs Fine-Tuning
- Vector Databases
- Embeddings
- Graph RAG vs Vector RAG
- Agent Engineering
Podcast anchors:
- Modern Search Systems with Atita Arora
- Practical LLM Engineering and RAG with Hugo Bowne-Anderson
- Deploying LLMs in Production with Meryem Arik
- Building Agentic AI Systems with Ranjitha Kulkarni
- Production ML Search
- Knowledge Graphs and LLMs for Automotive R&D with Anahita Pakiman
- AI Engineering: Skill Stack, Agents, LLMOps, and How to Ship AI Products with Paul Iusztin
Common Definition
Across the archive, a strong RAG project proves the full retrieval loop. It covers source ingestion, chunking, and metadata. It also covers retrieval, answer generation, and citations. Evaluation and iteration are part of the same loop.
Atita’s transcript-chatbot example in Modern Search Systems at 35:49-42:49 is the clearest portfolio model. The corpus is long, the chunks need provenance, and the answer has to point back to the source.
Hugo adds the review standard in Practical LLM Engineering and RAG at 23:00-27:38. Build representative gold tests. Categorize failures. Log traces. Change chunking or retrieval before polishing the UI when retrieval is the larger failure class.
The portfolio signal is strongest when readers can look at the system’s work. A README should show example questions, retrieved chunks, and citations. It should also show wrong answers, latency or cost notes, and the next retrieval fix. This follows Atita’s multi-level RAG evaluation discussion at 48:09 in Modern Search Systems and Hugo’s debugging workflow at 27:38 in Practical LLM Engineering and RAG.
Guest Differences
Atita starts from search engineering. Her useful portfolio standard is to compare existing search infrastructure with standalone vector databases. The project should preserve metadata and evaluate retrieval quality before the LLM answer becomes the product (Modern Search Systems, 17:01-48:09).
Hugo starts from practical LLM shipping. He presents RAG as a quick business win when the corpus, chunking, and embeddings fit the task. He also says teams should add tools or agents only when the workflow needs actions beyond lookup (Practical LLM Engineering and RAG, 44:26-56:21).
Ranjitha Kulkarni draws the same boundary from agent engineering. RAG remains useful when retrieval controls latency, cost, noisy context, and chunk metadata. Source quality and wrappers also decide whether retrieval helps (Building Agentic AI Systems, 29:30-37:39).
Meryem Arik frames RAG against fine-tuning and deployment constraints. In Deploying LLMs in Production, she treats retrieval as a better fit for changing knowledge at 40:46-46:42. She also covers API risk and model drift at 18:46. Later, she covers latency, cost, and self-hosting at 49:44-51:35.
Source-Cited Knowledge Assistant
Start with a source-cited assistant over a real corpus. Good archive-backed choices include podcast transcripts, internal docs, and policies. Tickets, research papers, and course notes can also work.
Atita’s podcast-transcript RAG example at 35:49-42:49 in Modern Search Systems shows the essential proof. Parse long source documents and chunk them. Attach source metadata, retrieve relevant passages, and return citations users can open.
For a portfolio README, include questions where the assistant answers with citations. Include questions where it refuses because evidence is missing. That matches the grounding work in Retrieval-Augmented Generation and the human-review layer Atita describes at 48:09 in Modern Search Systems.
Search-First RAG Project
A search-first project proves that generation isn’t hiding weak retrieval. Build keyword search or vector search first. Add filters or hybrid search when the corpus needs them. Then add the answer-generation layer.
This follows Production ML Search, where the discussion separates candidate generation from ML ranking at 12:45-17:40. The same episode covers embeddings and vector compute at 21:55-29:00. It also covers vector storage, filters, recency, and business constraints at 29:00-45:11.
This project is useful for candidates targeting search, embeddings, or vector database roles. It can show retrieval metrics before and after changes. Compare one simple baseline with one semantic or hybrid path. That reflects Atita’s migration discussion at 20:27 in Modern Search Systems and the hybrid-search discussion at 34:00 in Production ML Search.
Evaluation and Failure Analysis
An evaluation-focused project can start from an existing demo and make it measurable. Hugo’s workflow in Practical LLM Engineering and RAG at 23:00-27:38 gives the structure. Create a representative gold set, run the system, label failures, and separate retrieval failures from generation failures. Then log enough traces to debug the next change. This connects to LLM Evaluation Workflows.
Ranjitha’s agent-evaluation guidance extends the same idea. It applies when retrieval is one tool in a larger workflow. At 51:17-57:23 in Building Agentic AI Systems, she argues for custom datasets and mocked tools. She also covers integration tests and outcome assertions.
A portfolio project can show this with a small report. Include the query, retrieved evidence, and generated answer. Add the expected evidence, failure class, and proposed fix.
Domain Knowledge or Graph RAG
Some RAG projects shouldn’t be only nearest-neighbor text search. In Knowledge Graphs and LLMs for Automotive R&D, Anahita Pakiman contrasts text chunking, embeddings, and vector databases at 33:43-38:10. She also compares that with graph semantics. Then she covers Cypher-driven retrieval and verification limits at 39:56-42:42.
That supports RAG projects for domains where relationships matter. Examples include papers and citations, parts and simulations, regulations and clauses, or incident reports and linked systems. A useful Graph RAG vs Vector RAG portfolio example can retrieve text snippets and relationship paths. It should show where each method fails or succeeds against the same gold questions (Knowledge Graphs and LLMs for Automotive R&D, 38:10-47:10).
Career-Transition RAG Project
A career-switcher RAG project should connect old domain knowledge to current AI engineering. How to Become an AI Engineer After a Career Break shows Revathy Ramalingam in her restart path. She uses current project evidence and GitHub work. She also uses a deployed project and a PDF Q&A assistant at 22:15-33:45.
The Career Transition page connects that story to a broader archive lesson. Visible artifacts make prior experience legible to a target role.
Paul Iusztin adds the AI-engineering version in his AI engineering episode. His 29:12 chapter puts RAG and knowledge management inside the AI engineer skill set. His 54:05 chapter connects portfolio work with a “second brain” artifact. A personal knowledge assistant is credible when it shows software quality, evaluation, and knowledge-management judgment.
Practical Review Criteria
Use these criteria as the project review standard because each one maps to a recurring archive discussion.
- Define the corpus and user questions. Atita’s transcript-chatbot example in Modern Search Systems works because the corpus and question type are clear at 35:49.
- Preserve source provenance. Chunk metadata and wrappers matter because Atita and Ranjitha connect retrieval quality to context design and metadata. Useful provenance can include titles, timestamps, and sections. Authors or permissions can matter too (Modern Search Systems, 38:24-42:49 and Building Agentic AI Systems, 32:48).
- Compare retrieval approaches. A portfolio project shouldn’t assume vector search is the whole answer because Production ML Search separates keyword search from vector search. It also covers filters, recency, and ranking at 11:29-45:11.
- Require grounded answers. The generation step should cite retrieved evidence and expose missing-evidence cases, following Modern Search Systems at 42:49 and Deploying LLMs in Production at 42:02-46:42.
- Build a small gold set. Hugo’s test-set discussion at 23:00-25:25 in Practical LLM Engineering and RAG supports a compact evaluation file with questions, expected evidence, acceptable answers, and failure labels.
- Log the debugging path. Retrieved chunks and scores are portfolio evidence. Prompts and model versions matter too. Log outputs, latency, cost, and feedback because Hugo ties logs and traces to debuggable MVPs at 27:38 in Practical LLM Engineering and RAG.
- State production boundaries. Meryem’s production discussion at 18:46 and 49:44-51:35 in Deploying LLMs in Production supports documenting API risk, model drift, and privacy. It also supports latency, cost, serving, and reindexing notes.
Weak Signals to Avoid
A RAG portfolio is weak when it only shows “chat with PDF” behavior. That misses retrieval evaluation and source citations. It also misses visible chunks and failure analysis.
The project then misses Atita’s retrieval-plus-generation path in Modern Search Systems at 30:38-48:09. It also misses the gold-set workflow Hugo describes in Practical LLM Engineering and RAG at 23:00-27:38.
It’s also weak to use agents or long context as a way to skip retrieval design. A vector database doesn’t remove that work either.
Ranjitha’s RAG reality check at 29:30-37:39 in Building Agentic AI Systems keeps latency and cost in scope. It also keeps noisy context, chunk metadata, and tool boundaries in scope. The Production ML Search discussion keeps filters and recency in scope. It also covers ranking and offline tests at 34:00-63:50.
Related Pages
Use these pages for adjacent concepts and project standards:
- Retrieval-Augmented Generation for the core RAG architecture.
- Search, RAG, and Knowledge Systems for retrieval architecture and knowledge-system tradeoffs.
- LLM Evaluation Workflows for gold sets, failure analysis, and agent tests.
- LLM Production Patterns for deployment, latency, cost, observability, and model-risk context.
- RAG vs Fine-Tuning for deciding whether changing knowledge belongs in retrieval or model adaptation.
- Graph RAG vs Vector RAG for projects where relationships matter as much as text similarity.
- Machine Learning Portfolio Projects for the broader project-evidence standard.
- Career Transition and Job Search for turning the project into hiring evidence.