Wiki

AI Agents

What DataTalks.Club guests have said about AI agents: autonomy, tool use, memory, RAG boundaries, evaluation, governance, and infrastructure.

Related Wiki Pages

Agent Engineering LLM Production Patterns Retrieval-Augmented Generation AI Red Teaming Multi-Agent Systems LLM Evaluation Workflows Responsible AI and Governance Tools

AI agents are LLM-backed systems that pursue a goal over time. They often use tools, retrieval, memory, and workflow state. In the DataTalks.Club archive, guests treat agents as more than chat interfaces. Ranjitha Kulkarni ties them to autonomy, objectives, orchestration, and tool use. She also adds memory and knowledge stores in Building Agentic AI Systems at 11:00 and 12:31.

For agent-specific reading, focus on agent planning and tool calls. Then compare memory, the retrieval-augmented generation boundary, evaluation, and governance. The implementation discipline around these systems lives in Agent Engineering. The broader production practices live in LLM Production Patterns.

Starting Points

Start with these archive discussions:

Building Agentic AI Systems with Ranjitha Kulkarni for autonomy, orchestration, tools, memory, and SRE workflows. The same episode covers agentic RAG, MCP-style tool protocols, and goal-based evaluation.
The Future of AI Agents with Aditya Gautam for enterprise adoption, specialized models, guardrails, and data lineage. He also covers multi-tenant evaluation, human-label alignment, and deployment risk.
From Game AI to LLM Agents with Micheal Lanham for the lineage from game AI and multi-agent systems to LLM agents. He also covers task decomposition, orchestration designs, SDKs, MCP integration, coding agents, and monitoring.
Practical LLM Engineering and RAG with Hugo Bowne-Anderson for generator-evaluator loops, embedded workflow assistants, and RAG-to-agent progression. His examples include email assistants, agent memory, and small-start evaluation.
AI Engineering: Skill Stack, Agents, LLMOps, and How to Ship AI Products with Paul Iusztin for agents inside the wider AI engineering skill stack.
Hardening Generative AI Chatbots with Maria Sukhareva for security risks that become sharper when an LLM can retrieve data or take actions.

Adjacent pages cover the supporting pieces:

Retrieval-Augmented Generation and RAG for search-backed grounding.
LLM Evaluation Workflows for tests, judges, and failure analysis.
AI Red Teaming for adversarial testing and security.
Multi-Agent Systems for manager agents and collaboration.
Tools for action interfaces.
Responsible AI and Governance for controls.

Common Definition

Across the agent-focused episodes, guests describe an AI agent as an LLM system that can plan or route the next step in a task. It can also use tools and update workflow state. Ranjitha makes the definition explicit at 11:00 and 12:31 in Building Agentic AI Systems: the agent has an objective and some autonomy. It also has an orchestration layer and access to tools, memory, or knowledge stores.

That definition separates an agent from a plain chatbot. In Practical LLM Engineering and RAG, Hugo moves from prompting and retrieval toward embedded assistants. At 33:14 he discusses Slack-style workflow assistants, and at 40:12 he describes agentic value through actions, documents, and automation. The agent matters when the system changes a workflow, not only when it writes a response.

The archive also treats agents as an extension of normal product engineering. In Paul’s AI engineering episode, Paul places agents beside RAG and knowledge management. He also connects them to LLMOps and shipped AI products. That puts agents close to the AI Engineer Role and the AI Engineering Roadmap, not in a separate research-only category.

Guest Differences

Guests agree on tools and context. They also agree on evaluation and guardrails, but they start from different failure modes.

Ranjitha starts from production workflows. Her on-call automation example at 7:44 and SRE workflow section at 22:50 show agents reading logs and metrics, then helping with remediation. At 24:59 she emphasizes integration abstractions, because the agent can’t act unless tools expose the right interfaces.

Hugo starts from adoption and evaluation. In Practical LLM Engineering and RAG, he recommends starting with the problem and a small workflow at 56:21. Teams then add data and evaluation. His email assistant example at 53:34 uses Gmail API access plus RAG to show how teams can grow from retrieval into actions.

Micheal starts from agent design history. In From Game AI to LLM Agents, he ties modern LLM agents to game AI and reinforcement learning. He also links them to multi-agent systems and workflow design. His 20:57 section favors minimalism and task decomposition, while 23:48 contrasts sequential flows with manager-agent orchestration. That emphasis connects AI agents to Multi-Agent Systems and Software Engineering.

Aditya starts from enterprise reliability. In The Future of AI Agents, he discusses legal and healthcare reliability at 13:13. He covers specialized models and agent governance at 19:16, guardrails and lineage at 30:26, and deployment risks at 56:40. His view connects agents to Responsible AI and Governance and AI Red Teaming.

Autonomy and Workflow Action

An agent needs a bounded objective and a way to stop. Ranjitha’s 15:10 section compares single-step, multi-pass, and self-reflection planning. Those planning styles only help when the system has a concrete task. Examples include triaging an incident, retrieving enterprise knowledge, scheduling work, and calling an API. Her 40:30 calendar and meeting assistant example shows the agent changing plans as workflow state changes.

Hugo’s examples draw the same boundary from the product side. At 31:56 he discusses developer assistants such as Copilot, Cursor, and IDE agents. At 33:14 and 40:12 he moves to embedded assistants that work in Slack, documents, and automation flows. The agent uses context and tools because the task spans more than one prompt and acts over a real work surface.

Micheal’s 20:57 and 23:48 sections add a design constraint: start with the smallest flow that works. A sequential pipeline is easier to look at than a manager-agent system. A manager agent or parallel agent collaboration only makes sense when the task naturally splits across roles, tools, or state, as he discusses at 26:25.

Tools, Interfaces, and Agent Infrastructure

Tools turn an agent from a text generator into a system that can act, but they also create the main operational risk. Ranjitha discusses prompts, SDKs, and tool wrappers at 18:23. At 24:59 she shows why teams need integration abstractions for operational systems. Examples include logs, metrics, tickets, and calendars. The agent can only pick a useful next step when each tool has a clear action, schema, permission boundary, and observable result.

Both Ranjitha and Micheal discuss newer infrastructure for exposing tools. Ranjitha covers framework choices at 44:08 in her agentic AI episode. At 46:00 she discusses LangChain, the OpenAI Agents SDK, and small agents. At 48:00 she turns to agent marketplaces and MCP-style protocols.

Micheal discusses OpenAI Agent SDK and MCP integration at 31:31 in From Game AI to LLM Agents. At 33:25 he covers sequential thinking servers and scratchpads.

Those discussions connect this topic to Tools and LLM Tools. A tool’s existence isn’t enough. Teams also need to know whether the tool is safe for the agent to call without human review. They need traceable results and a known workflow state after a failed call.

Memory, Context, and RAG Boundaries

Agents often use retrieval, but RAG and agents aren’t the same thing. Ranjitha’s 29:30 section gives a RAG reality check around latency, cost, and bad context. At 31:38 and 32:48 she discusses reworking retrieval backends, chunking, metadata, and wrappers so the model receives usable context. At 36:11 she frames retrieval as a tool inside an agent. At 37:39 she separates cases where RAG is enough from cases that need agent behavior.

Hugo makes the same progression in Practical LLM Engineering and RAG. At 44:26 he starts with quick RAG wins through chunking and embeddings. At 48:20 he compares chunking strategies and context rot. At 50:19 he discusses when teams should add tooling and tool calls. At 57:41 he distinguishes retrieval-based memory from multi-turn conversation memory.

That distinction matters for Search, RAG, and Knowledge Systems and RAG. A support agent may need customer history, policy documents, ticket state, and previous actions. A coding agent may need repository structure, open files, test results, and task history. A short chat window can’t substitute for a designed memory and retrieval layer.

Evaluation and Feedback Loops

Agent evaluation asks whether the system completed the task under realistic conditions, not whether one final sentence matches a reference answer. Ranjitha’s 51:17 section recommends custom datasets and system benchmarks. At 53:20 she discusses mocked tools, integration tests, and regression tests. At 56:02 she emphasizes goal-based evaluation and outcome assertions rather than exact path matching.

Hugo gives a practical evaluation path before and after agents enter the workflow. His generator-evaluator section at 13:56 shows automated quality control. At 23:00 he covers gold test sets, cost, and representativeness. At 26:43 he uses failure analysis to decide whether a retrieval component needs repair. Teams keep those habits for agents because each tool call can introduce a new failure mode.

Aditya extends evaluation to enterprise scale through evals for multi-tenancy and scale at 43:30. At 50:18 he covers aligning LLM judges with human labels. Micheal adds monitoring tools such as Arize Phoenix at 57:39.

In practice, teams collect failures and label what went wrong. They update tools or context, then rerun regression tests. See LLM Evaluation Workflows for the broader evaluation vocabulary.

Governance, Guardrails, and Security

Agents need stronger controls than one-shot generation because they can combine retrieval with actions. Aditya gives the strongest governance discussion. He discusses reliability in legal and healthcare at 13:13, specialized models and governance at 19:16, and guardrails and data lineage at 30:26. He then covers user feedback at 36:55 and deployment risks at 56:40.

Those controls overlap with security work from Hardening Generative AI Chatbots. At 13:20 in that episode, Maria Sukhareva discusses prompt overload and knowledge-base retrieval attacks. For agents, the same retrieval risk can become an action risk. That matters when the system can call tools, write data, send messages, or trigger workflows.

Across the archive, teams keep agents governed by narrowing tool permissions and tracing the data used for each answer or action. They also keep human review around high-impact decisions and test failures repeatedly. That connects AI agents to AI Red Teaming, Responsible AI and Governance, Data Governance, and Production.