Wiki

Context Engineering

Designing effective LLM inputs with chunking strategies, metadata, wrappers, context windows, and context rot.

Related Wiki Pages

Agent Engineering Retrieval-Augmented Generation LLM Production Patterns Prompt Engineering Embeddings Long-Context LLM Evaluation AI Engineering LLMs

Context engineering is the deliberate design of what information goes into an LLM prompt. It extends Prompt Engineering beyond instruction phrasing. Engineers choose and package the data the model sees. Context engineering means being deliberate about which information reaches the model instead of “stuffing everything in.”^[1]

The topic connects RAG, Embeddings, Agent Engineering, and LLM Production Patterns. Retrieval pipelines engineer context by selecting passages. Agents engineer context by exposing tools, memory, examples, and state only when the task needs them. Use LLM and RAG Production Roadmap when those context choices become rollout milestones.

Reducing Noise

Recent LLM episodes keep returning to one constraint: larger context windows don’t remove the need for selection. Noisy prompts increase latency and cost, and they create garbage-in/garbage-out failures. Preprocessing still matters even with 32k-token windows because a smaller context can improve reliability.^[1]

Teams sometimes keep a stable prompt prefix or retrieved block after selection. In those cases caching can reduce repeated LLM work without changing what the model receives. The cost and latency tradeoff is also a LLM Deployment concern ^[2].

Context rot describes how long prompts can reduce precision and relevance. Important instructions may need prominent placement at both ends of the prompt.^[3] For context engineering, “more context” isn’t automatically safer. Engineers still decide what deserves attention.

Hugo Bowne-Anderson connects context rot to chunking strategy. Fixed-length chunks are a fast starting point, while sliding windows can preserve continuity across boundaries. Neither choice is complete until the team reviews the failures. The chunking rule should change when retrieval misses the useful passage or when the model receives too much distracting context ^[4].

Long-Context Boundaries

Financial LLM benchmarking adds evaluation evidence for the same boundary. Long-context tests split below and above 32k tokens showed a clear dip around that boundary. The same specialized-domain tests exposed pitfalls that public benchmarks can hide.^[5]

At the bank, the practical response was still to chunk large inputs before downstream processing. The team kept doing this even when they used models advertised with much larger windows.^[5]

Long-context models can help while teams still use retrieval or preprocessing for large material. Summarization can help when the material is specialized or hard to verify. Lavanya explicitly names chunking, retrieval, and summarization as fallbacks instead of sending the whole document blindly. The team needs evidence for when each path is reliable. That puts long-context work next to long-context LLM evaluation and LLM Evaluation Workflows ^[6].

Chunking and Source Structure

Chunking is visible in context engineering. Teams choose units that match the data structure.^[3]

Podcast transcripts can use question-and-answer pairs or speaker turns, while multi-person conversations may work better with topic-based chunks. Look at the raw source before choosing one split rule ^[3].

Start with fixed-length chunks, then refine based on observed failures ^[3]. Chunk overlap belongs in the same decision because references can cross chunk boundaries. If a retrieved chunk says “they” or “that result” without its neighboring context, the model may receive a similar but unusable passage ^[7].

A chunk is lossy when it drops surrounding context.^[1] Useful chunks keep source context, target questions, and current findings. That connects chunking to Embeddings and RAG: retrieval quality depends not only on vector similarity. It also depends on whether the retrieved unit is self-contained enough for the model to use.

Metadata, Wrappers, and Tools

Context engineering also includes the wrapper around retrieved information. Wrappers present chunks in a form the LLM can use. Tool lists and prior problem-solving examples are also context that influences the output.^[1]

Repository files, error messages, and nearby tests become context in AI coding tools. Better context selection changes the quality of the generated diff ^[2]. In AI Engineering Portfolios, those artifacts show which context the system selected and why.

For Agent Engineering, context can include tools and API affordances. It can also include memory, source metadata, user state, and similar-problem history. The Game AI to LLM Agents bridge is useful here because game AI makes state and actions explicit. It also treats feedback and environments as part of the design. Those ideas reappear as tool lists, scratchpads, and task state in LLM agents ^[8].

Search isn’t always the whole answer. Search and information retrieval are tools an agent may use when needed, not a flow to apply everywhere.^[1]

Memory adds another context boundary. Retrieval memory stores facts or documents the system can look up later. Conversation memory decides what from the interaction history should remain active. Many single-turn systems don’t need either one. Add memory only when the task requires durable user, document, or workflow state ^[9].

Use AI tools for personal productivity when personal assistants move from one-off drafting to remembered workflows.

RAG, Agents, and Scope

The boundary between RAG and agents is a context decision. A restrained edtech RAG use case doesn’t need to become an all-purpose tutor. A team could use a simple RAG bot with good chunking and embeddings.

That bot could answer common support questions and solve a meaningful share of tickets quickly. One example was “which class contained a lesson.”^[3]

RAG fits large search spaces and simple question answering over many documents. When the task depends on current state or dynamic planning, context engineering becomes part of agent orchestration. The same shift happens when the system needs multiple data sources, API integrations, or Multi-Agent Systems.^[1]

Knowledge management is the hard part of many AI engineering systems. The team has to model knowledge so an agent or RAG system can access it. Chunks, metadata, a knowledge graph, or another retrieval layer can provide that context ^[10].

Tool calls fit when the simpler RAG path can’t answer the user’s question. Tools increase both power and system complexity.^[3]

Hugo recommends starting with a useful RAG path before adding tools for current state, external APIs, or actions. That keeps Agent Engineering from becoming the default answer for every retrieval problem ^[11]. The same RAG-to-tools ordering appears in LLM and RAG Production Roadmap.

Adjacent context decisions affect agents and retrieval. They also affect production paths, prompts, embeddings, and long-context evaluation.

DataTalks.Club