Architecture Overview
As agents tackle longer tasks, model performance degrades with expanded context windows. This drives a core principle: context must be treated as a finite resource with diminishing marginal returns. These design patterns — drawn from Claude Code ($1B run rate), Manus ($2B acquisition), and other production agents — show how to strategically populate the context window with only essential information for the agent's immediate next step.
Based on Agent Design Patterns by Lance Martin (Jan 2026)
Seven Agent Design Patterns
These patterns address the central challenge of context engineering — getting the right information into the context window at the right time, while keeping it lean enough for high-quality reasoning.
| Pattern | Core Idea | How It Works | Example |
|---|---|---|---|
| Give Agents a Computer | OS-layer access (filesystem + shell) | Agents interact with the operating system via CLI, gaining persistent storage and executable capabilities beyond isolated tool sets | Claude Code, Manus — agents that write, execute, and iterate on code directly |
| Multi-Layer Action Space | Hierarchical tools instead of flat tool lists | 6-20 atomic tool definitions at the top layer; complex workflows executed via shell/code at the computer layer (CodeAct pattern) | Agent calls a "run_code" tool that executes a script invoking dozens of APIs, avoiding intermediate tool result bloat |
| Progressive Disclosure | Reveal information on demand, not upfront | Tool indexing retrieves definitions on demand; skill folders store detailed docs agents access selectively; agents call --help flags when needed | Cursor Agent syncs MCP tool descriptions to folders, providing abbreviated lists with full docs retrievable on demand |
| Offload Context | Write old results to filesystem storage | Agent writes tool results and trajectories to files. Plans stored as files and periodically reloaded to reinforce objectives. Selective summarization only when offloading value diminishes | Agent writes intermediate research findings to scratch files, reads them back when synthesizing final answer |
| Cache Context | Prompt caching changes agent economics | Resume from cached prefixes instead of replaying linear chat history. Manus identified cache hit rate as the most critical production metric. Without caching, coding agents become economically prohibitive | Higher-capacity model with caching can cost less than lower-capacity model without caching |
| Isolate Context | Sub-agents with independent context windows | Each sub-agent has its own context, tools, and instructions. Enables parallelizable tasks and long-running sequential loops. Git-backed coordination communicates progress across instances | "Ralph Wiggum" pattern: sequential agents each tackle one discrete plan item, coordinating via git history |
| Evolve Context | Continual learning in token space | Analyze past sessions, extract learnings, update master documentation. Diary-based memory distills sessions into concise entries. Skill extraction saves reusable procedures as new skills | GEPA framework: collect trajectories, score outcomes, refine prompt variants over time |
Context Engineering Deep Dive
The fundamental insight: context engineering is not prompt engineering. It is the discipline of building systems that populate the context window with exactly the right information at each step of agent execution.
- Finite Resource Model: Context windows have diminishing marginal returns — adding more information beyond a threshold actively degrades performance. Every token in the window must earn its place.
- CodeAct Pattern: Instead of processing intermediate tool results in context, agents execute code that captures only final outputs. This avoids context bloat from verbose API responses and lets agents compose, loop, and branch over tool calls in code.
- Cache Hit Rate as North Star: Manus identified prompt cache hit rate as the single most important metric for production agent economics. High cache hit rates make high-capability models affordable; low hit rates make even cheap models expensive at scale.
- Selective Summarization: Not all context should be summarized — compress only when the cost of keeping full context exceeds the information loss from summarization. Plans and critical decisions should be preserved verbatim.
- File-Backed Memory: The filesystem serves as an external memory bank. Agents write plans, intermediate results, and learnings to files, then selectively read them back. This decouples memory capacity from context window size.
Emerging Frontiers
- Learned Context Management: Instead of hand-crafted compression strategies, models may learn their own context management. Recursive Language Models (RLM) suggest LLMs could absorb scaffolding currently embedded in agent harnesses. Sleep-time compute enables agents to reflect offline, consolidating memories without explicit prompting.
- Multi-Agent Coordination at Scale: Scaling to concurrent agent swarms introduces shared-context and conflict-resolution challenges. Gas Town demonstrates coordination using git-backed tracking, a specialized "Mayor" agent maintaining workspace context, and merge queues for parallel work.
- Infrastructure for Long-Running Agents: Production requirements include observability into agent behavior, human-review hooks, graceful degradation frameworks, standardized debugging interfaces, and human-in-the-loop monitoring — most of which remain immature.
Key Design Decisions
| Decision | Chosen Approach | Alternative | Rationale |
|---|---|---|---|
| Agent-computer interface | OS-level access (filesystem + shell) | Sandboxed tool-only access | OS access provides persistent storage, executable capabilities, and scales to arbitrary workflows. Tool-only access limits agents to predefined actions. |
| Action space design | Multi-layer: minimal tools + computer layer | Flat list of all available tools | Dozens of tool definitions pollute context. A hierarchical action space keeps the prompt lean while enabling complex workflows via code execution. |
| Information loading | Progressive disclosure (retrieve on demand) | Load all tool/context upfront | Upfront loading wastes context on information the agent may never need. On-demand retrieval keeps context focused on the current subtask. |
| Memory architecture | File-backed offloading + selective recall | Keep everything in context window | Context windows are finite with diminishing returns. Filesystem storage provides unlimited capacity; selective recall retrieves only what's relevant. |
| Cost optimization | Prompt caching with cache hit rate as KPI | Use cheaper models to reduce cost | A high-capability model with high cache hit rate can cost less than a low-capability model without caching, while producing better results. |
| Multi-agent coordination | Isolated contexts + git-backed coordination | Shared context window across agents | Isolated contexts prevent cross-contamination and enable parallelism. Git provides durable, auditable coordination without shared-state complexity. |
Interview Talking Points
- Context is a finite resource with diminishing returns: Adding more information to the context window eventually degrades performance. The core skill of agent design is curating what goes in — not maximizing what goes in. This principle drives every pattern in this section.
- Give agents a computer, not just tools: The most successful agents (Claude Code, Manus) interact at the OS layer — filesystem for persistence, shell for execution. This is fundamentally different from giving an LLM a list of API tools. It enables agents to write their own tools, store intermediate results, and compose arbitrary workflows.
- Multi-layer action spaces prevent context pollution: Instead of loading 50+ tool definitions into context, use 6-20 atomic tools at the top layer and let the agent execute complex workflows via code at the computer layer. The CodeAct pattern avoids processing verbose intermediate results by capturing only final outputs.
- Cache hit rate is the most important production metric: Manus found that prompt cache hit rate determines agent economics more than model choice. High cache hit rates make powerful models affordable; without caching, even coding agents become economically prohibitive. Design your agent's prompt structure to maximize cache hits.
- Progressive disclosure keeps context lean: Don't load all tool definitions and documentation upfront. Use tool indexing, skill folders, and help flags to let agents retrieve information on demand. This mirrors how human developers work — you don't read every man page before starting a task.
- Evolving context is the path to continual improvement: Agents that reflect on past trajectories, extract reusable skills, and update master documentation improve over time without retraining. This operates in "token space" — updating what goes into context rather than updating model weights.
- Multi-agent coordination is an unsolved problem at scale: Isolated context windows with git-backed coordination works for small agent teams, but scaling to concurrent swarms requires new infrastructure — observability, merge queues, human-review hooks, and graceful degradation. This is an active area of research worth flagging in an interview.