Architecture Overview

As agents tackle longer tasks, model performance degrades with expanded context windows. This drives a core principle: context must be treated as a finite resource with diminishing marginal returns. These design patterns — drawn from Claude Code ($1B run rate), Manus ($2B acquisition), and other production agents — show how to strategically populate the context window with only essential information for the agent's immediate next step.

Based on Agent Design Patterns by Lance Martin (Jan 2026)

Seven Agent Design Patterns

These patterns address the central challenge of context engineering — getting the right information into the context window at the right time, while keeping it lean enough for high-quality reasoning.

Give Agents a Computer — OS-layer access

1. Give Agents a Computer

Multi-Layer Action Space — hierarchical tools

2. Multi-Layer Action Space

Progressive Disclosure — on-demand tool retrieval

3. Progressive Disclosure

4a. Offload Context (Summarization)

Offload Context — tool result offloading

4b. Offload Context (Tool Results)

5. Cache Context

6a. Isolate Context (Sub-Agents)

Isolate Context — Ralph Wiggum loop pattern

6b. Isolate Context (Ralph Wiggum Loop)

7. Evolve Context

Pattern	Core Idea	How It Works	Example
Give Agents a Computer	OS-layer access (filesystem + shell)	Agents interact with the operating system via CLI, gaining persistent storage and executable capabilities beyond isolated tool sets	Claude Code, Manus — agents that write, execute, and iterate on code directly
Multi-Layer Action Space	Hierarchical tools instead of flat tool lists	6-20 atomic tool definitions at the top layer; complex workflows executed via shell/code at the computer layer (CodeAct pattern)	Agent calls a "run_code" tool that executes a script invoking dozens of APIs, avoiding intermediate tool result bloat
Progressive Disclosure	Reveal information on demand, not upfront	Tool indexing retrieves definitions on demand; skill folders store detailed docs agents access selectively; agents call --help flags when needed	Cursor Agent syncs MCP tool descriptions to folders, providing abbreviated lists with full docs retrievable on demand
Offload Context	Write old results to filesystem storage	Agent writes tool results and trajectories to files. Plans stored as files and periodically reloaded to reinforce objectives. Selective summarization only when offloading value diminishes	Agent writes intermediate research findings to scratch files, reads them back when synthesizing final answer
Cache Context	Prompt caching changes agent economics	Resume from cached prefixes instead of replaying linear chat history. Manus identified cache hit rate as the most critical production metric. Without caching, coding agents become economically prohibitive	Higher-capacity model with caching can cost less than lower-capacity model without caching
Isolate Context	Sub-agents with independent context windows	Each sub-agent has its own context, tools, and instructions. Enables parallelizable tasks and long-running sequential loops. Git-backed coordination communicates progress across instances	"Ralph Wiggum" pattern: sequential agents each tackle one discrete plan item, coordinating via git history
Evolve Context	Continual learning in token space	Analyze past sessions, extract learnings, update master documentation. Diary-based memory distills sessions into concise entries. Skill extraction saves reusable procedures as new skills	GEPA framework: collect trajectories, score outcomes, refine prompt variants over time

Context Engineering Deep Dive

The fundamental insight: context engineering is not prompt engineering. It is the discipline of building systems that populate the context window with exactly the right information at each step of agent execution.

Finite Resource Model: Context windows have diminishing marginal returns — adding more information beyond a threshold actively degrades performance. Every token in the window must earn its place.
CodeAct Pattern: Instead of processing intermediate tool results in context, agents execute code that captures only final outputs. This avoids context bloat from verbose API responses and lets agents compose, loop, and branch over tool calls in code.
Cache Hit Rate as North Star: Manus identified prompt cache hit rate as the single most important metric for production agent economics. High cache hit rates make high-capability models affordable; low hit rates make even cheap models expensive at scale.
Selective Summarization: Not all context should be summarized — compress only when the cost of keeping full context exceeds the information loss from summarization. Plans and critical decisions should be preserved verbatim.
File-Backed Memory: The filesystem serves as an external memory bank. Agents write plans, intermediate results, and learnings to files, then selectively read them back. This decouples memory capacity from context window size.

Emerging Frontiers

Learned Context Management: Instead of hand-crafted compression strategies, models may learn their own context management. Recursive Language Models (RLM) suggest LLMs could absorb scaffolding currently embedded in agent harnesses. Sleep-time compute enables agents to reflect offline, consolidating memories without explicit prompting.
Multi-Agent Coordination at Scale: Scaling to concurrent agent swarms introduces shared-context and conflict-resolution challenges. Gas Town demonstrates coordination using git-backed tracking, a specialized "Mayor" agent maintaining workspace context, and merge queues for parallel work.
Infrastructure for Long-Running Agents: Production requirements include observability into agent behavior, human-review hooks, graceful degradation frameworks, standardized debugging interfaces, and human-in-the-loop monitoring — most of which remain immature.

Key Design Decisions

Decision	Chosen Approach	Alternative	Rationale
Agent-computer interface	OS-level access (filesystem + shell)	Sandboxed tool-only access	OS access provides persistent storage, executable capabilities, and scales to arbitrary workflows. Tool-only access limits agents to predefined actions.
Action space design	Multi-layer: minimal tools + computer layer	Flat list of all available tools	Dozens of tool definitions pollute context. A hierarchical action space keeps the prompt lean while enabling complex workflows via code execution.
Information loading	Progressive disclosure (retrieve on demand)	Load all tool/context upfront	Upfront loading wastes context on information the agent may never need. On-demand retrieval keeps context focused on the current subtask.
Memory architecture	File-backed offloading + selective recall	Keep everything in context window	Context windows are finite with diminishing returns. Filesystem storage provides unlimited capacity; selective recall retrieves only what's relevant.
Cost optimization	Prompt caching with cache hit rate as KPI	Use cheaper models to reduce cost	A high-capability model with high cache hit rate can cost less than a low-capability model without caching, while producing better results.
Multi-agent coordination	Isolated contexts + git-backed coordination	Shared context window across agents	Isolated contexts prevent cross-contamination and enable parallelism. Git provides durable, auditable coordination without shared-state complexity.

Interview Talking Points