Context Engineering for AI Agents: The Real Bottleneck

Agent frameworks focus on tool orchestration. But the real bottleneck is context management — what enters the context window, how conversation history accumulates, and why relevant context beats raw data every time.

context-engineeringcontext-windowcontext-managementagent-memoryconversation-history

Everyone talks about agent orchestration. Which tool to call. When to loop. How to plan. But orchestration is a solved problem — ReAct, tool-use, chain-of-thought all work well enough.

The unsolved problem is context engineering: controlling what information enters the context window, how it accumulates across turns, and how to keep it relevant as conversations grow.

The context window is a budget, not a container

Models have a context window — 128K, 200K, 1M tokens. It's tempting to treat this as a container: throw everything in, let the model sort it out. But the context window isn't a container. It's a budget.

Every token in the window competes for attention. Self-attention is a softmax over all tokens — a zero-sum allocation of the model's reasoning capacity. Add 10,000 irrelevant tokens, and every relevant token gets proportionally less attention. The context window has a capacity, but capacity isn't the constraint. Attention quality is.

This is why context management matters more than context size.

Three failures of naive context

Most agent implementations fail at context in predictable ways:

1. Conversation history bloat

Every tool call adds its full response to the conversation history. A 200-field API response becomes part of the permanent context. By turn 10, the model is reasoning through 50K tokens of accumulated JSON — most of which is irrelevant to the current question.

The fix isn't truncation (you lose information). It's curation — extracting the relevant fields before they enter the conversation history.

2. No relevant context across sessions

Agent A analyzes your data and reaches a conclusion. The session ends. Agent B starts from scratch — re-fetching, re-analyzing, re-reasoning. The conclusion Agent A reached is lost because nothing persists across sessions.

This isn't a model limitation. It's an infrastructure gap. The conversation history dies with the session. There's no memory layer to carry relevant context forward.

3. Context management is manual

Today, if you want an agent to have relevant context, you write it into the system prompt. Manually. Every time. "Remember that the user prefers X." "The API returns Y format." "Previous analysis showed Z."

This doesn't scale. You can't manually curate context for every session, every agent, every API response.

Context engineering as infrastructure

Context engineering isn't prompt engineering. Prompt engineering is about crafting a single message. Context engineering is about what the model sees across its entire context window — and making that view precise.

Three principles:

Normalize: one format, any source

If every API returns a different shape, the model wastes reasoning on format parsing instead of content analysis. Normalize responses into a consistent data[] + meta{} + errors[] structure. The model learns one format and focuses on content.

Curate: right density for the task

Not every field deserves context window space. An API returning 47 fields when the agent needs 3 is wasting 44 fields worth of attention budget. Learn which fields matter and project the response down to those fields before it enters the conversation history.

This is what schema learning does — the agent teaches the infrastructure which fields are relevant, and all future responses are curated automatically.

Recall: relevant context from past sessions

When an agent queries an API it's queried before, the most relevant context isn't the raw response — it's what a previous agent concluded from that response. "BTC dominance rising, SOL underperforming" is more useful than 200 fields of raw price data.

Cross-session memory turns past analysis into relevant context for future sessions. Not the raw data — the conclusions.

What this looks like in practice

Without context engineering:

{"bitcoin":{"usd":67234.12,"usd_market_cap":1320984173209,"usd_24h_vol":28394857234,
"usd_24h_change":2.34,"last_updated_at":1707900000},"ethereum":{"usd":3456.78,...}}

47 fields dumped into the context window. The model parses the format, finds the 3 fields it needs, and discards the rest — but attention was already wasted.

With context engineering:

{
  "data": [
    {"id": "bitcoin", "price_usd": 67234.12, "change_24h": 2.34},
    {"id": "ethereum", "price_usd": 3456.78, "change_24h": -0.82}
  ],
  "meta": {
    "context": {"summary": "BTC dominance rising. SOL underperforming vs ETH."},
    "recalls": [{"age": "1d", "summary": "BTC at $71k, market bullish but elevated risk."}]
  }
}

3 relevant fields per item. A one-line summary from a previous session. Past conclusions injected as relevant context. The model spends zero attention on format parsing and 100% on reasoning.

The conversation history problem

Long-running agents accumulate massive conversation histories. Each turn adds system prompts, user messages, tool calls, and tool results. By turn 20, the conversation history might be 100K tokens — mostly stale.

The standard fix is conversation history truncation or summarization. But both lose information. Truncation drops early context that might be relevant. Summarization is lossy.

A better approach: don't put everything in the conversation history in the first place. Curate tool responses before they enter the context. Store conclusions in external memory. Inject only relevant context on the next call.

The context window should contain:

  1. The current task and system prompt
  2. Recent conversation turns (not all of them)
  3. Curated tool responses (not raw API dumps)
  4. Relevant recalled context from past sessions

Everything else is noise.

Context management is the next layer

Agent frameworks solved tool orchestration. Prompt engineering solved instruction quality. The next layer is context management — automated, infrastructure-level control over what enters the context window.

This means:

  • Schema learning — the agent teaches which fields matter, infrastructure curates automatically
  • Cross-session memory — conclusions persist and resurface as relevant context
  • Attention-aware curation — fewer tokens that matter more, not more tokens that compete for attention

The model doesn't need more context. It needs relevant context.


Harbor is open-source agent context infrastructure. It sits between your agents and their data sources, curating what enters the context window so every token earns its place. Learn more or read the docs.