Inform, Then Trust | Monika Bishnoi

The Problem

Every agent harness hits the same wall: the model doesn't know when to stop.

I discovered this building Cal, my own agent harness, running daily for months. It would hit max iterations on simple tasks and finish in 2 calls on complex ones. Same model. Same tools. Completely unpredictable. The loop had no self-awareness.

When I studied how others handle this, I found what looked like two opposite approaches. But they turned out to be two points on the same spectrum.

OpenAI's Codex sits at one end. Rich context. No iteration cap. No spinning detection. The model runs until it decides it's done. High inform, high trust. This works beautifully for bounded tasks where "done" is obvious: tests pass, code compiles, stop. It breaks on anything open-ended. The model optimizes for thoroughness because RLHF trained it to, and without a frame for "done," it researches forever.

Anthropic's Claude Code sits at the other end. Rich context plus explicit constraints. Hard caps. Nine exit paths. Five tiers of compaction. A spinning detector that halts after three low-output turns. High inform, low trust. Built for a generation of models that genuinely needed guardrails. Claude 3 would hallucinate tool names and loop on the same query. Those guardrails saved sessions.

But models got smarter. The harness didn't relax its grip. The constraint that once protected the system now kills productive runs that need more room.

Both harnesses inform the model. They supply context, tools, system prompts. The difference is what happens next: how much autonomy the model gets to act on what it knows.

That's not two philosophies. It's one spectrum: The Trust Spectrum.

◄──────────────────────── TRUST ────────────────────────►

  Claude Code                                         Codex
  Low trust                                      High trust
  (heavy guardrails)                     (minimal guardrails)
  
              ▲
              │
         Every other harness
         sits somewhere here

Both inform. The difference is how much they trust the model to act on what it knows.

The Insight

This isn't an intelligence problem. It's an information asymmetry problem. The asymmetry is between the harness (which knows everything about the loop) and the model (which knows nothing about it).

RLHF trains models to be thorough. Without a fuel gauge, they optimize for "more" indefinitely. More research. More context. More tool calls. Not because they lack judgment. Because "good enough" is subjective, and no one gave them the constraint as information.

The moment you show a model what success looks like for this specific task (the budget, the tool patterns, what's worked before) it self-regulates. Not because you forced it. Because you informed it.

A brilliant researcher with infinite time and no deadline will refine forever. Give them a deadline and they ship. Not because the deadline made them smarter. Because it gave them a frame for "done." The model needs the same frame.

The foundational principle: inform, then trust.

Not a hard cap. Not blind hope. Informed autonomy.

Where you land on the spectrum depends on your model's maturity, your task's complexity, and your tolerance for risk. But the principle is the same everywhere: the more you inform, the more you can trust. Close the information gap, and the model's existing intelligence handles the rest.

Every harness is on this spectrum. Most just don't know where they've landed.

What This Means for Harness Design

If you see "inform, then trust" as the spectrum, not a binary choice, the implications cascade:

Your harness needs behavioral memory. Not just knowledge retrieval. A record of how past tasks actually ran. What tools were used. How many calls. What succeeded. What failed. What hit the cap. This is the raw material for informing the model. Without it, you have nothing to show it.

Your harness needs to surface that history as context. Before the loop starts, the model should see what success looks like for tasks like this one. Not as a command. As situational awareness. "Tasks like yours typically take 3-5 calls. Here's what's worked before. Use your judgment." One paragraph. Injected into the system prompt. The model reads it and plans accordingly.

Your safety cap becomes a backstop, not the primary control. The model lands because it knows where the runway is. Not because you cut the engine at an arbitrary altitude. The hard limit stays as a safety net for the rare case where everything else fails. But it's no longer the mechanism for loop control.

The better models get, the more you can move right on the spectrum, but you never stop informing. Even the best pilot still checks the fuel gauge. Post-trained models will internalize general tool-use efficiency. They'll learn "enough researching, time to write" the way a senior engineer does. But they'll never be trained on your specific harness, your tools, your user's patterns. That's a long tail no foundation model lab will cover. For your custom system, behavioral context remains valuable. Indefinitely.

I'm building toward this with Loop Pilot, an open experiment in behavior memory and trajectory prediction for custom agent harnesses. It's early. The predictions aren't accurate enough yet. But the principle is proven: when the model can see what success looks like, it flies differently.

The First Question

If you're building an agent harness, the first question isn't "what tools should it have?" or "what model should it use?"

It's: where on the spectrum do you want to be, and does the model know it's there?

If the answer is "I don't know," you've chosen a position. You just haven't chosen it consciously.

github.com/monbishnoi/loop-pilot