Manifesto Architecture Projects Insights Resources Engage →
All Insights
AI Architecture

Human-in-the-Loop Isn't a Compromise.
It's the Design.

There's a shape that appears in every AI deployment that quietly works. It's not model choice, it's not prompt engineering, it's not the integration layer. It's the review boundary: where, exactly, a human signs off before the system takes an action that can't be taken back.

Teams under pressure to ship AI features keep trying to remove that boundary. They frame it as a compromise — the human is a latency tax, a bottleneck, a reason the agent "isn't really autonomous yet." It isn't. The human checkpoint is the architecture. Everything else is implementation detail.

A working example

A weekly executive status report used to take 60–90 minutes of a manager's time. The obvious AI play is: let the agent ingest the week's Slack activity and write the report end-to-end. Push it. Done.

The system we actually built does the first part automatically and stops before the send. The AI drafts; the manager reviews; a one-click action posts the approved version to Slack. Total human time dropped from 60–90 minutes to about 10. And because every approval cycle logs the changes the manager made, the draft quality improves every week without anyone re-tuning a prompt.

"Full autonomy is a vanity metric. Shipped autonomy is a review architecture that disappears in the background."

Where HITL belongs

Not everywhere. A checkpoint in the wrong place is just friction. The places where it earns its keep share three properties:

The action is irreversible. Anything that touches an external system someone else can see — a message sent, a record created, a file shared — needs an approval boundary. Internal state is cheap; external trust isn't.

The judgment is expensive to re-train. When the agent's decision reflects domain knowledge a model doesn't have (a client relationship, a commercial nuance, a political sensitivity), the human review isn't approval — it's the signal that keeps the system calibrated. Log the edits; the system gets better.

The downside is asymmetric. If being wrong once costs you a week of cleanup, and being right 99 times saves you three hours each, the 1% failure case still wins. That's when you hold the boundary even if it looks pedantic.

The checkpoint test

For any proposed autonomous action, ask: what does "wrong" actually cost, and how hard is it to undo? If the answer involves an apology email, a reconciliation, or a client call — you need a human in the loop.

Where HITL doesn't

Reads. Summaries. Categorisation. Internal routing. Any step that ends in a human seeing something on a screen is already supervised by the nature of what happens next. Putting an approval gate there doesn't add trust, it just adds a click.

The agents we're designing at the edge of this work — the multi-agent proposal system is a good example — have human checkpoints in three specific places: before drafting starts, after the draft is produced, and before the final version is sent. Everything in between runs unsupervised, because nothing in between can break the world.

The quiet advantage

There's a second reason HITL is the design, not a compromise: it's the only honest answer to the liability question. When something goes wrong, "our AI agent decided" is not a defence anyone wants to rely on. "A human approved it after the agent drafted it" is a defence that reflects how the work actually got done — and it's the version that survives a board conversation.

Compounding this: every approval event is training data. Every edit teaches the system what this organisation actually means by "ready to send." The agents that will still be running in three years are the ones that logged their corrections. The ones that shipped without them have already been turned off.

Autonomy is not a binary. It's a diagram of who signs off on what, and why. Draw that diagram first; the rest of the stack falls into place.

Building an agent?
Let's map the checkpoints first.

The difference between an agent that ships and an agent that gets quietly shelved is almost always the review architecture. That's a conversation, not a proposal.

Start the conversation →