March 24, 2026

Guardrails Before Capabilities

The instinct with a new agent is to make it do more. The discipline is to first decide what it must never do. That order is not a detail. It is the design.

Ahmed

Founder, CEO & Software Engineer

6 min read

When a team gets a capable model in their hands, the first question is always "what else can we make it do." It is the wrong first question. The right one is "what must it never do, no matter how the conversation goes."

We build every production agent in that order. Boundaries first, abilities second. Not because we are cautious by temperament, but because the boundary is the part that determines whether the thing is safe to put in front of real users and real money.

Capability is easy, restraint is hard

Modern models are eager. Ask one to help and it will try, even when the honest answer is "I don't know" or "a human needs to handle this." Left alone, an agent will happily quote a policy it half-remembers or promise something the business cannot deliver.

Capability comes for free now. Restraint is the thing you have to engineer. So we start by drawing hard lines: what actions require confirmation, what the agent can commit to on the company's behalf, what inputs it should refuse outright, and where it must escalate rather than improvise.

An agent's value is not what it can do. It is what it reliably refuses to do when it should not.

The envelope comes first

We think of it as drawing a safe envelope, then adding capability inside it. The envelope is the set of things that are always true no matter what the user types. The agent never moves money without a human. It never invents a fact it cannot ground. It never argues with a frustrated customer, it escalates.

Once that envelope exists, adding abilities is low-risk, because every new ability lives inside the same boundaries. Without the envelope, every new ability is a new way to fail. This is why bolting guardrails on at the end never works. By then the capabilities have already shaped the system, and you are patching holes instead of preventing them.

Guardrails are product decisions, not safety theater

It is tempting to treat guardrails as a compliance checkbox. They are not. They are some of the most important product decisions you will make, because they define the agent's character.

A bot that refuses cleanly and hands off to a person feels trustworthy. A bot that tries to handle everything feels reckless the first time it gets something important wrong. Users do not remember the hundred questions an agent answered. They remember the one time it confidently told them something false. The guardrail is what prevents that memory.

How we actually enforce them

Instructions in a prompt are necessary but not sufficient. A model can be talked out of an instruction. So the lines that really matter are enforced in code, outside the model's reach.

A read-only database connection cannot be prompted into writing. A confirmation step the model cannot skip will always fire. A schema on the output means downstream systems never act on free-form text. The model is one layer of defense. The system around it is the layer you actually rely on. When a boundary matters, we put it somewhere the model cannot argue with it.

Start with the no

If you are building an agent, write the list of things it must never do before you write a single capability. Decide where it stops, encode those limits where the model cannot override them, and only then start adding what it can do. The capabilities are the easy, fun part. The boundary is the part that lets you sleep.

Keep reading