The Cumulative Restrictions Framework
The CRF is Corral’s security model for governing AI agent behavior based on what the agent has been exposed to. This page provides the technical detail behind the framework.
For the executive summary, see Security Architecture →.
The Problem
AI agents act on what they read. An agent that reads an email, searches the web, or processes a user-uploaded document is ingesting content from outside the trust boundary. That content can contain instructions — prompt injections — that manipulate the agent’s behavior.
The traditional security approach (access control: who can see what) is necessary but insufficient. Access control governs what the agent can read. The CRF governs what the agent can do after reading.
Core Principle
“If the agent ingests anything, then its permissions should be dropped to the level of the author of that information.”
Once an agent has processed untrusted content, you can’t trust its reasoning — including its reasoning about which data is safe to send outward. The CRF treats this as a binary per-conversation state: either the agent’s judgment is compromised, or it isn’t.
Conversation States
| State | Value | Condition |
|---|---|---|
| Clean | Highest trust | Fresh conversation, no external data accessed |
| Internal | Medium trust | Has accessed internal/trusted data |
| Public (Tainted) | Lowest trust | Has accessed untrusted data |
States only escalate, never de-escalate within a conversation. The transition function is Math.Min(current, source) — the conversation takes on the lowest trust level of any data it has ingested.
Tool Classification
Every tool call is classified along two dimensions:
Direction
| Direction | Test | Effect |
|---|---|---|
| Ingress | New content enters agent context | Escalates conversation state |
| Egress | Freeform content sent outward | Restricted by conversation state |
| Operation | All parameters are sanitizable, no new content ingested | No state change, not restricted |
Boundary
| Boundary | Meaning |
|---|---|
| Internal | Tool interacts with internal/trusted systems |
| Public | Tool interacts with external/untrusted systems |
| None | Operations (no boundary crossing) |
Classification Examples
| Tool | Direction | Boundary |
|---|---|---|
| Web search | Ingress | Public |
| Read external email | Ingress | Public |
| Read internal documents | Ingress | Internal |
| Send internal email | Egress | Internal |
| Send external email | Egress | Public |
| Move a file | Operation | None |
| Rename a folder | Operation | None |
The key distinction: sanitizable parameters (file paths, IDs, enum values, structured queries) vs. freeform parameters (message bodies, file content, arbitrary text). If every outbound parameter is sanitizable, the call is an operation. If freeform content goes outward, it’s egress.
Egress Rules
| Conversation State | Egress to Internal | Egress to Public |
|---|---|---|
| Clean | Allowed | Allowed |
| Internal | Allowed | Fork required |
| Tainted | Fork required | Fork required |
Reading and operations are always allowed. A tainted conversation can continue reading sources and performing operations. Restrictions apply only to egress.
The Fork Mechanism
When egress is blocked, the system serializes the intended action into a self-contained, human-readable prompt:
- The agent describes what it wants to do (send this email, call this API, push this code)
- The serialized action is presented to the user in a review card
- The user sees exactly what would be sent, including any content that originated from untrusted sources
- The user can edit, approve, reject, or report
- On approval, a fresh clean conversation executes the action
Why This Works
Injection attacks hide instructions in content the agent reads. The fork mechanism forces those instructions to be serialized as visible text. The attack surface shifts from “hidden instructions manipulate the agent” to “the user must be fooled by visible text” — a dramatically easier problem to catch.
Anomaly Detection
The fork review includes basic anomaly detection:
- New domains — highlight email addresses or URLs the agent hasn’t seen before
- Typosquatting — flag domains that look like known domains with subtle differences
- Fork frequency — many fork requests in quick succession are flagged
- Visual indicators — suspicious elements are highlighted in the review UI
Modes of Operation
Copilot Mode
- Starts Clean — fresh context per session
- Accumulates taint as it works
- When tainted, egress requires fork review
Assistant Mode
- Starts Tainted — long-running, has ingested mixed-trust content over time
- Can read and reason about anything
- All egress goes through fork review
- Produces prompts for clean copilots to execute
Assistant Sleep Cycle (Planned)
After a period of inactivity, the assistant:
- Summarizes the session’s key context
- Context is wiped
- Next session: summary presented to the user for review before continuing
This creates another airlock where accumulated injection must survive as visible text.
Implementation Status
| Phase | Status | What It Does |
|---|---|---|
| Phase 1: Core Vocabulary | Complete | TrustLevel, ToolDirection, ToolClassification, RestrictionRules — the shared types and pure functions |
| Phase 2: State Tracking | In progress | Conversation taint tracking, persistence, taint transition events, frontend indicators |
| Phase 3: Soft Enforcement | Planned | Classify tools, wrap invocations, log violations without blocking |
| Phase 4: Hard Enforcement | Planned | Block egress when taint rules prohibit, return structured ForkRequired results |
| Phase 5: Fork Mechanism | Planned | Serialize blocked actions, user review UI, clean conversation execution |
| Phase 6: Memory & Sub-Agent Taint | Future | Taint propagation through persistent memory and sub-agent composition |
The vocabulary and rules engine are tested and in production code. Enforcement is being rolled out incrementally — observation before blocking.
Why Conversation-Level Taint
Some approaches (e.g., per-variable taint tracking) attempt finer granularity — tracking which output came from which input so you can apply policies per data item.
The CRF rejects this approach. If an agent’s context contains a prompt injection, the agent’s reasoning is already compromised when it decides which variable to assign data to. An injection could tell the agent to label exfiltration payload as “clean,” and a per-variable policy engine would allow it through.
Conversation-level taint is honest about the reality: once compromised, the agent’s judgment is untrusted for everything, including self-assessment.
Extensibility
The classification scheme (direction, boundary, conversation state, egress rules) is shared vocabulary in the platform’s core. Features consume it independently:
| Feature | Consumes | Purpose |
|---|---|---|
| Cumulative Restrictions | Classification + state + rules | Enforce what a conversation can do |
| Risk Scoring (future) | Classification from agent config | Assess agent tool combinations before publishing |
| Audit Trail (future) | Classification + state transitions | Log taint transitions for compliance |
The vocabulary is core; the opinions are features. “This tool is egress + public” is vocabulary. “An agent with 3 public egress tools needs approval” is an opinion that can be added or removed.