The Cumulative Restrictions Framework

The CRF is Corral’s security model for governing AI agent behavior based on what the agent has been exposed to. This page provides the technical detail behind the framework.

For the executive summary, see Security Architecture →.

The Problem

AI agents act on what they read. An agent that reads an email, searches the web, or processes a user-uploaded document is ingesting content from outside the trust boundary. That content can contain instructions — prompt injections — that manipulate the agent’s behavior.

The traditional security approach (access control: who can see what) is necessary but insufficient. Access control governs what the agent can read. The CRF governs what the agent can do after reading.

Core Principle

“If the agent ingests anything, then its permissions should be dropped to the level of the author of that information.”

Once an agent has processed untrusted content, you can’t trust its reasoning — including its reasoning about which data is safe to send outward. The CRF treats this as a binary per-conversation state: either the agent’s judgment is compromised, or it isn’t.

Conversation States

State	Value	Condition
Clean	Highest trust	Fresh conversation, no external data accessed
Internal	Medium trust	Has accessed internal/trusted data
Public (Tainted)	Lowest trust	Has accessed untrusted data

States only escalate, never de-escalate within a conversation. The transition function is Math.Min(current, source) — the conversation takes on the lowest trust level of any data it has ingested.

Tool Classification

Every tool call is classified along two dimensions:

Direction

Direction	Test	Effect
Ingress	New content enters agent context	Escalates conversation state
Egress	Freeform content sent outward	Restricted by conversation state
Operation	All parameters are sanitizable, no new content ingested	No state change, not restricted

Boundary

Boundary	Meaning
Internal	Tool interacts with internal/trusted systems
Public	Tool interacts with external/untrusted systems
None	Operations (no boundary crossing)

Classification Examples

Tool	Direction	Boundary
Web search	Ingress	Public
Read external email	Ingress	Public
Read internal documents	Ingress	Internal
Send internal email	Egress	Internal
Send external email	Egress	Public
Move a file	Operation	None
Rename a folder	Operation	None

The key distinction: sanitizable parameters (file paths, IDs, enum values, structured queries) vs. freeform parameters (message bodies, file content, arbitrary text). If every outbound parameter is sanitizable, the call is an operation. If freeform content goes outward, it’s egress.

Egress Rules

Conversation State	Egress to Internal	Egress to Public
Clean	Allowed	Allowed
Internal	Allowed	Fork required
Tainted	Fork required	Fork required

Reading and operations are always allowed. A tainted conversation can continue reading sources and performing operations. Restrictions apply only to egress.

The Fork Mechanism

When egress is blocked, the system serializes the intended action into a self-contained, human-readable prompt:

The agent describes what it wants to do (send this email, call this API, push this code)
The serialized action is presented to the user in a review card
The user sees exactly what would be sent, including any content that originated from untrusted sources
The user can edit, approve, reject, or report
On approval, a fresh clean conversation executes the action

Why This Works

Injection attacks hide instructions in content the agent reads. The fork mechanism forces those instructions to be serialized as visible text. The attack surface shifts from “hidden instructions manipulate the agent” to “the user must be fooled by visible text” — a dramatically easier problem to catch.

Anomaly Detection

The fork review includes basic anomaly detection:

New domains — highlight email addresses or URLs the agent hasn’t seen before
Typosquatting — flag domains that look like known domains with subtle differences
Fork frequency — many fork requests in quick succession are flagged
Visual indicators — suspicious elements are highlighted in the review UI

Modes of Operation

Copilot Mode

Starts Clean — fresh context per session
Accumulates taint as it works
When tainted, egress requires fork review

Assistant Mode

Starts Tainted — long-running, has ingested mixed-trust content over time
Can read and reason about anything
All egress goes through fork review
Produces prompts for clean copilots to execute

Assistant Sleep Cycle (Planned)

After a period of inactivity, the assistant:

Summarizes the session’s key context
Context is wiped
Next session: summary presented to the user for review before continuing

This creates another airlock where accumulated injection must survive as visible text.

Implementation Status

Phase	Status	What It Does
Phase 1: Core Vocabulary	Complete	TrustLevel, ToolDirection, ToolClassification, RestrictionRules — the shared types and pure functions
Phase 2: State Tracking	In progress	Conversation taint tracking, persistence, taint transition events, frontend indicators
Phase 3: Soft Enforcement	Planned	Classify tools, wrap invocations, log violations without blocking
Phase 4: Hard Enforcement	Planned	Block egress when taint rules prohibit, return structured ForkRequired results
Phase 5: Fork Mechanism	Planned	Serialize blocked actions, user review UI, clean conversation execution
Phase 6: Memory & Sub-Agent Taint	Future	Taint propagation through persistent memory and sub-agent composition

The vocabulary and rules engine are tested and in production code. Enforcement is being rolled out incrementally — observation before blocking.

Why Conversation-Level Taint

Some approaches (e.g., per-variable taint tracking) attempt finer granularity — tracking which output came from which input so you can apply policies per data item.

The CRF rejects this approach. If an agent’s context contains a prompt injection, the agent’s reasoning is already compromised when it decides which variable to assign data to. An injection could tell the agent to label exfiltration payload as “clean,” and a per-variable policy engine would allow it through.

Conversation-level taint is honest about the reality: once compromised, the agent’s judgment is untrusted for everything, including self-assessment.

Extensibility

The classification scheme (direction, boundary, conversation state, egress rules) is shared vocabulary in the platform’s core. Features consume it independently:

Feature	Consumes	Purpose
Cumulative Restrictions	Classification + state + rules	Enforce what a conversation can do
Risk Scoring (future)	Classification from agent config	Assess agent tool combinations before publishing
Audit Trail (future)	Classification + state transitions	Log taint transitions for compliance

The vocabulary is core; the opinions are features. “This tool is egress + public” is vocabulary. “An agent with 3 public egress tools needs approval” is an opinion that can be added or removed.