Corral
Learn > Documentation

The Cumulative Restrictions Framework

The CRF is Corral’s security model for governing AI agent behavior based on what the agent has been exposed to. This page provides the technical detail behind the framework.

For the executive summary, see Security Architecture →.


The Problem

AI agents act on what they read. An agent that reads an email, searches the web, or processes a user-uploaded document is ingesting content from outside the trust boundary. That content can contain instructions — prompt injections — that manipulate the agent’s behavior.

The traditional security approach (access control: who can see what) is necessary but insufficient. Access control governs what the agent can read. The CRF governs what the agent can do after reading.


Core Principle

“If the agent ingests anything, then its permissions should be dropped to the level of the author of that information.”

Once an agent has processed untrusted content, you can’t trust its reasoning — including its reasoning about which data is safe to send outward. The CRF treats this as a binary per-conversation state: either the agent’s judgment is compromised, or it isn’t.


Conversation States

StateValueCondition
CleanHighest trustFresh conversation, no external data accessed
InternalMedium trustHas accessed internal/trusted data
Public (Tainted)Lowest trustHas accessed untrusted data

States only escalate, never de-escalate within a conversation. The transition function is Math.Min(current, source) — the conversation takes on the lowest trust level of any data it has ingested.


Tool Classification

Every tool call is classified along two dimensions:

Direction

DirectionTestEffect
IngressNew content enters agent contextEscalates conversation state
EgressFreeform content sent outwardRestricted by conversation state
OperationAll parameters are sanitizable, no new content ingestedNo state change, not restricted

Boundary

BoundaryMeaning
InternalTool interacts with internal/trusted systems
PublicTool interacts with external/untrusted systems
NoneOperations (no boundary crossing)

Classification Examples

ToolDirectionBoundary
Web searchIngressPublic
Read external emailIngressPublic
Read internal documentsIngressInternal
Send internal emailEgressInternal
Send external emailEgressPublic
Move a fileOperationNone
Rename a folderOperationNone

The key distinction: sanitizable parameters (file paths, IDs, enum values, structured queries) vs. freeform parameters (message bodies, file content, arbitrary text). If every outbound parameter is sanitizable, the call is an operation. If freeform content goes outward, it’s egress.


Egress Rules

Conversation StateEgress to InternalEgress to Public
CleanAllowedAllowed
InternalAllowedFork required
TaintedFork requiredFork required

Reading and operations are always allowed. A tainted conversation can continue reading sources and performing operations. Restrictions apply only to egress.


The Fork Mechanism

When egress is blocked, the system serializes the intended action into a self-contained, human-readable prompt:

  1. The agent describes what it wants to do (send this email, call this API, push this code)
  2. The serialized action is presented to the user in a review card
  3. The user sees exactly what would be sent, including any content that originated from untrusted sources
  4. The user can edit, approve, reject, or report
  5. On approval, a fresh clean conversation executes the action

Why This Works

Injection attacks hide instructions in content the agent reads. The fork mechanism forces those instructions to be serialized as visible text. The attack surface shifts from “hidden instructions manipulate the agent” to “the user must be fooled by visible text” — a dramatically easier problem to catch.

Anomaly Detection

The fork review includes basic anomaly detection:

  • New domains — highlight email addresses or URLs the agent hasn’t seen before
  • Typosquatting — flag domains that look like known domains with subtle differences
  • Fork frequency — many fork requests in quick succession are flagged
  • Visual indicators — suspicious elements are highlighted in the review UI

Modes of Operation

Copilot Mode

  • Starts Clean — fresh context per session
  • Accumulates taint as it works
  • When tainted, egress requires fork review

Assistant Mode

  • Starts Tainted — long-running, has ingested mixed-trust content over time
  • Can read and reason about anything
  • All egress goes through fork review
  • Produces prompts for clean copilots to execute

Assistant Sleep Cycle (Planned)

After a period of inactivity, the assistant:

  1. Summarizes the session’s key context
  2. Context is wiped
  3. Next session: summary presented to the user for review before continuing

This creates another airlock where accumulated injection must survive as visible text.


Implementation Status

PhaseStatusWhat It Does
Phase 1: Core VocabularyCompleteTrustLevel, ToolDirection, ToolClassification, RestrictionRules — the shared types and pure functions
Phase 2: State TrackingIn progressConversation taint tracking, persistence, taint transition events, frontend indicators
Phase 3: Soft EnforcementPlannedClassify tools, wrap invocations, log violations without blocking
Phase 4: Hard EnforcementPlannedBlock egress when taint rules prohibit, return structured ForkRequired results
Phase 5: Fork MechanismPlannedSerialize blocked actions, user review UI, clean conversation execution
Phase 6: Memory & Sub-Agent TaintFutureTaint propagation through persistent memory and sub-agent composition

The vocabulary and rules engine are tested and in production code. Enforcement is being rolled out incrementally — observation before blocking.


Why Conversation-Level Taint

Some approaches (e.g., per-variable taint tracking) attempt finer granularity — tracking which output came from which input so you can apply policies per data item.

The CRF rejects this approach. If an agent’s context contains a prompt injection, the agent’s reasoning is already compromised when it decides which variable to assign data to. An injection could tell the agent to label exfiltration payload as “clean,” and a per-variable policy engine would allow it through.

Conversation-level taint is honest about the reality: once compromised, the agent’s judgment is untrusted for everything, including self-assessment.


Extensibility

The classification scheme (direction, boundary, conversation state, egress rules) is shared vocabulary in the platform’s core. Features consume it independently:

FeatureConsumesPurpose
Cumulative RestrictionsClassification + state + rulesEnforce what a conversation can do
Risk Scoring (future)Classification from agent configAssess agent tool combinations before publishing
Audit Trail (future)Classification + state transitionsLog taint transitions for compliance

The vocabulary is core; the opinions are features. “This tool is egress + public” is vocabulary. “An agent with 3 public egress tools needs approval” is an opinion that can be added or removed.