Skip to main content

Core Assumption

AARM operates on a fundamental principle: the AI orchestration layer cannot be trusted as a security boundary. Unlike traditional applications where code behavior is deterministic, AI agents process untrusted inputs—user prompts, tool outputs, retrieved documents—that can manipulate their behavior. The agent may be:
  • Instructed to perform harmful actions
  • Confused about what it should do
  • Deceived about what it is doing
AARM treats the agent as a potentially compromised component and enforces security at the action layer—the boundary where decisions become operations on external systems.

Trust Model

Trusted

  • AARM control plane
  • Cryptographic primitives
  • Policy store

Untrusted

  • AI model
  • Agent orchestration
  • User inputs
  • Tool outputs
  • Retrieved documents

Partially Trusted

  • Tool implementations

Primary Threats

AARM addresses six threat categories. Three represent the most critical attack patterns:
ThreatDescriptionImpact
Prompt InjectionMalicious instructions override agent behaviorUnauthorized actions executed with legitimate credentials
Confused DeputyAgent manipulated into unintended operationsDestructive actions the user never requested
Data ExfiltrationComposition of allowed actions creates breachSensitive data sent to unauthorized destinations
Additional threats AARM mitigates:
ThreatDescriptionAARM Control
Malicious Tool OutputsTool returns adversarial content that manipulates agentPost-tool action restrictions
Over-Privileged CredentialsTokens grant excessive permissionsLeast-privilege, scoped credentials
Memory PoisoningFalse data injected into persistent memoryProvenance tracking, anomaly detection

Attack Lifecycle

1

Injection

Attacker embeds malicious instructions in user input, documents, or tool outputs
2

Hijacking

Agent interprets malicious content as legitimate instructions
3

Execution

Agent invokes tools with attacker-controlled parameters
4

Impact

Irreversible effects: data theft, unauthorized transactions, system damage
AARM intervenes between steps 2 and 3—after the agent decides to act, but before the action executes.

Out of Scope

ThreatWhyComplementary Control
Model training poisoningPre-deploymentML security, model provenance
DoS against AARMInfrastructureAvailability controls
Social engineering of approversHuman factorSecurity training

Deep Dives