Core Assumption
AARM operates on a fundamental principle: the AI orchestration layer cannot be trusted as a security boundary. Unlike traditional applications where code behavior is deterministic, AI agents process untrusted inputs—user prompts, tool outputs, retrieved documents—that can manipulate their behavior. The agent may be:- Instructed to perform harmful actions
- Confused about what it should do
- Deceived about what it is doing
Trust Model
Trusted
- AARM control plane
- Cryptographic primitives
- Policy store
Untrusted
- AI model
- Agent orchestration
- User inputs
- Tool outputs
- Retrieved documents
Partially Trusted
- Tool implementations
Primary Threats
AARM addresses six threat categories. Three represent the most critical attack patterns:| Threat | Description | Impact |
|---|---|---|
| Prompt Injection | Malicious instructions override agent behavior | Unauthorized actions executed with legitimate credentials |
| Confused Deputy | Agent manipulated into unintended operations | Destructive actions the user never requested |
| Data Exfiltration | Composition of allowed actions creates breach | Sensitive data sent to unauthorized destinations |
| Threat | Description | AARM Control |
|---|---|---|
| Malicious Tool Outputs | Tool returns adversarial content that manipulates agent | Post-tool action restrictions |
| Over-Privileged Credentials | Tokens grant excessive permissions | Least-privilege, scoped credentials |
| Memory Poisoning | False data injected into persistent memory | Provenance tracking, anomaly detection |
Attack Lifecycle
1
Injection
Attacker embeds malicious instructions in user input, documents, or tool outputs
2
Hijacking
Agent interprets malicious content as legitimate instructions
3
Execution
Agent invokes tools with attacker-controlled parameters
4
Impact
Irreversible effects: data theft, unauthorized transactions, system damage
Out of Scope
| Threat | Why | Complementary Control |
|---|---|---|
| Model training poisoning | Pre-deployment | ML security, model provenance |
| DoS against AARM | Infrastructure | Availability controls |
| Social engineering of approvers | Human factor | Security training |