Fundamental Assumption
AARM operates on a core premise: the AI orchestration layer cannot be trusted as a security boundary. The model processes untrusted inputs through opaque reasoning, producing actions that may serve attacker goals rather than user intent.Threat Summary
| Threat | Attack Vector | AARM Control |
|---|---|---|
| Prompt Injection | User input, documents, tool outputs | Policy enforcement, context-dependent deny |
| Malicious Tool Outputs | Adversarial tool responses | Post-tool action restrictions, context tracking |
| Confused Deputy | Ambiguous/malicious instructions | Step-up approval, intent alignment check |
| Over-Privileged Credentials | Excessive token scopes | Least-privilege, scoped credentials |
| Data Exfiltration | Action composition | Context accumulation, compositional policies |
| Goal Hijacking | Injected objectives | Action-level policy, semantic distance |
| Intent Drift | Agent reasoning divergence | Context accumulation, semantic distance, deferral |
| Memory Poisoning | Persistent context manipulation | Provenance tracking, anomaly detection |
| Cross-Agent Propagation | Multi-agent delegation | Cross-agent context tracking, transitive trust limits |
| Side-Channel Leakage | Logs, debug traces, API metadata | Output filtering, contextual sensitivity scoring |
| Environmental Manipulation | Modified system/environment state | Input provenance tracking, anomaly detection |
Attack Lifecycle
Attacks against AI agents typically follow four stages:- Injection — Attacker embeds malicious instructions in content the agent will process
- Hijacking — Agent interprets malicious content as legitimate instructions (unobservable)
- Execution — Agent invokes tools with attacker-controlled parameters using legitimate credentials
- Impact — Actions produce effects in external systems, often irreversible
Trust Assumptions
| Trust Level | Components |
|---|---|
| Trusted | AARM system, cryptographic primitives, policy store, underlying infrastructure |
| Untrusted | AI model, orchestration layer, user inputs, tool outputs, documents, agent memory |
| Partially Trusted | Tool implementations (AARM constrains invocation, not internal behavior), human approvers (subject to social engineering) |
Out of Scope
AARM addresses runtime action security. These threats require complementary controls:- Model training data poisoning — requires ML security and supply chain controls
- Denial of service against AARM — requires infrastructure redundancy
- Physical/infrastructure attacks — requires physical security
- Social engineering of approvers — requires security awareness training
- Vulnerabilities within tools — requires secure development practices
- Memory storage security — requires separate storage-level controls