Overview
Intent drift occurs when an AI agent’s actions gradually diverge from what the user originally asked for. Unlike prompt injection (external manipulation) or confused deputy (credential misuse), intent drift can happen through the agent’s own reasoning process.Intent drift is subtle. Each individual action may seem reasonable. It’s only when you examine the full sequence that the divergence becomes apparent.
How Intent Drift Happens
Reasoning Chain Divergence
The agent interprets instructions, makes inferences, and takes actions. At each step, small misinterpretations compound.Goal Expansion
Agents may expand the scope of a task beyond what was requested.| User Asked | Agent Interpreted | Drift |
|---|---|---|
| ”Send the report to my team" | "Ensure the team can act on it” | Sends to team + schedules follow-up meetings + creates tasks |
| ”Fix the bug in the login page" | "Improve the login experience” | Fixes bug + refactors authentication + updates dependencies |
| ”Find sales numbers for Q3" | "Provide comprehensive sales analysis” | Queries Q3 + pulls Q1-Q4 + accesses compensation data for context |
Error Recovery Escalation
When an agent encounters errors, it may escalate privileges or expand scope to “solve” the problem.Why Intent Drift Is Dangerous
Legitimate Credentials
Unlike prompt injection, the agent isn’t being manipulated by external content. It’s using its legitimate access based on its own (flawed) reasoning. This makes detection harder.Plausible Deniability
Each action in the chain has a reasonable explanation. There’s no obvious “malicious” moment.Gradual Escalation
By the time the drift becomes severe, the agent has already accessed sensitive resources or taken consequential actions.User Trust
Users may not monitor agent actions closely, especially for routine tasks. They trust the agent to stay on task.Detection Signals
| Signal | Indicates |
|---|---|
| Scope expansion | Actions accessing resources unrelated to original request |
| Privilege escalation | Attempting higher-privilege operations than task requires |
| Chain length | Unusually long action sequences for simple requests |
| Classification creep | Accessing increasingly sensitive data categories |
| Goal restatement | Agent’s stated goal differs from user’s original request |
AARM Mitigations
Context Accumulator
Track the chain of reasoning from original request through each action. Detect when the semantic distance grows too large.Intent Boundaries
Define boundaries around what actions are reasonable for a given intent category.Drift Detection Rules
Periodic Re-validation
For long-running tasks, periodically verify the agent is still aligned with user intent.Examples
Example: Meeting prep drift
Example: Meeting prep drift
Original request: “Prepare briefing for Johnson meeting”Action chain:
crm.query("Johnson account")✓email.search("from:johnson")✓email.search("competitor pricing")⚠️ Drift startingdocuments.search("market analysis")⚠️file.read("/strategy/competitive-response.docx")❌ Significant drift
- Semantic distance: 0.75 at step 4
- Data classification escalated: INTERNAL → CONFIDENTIAL
- Action: STEP-UP at step 5
Example: Bug fix scope creep
Example: Bug fix scope creep
Original request: “Fix the null pointer on line 42”Action chain:
code.read("login.py")✓code.write("login.py", fix)✓code.read("auth_utils.py")⚠️ Expanding scopecode.write("auth_utils.py", refactor)⚠️package.update("cryptography")❌ Beyond request
- Files modified: 1 → 3
- Dependencies changed: 0 → 1
- Action: DENY at step 5, notify user
Comparison to Other Threats
| Threat | Source | Detection | Mitigation |
|---|---|---|---|
| Prompt Injection | External malicious content | Injection patterns, anomaly | Input filtering, action validation |
| Confused Deputy | Credential misuse under manipulation | Unusual credential usage | Least privilege, step-up auth |
| Data Exfiltration | Compositional attack | Read-then-send patterns | Destination allowlists, session tracking |
| Intent Drift | Agent’s own reasoning | Semantic distance, scope expansion | Intent boundaries, periodic re-validation |