Skip to main content

Conformance Levels

LevelRequirements
AARM CoreR1–R6 (all MUST)
AARM ExtendedR1–R9 (all MUST + SHOULD)
Systems may claim partial conformance by specifying which requirements are satisfied, but only systems satisfying all MUST requirements may use the designation “AARM-conformant.”

Claiming Conformance

AARM is an open specification maintained by a community that has invested significant effort in defining rigorous, vendor-neutral requirements for securing AI-driven actions at runtime. To protect the integrity of that work and ensure that “AARM-conformant” carries real meaning, we ask that organizations claiming conformance agree to the following.
1

Help shape the AARM system category definition

If your system is conformant, you agree to help shape and drive the AARM system category definition.
2

Engage with the AARM community

Be prepared to participate in the AARM Technical Working Group, conformance discussions, or related community activities. Claiming conformance is not a one-way assertion. It comes with an expectation of engagement with the community that maintains the specification. This ensures that conformance claims are grounded in real implementation experience, that feedback flows back into the specification, and that the broader ecosystem benefits.
3

Operate a production system with active customers

Conformance claims should be backed by a working, production-deployed solution that is actively used by customers. AARM conformance describes the runtime behavior of a live system, not a design document or a roadmap.
4

Hold a recognized security certification

The organization claiming conformance should hold at least one recognized security certification (e.g., SOC 2 Type II, ISO 27001, FedRAMP) relevant to the environment in which the AARM system operates. This establishes baseline organizational security maturity and provides independent assurance that the system is operated within a controlled security program.
5

Participate in future benchmarking

AARM will publish comparable benchmarks to measure policy detection and enforcement metrics across conformant systems, giving buyers objective data to evaluate which tools are most effective for their needs. By claiming conformance, you agree to actively participate in those benchmarking efforts when they become available.

Required (MUST)

R1: Pre-Execution Interception

The system MUST intercept actions before execution and be capable of blocking or deferring based on policy evaluation.
✓ Actions matching DENY policies do not execute
✓ No effects occur on target systems for denied or deferred actions
✓ No fail-open mode that bypasses policy evaluation
✓ Denial and deferral decisions recorded with the matching policy and reason
Verification: Configure a DENY policy, submit a matching action, verify the action does not execute and a denial receipt is generated. Configure a DEFER condition, verify the action is suspended without effects.

R2: Context Accumulation

The system MUST accumulate session context across actions within a session.
✓ Track prior actions executed in the session
✓ Track data classifications accessed (via explicit labels, pattern detection, or policy-defined rules)
✓ Default to highest sensitivity level when no classification mechanism produces a result
✓ Maintain original user request (when available) for intent alignment
✓ Make accumulated context available to policy evaluation
The session context store SHOULD be implemented as an append-only, hash-chained log. Each context entry SHOULD include a cryptographic hash of the previous entry, forming a tamper-evident chain that detects retroactive modification. Verification: Execute a sequence of actions, verify the policy engine receives accumulated context for each subsequent action. If hash-chaining is implemented, verify that tampering with a prior context entry is detectable.

R3: Policy Evaluation with Intent Alignment

The system MUST evaluate actions against both static policy and contextual intent alignment.
✓ Support action classification: forbidden, context-dependent deny, context-dependent allow, context-dependent defer
✓ Evaluate forbidden actions against static policy with immediate denial
✓ Evaluate context-dependent actions against accumulated session context
✓ Defer actions when the policy engine cannot reach a confident decision
✓ Support parameter validation: type, range, pattern, allowlist/blocklist
Deferral MUST be triggered when:
  • A policy rule’s match predicate references context fields not yet populated in the session
  • Multiple applicable policies produce conflicting decisions at the same priority level
  • A confidence score (if implemented) falls below a deployment-configured threshold
The conditions triggering deferral MUST be documented and auditable. Verification: Configure policies for each classification type, verify correct evaluation behavior for each, including deferral for ambiguous context.

R4: Authorization Decisions

The system MUST support five authorization decisions: ALLOW, DENY, MODIFY, STEP_UP, and DEFER.
DecisionBehavior
ALLOWAction proceeds unchanged
DENYAction blocked, no effects occur
MODIFYAction proceeds with transformed parameters
STEP_UPAction paused pending human approval
DEFERAction temporarily suspended due to insufficient, ambiguous, or conflicting context
STEP_UP requirements:
✓ Action execution blocks until approval decision is received
✓ Approval requests routed to configured approvers
✓ Configurable timeouts enforced (DENY on timeout recommended)
✓ Full action context available to approvers
DEFER requirements:
✓ Execution paused until sufficient context is collected, additional validation performed, or safe constraints applied
✓ Deferred actions tracked with execution order maintained relative to other operations
✓ Deferred actions preserve security: no premature execution of high-risk operations
✓ Configurable timeouts enforced; DENY on timeout is the default; fail-open on timeout is not permitted
✓ Dependent actions (those relying on a deferred action's output) must also be deferred
✓ Independent actions should proceed without blocking
✓ Cascading deferrals bounded by a configurable limit; exceeding the limit results in DENY
✓ Deferred actions recorded in receipts with deferral reason; resolution or timeout generates a follow-up receipt
Verification: Configure policies producing each decision type, verify correct enforcement behavior including deferral suspension and resolution.

R5: Tamper-Evident Receipts

The system MUST generate cryptographically signed receipts for all actions. Receipts MUST contain:
✓ Action: tool, operation, parameters, timestamp
✓ Context: session identifier, accumulated context at decision time
✓ Identity: human principal, service identity, agent identity, role/privilege scope
✓ Decision: result (ALLOW/DENY/MODIFY/STEP_UP/DEFER), policy matched, reason
✓ Approval: if applicable, approver identity, decision, timestamp
✓ Deferral: if applicable, deferral reason, resolution method, resolution timestamp
✓ Outcome: execution result, error details if failed
✓ Signature: cryptographic signature verifiable offline
Signature requirements:
✓ Secure algorithm (Ed25519, ECDSA P-256, or RSA-2048 minimum)
✓ Sign canonical serialization of receipt contents
✓ Public keys available for offline verification
Verification: Generate receipts for allowed, denied, deferred, and step-up actions. Verify all fields present and signature validates.

R6: Identity Binding

The system MUST bind actions to identities at multiple levels:
✓ Human principal: the user on whose behalf the agent acts
✓ Service identity: the service account executing the action
✓ Agent identity: the specific agent instance
✓ Session context: identifier linking related actions
✓ Role and privilege scope: permissions associated with each identity at the time of action
Requirements:
✓ Identity captured at action submission time and preserved for deferred or delegated actions
✓ Identity claims validated against trusted sources, including freshness and revocation status
✓ Actions without verifiable identity denied or flagged
✓ Identity information recorded in tamper-evident receipts for audit and forensic purposes
Verification: Submit actions from different principals and sessions, verify receipts correctly attribute each including role/privilege scope. Verify that identity is preserved across deferral and resolution.

R7: Semantic Distance Tracking

The system SHOULD compute semantic distance between actions and stated intent to detect intent drift. Given the original request r₀ and the current action aₙ, semantic distance can be computed via embedding similarity:
d(r₀, aₙ) = 1 − cosine(embed(r₀), embed(aₙ))
Implementation considerations:
○ Embedding model should produce meaningful similarity scores between natural language requests and structured action descriptors
○ Validate embedding model suitability through calibration against known-benign and known-malicious action sequences
○ Track cumulative drift across action sequences, not only per-action
○ Aggregation method (e.g., running maximum, exponential moving average) is implementation-defined but should be documented
○ Drift thresholds are deployment-specific and should be calibrated empirically
○ Trigger alerts, deferral, or escalation when drift exceeds configured thresholds
Verification: Configure drift thresholds, execute diverging action sequences, verify escalation or deferral triggers. Verify that cumulative drift tracking detects gradual divergence across multi-step sequences, not only single-action anomalies.

R8: Telemetry Export

The system SHOULD export structured telemetry to security platforms.
○ Real-time streaming within seconds of occurrence
○ Standard schemas (OCSF, CEF, or documented custom)
○ Configurable filtering by action type, decision (including DEFER), identity
○ Batch export for historical analysis
Verification: Configure export to SIEM, verify events appear with correct schema including deferral events.

R9: Least Privilege Enforcement

The system SHOULD support credential scoping for minimal permissions per action.
○ Just-in-time credential issuance with minimal validity period
○ Operation-specific scoping (e.g., read-only for query operations)
○ Credential usage logged for audit
Verification: Submit read operation, verify issued credential cannot perform writes.

Summary Table

IDLevelRequirement
R1MUSTPre-execution interception: block or defer actions before execution
R2MUSTContext accumulation: track prior actions, data classifications, original request
R3MUSTPolicy evaluation with intent alignment: forbidden, context-dependent deny/allow/defer
R4MUSTFive authorization decisions: ALLOW, DENY, MODIFY, STEP_UP, DEFER
R5MUSTTamper-evident receipts: cryptographically signed with full context
R6MUSTIdentity binding: human, service, agent, session, and role/privilege scope
R7SHOULDSemantic distance tracking: detect intent drift via embedding similarity
R8SHOULDTelemetry export: structured events to SIEM/SOAR platforms
R9SHOULDLeast privilege enforcement: scoped, just-in-time credentials