→

B002. Detect adversarial input

B002

Detect adversarial input

Implement monitoring capabilities to detect and respond to adversarial inputs and prompt injection attempts

Keywords

Monitor

Adversarial

Jailbreak

Prompt Injection

Application

Optional

Frequency

Every 3 months

Type

Detective

Crosswalks

AML-M0003: Model Hardening

AML-M0015: Adversarial Input Detection

AML-M0024: AI Telemetry Logging

AML-M0021: Generative AI Guidelines

Article 15: Accuracy Robustness and Cybersecurity

Article 72: Post-Market Monitoring by Providers and Post-Market Monitoring Plan for High-Risk AI Systems

GOVERN 1.5: Risk monitoring and review

MEASURE 2.4: Production monitoring

MEASURE 2.7: Security and resilience

MEASURE 3.1: Emergent risk tracking

LLM01:25 - Prompt Injection

LLM08:25 - Vector and Embedding Weaknesses

LLM10:25 - Unbounded Consumption

AIS-08: Input Validation

MDS-07: Robustness against Adversarial Attack / Model Hardening

TVM-01: Threat and Vulnerability Management Policy and Procedures

TVM-04: Detection Updates

UEM-09: Anti-Malware Detection and Prevention

TVM-02: Malware and Malicious Instructions Protection Policy and Procedure

AIS-10: API Security

LOG-14: Input Monitoring

Agent Goal and Instruction Manipulation

Control activities

Typical evidence

Establishing detection and alerting. For example, implementing monitoring for prompt injection patterns, jailbreak techniques, adversarial input attempts, and exceeding rate limits, configuring alerts and threat notifications for suspicious activities.

B002.1 Config: Adversarial input detection and alerting

Screenshot of monitoring system, SIEM, or detection code showing rules and alerts for adversarial inputs - may include prompt injection detection patterns, jailbreak technique signatures, rate limit monitoring with threshold alerts, or notification configurations (Slack, PagerDuty, email)

Category

Technical Implementation

Engineering Code

Universal

Implementing incident logging and response procedures. For example, logging suspected adversarial attacks with relevant context, escalating to designated personnel based on severity, and documenting response actions in a centralized system.

B002.2 Logs: Adversarial incident and response

Screenshot of incident management system or logs showing adversarial attack handling - may include log entries with timestamps and user/session context, escalation runbooks defining severity thresholds, or incident tickets in Jira/PagerDuty/ServiceNow documenting response actions and workflows.

Category

Technical Implementation

LogsEngineering Tooling

Universal

Maintaining detection effectiveness through quarterly reviews. For example, updating detection rules based on emerging adversarial techniques, analyzing incident patterns and documenting system improvements.

B002.3 Documentation: Updates to detection config

Quarterly review documentation showing detection updates - for example, review meeting notes with incident pattern analysis, updated detection rules with version history, or tracking records showing rule improvements (e.g. GitHub/Jira tickets).

Category

Technical Implementation

Engineering PracticeInternal processes

Universal

Implementing adversarial input detection prior to AI model processing where feasible. For example, using pre-processing filters to flag likely threats before model processing.

B002.4 Config: Pre-processing adversarial detection

Screenshot of pre-processing filtering logic or gateway - may include pattern-matching or heuristic code checking inputs before model processing, WAF or API gateway rules blocking adversarial patterns, or IP-based filtering.

Category

Technical Implementation

Engineering Code

Universal

Integrating adversarial input detection into existing security operations tooling. For example, forwarding flagged inputs to SIEM platforms, correlating detection with authentication and network logs, enabling SOC teams to triage AI-related security events.

B002.5 Config: AI security alerts

Screenshot of SIEM platform, SOC tooling, or log forwarding configuration showing adversarial detection integration - may include Splunk/Datadog/Elastic SIEM ingesting AI adversarial alerts, correlation rules linking AI events with authentication or network logs, SOC dashboard displaying AI security event triage, or code forwarding flagged inputs to security platforms.

Category

Technical Implementation

Engineering Tooling

Universal

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Phil Venables

Former CISO of Google Cloud

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."