AIUC-1
B005

Implement real-time input filtering

Implement real-time input filtering using automated moderation tools

Keywords
Prompt Injection
Jailbreak
Adversarial Input Protection
Application
Optional
Frequency
Every 12 months
Type
Detective
Crosswalks
LLM01:25 - Prompt Injection
LLM04:25 - Data and Model Poisoning
LLM10:25 - Unbounded Consumption
AML-M0015: Adversarial Input Detection
AML-M0021: Generative AI Guidelines
MEASURE 2.7: Security and resilience
LOG-14: Input Monitoring
AIS-08: Input Validation
AIS-15: Prompt Differentiation
Integrating automated moderation tools to filter inputs before they reach the foundation model. For example, integrating third-party moderation APIs, implementing custom filtering rules, configuring blocking or warning actions for flagged content, and establishing confidence thresholds based on risk category and severity
B005.1 Config: Input filtering

Screenshot of moderation tool integration showing API configuration, filtering rules, action settings (block/warn/modify), and confidence thresholds for different violation categories - this could be screenshots of configuration files, admin dashboard settings, or API integration code. Example moderation tools: OpenAI Moderation API, Claude content filtering, VirtueAI/Hive/Spectrum Labs

Eng: User LLM input filtering logicEngineering Tooling
Text-generationVoice-generationImage-generation
Documenting the moderation logic and rationale. For example, explaining chosen moderation tools, threshold justifications, and decision criteria for different risk categories.
B005.2 Documentation: Input moderation approach

Document explaining moderation approach including tool selection rationale, threshold settings with justifications, action logic for different violation types, and examples of how different input categories are handled.

Internal processesEngineering Practice
Text-generationVoice-generationImage-generation

Providing feedback to users when inputs are blocked.

B005.3 Demonstration: Warning for blocked inputs

Screenshot of user-facing messages or UI flows showing how blocked inputs are communicated to users - this could be error messages, warning dialogs, or alternative suggestions provided when content is filtered.

Product
Text-generationVoice-generationImage-generation
Logging flagged prompts for analysis and refinement of filters, while ensuring compliance with privacy obligations.
B005.4 Logs: Input filtering

Screenshot of logging system showing how flagged inputs are captured, what metadata is included/excluded for privacy, retention policies, and audit trail - may include privacy documentation explaining logging disclosures to users.

Logs
Text-generationVoice-generationImage-generation
Periodically evaluating filter performance and adjusting thresholds accordingly. For example, accuracy, latency, false positives/negatives.
B005.5 Documentation: Input filter performance

Report or dashboard showing analysis of filter performance metrics (false positives, false negatives, accuracy, latency) and documented threshold adjustments made based on performance data - should include timestamps and rationale for changes.

Engineering Practice
Text-generationVoice-generationImage-generation

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

Phil Venables

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Google Cloud
Phil Venables
Former CISO of Google Cloud
Dr. Christina Liaghati

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."

MITRE
Dr. Christina Liaghati
MITRE ATLAS lead
Hyrum Anderson

"Today, enterprises can't reliably assess the security of their AI vendors— we need a standard to address this gap."

Cisco
Hyrum Anderson
Senior Director, Security & AI
Prof. Sanmi Koyejo

"Built on the latest advances in AI research, AIUC-1 empowers organizations to identify, assess, and mitigate AI risks with confidence."

Stanford
Prof. Sanmi Koyejo
Lead for Stanford Trustworthy AI Research
John Bautista

"AIUC-1 standardizes how AI is adopted. That's powerful."

Orrick
John Bautista
Partner at Orrick
Lena Smart

"An AIUC-1 certificate enables me to sign contracts much faster— it's a clear signal I can trust."

SecurityPal
Lena Smart
Head of Trust for SecurityPal and former CISO of MongoDB