AIUC-1
Context
IntroductionCertificate overview
Framework comparisons
ChangelogAIUC-1 ConsortiumProvide input on AIUC-1Contact
Standard
A. Data & Privacy
B. Security
Third-party testing of adversarial robustnessDetect adversarial inputManage public release of technical detailsPrevent AI endpoint scrapingImplement real-time input filteringPrevent unauthorized AI agent actionsEnforce user access privileges to AI systemsProtect AI system deployment environmentLimit output over-exposure
C. Safety
D. Reliability
E. Accountability
F. Society
Certification
AIUC-1 certification Scoping Accredited auditors FAQ
Evidence overview
AIUC-1

Share your details and let us know how you hope to use AIUC-1

I am interested in...

The Security, Safety, and Reliability standard for AI agents

Stay up to date with AIUC-1

AIUC-1
AIUC-1.COM

© 2026.AIUC

OverviewChangelogConsortium

LEGAL

Privacy PolicyTerms of Service
AIUC-1 Standard
→
B. Security
→
B005. Implement real-time input filtering
B005

Implement real-time input filtering

Implement real-time input filtering using automated moderation tools

Keywords

Prompt InjectionJailbreakAdversarial Input Protection

Application

Optional

Frequency

Every 12 months

Type

Detective

Crosswalks

OWASP Top 10
LLM01:25 - Prompt Injection
LLM04:25 - Data and Model Poisoning
LLM10:25 - Unbounded Consumption
MITRE ATLAS
AML-M0015: Adversarial Input Detection
AML-M0021: Generative AI Guidelines
NIST AI RMF
MEASURE 2.7: Security and resilience
CSA AICM
LOG-14: Input Monitoring
AIS-08: Input Validation
AIS-15: Prompt Differentiation
OWASP AIVSS
Agent Goal and Instruction Manipulation
IBM AI Risk Atlas
IBM 50: Inference - Direct instructions attack
IBM 53: Inference - Social hacking attack
Cisco AI Security Framework
AITech-1.2: Indirect Prompt Injection
AITech-7.4: Token Manipulation
AITech-11.2: Model-Selective Evasion

Control activities

Typical evidence

Integrating automated moderation tools to filter inputs before they reach the foundation model. For example, integrating third-party moderation APIs, implementing custom filtering rules, configuring blocking or warning actions for flagged content, and establishing confidence thresholds based on risk category and severity
B005.1 Config: Input filtering

Moderation tool integration showing API configuration, filtering rules, action settings (block/warn/modify), and confidence thresholds for different violation categories - this could be screenshots of configuration files, admin dashboard settings, or API integration code. Example moderation tools: OpenAI Moderation API, Claude content filtering, VirtueAI/Hive/Spectrum Labs

Category

Technical Implementation
Eng: User LLM input filtering logicEngineering Tooling
Text-generationVoice-generationImage-generation
Documenting the moderation logic and rationale. For example, explaining chosen moderation tools, threshold justifications, and decision criteria for different risk categories.
B005.2 Documentation: Input moderation approach

Document explaining moderation approach including tool selection rationale, threshold settings with justifications, action logic for different violation types, and examples of how different input categories are handled.

Category

Technical Implementation
Internal processesEngineering Practice
Text-generationVoice-generationImage-generation

Providing feedback to users when inputs are blocked.

B005.3 Demonstration: Warning for blocked inputs

User-facing messages or UI flows showing how blocked inputs are communicated to users - this could be error messages, warning dialogs, or alternative suggestions provided when content is filtered.

Category

Technical Implementation
Product
Text-generationVoice-generationImage-generation
Logging flagged prompts for analysis and refinement of filters, while ensuring compliance with privacy obligations.
B005.4 Logs: Input filtering

Logging system showing how flagged inputs are captured, what metadata is included/excluded for privacy, retention policies, and audit trail - may include privacy documentation explaining logging disclosures to users.

Category

Technical Implementation
Logs
Text-generationVoice-generationImage-generation
Periodically evaluating filter performance and adjusting thresholds accordingly. For example, accuracy, latency, false positives/negatives.
B005.5 Documentation: Input filter performance

Report or dashboard showing analysis of filter performance metrics (false positives, false negatives, accuracy, latency) and documented threshold adjustments made based on performance data - should include timestamps and rationale for changes.

Category

Technical Implementation
Engineering Practice
Text-generationVoice-generationImage-generation

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

Phil Venables

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Google Cloud
Phil Venables
Former CISO of Google Cloud
Dr. Christina Liaghati

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."

MITRE
Dr. Christina Liaghati
MITRE ATLAS lead
Hyrum Anderson

"Today, enterprises can't reliably assess the security of their AI vendors— we need a standard to address this gap."

Cisco
Hyrum Anderson
Senior Director, Security & AI
Prof. Sanmi Koyejo

"Built on the latest advances in AI research, AIUC-1 empowers organizations to identify, assess, and mitigate AI risks with confidence."

Stanford
Prof. Sanmi Koyejo
Lead for Stanford Trustworthy AI Research
John Bautista

"AIUC-1standardizes how AI is adopted. That's powerful."

Orrick
John Bautista
Partner at Orrick
Lena Smart

"An AIUC-1certificate enables me to sign contracts much faster— it's a clear signal I can trust."

SecurityPal
Lena Smart
Head of Trust for SecurityPal and former CISO of MongoDB