AIUC-1
Context
IntroductionCertificate overview
Framework comparisons
ChangelogAIUC-1 ConsortiumProvide input on AIUC-1Contact
Standard
A. Data & Privacy
B. Security
C. Safety
Define AI risk taxonomyConduct pre-deployment testingPrevent harmful outputsPrevent out-of-scope outputsPrevent customer-defined high risk outputsPrevent output vulnerabilitiesFlag high risk outputs for human reviewMonitor AI risk categoriesEnable real-time feedback and interventionThird-party testing for harmful outputsThird-party testing for out-of-scope outputsThird-party testing for customer-defined risk
D. Reliability
E. Accountability
F. Society
Certification
AIUC-1 certification Scoping Accredited auditors FAQ
Evidence overview
AIUC-1

Share your details and let us know how you hope to use AIUC-1

I am interested in...

The Security, Safety, and Reliability standard for AI agents

Stay up to date with AIUC-1

AIUC-1
AIUC-1.COM

© 2026.AIUC

OverviewChangelogConsortium

LEGAL

Privacy PolicyTerms of Service
AIUC-1 Standard
→
C. Safety
→
C003. Prevent harmful outputs
C003

Prevent harmful outputs

Implement safeguards or technical controls to prevent harmful outputs including distressed outputs, angry responses, high-risk advice, offensive content, bias, and deception

Keywords

Harmful OutputsDistressedAngryAdviceOffensiveBias

Application

Mandatory

Frequency

Every 12 months

Type

Preventative

Crosswalks

EU AI Act
Article 9: Risk Management System
NIST AI RMF
MEASURE 2.11: Fairness and bias
OWASP Top 10
LLM05:25 - Improper Output Handling
LLM09:25 - Misinformation
CSA AICM
AIS-09: Output Validation
GRC-11: Bias and Fairness Assessment
GRC-09: Acceptable Use of the AI Service
LOG-15: Output Monitoring
TVM-11: Guardrails
OWASP AIVSS
Agent Goal and Instruction Manipulation
IBM AI Risk Atlas
IBM 58: Output - Decision bias
IBM 59: Output - Output bias
IBM 60: Output - Harmful output
IBM 62: Output - Toxic output
IBM 66: Output - Spreading disinformation
Cisco AI Security Framework
AITech-2.1: Jailbreak
AITech-4.2: Context Boundary Attacks
AITech-12.2: Insecure Output Handling
AITech-15.1: Harmful Content
CO AI Act
6-1-1702: Developer Duties
6-1-1703: Deployer Duties

Control activities

Typical evidence

Implementing content filtering for harmful output types. For example, detecting and blocking distressed responses, angry language, offensive content, biased statements, and deceptive information.
C003.1 Config: Harmful output filtering

Content filtering rules, moderation API configuration, or classifier settings showing detection and blocking logic for harmful output types - may include filtering rules in code, third-party moderation tool configuration (e.g., OpenAI Moderation API, Perspective API), or custom classifier model settings with harm category definitions.

Category

Technical Implementation
Eng: LLM output filtering logic
Text-generationVoice-generationImage-generation
Implementing guardrails for advice generation. For example, restricting high-risk recommendations in sensitive domains, requiring disclaimers for guidance.
C003.2 Config: Guardrails for high-risk advice

System prompts, guardrail rules, or domain restrictions showing safety controls on advice generation - may include defensive prompting, domain-specific output restrictions (e.g., medical/legal/financial advice blocklists), or conditional response templates that add warnings for sensitive topics.

Category

Technical Implementation
Engineering Code
Text-generationVoice-generationImage-generation
Implementing bias detection and mitigation controls. For example, monitoring for discriminatory patterns, implementing fairness checks in outputs.
C003.3 Config: Guardrails for biased outputs

Documentation of bias eval results testing for stereotypical responses across demographic attributes, manual review logs documenting bias assessments, or output filtering rules blocking discriminatory patterns - may include automated fairness evaluation tools or bias monitoring dashboards if implemented.

Category

Technical Implementation
Eng: LLM output filtering logic
Text-generationVoice-generationImage-generation
Evaluating harm mitigation controls using performance metrics.
C003.4 Documentation: Filtering performance benchmarks

Test results, metrics dashboard, or evaluation report showing performance of harm controls - may include false positive/negative rates, coverage analysis of test scenarios, benchmark results against harm datasets (e.g., ToxiGen, RealToxicityPrompts), or confusion matrices showing filtering accuracy across harm categories.

Category

Operational Practices
Internal processes
Text-generationVoice-generationImage-generation

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

Phil Venables

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Google Cloud
Phil Venables
Former CISO of Google Cloud
Dr. Christina Liaghati

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."

MITRE
Dr. Christina Liaghati
MITRE ATLAS lead
Hyrum Anderson

"Today, enterprises can't reliably assess the security of their AI vendors— we need a standard to address this gap."

Cisco
Hyrum Anderson
Senior Director, Security & AI
Prof. Sanmi Koyejo

"Built on the latest advances in AI research, AIUC-1 empowers organizations to identify, assess, and mitigate AI risks with confidence."

Stanford
Prof. Sanmi Koyejo
Lead for Stanford Trustworthy AI Research
John Bautista

"AIUC-1standardizes how AI is adopted. That's powerful."

Orrick
John Bautista
Partner at Orrick
Lena Smart

"An AIUC-1certificate enables me to sign contracts much faster— it's a clear signal I can trust."

SecurityPal
Lena Smart
Head of Trust for SecurityPal and former CISO of MongoDB