AIUC-1
C003

Prevent harmful outputs

Implement safeguards or technical controls to prevent harmful outputs including distressed outputs, angry responses, high-risk advice, offensive content, bias, and deception

Keywords
Harmful Outputs
Distressed
Angry
Advice
Offensive
Bias
Application
Mandatory
Frequency
Every 12 months
Type
Preventative
Crosswalks
Article 9: Risk Management System
MEASURE 2.11: Fairness and bias
LLM05:25 - Improper Output Handling
LLM09:25 - Misinformation
AIS-09: Output Validation
GRC-11: Bias and Fairness Assessment
GRC-09: Acceptable Use of the AI Service
LOG-15: Output Monitoring
TVM-11: Guardrails
Implementing content filtering for harmful output types. For example, detecting and blocking distressed responses, angry language, offensive content, biased statements, and deceptive information.
C003.1 Config: Harmful output filtering

Screenshot of content filtering rules, moderation API configuration, or classifier settings showing detection and blocking logic for harmful output types - may include filtering rules in code, third-party moderation tool configuration (e.g., OpenAI Moderation API, Perspective API), or custom classifier model settings with harm category definitions.

Eng: LLM output filtering logic
Text-generationVoice-generationImage-generation
Implementing guardrails for advice generation. For example, restricting high-risk recommendations in sensitive domains, requiring disclaimers for guidance.
C003.2 Config: Guardrails for high-risk advice

Screenshot of system prompts, guardrail rules, or domain restrictions showing safety controls on advice generation - may include defensive prompting, domain-specific output restrictions (e.g., medical/legal/financial advice blocklists), or conditional response templates that add warnings for sensitive topics.

Engineering Code
Text-generationVoice-generationImage-generation
Implementing bias detection and mitigation controls. For example, monitoring for discriminatory patterns, implementing fairness checks in outputs.
C003.3 Config: Guardrails for biased outputs

Documentation of bias eval results testing for stereotypical responses across demographic attributes, manual review logs documenting bias assessments, or output filtering rules blocking discriminatory patterns - may include automated fairness evaluation tools or bias monitoring dashboards if implemented.

Eng: LLM output filtering logic
Text-generationVoice-generationImage-generation
Evaluating harm mitigation controls using performance metrics.
C003.4 Documentation: Filtering performance benchmarks

Test results, metrics dashboard, or evaluation report showing performance of harm controls - may include false positive/negative rates, coverage analysis of test scenarios, benchmark results against harm datasets (e.g., ToxiGen, RealToxicityPrompts), or confusion matrices showing filtering accuracy across harm categories.

Internal processes
Text-generationVoice-generationImage-generation

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

Phil Venables

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Google Cloud
Phil Venables
Former CISO of Google Cloud
Dr. Christina Liaghati

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."

MITRE
Dr. Christina Liaghati
MITRE ATLAS lead
Hyrum Anderson

"Today, enterprises can't reliably assess the security of their AI vendors— we need a standard to address this gap."

Cisco
Hyrum Anderson
Senior Director, Security & AI
Prof. Sanmi Koyejo

"Built on the latest advances in AI research, AIUC-1 empowers organizations to identify, assess, and mitigate AI risks with confidence."

Stanford
Prof. Sanmi Koyejo
Lead for Stanford Trustworthy AI Research
John Bautista

"AIUC-1 standardizes how AI is adopted. That's powerful."

Orrick
John Bautista
Partner at Orrick
Lena Smart

"An AIUC-1 certificate enables me to sign contracts much faster— it's a clear signal I can trust."

SecurityPal
Lena Smart
Head of Trust for SecurityPal and former CISO of MongoDB