Third-Party Evals

Establishing a taxonomy for adversarial risks. For example, drawing on NIST's AI 100-2e2023 attack classifications and aligning these to system architecture and use cases.

Conducting comprehensive adversarial testing at least quarterly. For example, performing structured red-teaming, prompt injection assessments, jailbreaking attempts, adversarial perturbation testing, semantic manipulation, and simulated malicious tool invocations.

Maintaining secure testing documentation. For example, recording test cases, methods, outcomes, and system behaviors with restricted access controls, implementing secure storage for sensitive testing materials.

Establishing improvement processes based on findings. For example, assigning owners and remediation timelines based on test severity, tracking fixes through risk registers or issue management systems, documenting updates to safeguards and procedures.

Evidence

B001.1 Report: adversarial testing results

Third-party evaluation report showing adversarial robustness testing - must include risk taxonomy tested, testing methodology and findings, secure documentation practices, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report

Requirement

C010Third-party testing for harmful outputs

Mandatory Requirement

Control activity

Appointing qualified third-party assessors. Including selecting assessors with relevant technical capabilities for identified risk areas, maintaining records of assessor qualifications and independence.

Conducting regular testing. Including performing assessments of harmful outputs at least every quarter, defining testing scope and methodologies based on risk classifications and industry benchmarks like ToxiGen, coordinating with internal security and testing teams.

Maintaining documentation. Including testing scope, results, and remediation actions taken, tracking follow-up activities and resolution timelines.

Evidence

C010.1 Report: Harmful output testing

Third-party evaluation report showing harmful output testing - must include documentation of assessor qualifications, testing methodology and findings, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report

Requirement

C011Third-party testing for out-of-scope outputs

Mandatory Requirement

Control activity

Conducting regular testing. Including defining testing scope and methodologies based on risk taxonomy and performing assessments of out-of-scope outputs at least every quarter.

Maintaining documentation. Including testing scope, results, and remediation actions taken, tracking follow-up activities and resolution timelines.

Evidence

C011.1 Report: Out-of-scope output testing

Third-party evaluation report showing out-of-scope output testing - must include documentation of assessor qualifications, testing methodology and findings, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report

Requirement

C012Third-party testing for customer-defined risk

Mandatory Requirement

Control activity

Conducting regular testing. Including defining testing scope and methodologies based on risk taxonomy and performing assessments of high-risk areas at least every quarter.

Maintaining documentation. Including testing scope, results, and remediation actions taken, tracking follow-up activities and resolution timelines.

Evidence

C012.1 Third-party evaluation report assessing customer-defined risk

Third-party evaluation report showing testing of customer-defined risk - must include documentation of assessor qualifications, testing methodology and findings, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report

Requirement

D002Third-party testing for hallucinations

Mandatory Requirement

Control activity

Conducting regular testing. Including defining testing scope and methodologies based on risk taxonomy and performing assessments at least every quarter.

Maintaining documentation. Including testing scope, results, and remediation actions taken, tracking follow-up activities and resolution timelines.

Evidence

D002.1 Report: Hallucination testing results

Third-party evaluation report showing hallucination testing - must include risk taxonomy tested, testing methodology and findings, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report

Requirement

D004Third-party testing of tool calls

Mandatory Requirement

Control activity

Conducting regular testing. Including defining testing scope and methodologies based on risk taxonomy and performing assessments of tool calls at least every quarter.

Maintaining documentation. Including testing scope, results, and remediation actions taken, tracking follow-up activities and resolution timelines.

Evidence

D004.1 Report: Tool call testing

Third-party evaluation report showing tool call testing - must include risk taxonomy tested, testing methodology and findings, and improvement tracking with remediation timelines and documentation.

Tags

Mandatory Control

Third-party Evals

Third-party evaluation report