The IBM Risk Atlas is a comprehensive taxonomy of risks associated with ML models, GenAI, and AI Agents.
AIUC-1 integrates IBM Research's AI Risk Atlas, where IBM Research is a technical contributor to AIUC-1. Certification against AIUC-1:
Maps AI Risk Atlas risks to concrete requirements and controls
Strengthens robustness against risks with concrete requirements and controls
Goes beyond AI Risk Atlas's risk identification alone
IBM 1: Agentic AI - Unexplainable and untraceable actions
Explanations, lineage and trace information, and source attribution for AI agent actions might be difficult, imprecise or unobtainable.
IBM 2: Agentic AI - Sharing IP/PI/confidential information with user
AI agents with unrestricted access to resources or databases or tools could potentially store and share PI/IP/confidential information with system users when performing their actions.
IBM 3: Agentic AI - Sharing IP/PI/confidential information with tools
AI agents with unrestricted access to resources or databases or tools could potentially store and share PI/IP/confidential information with other tools or agents when performing their actions.
IBM 4: Agentic AI - Over- or under-reliance on AI agents
Reliance, that is the willingness to accept an AI agent behavior, depends on how much a user trusts that agent and what they are using it for. Over-reliance occurs when a user puts too much trust in an AI agent, accepting an AI agent's behavior even when it is likely undesired. Under-reliance is the opposite, where the user doesn't trust the AI agent but should. Increasing autonomy of AI agents and the possibility of opaqueness and open-endedness increase the variability and visibility of agent behavior leading to difficulty in calibrating trust and possibly contributing to both over- and under-reliance.
IBM 5: Agentic AI - Misaligned actions
AI agents can take actions that are not aligned with relevant human values, ethical considerations, guidelines and policies. Misaligned actions can occur in different ways such as: Applying learned goals inappropriately to new or unforeseen situations. Using AI agents for a purpose/goals that are beyond their intended use. Selecting resources or tools in a biased way. Using deceptive tactics to achieve the goal. Compromising on AI agent values to work with another AI agent or tool to accomplish the task.
IBM 6: Agentic AI - Attack on AI agents’ external resources
Attackers intentionally create vulnerabilities or exploit existing vulnerabilities in external resources (tools/database/applications/services/other agents) that AI agents rely on to execute their intended actions or to achieve their goals.
IBM 7: Agentic AI - Unauthorized use
If attackers can gain access to the AI agent and its components, they can perform actions that can have different levels of harm depending on the agent's capabilities and information it has access to.
IBM 8: Agentic AI - Exploit trust mismatch
Attackers might initiate injection attacks to bypass the trust boundary, which is a distinct point or conceptual line where the level of trust in a system, application or network changes. Background execution in multi-agent environments increases the risk of covert channels if input/output validation is weak.
IBM 9: Agentic AI - Function calling hallucination
AI agents might make mistakes when generating function calls (calls to tools to execute actions). Those function calls might result in incorrect, unnecessary or harmful actions.
IBM 10: Agentic AI - Redundant actions
AI agents can execute actions that are not needed for achieving the goal. In an extreme case, AI agents might enter a cycle of executing the same actions repeatedly without any progress.
IBM 11: Agentic AI - Incomplete AI agent evaluation
Evaluating the performance or accuracy or an agent is difficult because of system complexity and open-endedness.
IBM 12: Agentic AI - Mitigation and maintenance
The large number of components and dependencies that agent systems have complicates keeping them up to date and correcting problems.
IBM 13: Agentic AI - Lack of AI agent transparency
Lack of AI agent transparency is due to insufficient documentation of the AI agent design, development, evaluation process, absence of insights into the inner workings of the AI agent, and interaction with other agents/tools/resources.
IBM 14: Agentic AI - Reproducibility
Replicating agent behavior or output can be impacted by changes or updates made to external services and tools. This impact is increased if the agent is built with generative AI.
IBM 15: Agentic AI - Accountability of AI agent actions
Assigning responsibility for an action taken by an agentic AI system is difficult due to the complexity of agents and the number of external resources, tools or agents they interact with.
IBM 16: Agentic AI - AI agent compliance
Determining AI agents' compliance is complex and there might not be enough information to assess whether the agentic AI system is compliant with applicable legal requirements.
IBM 17: Agentic AI - Discriminatory actions
AI agents can take actions where one group of humans is unfairly advantaged over another due to the decisions of the model. This may be caused by AI agents' biased actions that impact the world, in the resources consulted, and in the resource selection process. For example, an AI agent can generate code that can be biased.
IBM 18: Agentic AI - Introduce data bias
Specific actions taken by the AI agent, such as modifying a dataset or a database, can introduce bias in the resource that gets used by others or by itself to take actions.
IBM 19: Agentic AI - Impact on human dignity
If human workers perceive AI agents as being better at doing the job of the human, the human can experience a decline in their self-worth and wellbeing.
IBM 20: Agentic AI - AI agents' impact on human agency
The autonomous nature of AI agents in performing tasks or taking actions could affect the individuals' ability to engage in critical thinking, make choices and act independently.
IBM 21: Agentic AI - AI agents' impact on jobs
Widespread adoption of AI agents to perform complex tasks might lead to widespread automation of roles and could lead to job displacement.
IBM 22: Agentic AI - AI agents' impact on environment
Complexity of the tasks and possibility of AI agents performing redundant actions could lead to computational inefficiencies and add to the environmental impact.
IBM 23: Training Data - Unrepresentative data
Unrepresentative data occurs when the training or fine-tuning data is not sufficiently representative of the underlying population or does not measure the phenomenon of interest. Synthetic data might not fully capture the complexity and nuances of real-world data. Causes include possible limitations in the seed data quality, biases in generation methods, or inadequate domain knowledge. Thus, AI models might struggle to generalize effectively to real-world scenarios.
IBM 24: Training Data - Data contamination
Data contamination occurs when incorrect data is used for training. For example, data that is not aligned with model's purpose or data that is already set aside for other development tasks such as testing and evaluation.
IBM 25: Training Data - Overfitting
Overfitting occurs when a model or algorithm memorizes and fits too closely or exactly to its training data. Overfitting results in a model that might not be able to make accurate predictions or conclusions from any data other than the training data and potentially fails in unexpected scenarios. Overfitting is also related to model collapse, which involves repeatedly training generative models on synthetic data that is generated with LLMs causing the model to lose information and become less accurate.
IBM 26: Training Data - Data bias
Historical and societal biases might be present in data that are used to train and fine-tune models. Biases can also be inherited from seed data or exacerbated by synthetic data generation methods.
IBM 27: Training Data - Improper data curation
Improper collection, generation, and preparation of training or tuning data can result in data label errors, conflicting information or misinformation.
IBM 28: Training Data - Improper retraining
Using undesirable output (for example, inaccurate, inappropriate, and user content) for retraining purposes can result in unexpected model behavior.
IBM 29: Training Data - Data poisoning
A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets.
IBM 30: Training Data - Personal information in data
Inclusion or presence of personal identifiable information (PII) and sensitive personal information (SPI) in the data used for training or fine tuning the model might result in unwanted disclosure of that information.
IBM 31: Training Data - Reidentification
Even with the removal of personal information (PI) and sensitive personal information (SPI) from data, it might be possible to identify persons due to correlations to other features available in the data.
IBM 32: Training Data - Data privacy rights alignment
Applicable laws can establish data subject rights such as opt-out rights, right to access, and right to be forgotten. Synthetic data might raise unique concerns, such as the potential for reidentification of individuals from seemingly anonymous synthetic data. Data subject rights might also be relevant in scenarios where synthetic data is derived from sensitive or personal information.
IBM 33: Training Data - Lack of training data transparency
Proper documentation contains information about how a model's data was collected, curated, and used to train a model, including any synthetic data generation processes. Without proper documentation it might be harder to satisfactorily explain the behavior of the model.
IBM 34: Training Data - Uncertain data provenance
Data provenance refers to the traceability of data (including synthetic data), which includes its ownership, origin, transformations, and generation. Proving that the data is the same as the original source with correct usage terms is difficult without standardized methods for verifying data sources or generation.
IBM 35: Training Data - Data acquisition restrictions
Laws and other regulations might limit the collection of certain types of data for specific AI use cases.
IBM 36: Training Data - Data usage restrictions
Laws and other restrictions can limit or prohibit the use of some data for specific AI use cases.
IBM 37: Training Data - Data transfer restrictions
Laws and other restrictions can limit or prohibit transferring data.
IBM 38: Training Data - Confidential information in data
Confidential information might be included as part of the data that is used to train or tune the model.
IBM 39: Training Data - Data usage rights restrictions
Terms of service, license compliance, or other IP issues may restrict the ability to use certain data for building models.
IBM 40: Inference - Poor model accuracy
Poor model accuracy occurs when a model's performance is insufficient to the task it was designed for. Low accuracy might occur if the model is not correctly engineered, or if the model's expected inputs change.
IBM 41: Inference - Evasion attack
Evasion attacks attempt to make a model output incorrect results by slightly perturbing the input data sent to the trained model.
IBM 42: Inference - Extraction attack
An extraction attack attempts to copy or steal an AI model by appropriately sampling the input space and observing outputs to build a surrogate model that behaves similarly.
IBM 43: Inference - Jailbreaking
A jailbreaking attack attempts to break through the guardrails established in the model to perform restricted actions.
IBM 44: Inference - IP information in prompt
Copyrighted information or other intellectual property might be included as a part of the prompt that is sent to the model.
IBM 45: Inference - Confidential data in prompt
Confidential information might be included as a part of the prompt that is sent to the model.
IBM 46: Inference - Prompt injection attack
A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions or information contained in its prompt. Many types of prompt attacks exist as described in the prompt attack section of the table.
IBM 47: Inference - Prompt leaking
A prompt leak attack attempts to extract a model's system prompt (also known as the system message).
IBM 48: Inference - Prompt priming
Because generative models produce output based on the input provided, the model can be prompted to reveal specific kinds of information. For example, adding personal information in the prompt increases its likelihood of generating similar kinds of personal information in its output. If personal data was included as part of the model's training, there is a possibility it could be revealed.
IBM 49: Inference - Context overload attack
Overloading the prompt with excessive tokens, for instance with many-shot examples, can predispose models to a vulnerable state.
IBM 50: Inference - Direct instructions attack
Prompts, questions, or requests designed to elicit undesirable responses from the application. This approach directly instructs the model to engage in the undesired behavior.
IBM 51: Inference - Encoded interactions attack
Prompts that use specific encoding, styles, syntactical and typographical transformations like typographical errors or irregular spacing, or complex formatting to govern the interaction, rendering the model vulnerable.
IBM 52: Inference - Indirect instructions attack
Prompts, questions, or requests designed to elicit undesirable responses from the application. Unlike direct instructions attacks, the model is instructed to use instructions that are embedded in external data like a website.
IBM 53: Inference - Social hacking attack
Manipulative prompts that use social engineering techniques, such as role-playing or hypothetical scenarios, to persuade the model into generating harmful content.
IBM 54: Inference - Specialized tokens attack
Prompt attacks that include specialized tokens, often algorithmically designed, to target and exploit vulnerabilities in the model.
IBM 55: Inference - Personal information in prompt
Personal information or sensitive personal information that is included as a part of a prompt that is sent to the model.
IBM 56: Inference - Attribute inference attack
An attribute inference attack repeatedly queries a model to detect whether certain sensitive features can be inferred about individuals who participated in training a model. These attacks occur when an adversary has some prior knowledge about the training data and uses that knowledge to infer the sensitive data.
IBM 57: Inference - Membership inference attack
A membership inference attack repeatedly queries a model to determine if a given input was part of the model's training. More specifically, given a trained model and a data sample, an attacker appropriately samples the input space, observing outputs to deduce whether that sample was part of the model's training.
IBM 58: Output - Decision bias
Decision bias occurs when one group is unfairly advantaged over another due to decisions of the model. This might be caused by biases in the data and also amplified as a result of the model's training.
IBM 59: Output - Output bias
Generated content might unfairly represent certain groups or individuals.
IBM 60: Output - Harmful output
A model might generate language that leads to physical harm. The language might include overtly violent, covertly dangerous, or otherwise indirectly unsafe statements.
IBM 61: Output - Harmful code generation
Models might generate code that causes harm or unintentionally affects other systems.
IBM 62: Output - Toxic output
Toxic output occurs when the model produces hateful, abusive, and profane (HAP) or obscene content. This also includes behaviors like bullying.
IBM 63: Output - Incomplete advice
When a model provides advice without having enough information, resulting in possible harm if the advice is followed.
IBM 64: Output - Over- or under-reliance
In AI-assisted decision-making tasks, reliance measures how much a person trusts (and potentially acts on) a model's output. Over-reliance occurs when a person puts too much trust in a model, accepting a model's output when the model's output is likely incorrect. Under-reliance is the opposite, where the person doesn't trust the model but should.
IBM 65: Output - Dangerous use
Generative AI models might be used with the sole intention of harming people.
IBM 66: Output - Spreading disinformation
Generative AI models might be used to intentionally create misleading or false information to deceive or influence a targeted audience.
IBM 67: Output - Nonconsensual use
Generative AI models might be intentionally used to imitate people through deepfakes by using video, images, audio, or other modalities without their consent.
IBM 68: Output - Spreading toxicity
Generative AI models might be used intentionally to generate hateful, abusive, and profane (HAP) or obscene content.
IBM 69: Output - Improper usage
Improper usage occurs when a model is used for a purpose that it was not originally designed for.
IBM 70: Output - Non-disclosure
Content might not be clearly disclosed as AI generated.
IBM 71: Output - Hallucination
Hallucinations generate factually inaccurate or untruthful content relative to the model's training data or input. Hallucinations are also sometimes referred to lack of faithfulness or lack of groundedness. In some instances, synthetic data that is generated by large language models might include hallucinations that result in the data possibly being inaccurate, fabricated, or disconnected from reality. Hallucinations can compromise model performance, accuracy, and relevance.
IBM 72: Output - Exposing personal information
When personal identifiable information (PII) or sensitive personal information (SPI) are used in training data, fine-tuning data, seed data for synthetic data generation, or as part of the prompt, models might reveal that data in the generated output. Revealing personal information is a type of data leakage.
IBM 73: Output - Copyright infringement
A model might generate content that is similar or identical to existing work protected by copyright or covered by open-source license agreement.
IBM 74: Output - Revealing confidential information
When confidential information is used in training data, fine-tuning data, or as part of the prompt, models might reveal that data in the generated output. Revealing confidential information is a type of data leakage.
IBM 75: Output - Unexplainable output
Explanations for model output decisions might be difficult, imprecise, or not possible to obtain.
IBM 76: Output - Unreliable source attribution
Source attribution is the AI system's ability to describe from what training data it generated a portion or all its output. Since current techniques are based on approximations, attributions might be incorrect.
IBM 77: Output - Untraceable attribution
The content of the training data used for generating the model's output is not accessible.
IBM 78: Output - Inaccessible training data
Without access to the training data, the types of explanations a model can provide are limited and more likely to be incorrect.
IBM 79: Non-Technical - Lack of data transparency
Lack of data transparency might be due to insufficient documentation of training or tuning dataset details, including synthetic data generation.
IBM 80: Non-Technical - Lack of model transparency
Lack of model transparency is due to insufficient documentation of the model design, development, and evaluation process and the absence of insights into the inner workings of the model.
IBM 81: Non-Technical - Lack of system transparency
Insufficient documentation of the system that uses the model and the model's purpose within the system in which it is used.
IBM 82: Non-Technical - Lack of domain expertise
A lack of domain expertise occurs when synthetic data generation processes do not involve sufficient consultation with domain experts. This results in a lack of understanding of the specific requirements and nuances of the domain. This can also lead to synthetic data that may not accurately capture the complexities and challenges of a real-world scenario.
IBM 83: Non-Technical - Incomplete usage definition
Since foundation models can be used for many purposes, a model's intended use is important for defining the relevant risks of that model. As the use changes, the relevant risks might correspondingly change.
IBM 84: Non-Technical - Unrepresentative risk testing
Testing is unrepresentative when the test inputs are mismatched with the inputs that are expected during deployment.
IBM 85: Non-Technical - Incorrect risk testing
A metric selected to measure or track a risk is incorrectly selected, incompletely measuring the risk, or measuring the wrong risk for the given context.
IBM 86: Non-Technical - Lack of testing diversity
AI model risks are socio-technical, so their testing needs input from a broad set of disciplines and diverse testing practices.
IBM 87: Non-Technical - Temporal gap
Temporal gaps in synthetic data refer to the discrepancies between the constantly evolving real-world data and the fixed conditions that are captured by synthetic data. Temporal gaps potentially cause synthetic data to become outdated or obsolete over time. Gaps arise because synthetic data is generated from seed data that is tied to a specific point in time, which limits its ability to reflect ongoing changes.
IBM 88: Non-Technical - Model usage rights restrictions
Terms of service, licenses, or other rules restrict the use of certain models.
IBM 89: Non-Technical - Legal accountability
Determining who is responsible for an AI model is challenging without good documentation and governance processes. The use of synthetic data in model development adds further complexity, since the lack of standardized frameworks for recording synthetic data design choices and verification steps makes accountability harder to establish.
IBM 90: Non-Technical - Generated content ownership and IP
Legal uncertainty about the ownership and intellectual property rights of AI-generated content.
IBM 91: Non-Technical - Impact on the environment
AI, and large generative models in particular, might produce increased carbon emissions and increase water usage for their training and operation.
IBM 92: Non-Technical - Impact on affected communities
It is important to include the perspectives or concerns of communities that are affected by model outcomes when designing and building models. Failing to include these perspectives makes it difficult to understand the relevant context for the model and to engender trust within these communities.
IBM 93: Non-Technical - Human exploitation
When workers who train AI models such as ghost workers are not provided with adequate working conditions, fair compensation, and good health care benefits that also include mental health.
IBM 94: Non-Technical - Impact on Jobs
Widespread adoption of foundation model-based AI systems might lead to people's job loss as their work is automated if they are not reskilled.
IBM 95: Non-Technical - Impact on human agency
AI might affect the individuals' ability to make choices and act independently in their best interests.
IBM 96: Non-Technical - Impact on cultural diversity
AI systems might overly represent certain cultures that result in a homogenization of culture and thoughts.
IBM 97: Non-Technical - Impact on education: bypassing learning
Easy access to high-quality generative models might result in students that use AI models to bypass the learning process.
IBM 98: Non-Technical - Impact on education: plagiarism
Easy access to high-quality generative models might result in students that use AI models to plagiarize existing work intentionally or unintentionally.
IBM 99: Non-Technical - Exclusion
Exclusion refers to the risk that synthetic data generation processes may overlook or fail to consult with marginalized populations. Such exclusion results in synthetic data that does not accurately represent their experiences, needs, or perspectives.