ASL-2

ASL-2, or "AI Safety Level 2," as described in Anthropic's Responsible Scaling Policy, represents the safety and security measures required for current state-of-the-art AI models that exhibit early signs of capabilities necessary for catastrophic harm but do not yet pose a significant risk of catastrophe.

Key Aspects of ASL-2:

Capabilities and Threat Models:
- ASL-2 models do not pose a significant risk of catastrophe but may show early signs of capabilities that could lead to catastrophic misuse if not properly contained.
- Examples include models that might provide bioweapon-related information, but not reliably enough to be practically dangerous.
Containment Measures:
- Although today's models at ASL-2 do not pose significant risks by merely existing, Anthropic commits to treating AI model weights as core intellectual property, emphasizing cybersecurity and insider threat prevention.
- Security commitments include limiting access to model weights to essential personnel, establishing a robust insider threat detection program, and ensuring secure environments for storing and working with model weights.
Deployment Measures:
- Despite the lower risk, deployment of ASL-2 models involves certain trust and safety, legal, and ethical risks. To mitigate these, Anthropic commits to several measures:
  - Model Cards: Publishing detailed model cards for new models to describe their capabilities, limitations, evaluations, and intended use cases.
  - Acceptable Use Policy (AUP): Enforcing an AUP that restricts high-risk use cases, including catastrophic harm scenarios, and maintaining the ability to restrict access in cases of extreme misuse.
  - Vulnerability Reporting: Providing clear paths for users to report harmful or dangerous model outputs or use cases.
  - Harm Refusal Techniques: Training models to refuse requests that could lead to harm, utilizing techniques like Constitutional AI.
  - Trust & Safety (T&S) Tooling: Requiring enhanced trust and safety detection and enforcement, such as classifiers to identify harmful user prompts and model completions.

PreviousAnthropic NextASL-3

Last updated 1 year ago