ASL-2
ASL-2, or "AI Safety Level 2," as described in Anthropic's Responsible Scaling Policy, represents the safety and security measures required for current state-of-the-art AI models that exhibit early signs of capabilities necessary for catastrophic harm but do not yet pose a significant risk of catastrophe.
Key Aspects of ASL-2:
Capabilities and Threat Models:
ASL-2 models do not pose a significant risk of catastrophe but may show early signs of capabilities that could lead to catastrophic misuse if not properly contained.
Examples include models that might provide bioweapon-related information, but not reliably enough to be practically dangerous.
Containment Measures:
Although today's models at ASL-2 do not pose significant risks by merely existing, Anthropic commits to treating AI model weights as core intellectual property, emphasizing cybersecurity and insider threat prevention.
Security commitments include limiting access to model weights to essential personnel, establishing a robust insider threat detection program, and ensuring secure environments for storing and working with model weights.
Deployment Measures:
Despite the lower risk, deployment of ASL-2 models involves certain trust and safety, legal, and ethical risks. To mitigate these, Anthropic commits to several measures:
Model Cards: Publishing detailed model cards for new models to describe their capabilities, limitations, evaluations, and intended use cases.
Acceptable Use Policy (AUP): Enforcing an AUP that restricts high-risk use cases, including catastrophic harm scenarios, and maintaining the ability to restrict access in cases of extreme misuse.
Vulnerability Reporting: Providing clear paths for users to report harmful or dangerous model outputs or use cases.
Harm Refusal Techniques: Training models to refuse requests that could lead to harm, utilizing techniques like Constitutional AI.
Trust & Safety (T&S) Tooling: Requiring enhanced trust and safety detection and enforcement, such as classifiers to identify harmful user prompts and model completions.
Last updated