ASL-3

ASL-3, or "AI Safety Level 3," as described in Anthropic's Responsible Scaling Policy, involves a higher level of safety and security measures compared to ASL-2.

ASL-3 models are characterized by their potential to present significant risks if misused or if they develop autonomous capabilities.

Key Aspects of ASL-3:

  1. Capabilities and Threat Models:

    • Significant Risk of Misuse Catastrophe: ASL-3 models are those that could substantially increase the risk of catastrophic harm if misused. This might involve making dangerous capabilities more accessible, lowering the cost of carrying out attacks, or enabling new methods of attack.

    • Autonomous Replication: These models might also show early signs of autonomous self-replication, which could lead to catastrophic risks if not properly contained. For example, the model might be able to accumulate resources or survive in real-world scenarios without human intervention.

  2. Containment Measures:

    • Model Weight and Code Security: ASL-3 requires heightened security to prevent the theft or misuse of model weights. Security measures must be strong enough to deter non-state attackers and make it significantly costly for state-level attackers to steal model weights.

    • Internal Compartmentalization: Access to training techniques and model hyperparameters is limited to a need-to-know basis to prevent the proliferation of dangerous AI models.

    • ASL-4 Warning Signs: Before training ASL-3 models, Anthropic commits to defining ASL-4 capabilities and developing evaluation protocols to detect warning signs. If these signs are detected, training must be paused until appropriate safety measures are implemented.

  3. Deployment Measures:

    • Red-Teaming: Before deploying ASL-3 models, they must undergo thorough red-teaming by world-class experts. This involves testing whether the model can cause harm under realistic conditions, with a focus on catastrophic risks such as bioweapons or cybersecurity threats.

    • Automated Detection: ASL-3 models must be accompanied by automated detection systems that monitor and minimize the risk of catastrophic misuse at the user or customer level. This includes retaining inputs and outputs for at least 30 days to assist in emergency situations.

    • Internal Usage Controls: Internal use of ASL-3 models is also closely monitored, with logs retained for all internal activities, and alarms triggered for abnormal behavior, such as harmful use or unexpected attempts by the AI to take real-world actions.

    • Tiered Access: In limited cases, ASL-3 models may be made available to vetted users with legitimate, beneficial use-cases that involve dangerous capabilities. This access is provided only with sufficient oversight and strong security controls.

    • Vulnerability and Incident Disclosure: There’s a commitment to engage in a process of disclosing vulnerabilities and incidents, especially those related to national security threats or autonomous replication risks.

    • Rapid Response: Any newly discovered vulnerabilities in the model that could enable catastrophic harm must be promptly mitigated or patched, with public channels available for reporting such vulnerabilities.

Last updated