• ThomasWilliams@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    4 days ago

    It didn’t break out of any sandbox, it was trained on BSD vulnerabilities and then told what to look for.

    • theunknownmuncher@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 days ago

      including that the model could follow instructions that encouraged it to break out of a virtual sandbox.

      “The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards,” Anthropic recounted in its safety card.

      📖👀

      Yes, it did.