The National Nuclear Security Administration (NNSA) is teaming up with Anthropic to build a tool that can determine whether artificial intelligence is providing knowledge on nuclear weapons that can be used for nefarious purposes.
Anthropic, a 2021 San Francisco-based AI safety and research company that built AI assistant models like Claude, announced last week that it was working with the Department of Energy’s national laboratories and its semi-autonomous agency in charge of maintaining the nation’s nuclear weapons stockpile.
“Information relating to nuclear weapons is particularly sensitive, which makes evaluating these risks challenging for a private company acting alone,” the press release said. It added that, with NNSA and the labs, Anthropic developed a “classifier,” or an AI system that categorizes content, that will separate concerning nuclear conversations with benign ones. So far, preliminary testing reveals 96% accuracy within the categories.
Anthropic also said it is developing an approach for addressing any risks of “concerning” conversations surrounding nuclear by blocking them. This way, information is not put out through AI systems that is accidentally sensitive or could provide users with the necessary information to build a radiological dispersal device combining conventional explosives with radioactive material, typically called a dirty bomb. However, information that is legitimately educational for nuclear engineering purposes, labelled benign, would go through.
“As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security,” the release said.