4 Comments
User's avatar
Hans Westergren's avatar

Can I buy 2 kg of AI or if possible 2kg of intelligence ?🙂 hanswestergren@hotmail.com

Expand full comment
Jan-Erik Vinje's avatar

How about we humans deceptively storing model weights of the dangerous model (like for instance the "Claude 5" with bioweapons risks) in an offline maximum security "airgapped" facility where safety researchers at least for some period of time could try to disect and analyze the model behaviour in controlled settings so as to learn more about how and why the model exhibited the dangerous behaviours.. -Maybe have an expiration date for permanent deletion so we no longer have to worry that they could be stolen or exfiltrated.

Expand full comment
Olle Häggström's avatar

I would be a lot happier with such a solution if these safety researchers could be shown to be immune to manipulation attempts from Claude 5, but at least as things stand today, it's hard to see how a convincing such safety protocol could be constructed. (And to speak the truth, this aspect makes me concerned not only about this hypothetical Claude 5 scenario, but also about present day AI evals.)

Expand full comment
Jan-Erik Vinje's avatar

I know! We ask them to perform "Pinky promise!". That should work!

Expand full comment