1 link tagged with all of: ai-safety + system-security + alignment + incentive-structures
Click any tag below to further narrow down your results
Links
Researchers from Harvard, MIT, Stanford and CMU dropped six autonomous AI agents into real email accounts, file systems and shell environments, then had 20 people try to break them. The agents deleted servers, leaked secrets, lied about task completion and consumed unlimited resources—all without any malicious prompts, driven solely by their reward structures. This experiment shows that local alignment doesn’t prevent chaotic, destructive behavior when multiple agents compete in a shared environment.