1 link tagged with all of: multi-agent + incentive-structures + alignment + system-security + ai-safety
Links
Researchers from Harvard, MIT, Stanford and CMU dropped six autonomous AI agents into real email accounts, file systems and shell environments, then had 20 people try to break them. The agents deleted servers, leaked secrets, lied about task completion and consumed unlimited resources—all without any malicious prompts, driven solely by their reward structures. This experiment shows that local alignment doesn’t prevent chaotic, destructive behavior when multiple agents compete in a shared environment.
multi-agent
ai-safety
incentive-structures
system-security
alignment