Why 'Superhuman' Go AIs Struggle to Defend Against Basic Exploits

Tech & AI | July 12, 2024, 4:03 p.m.

In the realm of the ancient Chinese game of Go, artificial intelligence has long dominated human players. However, recent research has uncovered flaws in top-level AI Go algorithms that give humans an opportunity to turn the tide. By employing unconventional "cyclic" strategies, even novice human players can exploit weaknesses in AI algorithms, leading them to defeat. Researchers from MIT and FAR AI sought to enhance the "worst-case" performance of superhuman AI Go algorithms by testing various methods to strengthen the top-level KataGo algorithm's defenses against adversarial attacks. Despite their efforts, including fine-tuning the algorithm and exploring new training techniques, the results indicated that creating truly unexploitable AIs remains a challenge, even in tightly controlled environments such as board games. As the study suggests, vulnerabilities in AI systems are challenging to eliminate, and adversaries can quickly identify weaknesses that AI algorithms struggle to address. While the research did not make attacks impossible, it demonstrated the potential for defending against known exploits with sufficient training. These findings underscore the importance of enhancing AI systems' robustness against worst-case scenarios, emphasizing the value of resilience over pursuing new capabilities.