Outperforming Top AI Models: A Challenge to Excel in Basic Vision Tests

Tech & AI | July 11, 2024, 1:03 p.m.

In recent years, significant advancements in AI systems have been made in recognizing and analyzing complex images. However, a new paper titled "Vision language models are blind" reveals that state-of-the-art vision learning models struggle with simple visual tasks that are easy for humans. Researchers from Auburn University and the University of Alberta created eight straightforward visual acuity tests, such as identifying intersecting lines and counting nested shapes, to assess the performance of four visual models. The results showed that the AI models fell short of human-level accuracy, with large performance variations across different tasks. While some models excelled in identifying circled letters, others struggled with tasks like counting rows and columns in a grid. The researchers suggest that the VLMs may be biased towards familiar patterns, leading to inaccurate results on unfamiliar tasks. Additionally, the study highlights the need for improved training approaches to bridge the gap between high-level visual reasoning and low-level abstract images in AI systems.