New Breakthrough: Scientists Revolutionize AI by Removing Matrix Multiplication in Large Language Models (LLMs)

Tech & AI | June 26, 2024, 2:43 a.m.

Researchers have developed a revolutionary technique to enhance the efficiency of AI language models by eliminating matrix multiplication, a key operation in neural network computations. This groundbreaking approach, detailed in a recent preprint paper by researchers from various institutions, could significantly impact the environmental footprint and operational costs of AI systems. Matrix multiplication is crucial for neural network tasks and is typically accelerated by GPUs due to their ability to perform multiple operations in parallel. The new paper introduces a custom model with 2.7 billion parameters that achieves comparable performance to traditional large language models without using matrix multiplication. Additionally, the researchers demonstrate running a 1.3 billion parameter model at high speed on a GPU accelerated by a custom FPGA chip, significantly reducing power consumption. Although not peer-reviewed yet, the researchers suggest that their technique challenges the conventional belief that matrix multiplication is essential for high-performing language models. By eliminating this operation, their approach could make large language models more accessible, efficient, and sustainable, particularly for use on resource-constrained devices like smartphones.