'Study Finds Apple, Anthropic, and Other Companies Leveraging YouTube for AI Training'

Tech & AI | July 16, 2024, 12:53 p.m.

An investigation by Proof News and Wired reveals more than 170,000 YouTube videos were used to train AI systems for tech giants Apple, Anthropic, Nvidia, and Salesforce without permission. The “YouTube Subtitles” dataset includes transcripts from popular creators like MrBeast and Marques Brownlee, as well as news clips from outlets such as ABC News and The New York Times. Apple admitted to sourcing data from companies that scraped YouTube data, sparking concerns over transparency in AI development. The dataset is part of EleutherAI’s larger collection, The Pile, which contains various text sources used for training AI systems. YouTube has not responded to inquiries regarding the unauthorized use of their content. This latest revelation raises questions about the ethical and legal implications of AI companies using publicly available data, especially from platforms like YouTube.