AI Companies Caught Ignoring Robots.Txt Files and Scraping Web Content, Reveals Licensing Firm

Tech & AI | June 23, 2024, 1:43 p.m.

Multiple AI companies are disregarding Robots.txt files that are in place to prevent the scraping of web content for generative AI systems, according to a report by Reuters. Content licensing startup TollBit issued a warning to publishers about this issue. The president of the News Media Alliance, a trade group representing over 2,200 U.S.-based publishers, expressed concern over the impact of this practice on the journalism industry. Without the ability to prevent large-scale scraping, publishers are unable to monetize their content and compensate journalists, which poses a serious threat to the industry. In addition to this, Reuters also highlights another threat facing news sites. This alarming trend of AI companies ignoring Robots.txt files not only undermines the rights of publishers but also poses a significant risk to the sustainability of the news industry.