Amazon Probes Perplexity AI Over Allegations of Unauthorized Website Scraping
Money | June 28, 2024, 4:44 p.m.
Amazon Web Services is investigating whether Perplexity AI is violating its rules by not complying with the Robots Exclusion Protocol. Wired reported that Perplexity's crawler, hosted on AWS servers, was bypassing robots.txt instructions to scrape content from various websites, including those of Condé Nast, The Guardian, Forbes, and The New York Times. This alleged behavior has raised concerns about Perplexity's data gathering methods for training language models. While Perplexity denies the accusations, Amazon is focusing its investigation on the company. Perplexity spokesperson Sara Platnick insists that their crawler respects robots.txt instructions. However, CEO Aravind Srinivas admitted to using third-party web crawlers in addition to their own. Despite the denial, Amazon is adamant that its customers must comply with robots.txt instructions when crawling websites. This investigation highlights the importance of ethical data gathering practices in the AI industry.