Reddit CEO states that Microsoft must pay to access and search the site

Reddit

Following agreements with Google and OpenAI, Reddit CEO Steve Huffman insists that Microsoft and other companies should pay to continue scraping the site’s data.

“Without these agreements, we lack control and insight into how our data is presented and utilized, which has led us to block those unwilling to agree to our terms for data use,” Huffman said in an interview this week. He specifically called out Microsoft, Anthropic, and Perplexity for their refusal to negotiate, describing it as “a real pain to block these companies.”

In recent months, Reddit has increased the intensity of its battle with crawlers. It modified its robots.txt file at the start of July to prevent web crawlers with no agreements. Subsequently, users saw that Reddit results only showed up in Google results, not in search results from other search engines like Bing, where Reddit pays for its data to be displayed.

According to Huffman, Reddit’s data has also been sold to other search engines via the Bing API, and Microsoft has been utilizing Reddit’s data to train its AI and summarize its material in Bing results “without telling us.” He brought up Microsoft AI CEO Mustafa Suleyman’s recent remark at a conference that publicly available online data is “freeware” during the interview.

“We’ve seen Microsoft, Anthropic, and Perplexity behave as if all online content is freely available for their use,” Huffman said. “That reflects their true stance.”

In reaction to Reddit results recently vanishing from Bing, Microsoft’s head of search, Jordi Ribas, stated on X that “Reddit has blocked Bing from crawling their site for search purposes, favoring another search engine, which affects competition for Bing and Bing-powered engines.” Microsoft spokesperson Caitlin Roulston also told The Verge last week that “we respect the preferences of websites that do not wish their content to be used with our generative AI models.

Huffman highlighted OpenAI’s recent launch of SearchGPT, which can display Reddit results due to a deal made earlier this year, as the model he wants to emulate. According to spokesperson Tim Rathschmidt, none of Reddit’s content licensing agreements so far include exclusive use cases for its data.

By seeking licensing agreements, Reddit aligns itself with traditional media publishers (such as Vox Media, The Verge’s parent company) in requesting compensation for their content used in generative AI. “The traditional value exchange with search engines has shifted,” Huffman remarked. “Search, summarization, and training are blending, and the clarity of the value exchange—crawling for traffic—is becoming unclear.”

Jennifer Martinez, a spokesman for Anthropic, made the following statement following the publication of this story: “We haven’t added any URLs from Reddit to our crawler since mid-May when Reddit was placed on our block list for web crawling. Robots.txt is a recognized industry signal for preventing site crawling, and we respect it.

For this report, Microsoft declined to comment. Upon being contacted for a comment, Perplexity did not react.

Leave a Reply

Your email address will not be published. Required fields are marked *