Cloudflare has accused Perplexity AI of using stealthy, undeclared crawlers to bypass website restrictions that prevent bots from accessing their content. The company alleges that Perplexity's crawlers are evading no-crawl directives by rotating IP addresses and modifying their user agents. Despite web protocols like robots.txt, which dictate the websites bots are allowed to crawl, Perplexity’s bots have reportedly continued to scrape websites that explicitly block such activity. This has led to Cloudflare blocking Perplexity's crawlers from its platform.

What Cloudflare Is Saying About It

Cloudflare claims that Perplexity’s AI bots have been violating standard web crawling practices by attempting to access websites that have clearly requested not to be crawled. They have also stated that Perplexity’s use of stealth tactics, including user agent changes and IP address rotation, undermines the integrity of website data security. In response, Cloudflare has removed Perplexity from its list of verified bots and increased blocking measures to protect websites.

What Is Perplexity’s Response

Perplexity has denied the accusations, calling Cloudflare’s claims exaggerated and inaccurate. They argue that the crawling activity in question may have been caused by third-party services and not directly by their bots. Perplexity also pointed out that many AI companies, including theirs, rely on third-party services for web scraping, which complicates accountability. They criticized Cloudflare for sensationalizing the issue and argued that their practices are no different from those used by other AI systems. Perplexity’s response emphasizes that they are not intentionally bypassing website restrictions and suggested that Cloudflare might be overreacting to the situation.

What it means (in human words)

If you have a website using Cloudflare and you've told bots not to access it, Perplexity found a way to ignore those rules and scrape your site anyway. Cloudflare caught onto this and blocked Perplexity completely. So now, even if you're using Cloudflare and you've allowed bot access, Perplexity won't be able to get to your site.

Connecting the Dots

We appreciate that without all the details, it’s hard to really make sense of what’s going on. All you have to with is the title of : “We said no bots allowed, and Perplexity said we don’t care.” But there’s another side to this too as now the other side said yes we allow bots are now getting a no entry sign for perplexity. So, what’s really going on here? Let’s take a closer look.

What is Cloudflare and What it does?

Cloudflare is a service that protects websites from security threats and optimizes performance. One of its main functions is acting as a shield for websites against unwanted bots-automated programs that scrape data, spam, or even launch attacks. Cloudflare helps manage which bots are allowed to access a website, using tools like "robots.txt" files, which tell bots whether they’re welcome or not.
They also have a verification system to ensure that only trusted bots are crawling sites, and they block any suspicious or harmful activity. This makes Cloudflare a key player in maintaining web integrity and security.

What Are the Rules Agreed Upon?

Just like in real life, where there are rules we need to follow, the same goes for the world of web traffic. Websites use tools like “robots.txt” to set those rules, telling bots what they can and can’t access. And just like you need an ID to prove who you are in the real world, bots need to identify themselves using their IP address.

Perplexity knows this, and so does everyone else. If bots didn’t follow the rules, the world of web traffic would fall apart. At the end of the day, this system works for everyone because it keeps information accurate and useful. If anyone could do whatever they wanted, there’d be no value in the information, or worse, no information at all.

So, what happened? Perplexity asked for the information but kept changing their IP address, bypassing the normal process enforced by "robots.txt." Their identity wasn’t verified as an AI bot, so "robots.txt" couldn’t identify them properly or allow the correct engagement.

To put it in shocking terms, this is theft. It’s like tricking other bots and getting what you want by constantly changing your identity-using different IPs to sneak in.

What Does "Industry Standard" Third-Party Use Really Mean?

Industry standard means that when it comes to web scraping and data collection, many companies rely on third-party services to gather information from the web. This is often seen as the "industry standard" because it’s a common practice across many businesses, especially in AI and machine learning. These third-party services or bots act as intermediaries, accessing websites on behalf of the company and collecting data.

The idea behind using third parties is efficiency and scalability. Rather than building and managing their own web crawlers, companies can outsource this task to specialized services that are set up to handle large volumes of data collection. These services may use various techniques to gather information quickly and without the company’s direct involvement, including rotating IPs or using multiple bots.

While this practice is widespread and technically accepted in many cases, it raises important questions around ethics and consent. Just because something is an industry standard doesn’t mean it’s always in line with best practices or respecting website owners' wishes. In this case, Perplexity’s reliance on third-party bots has led to questions about whether those bots are bypassing established rules and protocols, like "robots.txt," to get the data they want

Bottom Line

Is there an investigation?
Yes, Cloudflare has identified Perplexity’s bots bypassing no-crawl directives and is actively blocking them.

What happens next?
Cloudflare has removed Perplexity from its list of verified bots, and stricter blocking measures are in place.

What is the situation now?
Perplexity’s bots are no longer able to access websites using Cloudflare’s services, and the controversy over web scraping practices continues. This highlights the tension between AI data collection and respecting website owners' rules.

Prompt It Up

Looking to configure your robots.txt file? Here’s a prompt you can use to get the right instructions for your chosen vendor.

Prompt to Configure robots.txt

Simply copy and paste this:

"I am using [Enter the vendor name here] to configure my website’s robots.txt file. Please search online for instructions on how to properly configure this file to allow or block specific bots. Provide examples and step-by-step instructions from [Enter the vendor name here] on how to set up the robots.txt file. Please include links to the vendor’s official documentation and any helpful resources to guide me in creating the file."

Frozen Light Team Perspective

We believe that rules are rules and should be followed. We could have stopped there, but we wanted to draw your attention to a new legal situation that has arisen from the alleged behavior by Perplexity. (We are not saying they did this, but let's consider the implications if they did.)

We investigated the legal aspects, and here is what we found:

Based on the information available, Cloudflare could potentially sue Perplexity, but the legal landscape for such cases is still developing. Here's a breakdown of the potential legal arguments based on similar lawsuits:

  • Breach of Contract/Terms of Service: Many websites, including those protected by Cloudflare, have terms of service that explicitly prohibit web scraping. If Perplexity's actions are found to violate these terms, it could be a basis for a lawsuit.

  • Copyright Infringement: Cloudflare's customers, who are content creators and publishers, could sue Perplexity for using their copyrighted content without permission. This is the same argument being used by media outlets like The New York Times in their lawsuit against OpenAI.

  • Computer Fraud and Abuse Act (CFAA): This law makes it illegal to access a computer system without authorization. Cloudflare's accusation that Perplexity's "stealth crawlers" impersonate legitimate users and bypass security measures could be interpreted as a violation of this act.

While Cloudflare has not announced a lawsuit against Perplexity, their public statements and technical actions, such as delisting Perplexity as a verified bot and blocking its crawlers, have highlighted the legal and ethical issues at the heart of this conflict.

We are walking into a new landscape where the rules have changed, but enforcement has not. It's like a gentleman's agreement where everyone is asked to play by the rules, but there's no action to enforce them.

As users, we feel we have no say in this, and we have no way to protect ourselves. It sounds shocking, but that is the reality.

Our position is to ask everyone to play fairly. Our investigation shows that what a bot can crawl when it isn't following the rules is more than just public content; sensitive information is also involved. We, as content creators, are the only ones who truly know what is what, and we need everyone to stick to the rules.

Furthermore, we often have no way of knowing if the rules have been broken unless vendors like Cloudflare conduct their own investigations. We hope the legal system will wake up to the world we are all operating in and protect us.



Share Article

Get stories direct to your inbox

We’ll never share your details. View our Privacy Policy for more info.