Update robots and fix typo
This commit is contained in:
@@ -6,7 +6,7 @@ tags: [ "AI", "Machine Learning", "Crawler", "robots.txt", ]
|
||||
|
||||
<p class="block">Today's gonna be a bit of a ranter. The whole AI-Crawler situation has gone from bad to awful, potentially ending the web as we know it. AI companies like OpenAI (ChatGPT) and Anthropic (Claude) have gotten so used to stealing that they're no longer afraid to cause extreme costs to others for their own gain. Many of them now employ crawlers that include methods to bypass limits and filters. It needs to stop.</p><!--more-->
|
||||
|
||||
<p class="block">I've personally been hit by OpenAI weeks ago, which managed to <a href="https://x.com/Xaymar/status/1831961709327880250" target="_blank">generate 49 TB of traffic costing me about 13€ for that day</a>, and later on Anthropic tried to do the same based on the access logs. And it seems I'm not the only one, with <a href="https://blog.uberspace.de/2024/08/bad-robots/" target="_blank">Uberspace</a> being hit even worse. <a href="https://x.com/rileyb3d/status/1831375847212904566" target="_blank">rileyb3d appears to have been harassed in a similar way and was forced to take down their own website entirely.</a> And you know it's gotten really bad the CloudFlare out of all things makes their AI-Crawler protection tool available for Free users. They only make things available for Free users that are widespread, so even CloudFlare has had enough now.</p>
|
||||
<p class="block">I've personally been hit by OpenAI weeks ago, which managed to <a href="https://x.com/Xaymar/status/1831961709327880250" target="_blank">generate 49 TB of traffic costing me about 13€ for that day</a>, and later on Anthropic tried to do the same based on the access logs. And it seems I'm not the only one, with <a href="https://blog.uberspace.de/2024/08/bad-robots/" target="_blank">Uberspace</a> being hit even worse. <a href="https://x.com/rileyb3d/status/1831375847212904566" target="_blank">rileyb3d appears to have been harassed in a similar way and was forced to take down their own website entirely.</a> And you know it's gotten really bad when CloudFlare out of all things makes their AI-Crawler protection tool available for Free users. They only make things available for Free users that are widespread, so even CloudFlare has had enough now.</p>
|
||||
|
||||
<p class="block">This situation has completely spiraled out of control for everyone and I don't see much future in the free web anymore if it continues. AI Companies no longer care about copyright, licensing or similar, and it's only going to get worse until governments wake up. Any work you published is being used to train AI models, no matter if your license allows for it or requires payment. None of them care, and lawsuits are piling up.</p>
|
||||
|
||||
|
||||
+39
@@ -5,3 +5,42 @@ Disallow: /feed.xml
|
||||
Disallow: /restricted/
|
||||
Disallow: /404.html
|
||||
Disallow: /redirects.json
|
||||
|
||||
User-agent: AI2Bot
|
||||
User-agent: Ai2Bot-Dolma
|
||||
User-agent: Amazonbot
|
||||
User-agent: Applebot
|
||||
User-agent: Applebot-Extended
|
||||
User-agent: Bytespider
|
||||
User-agent: CCBot
|
||||
User-agent: ChatGPT-User
|
||||
User-agent: Claude-Web
|
||||
User-agent: ClaudeBot
|
||||
User-agent: Diffbot
|
||||
User-agent: FacebookBot
|
||||
User-agent: FriendlyCrawler
|
||||
User-agent: GPTBot
|
||||
User-agent: Google-Extended
|
||||
User-agent: GoogleOther
|
||||
User-agent: GoogleOther-Image
|
||||
User-agent: GoogleOther-Video
|
||||
User-agent: iaskspider/2.0
|
||||
User-agent: ICC-Crawler
|
||||
User-agent: ImagesiftBot
|
||||
User-agent: Meta-ExternalAgent
|
||||
User-agent: Meta-ExternalFetcher
|
||||
User-agent: OAI-SearchBot
|
||||
User-agent: PerplexityBot
|
||||
User-agent: PetalBot
|
||||
User-agent: Scrapy
|
||||
User-agent: Timpibot
|
||||
User-agent: VelenPublicWebCrawler
|
||||
User-agent: Webzio-Extended
|
||||
User-agent: YouBot
|
||||
User-agent: anthropic-ai
|
||||
User-agent: cohere-ai
|
||||
User-agent: facebookexternalhit
|
||||
User-agent: img2dataset
|
||||
User-agent: omgili
|
||||
User-agent: omgilibot
|
||||
Disallow: /
|
||||
|
||||
Reference in New Issue
Block a user