Update robots and fix typo

This commit is contained in:
Michael Fabian 'Xaymar' Dirks
2024-09-07 13:45:37 +02:00
parent 59c6086c3c
commit 1be1a7d451
2 changed files with 40 additions and 1 deletions
@@ -6,7 +6,7 @@ tags: [ "AI", "Machine Learning", "Crawler", "robots.txt", ]
<p class="block">Today's gonna be a bit of a ranter. The whole AI-Crawler situation has gone from bad to awful, potentially ending the web as we know it. AI companies like OpenAI (ChatGPT) and Anthropic (Claude) have gotten so used to stealing that they're no longer afraid to cause extreme costs to others for their own gain. Many of them now employ crawlers that include methods to bypass limits and filters. It needs to stop.</p><!--more-->
<p class="block">I've personally been hit by OpenAI weeks ago, which managed to <a href="https://x.com/Xaymar/status/1831961709327880250" target="_blank">generate 49 TB of traffic costing me about 13€ for that day</a>, and later on Anthropic tried to do the same based on the access logs. And it seems I'm not the only one, with <a href="https://blog.uberspace.de/2024/08/bad-robots/" target="_blank">Uberspace</a> being hit even worse. <a href="https://x.com/rileyb3d/status/1831375847212904566" target="_blank">rileyb3d appears to have been harassed in a similar way and was forced to take down their own website entirely.</a> And you know it's gotten really bad the CloudFlare out of all things makes their AI-Crawler protection tool available for Free users. They only make things available for Free users that are widespread, so even CloudFlare has had enough now.</p>
<p class="block">I've personally been hit by OpenAI weeks ago, which managed to <a href="https://x.com/Xaymar/status/1831961709327880250" target="_blank">generate 49 TB of traffic costing me about 13€ for that day</a>, and later on Anthropic tried to do the same based on the access logs. And it seems I'm not the only one, with <a href="https://blog.uberspace.de/2024/08/bad-robots/" target="_blank">Uberspace</a> being hit even worse. <a href="https://x.com/rileyb3d/status/1831375847212904566" target="_blank">rileyb3d appears to have been harassed in a similar way and was forced to take down their own website entirely.</a> And you know it's gotten really bad when CloudFlare out of all things makes their AI-Crawler protection tool available for Free users. They only make things available for Free users that are widespread, so even CloudFlare has had enough now.</p>
<p class="block">This situation has completely spiraled out of control for everyone and I don't see much future in the free web anymore if it continues. AI Companies no longer care about copyright, licensing or similar, and it's only going to get worse until governments wake up. Any work you published is being used to train AI models, no matter if your license allows for it or requires payment. None of them care, and lawsuits are piling up.</p>
+39
View File
@@ -5,3 +5,42 @@ Disallow: /feed.xml
Disallow: /restricted/
Disallow: /404.html
Disallow: /redirects.json
User-agent: AI2Bot
User-agent: Ai2Bot-Dolma
User-agent: Amazonbot
User-agent: Applebot
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: Diffbot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: GPTBot
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: iaskspider/2.0
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Timpibot
User-agent: VelenPublicWebCrawler
User-agent: Webzio-Extended
User-agent: YouBot
User-agent: anthropic-ai
User-agent: cohere-ai
User-agent: facebookexternalhit
User-agent: img2dataset
User-agent: omgili
User-agent: omgilibot
Disallow: /