TL;DR:
- AI models like ChatGPT use vast amounts of text, often without permission.
- The New York Times has sued OpenAI for copyright infringement.
- You can protect your writing by editing your robots.txt file.
The Rise of AI and Its Hunger for Words
Artificial Intelligence (AI) is transforming the world, but it comes with challenges. AI models like ChatGPT require enormous amounts of text to train. For instance, the first version of ChatGPT was trained on about 300 billion words. That’s equivalent to writing a thousand words a day for over 800,000 years!
But where does all this text come from? Often, it’s scraped from the internet without permission, raising serious copyright concerns.
The Case of The New York Times vs. OpenAI
In a high-profile case, The New York Times sued OpenAI, the company behind ChatGPT, for copyright infringement. The lawsuit alleges that OpenAI scraped millions of articles from The New York Times and used them to train its AI models. Sometimes, these models even reproduce chunks of text verbatim.
“OpenAI made three hundred million in August and expects to hit $3.7 billion this year.” – The New York Times
This raises a crucial question: How would you feel if AI models were using your writing without permission?
The Looming Content Crisis
AI companies face a potential content crisis. A study by Epoch AI suggests that AI models could run out of human-generated content as early as 2026. This could lead to stagnation, as AI models need fresh content to keep improving.
“The AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.” – Tamay Besiroglu, author of the Epoch AI study
Protecting Your Writing: The robots.txt File
So, how can you protect your writing? The solution lies in a simple text file called robots.txt. This file tells robots (including AI bots) what they can and can’t access on your website.
Here’s how it works:
- User-agent: This is the name of the robot. For example, ‘GPTBot’ for ChatGPT.
- Disallow: This means ‘no’.
- The slash (/): This means the whole website or account.
So, if you want to block ChatGPT from accessing your writing, you would add this to your robots.txt file:
User-agent: GPTBot
Disallow: /
How to Edit Your robots.txt File
If you have your own website, you can edit the robots.txt file to block AI bots.
Here’s how:
- Using the Yoast SEO plugin: Go to Yoast > Tools > File Editor.
- Using FTP access: The robots.txt file is in the root directory.
- Using the WP Robots Txt plugin: This is a simple, non-technical solution. Just go to Plugins > Add New, then type in ‘WP Robots Txt’ and click install.
Once you’re in the robots.txt file, copy and paste the following to block common AI bots:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
The Common Crawl Dilemma
Common Crawl is a non-profit organisation that creates a copy of the internet for research and analysis. Unfortunately, OpenAI used Common Crawl data to train its AI models. If you want to block Common Crawl, add this to your robots.txt file:
User-agent: CCBot
Disallow: /
The Future of AI and Copyright Law
The future of AI and copyright law is uncertain. Until the laws change, the best way to protect your writing is to block AI bots using the robots.txt file.
“Until they change copyright laws and intellectual property laws and give the rights to he with the most money — your words are yours.”
Comment and Share:
How do you feel about AI models using your writing without permission? Have you checked your robots.txt file? Share your thoughts and experiences below. And don’t forget to subscribe for updates on AI and AGI developments!