Skip to main content

Cookie Consent

We use cookies to enhance your browsing experience, serve personalised ads or content, and analyse our traffic. Learn more

AI in ASIA
Protect writing from AI
Life

Protect Your Writing from AI Bots: A Simple Guide

This article explains how to protect your writing from AI bots using the robots.txt file, and discusses the copyright issues surrounding AI models.

Intelligence Desk4 min read

AI Snapshot

The TL;DR: what matters, fast.

AI models require vast amounts of text for training, often scraping content without permission, as highlighted by The New York Times' lawsuit against OpenAI.

AI companies face a potential content crisis as human-generated text could run out by 2026, hindering further AI development.

Protect your writing by using the robots.txt file to block AI bots and web crawlers like Common Crawl from accessing your website's content.

Who should pay attention: Writers | Publishers | AI developers | Copyright lawyers

What changes next: The legal landscape around AI training data and copyright is likely to evolve rapidly.

AI models like ChatGPT use vast amounts of text, often without permission.,The New York Times has sued OpenAI for copyright infringement.,You can protect your writing by editing your robots.txt file.

The Rise of AI and Its Hunger for Words

Artificial Intelligence (AI) is transforming the world, but it comes with challenges. AI models like ChatGPT require enormous amounts of text to train. For instance, the first version of ChatGPT was trained on about 300 billion words. That's equivalent to writing a thousand words a day for over 800,000 years!

But where does all this text come from? Often, it's scraped from the internet without permission, raising serious copyright concerns.

The Case of The New York Times vs. OpenAI

In a high-profile case, The New York Times sued OpenAI, the company behind ChatGPT, for copyright infringement. The lawsuit alleges that OpenAI scraped millions of articles from The New York Times and used them to train its AI models. Sometimes, these models even reproduce chunks of text verbatim.

"OpenAI made three hundred million in August and expects to hit $3.7 billion this year." - The New York Times

"OpenAI made three hundred million in August and expects to hit $3.7 billion this year." - The New York Times

This raises a crucial question: How would you feel if AI models were using your writing without permission?

The Looming Content Crisis

AI companies face a potential content crisis. A study by Epoch AI suggests that AI models could run out of human-generated content as early as 2026. This could lead to stagnation, as AI models need fresh content to keep improving.

"The AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing." - Tamay Besiroglu, author of the Epoch AI study

"The AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing." - Tamay Besiroglu, author of the Epoch AI study

Protecting Your Writing: The robots.txt File

So, how can you protect your writing? The solution lies in a simple text file called robots.txt. This file tells robots (including AI bots) what they can and can't access on your website.

Here's how it works:

User-agent: This is the name of the robot. For example, 'GPTBot' for ChatGPT.,Disallow: This means 'no'.,The slash (/): This means the whole website or account.

So, if you want to block ChatGPT from accessing your writing, you would add this to your robots.txt file:

User-agent: GPTBot Disallow: /

How to Edit Your robots.txt File

If you have your own website, you can edit the robots.txt file to block AI bots.

Here's how:

Using the Yoast SEO plugin: Go to Yoast > Tools > File Editor.,Using FTP access: The robots.txt file is in the root directory.,Using the WP Robots Txt plugin: This is a simple, non-technical solution. Just go to Plugins > Add New, then type in 'WP Robots Txt' and click install.

Once you're in the robots.txt file, copy and paste the following to block common AI bots:

User-agent: GPTBot Disallow: /

User-agent: ChatGPT-User Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: Omgilibot Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: Claude-Web Disallow: /

The Common Crawl Dilemma

Common Crawl is a non-profit organisation that creates a copy of the internet for research and analysis. Unfortunately, OpenAI used Common Crawl data to train its AI models. If you want to block Common Crawl, add this to your robots.txt file:

User-agent: CCBot Disallow: /

The Future of AI and Copyright Law

The future of AI and copyright law is uncertain. Until the laws change, the best way to protect your writing is to block AI bots using the robots.txt file.

"Until they change copyright laws and intellectual property laws and give the rights to he with the most money — your words are yours."

"Until they change copyright laws and intellectual property laws and give the rights to he with the most money — your words are yours."

Comment and Share:

How do you feel about AI models using your writing without permission? Have you checked your robots.txt file? Share your thoughts and experiences below. And don't forget to Subscribe to our newsletter for updates on AI and AGI developments!

YOUR TAKE

We cover the story. You tell us what it means on the ground.

What did you think?

Written by

Share your thoughts

Be the first to share your perspective on this story

This is a developing story

We're tracking this across Asia-Pacific and may update with new developments, follow-ups and regional context.

This article is part of the AI Writing Mastery learning path.

Continue the path →

Liked this? There's more.

Join our weekly newsletter for the latest AI news, tools, and insights from across Asia. Free, no spam, unsubscribe anytime.

Loading comments...