How to Identify, Disable and Prevent Scrapers from Thieving Your Blogs’ Content

bunnymeow The first time I ever saw one of my blogs on another site I was enraged. “They stole my content!” It was verbatim. They could have at least changed the wording a bit. Then I realized that they were tapping into my RSS to steal it, strip it of links, and post without credit.

Unless you religiously monitor your analytics and server logs, you may not even know that your content has been stolen. I found someone copying my content when I viewed trackbacks on another blog, supposedly linking to mine but actually linking to the theft’s website. If you have an active blogging community that is cranking out some high quality content, you need to be able to give your bloggers some protection against scrapers. Here’s how:

How To Identify Scraper Sites:

  • No RSS feed available
  • Many quality posts that contain no links
  • Many quality posts but very low subscriber count
  • Great content but with zero comments on any posts
  • Lots of good content but with lots of Adsense or other ads
  • No “About” page or business information
  • And the number one brain-dead giveaway: no contact form or email address

 (Source for list above: Perishable Press)

How to Diagnose the Scraper’s Source:

Monitor your incoming links.

In your WordPress dashboard you will see a section for “Incoming Links.” This is probably the easiest way to track down content thieves.

Monitor your bandwidth.

Log analysis software is imperative if you want to find the source of some of your most persistent bots. If you have cPanel, you have Analog Stats, Awstats, and Webalizer available to you to help you identify the evil bots crawling your server. Dig into the data and find out their IP addresses. Then you can block them in your .htaccess file.

useragents

Monitor your trackbacks.

Scrapers will often send a “ping” to your site for a trackback. You will generally be notified in the same way as if it were a comment. Check out their content, verify the theft, and then grab their IP address and block it in order to prevent them from returning to your site.

How to Disable and Prevent Scrapers

Find the IP address of your scraper and then deny it access from your .htaccess file:

order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all

Crack down on hotlinking:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*lame-bandwidth-theft.com [NC]
RewriteRule .* – [F]

Use partial feeds.

This may not be desirable to you but I guarantee you that you’ll attract fewer scrapers with content excerpts or titles only. This is one measure that will seriously curb the amount of content being scraped from your site.

Contact the Scraper Directly

This might seem ridiculous but if you ask them to remove your content, they may comply. If they do not, there are places where you can report their thieving. File a formal DMCA notice with each of the major search engines.

Add Copyright Information to Your RSS Footer

Make sure that anyone who reads your content knows the original source. You can customize your RSS footer using this WordPress plugin.

Featured Plugin - WordPress Q&A Site Plugin

It's now incredibly easy to start your own Q&A site using nothing more than WordPress - The Q&A plugin simply and brilliantly transforms any site, or page, into a perfect support or Q&A environment.
Find out more

Featured Plugin - WordPress Pop-Up Chat Plugin

No javascript required, no third part chat engine, just fully featured chat right in your own database on your own WP sites - couldn't be easier.
Find out more

Featured Plugin - WordPress Google Maps Plugin

Simply insert google maps into posts, sidebars and pages - show directions, streetview, provide image overlays and do it all from a simple button and comprehensive widget.
Find out more

Featured Plugin - WordPress Infinite SEO Plugin

Fully integrated with the SEOMoz API, complete with automatic links, sitemaps and SEO optimization of your WordPress setup - this is the only plugin you need to help you rank your site number 1 on Google - nothing else compares.
Find out more

Featured Plugin - WordPress Facebook Plugin

Would you like to add Facebook comments, registration, 'Like' buttons and autoposting to your WP site? Well, The Ultimate Facebook plugin has got that all covered!
Find out more

Featured Plugin - WordPress Wiki Plugin

To get a wiki up and running you used to need to install Mediawiki and toil away for days configuring it... not any more! This plugin gives you *all* the functionality you want from a wiki, in WordPress!!!
Find out more

Featured Plugin - WordPress Membership Site Plugin

If you're thinking about starting a paid, or just private, membership site then this is truly the plugin you've been looking for. Easy to use, massively configurable and ready to go out of the box!
Find out more

Featured Plugin - WordPress Appointments Plugin

Take, set and manage appointments and client bookings without having to leave WordPress. Appointments+ makes it easy.
Find out more

Featured Plugin - WordPress Newsletter Plugin

Now there's no need to pay for a third party service to sign up, manage and send beautiful email newsletters to your subscriber base - this plugin has got the lot.
Find out more

Comments (2)

Participate