A commenter on the recent post of mine regarding Bitacle’s insults wrote in to stand up for Bitacle saying, in part:
“Google, yahoo, and technorati scrape other people’s content every day (which I believe, there was a US court case that google won). They also display advertisements. Just like bitacle….”
“If you want your content off of bitacle, it also needs to be taken off of every other search engine that is caching it.”
While the commenter, who used the name Ricardo Sanborn, is correct that Google and other search engines do many of the same things as Bitacle, he, and others like him, are wrong to equate the two.
There’s a world of difference both legally and ethically between Bitacle (as well as other scraper sites) and the legitimate search engines. All one has to do is stop making excuses and start looking to see the difference.
Six Points of Distinction
As I said in my reply, there are at least six points of distinction between Bitacle and the legitimate search engines:
- Lack of Opt Out – Bitacle completely ignores robots.txt files, disregards meta tags and offers no means to opt out of the site. Though Bitacle claims there isn’t any “norm that forces (them) to obey the robots.txt“, the presence of an opt out mechanism was critical to Google, and other search engines, in having their cache judged to be fair use (PDF, see pages 20 and 21).*
- Displays Full Content – Though major search engines cache Web pages, they do not display the full content of the sites they index on their own result pages. They display, at the most, small snippets of content. Also, search engine caches display the content in the original context, capturing all images, licenses and attribution, Bitacle merely scrapes the content and formats it for their own site.
- Destination, Not Direction – Major search engines exist to direct users to the sites they want to see, not to be end destinations. Bitacle’s “aggregates” feature not only displays the full content of every post, but also offers a Digg feature and a comment form. Users have almost no motivation to leave Bitacle’s version and visit the original site. These are clear signs that Bitacle’s goal is not to direct users to the sites they scrape, but to keep the traffic (and money) for themselves. This means that Bitacle’s use is not transformative (where the use of the copy is different than the original intent) and thus almost certainly not fair use (See above PDF pages 14-16)
- All About the Benjamins – Until Bitacle’s Adsense account was forcibly shut down, Bitacle was displaying ads next to the scraped content and, at last check, was still attempting to do so (leaving the Adsense block intact). Commercial use is heavily frowned upon in fair use arguments and profiting directly from one’s material without their permission is generally not considered fair use or fair dealing. (See above PDF, page 16 and 17).
- Bitacle’s Past – When Bitacle first started gaining attention, they provided no attribution to the original author of the posts and relicensed every post it scraped under a new Creative Commons license, regardless of how it was licensed under the original site. Though that behavior has stopped, it shows the lack of consideration that Bitacle has for bloggers at large.
- The Spam Factor – Finally, where most search engines do not allow other sites to index their cached copies, Bitacle encourages others to do so by automatically adding search-engine friendly metadata to every post they scrape. They have hundreds of thousands of pages indexed in Google, most of them from the aggregates section. Bitacle may not be the largest search engine spam operation, but it is definitely one of the most dangerous to copyright holders.
All in all, Bitacle is light years apart from Google and the other search engines both in terms of both law and ethics. Any comparison between the two is flawed.
A Change of Venue
Astute readers will quickly point out that all of the laws I have cited are American and that Bitacle is located in Spain. However, it is unlikely that Bitacle would find a much friendlier audience in its home country.
Spain is part of the European Union (E.U.) and the E.U. is where Google News was successfully sued for copyright infringement by Belgian newspapers. Though the case leaves many unanswered questions, it is clear that European courts are no friends to search engines, even ones that do actually offer an opt out.
While it remains to be seen how a Spanish court would react to Bitacle, E.U. copyright law is notoriously strict and it wouldn’t likely favor Bitacle.
In short, if Bitacle can not meet the standards of U.S. law, it will almost certainly fail the standards of E.U. law.
The law has made it clear that caching, for certain purposes, is very much legal and acceptable. However, Bitacle caches both the wrong way and for the wrong reasons. It is a violation of copyright law and ethically divorced from the search services it tries to emulate.
Though Bitacle apologists may try to make excuses and attempt to label those who are upset with Bitacle as hypocrites, it is Bitacle itself that is in the moral dillemma.
There is little question as to Bitacle’s legal and ethical standing, even if some people don’t want that to be the case.
I intended posting on my blog about this issue yesterday anyway, but something rather serendipitous occurred to make my post even sweeter. I love a touch of irony in the mornings…
On my previous blog post I received a little comment from a lovely visitor called Anonymous – in Spanish:
Pero que puta eres, y tu hijo es un hijo de puta bastardo de mala madre.
The lovely Lioness happened to be online when I received this in my inbox this morning and reluctanctly translated it for me:
“What a whore you are and your son is a son of a whore, son of a bad mother.”
It took about 10 seconds for the penny to drop. Silly me for sticking my head up at Bitacle the previous day.
Really guys, if your going to centre your entire website around stealing other people’s content, allow people to leave comments and then spam me with hateful abuse the day after I dare to point out ON MY STOLEN BLOG that I dont really appreciate the fact that YOU HAVE STOLEN MY ENTIRE BLOG, then have then sense to cover your tracks when you visit my actual blog. If you have the technology to block my IP address surely you must realise I have the technology to trace who was online at the time the comment was left. Bitacle, you scum-sucking bastards, you may have blocked my IP address so I can no longer access your website from home, but you cant block every IP in the world.
It seems they may have also disabled the comments feature now so people cant leave comments on their website thinking it is the actual blog. Since I cant access the website I had someone else do it for me. No comments feature anymore…gee wonder why???
The most disturbing thing about this whole Bitacle debacle is not the mere displaying of copyright material. It is that we cannot now protect our privacy. Feedburner, Technorati, etc, they can’t provide you with a page that no longer exists. Bitacle keeps your information. Keeps it. If you delete your blog and your photos having decided that your privacy has been compomised…too bad. Your blog may no longer exist but every word you ever wrote and every image you ever uploaded is right there on their server for anyone to search.
Next time they want to abuse me for asserting my legal rights I hope they get their facts right: I never charge for sex.