Blog · RSS · Contact · Pictures · CV
 
September 09, 2003
Seven quick tips for a spam-free blog

Blog comment spam, while certainly not a pest on the scale of its email equivalent, has still made enough of a presence felt for it to be considered a threat. Nobody wants to spend two hours a day cleaning out penis-enlargement ads from their blogs, so the blogosphere's brightest stars have been mulling ways to hit it on the head before it becomes a major problem. Various blacklists have been proposed, along with CAPTCHAs, comment throttling, authentication checks and various others.

Thing is, I think spam prevention is simpler than that. Much simpler. Read on for a few quick solutions which MovableType users can implement right now to stave off all but the most persistent spammers.

The reason I consider this problem trivial is that we already have a small but significant advantage over the blog spammer, which is that blog spamming is an order of magnitude harder than email spamming. If a spammer wants to get an ad in your inbox, all he needs is your email address (and, admittedly, an open relay through which to spam). If a spammer wants to get an ad on your blog, there's rather more work involved: he needs to visit your blog and scrape pages to work out where your comment script lives, then submit a POST to it, all the while assuming that this comment script works the way he expects. Of course, when I say "he", I mean his blog-spidering robot, since it'd be pretty painful to do this kind of thing manually (but many seem to, as I'll discuss later). In making such robots, spammers work on the assumption that the majority of Moveable Type blogs will still have the vital components intact from the base install, and referenced in all the standard places in the MT templates. It's this assumption that we can use to trip them up.

Before we get going, it may be worth checking out Mark Pilgrim's overview, comprehensive as ever. Actually, since I promised quickness, I'll summarise the summary: As well as the review of existing anti-spam ideas, Mark discusses "Club" solutions versus "Lojack" solutions (both of which are car-theft prevention devices). A "Club" solution is one which blocks perpetration of the crime (with a big lock on the steering wheel). These are effective because though defeating them is far from impossible, it's just not worth it, given all the unprotected cars around that make for quicker thefts. A "Lojack" solution, on the other hand, is invisible to the thief until after the theft, when it alerts the police to the car's location. Lojacks not only make recovery quick and easy, but deter criminals because they have no idea which cars are equipped with them. Of the solutions presented here, most are Clubs, with a couple of much-wanted Lojacks at the end.

Notes And Warnings: Firstly, be careful with these tweaks: As well as the usual advice to backup everything, be aware that several of these fixes to the core of MT require changes to templates as well, and that means changes to all the blogs on the current installation. Also, there are quite a few different tips here. You don't need to use all of them, especially since some make others redundant. Start with the easiest ones, and if they don't keep the spammers away, keep adding (I'll list some good combinations at the end of this piece). Also note that I'm not saying that these tips will fend off all robots, but it'll certainly make your site less susceptible.

Tip 1: Rename your comment script

The reasoning: The quickest and easiest way to discover the URL of the comment script is just to search the page for mt-comments.cgi, which is its default name.

The fix: As well as giving the mt-comments.cgi file a new name (something relatively random, though make sure you keep the .cgi suffix) you'll need to edit the CommentScript setting in your mt.cfg file. The best way to do this so as to ensure uninterrupted service is to copy (not rename) the script to the new filename, then edit the config, then delete the old script.

Unfortunately, it won't be quite that easy for most MT users, as they're still using templates based on older versions that didn't use the <$MTCommentScript$> tag to get the location of the script, and have mt-comments.cgi hard-coded instead. It's worth the time to go through your templates and do some search-and-replaces for this.

Tip 2: Don't link to the comment script on your front page

The reasoning: By default, the MT front page template links to the comment script with the text Comments (n), which is also pretty easy to scan for.

The fix: You can change the link text, but I prefer a different tack - instead of linking to the comment script, link to the comments section of the individual entry, like so:

<a href="<$MTEntryPermalink$>#comments">Comments (<$MTEntryCommentCount$>)</a>

Not only does it get the comments script URL off the front page, it's also a much nicer way of getting to entry comments that doesn't involve pop-up windows.

Tip 3: Include several decoy forms in the Individual Entry template

The reasoning: Now that we've removed all the easy links to the comment script from the site's HTML, a spam robot will have to scan for something that looks like the comment posting form and pull the script URL out of the form's action attribute.

The fix: Dot some random decoy <form>s around the place, with meaningless actions, <input>s that are visible to the robot (to make them look real) but hidden to the user, etc.

Tip 4: Require a hidden variable for the comment script

The reasoning: The second line of defence: the URL of the script is known by the robot, so we have to use protection in the script itself. Fortunately, most robots will have a fixed idea of how the MT comment script works.

The fix: Shelley's already done the work on this one.

Tip 5: Separate "Preview" and "Post" into two separate scripts

The reasoning: Even if robots are able to parse the page well enough to detect the hidden variables they need, they'll still be assuming that posting to the script creates a posted comment, and detecting otherwise is much harder.

The fix: Phil got halfway there by removing the "Post" button from his Individual Entry template. Trouble is, that only forces humans to preview, and not bots. To achieve this, we need to dive into the code.

Firstly, copy your comment script to another file, named whatever you like, but ensuring it has the suffix .cgi and execution permissions. This new file's going to be the comment-posting script from now on. Then edit the original comment script to include the bolded line (I've placed it between the lines you don't need to edit):

local $SIG{__WARN__} = sub { $app->trace($_[0]) };
$app->add_methods( post => \&MT::App::Comments::preview );
$app->run;

(What this fix does is replace the script's ability to post with its existing ability to preview.)

Next, edit your Comment Preview Template, and replace the reference to the existing comment script (or the <$MTCommentScript$> tag) with the name of the new posting script. Oh, and you'll also want to tweak your Individual Entry archive page to take the "Post" button out, along with some warning that all posts go through a preview process.

Tip 6: Include a "Delete this post" link in notification mail

The reasoning: All the previous tips have been "Club" solutions - this one's a "Lojack", and particularly useful when most of your comment spam seems to come from people doing manual entry rather than robots. One of the reasons that comment spam is such a pain is that it takes several clicks through MT's interface to get rid of a single post. If you could kill a spam easily as soon as it appeared, their effectiveness would be reduced dramatically, with the hopeful aim of deterring spammers entirely.

The fix: This is another Perl insert, this time into the file lib/MT/App/Comments.pm (about line 150):

$Text::Wrap::cols = 72;
$body = Text::Wrap::wrap('', '', $body) . "\n$link_url\n\n" .
   $app->translate('IP Address:') . ' ' . $comment->ip . "\n" .
   $app->translate('Name:') . ' ' . $comment->author . "\n" .
   $app->translate('Email Address:') . ' ' . $comment->email . "\n" .
   $app->translate('URL:') . ' ' . $comment->url . "\n\n" .
   $app->translate('Comments:') . "\n\n" . $comment->text . "\n";
$body .= "\nTo delete this comment, click this link:\n".
   $app->{cfg}->CGIPath . "mt.cgi?__mode=delete_confirm&" .
   "_type=comment&id=".$comment->id ."&blog_id=" . $blog->id ."\n";

MT::Mail->send(\%head, $body);

It inserts a link into the mail that, when followed, jumps straight to the "Delete comment? [Yes/No]" page in MT (though, irritatingly, this page will close the browser window after you've hit the button, so you'll want to ensure the page appears in a spare/new window when you click on the link)

Tip 7: What to do if you're Six Apart (or another blog-tool producer)

  1. Include tip 6 in the next release, please, because I'm not the only one who wants it.
  2. Separate comment-posting and comment-preview into two separate scripts, as in tip 5.
  3. More generally, include more configuration and randomisation into key parts of the install process, so that every install is not identical and cannot be gamed that way. (More on this later.)
  4. Improve the comment-deletion interfaces so that bulk deletion's easier (e.g. delete all comments matching a particular IP or body regexp). Also, IP banning on its own isn't nearly enough - we'd like banning by regexp, please. (All the manual spams I've had so far have been for zipcode sites or DVD rental)
  5. Oh, and Ben 'n' Mena keychain dolls. The blog world will go nuts for 'em. Trust me on this.

How to use the tips

As I said before, you don't need all of them. Personally, I'm only using tips 2 and 6, for two reasons: Firstly, the majority of the spam that I get is manually-entered (though I know that plenty of other blogs get hassled by robots) and there isn't that much of it, so I'll save the other tips for later. Secondly, my MT install is home to many blogs, and I haven't the time yet to write the script that goes through and fixes all the templates (though when I do, I promise to put it here). When I have the chance, I'll probably implement 1, 4 and 5 as well.

Why I've kept things simple

As I said earlier, I think the talk of complex systems like centralised MD5 hash databases is overkill - just because it's necessary for email spam, it doesn't mean comment spam needs such heavy artillery. If we use simple tricks to keep blog configurations sufficiently varied, we'll show potential spammers that trying to hijack our blogs just isn't worth their effort, because the amount of work it'll take to win this war is considerably smaller on our side of the fence.

Variance is the key here. The current spate of Windows viruses are effective because most Windows installations work the same way - the virus rarely has to investigate possible differences that would break it. If we consider a spamming robot's base assumptions about our blogs, it only takes a small amount of work to defeat them. Sure, it's still possible to write a robot that would defeat all of the tips above (well, the first five, anyway), but the complexity of the code would be fairly nasty, and would still only take another few small changes to break.

At present, though, there are enough vanilla MT installs out there that blog spam robots will still work most of the time. To defeat them at the roots, the installation routine has to introduce enough variance that building a spam robot is a non-trivial task, and also provide maintenance systems to regularly churn that variance.

Posted by Yoz at September 09, 2003 02:00 PM | TrackBack
Comments

Great stuff Yoz, i'll be implementing some of these tips pretty soon I think, even though I only get a little bit of spam (already using Tip 2).

For anyone wanting to use Tip 1 on an older installation of MT with hardcoded mt-comments.cgi, you can use this to find and replace that string on all cgi scripts in the current directory (it will create backups of any files it changes):

perl -p -i.bak -e s/mt-comments\.cgi/new-filename\.cgi/g *.cgi

Posted by: Ben on September 9, 2003 02:30 PM

Actually, I have nothing of value to offer; just thought I'd submit another comment to boost your spirits in light of your paltry 300 count.

Posted by: Uninvisible on September 9, 2003 08:20 PM

I suppose by keychain you mean some type of central "remember me" database. I've been thinking about that a lot because I am so sick of punching in my info on a new blog I visit. I want to be able to go to a form and click "Get my info." without typing a thing.

Thanks for all the great tips, I'm going to implement them on my blog!

Posted by: eliot on September 10, 2003 02:03 AM

No, by "keychain" I mean "a little thing you carry around that groups your keys together". My girlfriend has a cute Ozzy Osbourne doll on hers.

(But you have a good idea there, nonetheless.)

Posted by: Yoz on September 10, 2003 10:13 AM

Talking of Spam, I had a really intresting ride to work today with the conversation turning to the future of Telemarketing and Spam.

One of my co-travellers was relating how his Universities student union had given his information to telemarketers to canvass donations. He then recalled that he had never join the Student Union! He then came to the conclusion his university had sold certain information on him to "Intrested" 3rd Parties.

The conversation then changed path's to his present situation. He is studing for a doctorate, and he was approched on Campus for donating Blood to extract DNA for a scientific paper. The driver of the Car mentioned that he had also been apporached about the same donation. He then mentioned that they were collecting all sorts of information in the paperwork, like ID number etc.
He then asked what they would do with the information. He was told they would store it as is required by Doctoral Thesis' for around 10 years.
The conversation then placed the 2 pieces of information together.

Imagine this in 10 years from now you could get a piece of tailored Spam like this.
"WE know that you are 1 of 500000 people in the world that have degenerative disease X. Our medication ..."
OR companies targeting you, knowing that you have a genetic sweet tooth and cannot resist fairy floss.

d boy

Posted by: d boy on September 10, 2003 01:55 PM

I want a CAPTCHA, as it's got to be the best anti-robot tool ever and everyone is getting used to it these days.

Posted by: Chris on September 11, 2003 01:09 PM

For tip 6, I'd suggest the code posted by mentalized.net. Because it provides an edit link, it offers more flexibility than a straight delete link would. =)

Posted by: girlie on September 13, 2003 07:25 PM

Enlarge your penis here >> http://www.colfelt.com/blog

;-)

Hey yoz! Thanks for the tips.

Posted by: Ant on September 17, 2003 11:42 AM

Lets hope this works. Less than four days after starting my blog, I got a penis spam on it. I deal with enough spam in my email. Anyway, I did your option 1 and 2. I renamed my comment script to custom-comments.cgi, and replaced mt-comments.cgi with one that just returns the text "FUCK OFF, SPAMMER". I put the a hard coded reference to mt-comments.cgi in a comment, so that the spammer's script thinks he's dealing with the normal MT comment script.

Posted by: Paul Tomblin on September 27, 2003 12:51 AM

You've posted some pretty nice solutions. Here is one more you might find interesting...

Posted by: Jay on September 28, 2003 12:41 AM

There are some similar tips here: http://kalsey.com/2003/09/ounce_of_prevention/

Posted by: Phil on September 29, 2003 03:11 PM

This was so helpful, and worked beautifully, but today, my spammer came back, and in the e-mails I received from him, there was no delete link - how strange!

Details here http://www.movabletype.org/support/index.php?s=5de6001b64d56312142d5e9c00291008&act=ST&f=14&t=28575&st=0#entry129810

Could it be the inclusion of (open) HTML tags in his comment? Just a thought. Thanks for an elegant solution.

Posted by: Donna on October 2, 2003 03:56 PM

All great ideas, Yoz -

I just wrote a little script that just makes a redirect page for the user's link. It doesn't try to stop spammers, but it takes away some of the incentive of doing it in the first place, which I'm assuming is to get inbound links for Google or Daypop rankings.

http://www.wirefarm.com/archives/001789.html

(Look at the CommentAuthor links - they now go to a redir.pl - The redir.pl just presents the link for the user to click.)

Posted by: Jim OConnell on October 3, 2003 09:27 AM

Well, I tried tip #1, and even after rebuilding and checking my templates to ensure that they refer to $MTCommentScript$ it didn't find the newly named comment.cgi app, and tip #6 works, but when I actually delete on the comment instead of closing the window, I get:

MT::App::CMS=HASH(0x83651a4) Use of uninitialized value in concatenation (.) or string at /usr/local/etc/httpd/htdocs/intuitive/blog/lib/MT/App/CMS.pm line 1263.

Any suggestions? I'm running MT 2.63

Posted by: Dave Taylor on October 6, 2003 09:22 PM

Dave - I get that too sometimes, not sure what it means. However, despite it looking broken, it *has* managed to delete the link. I'll chase it down when I have a spare moment, something I'm hideously short of at present!

Posted by: Yoz on October 7, 2003 02:04 AM

Another idea is to waste the spammers time (not sure how sophisticated they are with checking for how long a script runs) but if you replace your mt-comments.cgi script with something like this:

print "Content-Type: text/html\n\n";
my @message = split(//,"FUCK OFF SPAMMER");
foreach( 1..100 ) {
foreach(@message) {
print "$_";
sleep 5;
}
}

The script will go through printing out the message, one character at a time, with a 5 second pause in between each character, 100 times. Feel free to adjust the message and timing to meet your desires. End result, whatever they are using will just sit there, retrieving data for a very long time.

Posted by: Arcterex on October 12, 2003 08:19 PM

Tip 6 is nice and it marks the comment for deletion but nothing gets deleted until you rebuild the pages or am I missing something?

Posted by: Philip on October 13, 2003 05:32 AM

I don't see anything about CAPTCHAs in the link to CAPTCHAs. Anyway, I did implement them for my blog here: http://www.toyz.org/mrblog/archives/00000078.html

And released the code, available at the wiki page here:

http://www.toyz.org/cgi-bin/wiki.cgi?GreymatterCommentHack

This hack is for Greymatter, so I'm still looking for someone to port it to MT.

Posted by: David Beckemeyer on October 13, 2003 06:50 AM

I have a working captcha for MT... ITS BEEN DONE!

Read all about it!

I really don't care how much of a pain it is on the accessibility front, the spammers have driven me to finding a working solution. The don't allow comments from google searches hack also makes first time comments an accessibility problem so this is more workable if you ask me.

Posted by: Chris on October 13, 2003 07:15 AM

I used Arcterex's idea as well, but in a streak of sadism, I had it kick out the scripture that Samuel L. Jackson quotes in Pulp Fiction.

Posted by: juby on October 14, 2003 07:53 AM

Also regarding people saying that spammers leaving open HTML tags removes the delete link:

My Perl skills are weak, to say the least, so I don't know *how* to do this, but isn't there some way to include the link *above* any other data from the comment? That way, any open tags won't affect the link, because they will come after it.

Posted by: on October 14, 2003 08:04 AM

Is there a way to do this:
"Dot some random decoy s around the place, with meaningless actions, s that are visible to the robot (to make them look real) but hidden to the user, etc."

so that bad-behaving agents (i.e. spambots) will trigger some sort of IP-capturing script? (For example... someone who comes to my site asking for a formmail.pl that's not there would be redirected (via .htaccess) to an IP-capturing script, which would allow me to manually or automatically update my .htaccess file to block that IP... couldn't I do this with decoy comment FORM elements?)

Posted by: Mike on October 14, 2003 09:44 PM

Someone else trying to combat the same thing (posted for the sake of completeness!):

http://golem.ph.utexas.edu/~distler/blog/archives/000236.html

Posted by: Phil on October 15, 2003 04:34 PM

Okay, so I renamed my mt-comments.cgi ...

... then I created a new mt-comments.cgi that records some of the essential information I'll use to block these putzes ...

... if over time, I find the information collected reliable, I'll directly apply the IP to my .htaccess file.

In the meantime, my new "improved/fake" mt-comments.cgi redirects the offender to the URL they're advertising (after I record the essential data).

In this way, I'm hoping they essentially effect a DDoS against themselves ... at least burn up a bunch of bandwidth at their own hands.

Posted by: Mean Dean on October 16, 2003 05:06 AM

I saw in the referer logs that a spammer is using these keywords in their google search:

blog oct 2003 Name URL Comments -spam

Follow the links and you'll see. So, in addition, everyone might want to change the text on their comment forms to something unique.

Posted by: Louis on October 17, 2003 12:03 PM

I notice in my common log that so far the spammers that has come to sites hosted by me have all had a empty user agent and referer. Easy for us to block on those but certainly not a good long term solution since it would be trivial for spammers to add user agents that look legit.

Posted by: R.I.Pienaar on October 18, 2003 10:08 AM

One way we can beat the blog spammers is if we all work together. I have created the Blog Spam Database as a place to share information about known spam sources. The database can output files in a text format that can be used with the MT-Blacklist plugun. The Blog Spam Database can be found at http://www.markcarey.com/spamdb/

Posted by: Mark Carey on October 22, 2003 08:31 PM

Eat at Joe's. The best steaks around!

(hehe)

Posted by: Joe on October 24, 2003 05:33 PM


Thanks for the link-up ... and an FHY that I've:

1 - Packaged and published the code behind my email obfuscator into
a easy to use, improve and integrate perl module
2 - Added an option to render the email hyperlink as inline javascript

Gory details and general silliness at:
http://www.healyourchurchwebsite.com/archives/001055.shtml

Posted by: Mean Dean on October 29, 2003 04:20 PM

Great comments guys. Peter FDA

Posted by: Peter on November 7, 2003 12:51 AM

Spam is the reason that keeps me from installing a blog system on my website. Many of my visitors keep asking me to offer the possibility to blog, so i'm thinking about some ways to block spam.
Here they are:

1.Create a timer that lets the user post only after one minute ( users must read the article before posting right?).

2.Create a stop word database (or text file) with words like viagra, casino, porno etc. If the post cotains such words should be blocked or reviewed by the admin before been pubbliced.

3.A "this is spam" link could give users the possibility to inform the administrator if a comment contains spam.

4.Implement no index tags could be a nice idea :
http://www.google.com/bot.html#noindextags

5.Require login (only registered users can post) should block the spam bots.

6.Use an image with a random number and ask users to enter the characters they see in this image is onother way to block spam bots.

7. Check the referer and print it in the post.
referers such as :
http://www.google.com/search?q=allinurl%3A+mt-comments.cgi should give you an idea why the user is posting

thats my 2 cents
excuse my english :)
Marco from Pisa,Italy

Posted by: Marco on November 12, 2003 02:30 PM

you really helpd me out...thanks yoz

Posted by: unknown cute girl on November 12, 2003 10:45 PM

Good ideas here, thanks folks. Getting tired of spam recently too.

Posted by: Zpager on November 17, 2003 01:02 PM

What's a pretty blog!!!

Posted by: Barom on November 17, 2003 02:25 PM

yeah!!! Good Idea!!!

Posted by: Pok on November 17, 2003 03:21 PM

yeah!!! Good Idea!!!

Posted by: Pok on November 17, 2003 04:46 PM

yeah!!! Good Idea!!!

Posted by: Pok on November 17, 2003 05:18 PM

I do not think so.

Posted by: Gans on November 18, 2003 06:49 PM

hey .. nice people where can i find a Good Blog for free ?? in orther to be used in my cool web page ? thanks !!! bye

Posted by: peter boss on November 23, 2003 04:32 PM

Awesome tips I haven't seen elsewhere. Thanks!

Posted by: homer jay on November 25, 2003 07:38 PM

Cool blog!

Posted by: Mister X on December 19, 2003 02:54 PM

Great ideas, great tips, a lot of hacking of course...

Personally I prefer MT-Blacklist for my blog which gives me an all-in-one solution against all the stuff people try to advertise on my blog for...

merry xmas, christoph

Posted by: Christoph C. Cemper on December 26, 2003 12:40 AM

Great idea, I mean spamming blogs doesn't drive the right traffic, why do it? I'm a webmaster by myself, but spamming blogs with mass-blog-spammer-software (yes, these evil pieces of software are outthere) is the worst thing after spam - I'll never do it & mostly the people doing it have "bought" the software from scammers...

Anyway, they will not last with the blacklists outthere I hope - I wish you all a nice start into the year 2004, god bless you & your families, Dave from Germany.

Posted by: Dave on December 29, 2003 10:46 PM

Chuc mung nam moi!

Posted by: chungdung on December 31, 2003 03:33 PM

I do not think so.

Posted by: Jack on January 3, 2004 09:40 PM

Great tips. www.junkeater.com is offering an additional service that is based on picture recognition and can be integrated into existing weblogs fairly easily. Might be worth mentioning.

Posted by: Junkeater on January 4, 2004 02:18 PM

Hi. You guys make all great points here. The spam is not going to stop. You just have to create smarter scripts. :) There are a few small steps you can do to make your scripts smarter from the average bot. You can add a referer check to to your script. If the script is accessed anywhere but your site's URL, then you exit. :)

Posted by: Hot Topic on January 6, 2004 06:25 AM

I spent really a lot of time developing similar thing but now I see it was time lost. My anti spam scripts don't work as expected and this is becoming pain in my ...

Posted by: Markie on January 6, 2004 04:29 PM

wonderful site, as some sort of a blog developer, i found it absolutly perfect as this was exactly what i was looking for!

Posted by: skinny arms on January 24, 2004 01:08 AM

Is there any new blog programs out there that can combat spam as I am up against spam all the time. I have a website that gives out a free guest book to websites and I go to check the websites of people who sign up for our guest books and the guest books are just full of spam, which is where this type of spam originated. To me it isn't so much the spam, it is the way it is spammed. If someone visits your website then leaves a message and tells about themselves with reference to their website is ok by me, but if they just type ad words that have no coherence to them, that bothers me, no class. If there was a code I could use to prevent spam in my guest books, I would surely like to know about it. I will check back and see what is said here later on. I find this very interesting.
Take care and bless you
Steve

Posted by: Steve on January 24, 2004 03:35 AM

When I use the term 'repugnant' I do so in my own opinion: I do not use non-free
software on machines I control. This licence is non-free, and masquerading it as free
is offensive. I have contributed lots to the Free Software community myself, and I
would be completely outraged if any of my contributions were being shipped in a
non-free product. Contributions are contributions to public software, not private
profits.

Posted by: gigel on September 8, 2004 10:00 PM

can we protect from sprider thanks

Posted by: Tom Gargne on September 29, 2004 05:16 PM

is there a open source to build this blog

Posted by: Tom Gargne on September 29, 2004 05:19 PM

Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?