Seven quick tips for a spam-free blog
Posted: September 9th, 2003 | 53 Comments »
Harry Beck’s map – a design classic
As September 11th approaches, endocrinologist The Guardian is doing a special series of reports commemorating the tragedy that brought the end of thousands of lives and hundreds of thousands of freedoms – namely, more about the bloody, US-backed coup that brought General Augusto Pinochet to power in Chile thirty years ago. There’s lots of fascinating and horrifying material here, but of most obvious interest to a sad geek like myself is the astonishing story of a Surrey engineer’s project to bring ubiquitous radio-networked democratic mechanisms to Chile, as told in this superb piece by Andy Beckett:
What this collaboration produced was startling: a new communications system reaching the whole spindly length of Chile, from the deserts of the north to the icy grasslands of the south, carrying daily information about the output of individual factories, about the flow of important raw materials, about rates of absenteeism and other economic problems.
The ambition of the scheme is incredible: Firstly in technical terms – even today, when we have technology several orders of magnitude more powerful and more prevalent, such a scheme would still be considered little more than a pipe dream – but even more so in the core idea of pioneering technology to provide the feedback mechanisms that would enable a socialist economy to operate more efficiently than its capitalist equivalent.
The project was operational for a couple of years before the coup, but never fully completed. The story is little-known outside of Chile, but is another testament to the far-sightedness of President Allende and his dedication to his society, and another poignant reminder of all that was lost in this stupid, evil tragedy. Go read.
As September 11th approaches, endocrinologist The Guardian is doing a special series of reports commemorating the tragedy that brought the end of thousands of lives and hundreds of thousands of freedoms – namely, more about the bloody, US-backed coup that brought General Augusto Pinochet to power in Chile thirty years ago. There’s lots of fascinating and horrifying material here, but of most obvious interest to a sad geek like myself is the astonishing story of a Surrey engineer’s project to bring ubiquitous radio-networked democratic mechanisms to Chile, as told in this superb piece by Andy Beckett:
What this collaboration produced was startling: a new communications system reaching the whole spindly length of Chile, from the deserts of the north to the icy grasslands of the south, carrying daily information about the output of individual factories, about the flow of important raw materials, about rates of absenteeism and other economic problems.
The ambition of the scheme is incredible: Firstly in technical terms – even today, when we have technology several orders of magnitude more powerful and more prevalent, such a scheme would still be considered little more than a pipe dream – but even more so in the core idea of pioneering technology to provide the feedback mechanisms that would enable a socialist economy to operate more efficiently than its capitalist equivalent.
The project was operational for a couple of years before the coup, but never fully completed. The story is little-known outside of Chile, but is another testament to the far-sightedness of President Allende and his dedication to his society, and another poignant reminder of all that was lost in this stupid, evil tragedy. Go read.
Some more updates to the moblog script, global burden of disease
including image rotation (thanks, Chris) and a new version from Ben Milleare that picks mail up from a remote account via POP3.
Blog comment spam, erectile while certainly not a pest on the scale of its email equivalent, has still made enough of a presence felt for it to be considered a threat. Nobody wants to spend two hours a day cleaning out penis-enlargement ads from their blogs, so the blogosphere’s brightest stars have been mulling ways to hit it on the head before it becomes a major problem. Various blacklists have been proposed, along with CAPTCHAs, comment throttling, authentication checks and various others.
Thing is, I think spam prevention is simpler than that. Much simpler. Read on for a few quick solutions which MovableType users can implement right now to stave off all but the most persistent spammers.
The reason I consider this problem trivial is that we already have a small but significant advantage over the blog spammer, which is that blog spamming is an order of magnitude harder than email spamming. If a spammer wants to get an ad in your inbox, all he needs is your email address (and, admittedly, an open relay through which to spam). If a spammer wants to get an ad on your blog, there’s rather more work involved: he needs to visit your blog and scrape pages to work out where your comment script lives, then submit a POST to it, all the while assuming that this comment script works the way he expects. Of course, when I say “he”, I mean his blog-spidering robot, since it’d be pretty painful to do this kind of thing manually (but many seem to, as I’ll discuss later). In making such robots, spammers work on the assumption that the majority of Moveable Type blogs will still have the vital components intact from the base install, and referenced in all the standard places in the MT templates. It’s this assumption that we can use to trip them up.
Before we get going, it may be worth checking out Mark Pilgrim’s overview, comprehensive as ever. Actually, since I promised quickness, I’ll summarise the summary: As well as the review of existing anti-spam ideas, Mark discusses “Club” solutions versus “Lojack” solutions (both of which are car-theft prevention devices). A “Club” solution is one which blocks perpetration of the crime (with a big lock on the steering wheel). These are effective because though defeating them is far from impossible, it’s just not worth it, given all the unprotected cars around that make for quicker thefts. A “Lojack” solution, on the other hand, is invisible to the thief until after the theft, when it alerts the police to the car’s location. Lojacks not only make recovery quick and easy, but deter criminals because they have no idea which cars are equipped with them. Of the solutions presented here, most are Clubs, with a couple of much-wanted Lojacks at the end.
Notes And Warnings: Firstly, be careful with these tweaks: As well as the usual advice to backup everything, be aware that several of these fixes to the core of MT require changes to templates as well, and that means changes to all the blogs on the current installation. Also, there are quite a few different tips here. You don’t need to use all of them, especially since some make others redundant. Start with the easiest ones, and if they don’t keep the spammers away, keep adding (I’ll list some good combinations at the end of this piece). Also note that I’m not saying that these tips will fend off all robots, but it’ll certainly make your site less susceptible.
Tip 1: Rename your comment script
The reasoning: The quickest and easiest way to discover the URL of the comment script is just to search the page for mt-comments.cgi, which is its default name.
The fix: As well as giving the mt-comments.cgi file a new name (something relatively random, though make sure you keep the .cgi suffix) you’ll need to edit the CommentScript setting in your mt.cfg file. The best way to do this so as to ensure uninterrupted service is to copy (not rename) the script to the new filename, then edit the config, then delete the old script.
Unfortunately, it won’t be quite that easy for most MT users, as they’re still using templates based on older versions that didn’t use the <$MTCommentScript$> tag to get the location of the script, and have mt-comments.cgi hard-coded instead. It’s worth the time to go through your templates and do some search-and-replaces for this.
Tip 2: Don’t link to the comment script on your front page
The reasoning: By default, the MT front page template links to the comment script with the text Comments (n), which is also pretty easy to scan for.
The fix: You can change the link text, but I prefer a different tack – instead of linking to the comment script, link to the comments section of the individual entry, like so:
<a href="<$MTEntryPermalink$>#comments">Comments (<$MTEntryCommentCount$>)</a>
Not only does it get the comments script URL off the front page, it’s also a much nicer way of getting to entry comments that doesn’t involve pop-up windows.
Tip 3: Include several decoy forms in the Individual Entry template
The reasoning: Now that we’ve removed all the easy links to the comment script from the site’s HTML, a spam robot will have to scan for something that looks like the comment posting form and pull the script URL out of the form’s action attribute.
The fix: Dot some random decoy <form>s around the place, with meaningless actions, <input>s that are visible to the robot (to make them look real) but hidden to the user, etc.
Tip 4: Require a hidden variable for the comment script
The reasoning: The second line of defence: the URL of the script is known by the robot, so we have to use protection in the script itself. Fortunately, most robots will have a fixed idea of how the MT comment script works.
The fix: Shelley’s already done the work on this one.
Tip 5: Separate “Preview” and “Post” into two separate scripts
The reasoning: Even if robots are able to parse the page well enough to detect the hidden variables they need, they’ll still be assuming that posting to the script creates a posted comment, and detecting otherwise is much harder.
The fix: Phil got halfway there by removing the “Post” button from his Individual Entry template. Trouble is, that only forces humans to preview, and not bots. To achieve this, we need to dive into the code.
Firstly, copy your comment script to another file, named whatever you like, but ensuring it has the suffix .cgi and execution permissions. This new file’s going to be the comment-posting script from now on. Then edit the original comment script to include the bolded line (I’ve placed it between the lines you don’t need to edit):
local $SIG{__WARN__} = sub { $app->trace($_[0]) };
$app->add_methods( post => &MT::App::Comments::preview );
$app->run;
(What this fix does is replace the script’s ability to post with its existing ability to preview.)
Next, edit your Comment Preview Template, and replace the reference to the existing comment script (or the <$MTCommentScript$> tag) with the name of the new posting script. Oh, and you’ll also want to tweak your Individual Entry archive page to take the “Post” button out, along with some warning that all posts go through a preview process.
Tip 6: Include a “Delete this post” link in notification mail
The reasoning: All the previous tips have been “Club” solutions – this one’s a “Lojack”, and particularly useful when most of your comment spam seems to come from people doing manual entry rather than robots. One of the reasons that comment spam is such a pain is that it takes several clicks through MT’s interface to get rid of a single post. If you could kill a spam easily as soon as it appeared, their effectiveness would be reduced dramatically, with the hopeful aim of deterring spammers entirely.
The fix: This is another Perl insert, this time into the file lib/MT/App/Comments.pm (about line 150):
$Text::Wrap::cols = 72;
$body = Text::Wrap::wrap('', '', $body) . "
$link_url" .
$app->translate('IP Address:') . ' ' . $comment->ip . "
" .
$app->translate('Name:') . ' ' . $comment->author . "
" .
$app->translate('Email Address:') . ' ' . $comment->email . "
" .
$app->translate('URL:') . ' ' . $comment->url . "" .
$app->translate('Comments:') . "" . $comment->text . "
";
$body .= "
To delete this comment, click this link:
".
$app->{cfg}->CGIPath . "mt.cgi?__mode=delete_confirm&" .
"_type=comment&id=".$comment->id ."&blog_id=" . $blog->id ."
";
MT::Mail->send(\%head, $body);
It inserts a link into the mail that, when followed, jumps straight to the “Delete comment? [Yes/No]” page in MT (though, irritatingly, this page will close the browser window after you’ve hit the button, so you’ll want to ensure the page appears in a spare/new window when you click on the link)
Tip 7: What to do if you’re Six Apart (or another blog-tool producer)
- Include tip 6 in the next release, please, because I’m not the only one who wants it.
- Separate comment-posting and comment-preview into two separate scripts, as in tip 5.
- More generally, include more configuration and randomisation into key parts of the install process, so that every install is not identical and cannot be gamed that way. (More on this later.)
- Improve the comment-deletion interfaces so that bulk deletion’s easier (e.g. delete all comments matching a particular IP or body regexp). Also, IP banning on its own isn’t nearly enough – we’d like banning by regexp, please. (All the manual spams I’ve had so far have been for zipcode sites or DVD rental)
- Oh, and Ben ‘n’ Mena keychain dolls. The blog world will go nuts for ’em. Trust me on this.
How to use the tips
As I said before, you don’t need all of them. Personally, I’m only using tips 2 and 6, for two reasons: Firstly, the majority of the spam that I get is manually-entered (though I know that plenty of other blogs get hassled by robots) and there isn’t that much of it, so I’ll save the other tips for later. Secondly, my MT install is home to many blogs, and I haven’t the time yet to write the script that goes through and fixes all the templates (though when I do, I promise to put it here). When I have the chance, I’ll probably implement 1, 4 and 5 as well.
Why I’ve kept things simple
As I said earlier, I think the talk of complex systems like centralised MD5 hash databases is overkill – just because it’s necessary for email spam, it doesn’t mean comment spam needs such heavy artillery. If we use simple tricks to keep blog configurations sufficiently varied, we’ll show potential spammers that trying to hijack our blogs just isn’t worth their effort, because the amount of work it’ll take to win this war is considerably smaller on our side of the fence.
Variance is the key here. The current spate of Windows viruses are effective because most Windows installations work the same way – the virus rarely has to investigate possible differences that would break it. If we consider a spamming robot’s base assumptions about our blogs, it only takes a small amount of work to defeat them. Sure, it’s still possible to write a robot that would defeat all of the tips above (well, the first five, anyway), but the complexity of the code would be fairly nasty, and would still only take another few small changes to break.
At present, though, there are enough vanilla MT installs out there that blog spam robots will still work most of the time. To defeat them at the roots, the installation routine has to introduce enough variance that building a spam robot is a non-trivial task, and also provide maintenance systems to regularly churn that variance.
Great stuff Yoz, i’ll be implementing some of these tips pretty soon I think, even though I only get a little bit of spam (already using Tip 2).
For anyone wanting to use Tip 1 on an older installation of MT with hardcoded mt-comments.cgi, you can use this to find and replace that string on all cgi scripts in the current directory (it will create backups of any files it changes):
perl -p -i.bak -e s/mt-comments\.cgi/new-filename\.cgi/g *.cgi
Actually, I have nothing of value to offer; just thought I’d submit another comment to boost your spirits in light of your paltry 300 count.
I suppose by keychain you mean some type of central “remember me” database. I’ve been thinking about that a lot because I am so sick of punching in my info on a new blog I visit. I want to be able to go to a form and click “Get my info.” without typing a thing.
Thanks for all the great tips, I’m going to implement them on my blog!
No, by “keychain” I mean “a little thing you carry around that groups your keys together”. My girlfriend has a cute Ozzy Osbourne doll on hers.
(But you have a good idea there, nonetheless.)
Talking of Spam, I had a really intresting ride to work today with the conversation turning to the future of Telemarketing and Spam.
One of my co-travellers was relating how his Universities student union had given his information to telemarketers to canvass donations. He then recalled that he had never join the Student Union! He then came to the conclusion his university had sold certain information on him to “Intrested” 3rd Parties.
The conversation then changed path’s to his present situation. He is studing for a doctorate, and he was approched on Campus for donating Blood to extract DNA for a scientific paper. The driver of the Car mentioned that he had also been apporached about the same donation. He then mentioned that they were collecting all sorts of information in the paperwork, like ID number etc.
He then asked what they would do with the information. He was told they would store it as is required by Doctoral Thesis’ for around 10 years.
The conversation then placed the 2 pieces of information together.
Imagine this in 10 years from now you could get a piece of tailored Spam like this.
“WE know that you are 1 of 500000 people in the world that have degenerative disease X. Our medication …”
OR companies targeting you, knowing that you have a genetic sweet tooth and cannot resist fairy floss.
d boy
I want a CAPTCHA, as it’s got to be the best anti-robot tool ever and everyone is getting used to it these days.
For tip 6, I’d suggest the code posted by mentalized.net. Because it provides an edit link, it offers more flexibility than a straight delete link would. =)
Oops. Guess you have HTML turned off:
http://www.mentalized.net/journal/archives/2003/09/09/movable_type_easier_editremoval_of_new_comments/index.asp
Enlarge your penis here >> http://www.colfelt.com/blog
😉
Hey yoz! Thanks for the tips.
Lets hope this works. Less than four days after starting my blog, I got a penis spam on it. I deal with enough spam in my email. Anyway, I did your option 1 and 2. I renamed my comment script to custom-comments.cgi, and replaced mt-comments.cgi with one that just returns the text “FUCK OFF, SPAMMER”. I put the a hard coded reference to mt-comments.cgi in a comment, so that the spammer’s script thinks he’s dealing with the normal MT comment script.
You’ve posted some pretty nice solutions. Here is one more you might find interesting…
And once again WITHOUT HTML: http://www.jayallen.org/journey/2003/09/killing_comment_spam_dead
There are some similar tips here: http://kalsey.com/2003/09/ounce_of_prevention/
This was so helpful, and worked beautifully, but today, my spammer came back, and in the e-mails I received from him, there was no delete link – how strange!
Details here http://www.movabletype.org/support/index.php?s=5de6001b64d56312142d5e9c00291008&act=ST&f=14&t=28575&st=0#entry129810
Could it be the inclusion of (open) HTML tags in his comment? Just a thought. Thanks for an elegant solution.
All great ideas, Yoz –
I just wrote a little script that just makes a redirect page for the user’s link. It doesn’t try to stop spammers, but it takes away some of the incentive of doing it in the first place, which I’m assuming is to get inbound links for Google or Daypop rankings.
http://www.wirefarm.com/archives/001789.html
(Look at the CommentAuthor links – they now go to a redir.pl – The redir.pl just presents the link for the user to click.)
Well, I tried tip #1, and even after rebuilding and checking my templates to ensure that they refer to $MTCommentScript$ it didn’t find the newly named comment.cgi app, and tip #6 works, but when I actually delete on the comment instead of closing the window, I get:
MT::App::CMS=HASH(0x83651a4) Use of uninitialized value in concatenation (.) or string at /usr/local/etc/httpd/htdocs/intuitive/blog/lib/MT/App/CMS.pm line 1263.
Any suggestions? I’m running MT 2.63
Dave – I get that too sometimes, not sure what it means. However, despite it looking broken, it *has* managed to delete the link. I’ll chase it down when I have a spare moment, something I’m hideously short of at present!
Another idea is to waste the spammers time (not sure how sophisticated they are with checking for how long a script runs) but if you replace your mt-comments.cgi script with something like this:
print “Content-Type: text/html\n\n”;
my @message = split(//,”FUCK OFF SPAMMER”);
foreach( 1..100 ) {
foreach(@message) {
print “$_”;
sleep 5;
}
}
The script will go through printing out the message, one character at a time, with a 5 second pause in between each character, 100 times. Feel free to adjust the message and timing to meet your desires. End result, whatever they are using will just sit there, retrieving data for a very long time.
Tip 6 is nice and it marks the comment for deletion but nothing gets deleted until you rebuild the pages or am I missing something?
I don’t see anything about CAPTCHAs in the link to CAPTCHAs. Anyway, I did implement them for my blog here: http://www.toyz.org/mrblog/archives/00000078.html
And released the code, available at the wiki page here:
http://www.toyz.org/cgi-bin/wiki.cgi?GreymatterCommentHack
This hack is for Greymatter, so I’m still looking for someone to port it to MT.
I have a working captcha for MT… ITS BEEN DONE!
Read all about it!
I really don’t care how much of a pain it is on the accessibility front, the spammers have driven me to finding a working solution. The don’t allow comments from google searches hack also makes first time comments an accessibility problem so this is more workable if you ask me.
I used Arcterex’s idea as well, but in a streak of sadism, I had it kick out the scripture that Samuel L. Jackson quotes in Pulp Fiction.
Also regarding people saying that spammers leaving open HTML tags removes the delete link:
My Perl skills are weak, to say the least, so I don’t know *how* to do this, but isn’t there some way to include the link *above* any other data from the comment? That way, any open tags won’t affect the link, because they will come after it.
Is there a way to do this:
“Dot some random decoy s around the place, with meaningless actions, s that are visible to the robot (to make them look real) but hidden to the user, etc.”
so that bad-behaving agents (i.e. spambots) will trigger some sort of IP-capturing script? (For example… someone who comes to my site asking for a formmail.pl that’s not there would be redirected (via .htaccess) to an IP-capturing script, which would allow me to manually or automatically update my .htaccess file to block that IP… couldn’t I do this with decoy comment FORM elements?)
Someone else trying to combat the same thing (posted for the sake of completeness!):
http://golem.ph.utexas.edu/~distler/blog/archives/000236.html
Okay, so I renamed my mt-comments.cgi …
… then I created a new mt-comments.cgi that records some of the essential information I’ll use to block these putzes …
… if over time, I find the information collected reliable, I’ll directly apply the IP to my .htaccess file.
In the meantime, my new “improved/fake” mt-comments.cgi redirects the offender to the URL they’re advertising (after I record the essential data).
In this way, I’m hoping they essentially effect a DDoS against themselves … at least burn up a bunch of bandwidth at their own hands.
I saw in the referer logs that a spammer is using these keywords in their google search:
blog oct 2003 Name URL Comments -spam
Follow the links and you’ll see. So, in addition, everyone might want to change the text on their comment forms to something unique.
I notice in my common log that so far the spammers that has come to sites hosted by me have all had a empty user agent and referer. Easy for us to block on those but certainly not a good long term solution since it would be trivial for spammers to add user agents that look legit.
One way we can beat the blog spammers is if we all work together. I have created the Blog Spam Database as a place to share information about known spam sources. The database can output files in a text format that can be used with the MT-Blacklist plugun. The Blog Spam Database can be found at http://www.markcarey.com/spamdb/
Eat at Joe’s. The best steaks around!
(hehe)
Thanks for the link-up … and an FHY that I’ve:
1 – Packaged and published the code behind my email obfuscator into
a easy to use, improve and integrate perl module
2 – Added an option to render the email hyperlink as inline javascript
Gory details and general silliness at:
http://www.healyourchurchwebsite.com/archives/001055.shtml
Great comments guys. Peter FDA
Spam is the reason that keeps me from installing a blog system on my website. Many of my visitors keep asking me to offer the possibility to blog, so i’m thinking about some ways to block spam.
Here they are:
1.Create a timer that lets the user post only after one minute ( users must read the article before posting right?).
2.Create a stop word database (or text file) with words like viagra, casino, porno etc. If the post cotains such words should be blocked or reviewed by the admin before been pubbliced.
3.A “this is spam” link could give users the possibility to inform the administrator if a comment contains spam.
4.Implement no index tags could be a nice idea :
http://www.google.com/bot.html#noindextags
5.Require login (only registered users can post) should block the spam bots.
6.Use an image with a random number and ask users to enter the characters they see in this image is onother way to block spam bots.
7. Check the referer and print it in the post.
referers such as :
http://www.google.com/search?q=allinurl%3A+mt-comments.cgi should give you an idea why the user is posting
thats my 2 cents
excuse my english 🙂
Marco from Pisa,Italy
you really helpd me out…thanks yoz
Good ideas here, thanks folks. Getting tired of spam recently too.
What’s a pretty blog!!!
yeah!!! Good Idea!!!
yeah!!! Good Idea!!!
yeah!!! Good Idea!!!
I do not think so.
hey .. nice people where can i find a Good Blog for free ?? in orther to be used in my cool web page ? thanks !!! bye
Awesome tips I haven’t seen elsewhere. Thanks!
Cool blog!
Great ideas, great tips, a lot of hacking of course…
Personally I prefer MT-Blacklist for my blog which gives me an all-in-one solution against all the stuff people try to advertise on my blog for…
merry xmas, christoph
Great idea, I mean spamming blogs doesn’t drive the right traffic, why do it? I’m a webmaster by myself, but spamming blogs with mass-blog-spammer-software (yes, these evil pieces of software are outthere) is the worst thing after spam – I’ll never do it & mostly the people doing it have “bought” the software from scammers…
Anyway, they will not last with the blacklists outthere I hope – I wish you all a nice start into the year 2004, god bless you & your families, Dave from Germany.
Chuc mung nam moi!
I do not think so.
Great tips. http://www.junkeater.com is offering an additional service that is based on picture recognition and can be integrated into existing weblogs fairly easily. Might be worth mentioning.
Hi. You guys make all great points here. The spam is not going to stop. You just have to create smarter scripts. 🙂 There are a few small steps you can do to make your scripts smarter from the average bot. You can add a referer check to to your script. If the script is accessed anywhere but your site’s URL, then you exit. 🙂
I spent really a lot of time developing similar thing but now I see it was time lost. My anti spam scripts don’t work as expected and this is becoming pain in my …
wonderful site, as some sort of a blog developer, i found it absolutly perfect as this was exactly what i was looking for!
Is there any new blog programs out there that can combat spam as I am up against spam all the time. I have a website that gives out a free guest book to websites and I go to check the websites of people who sign up for our guest books and the guest books are just full of spam, which is where this type of spam originated. To me it isn’t so much the spam, it is the way it is spammed. If someone visits your website then leaves a message and tells about themselves with reference to their website is ok by me, but if they just type ad words that have no coherence to them, that bothers me, no class. If there was a code I could use to prevent spam in my guest books, I would surely like to know about it. I will check back and see what is said here later on. I find this very interesting.
Take care and bless you
Steve
When I use the term ‘repugnant’ I do so in my own opinion: I do not use non-free
software on machines I control. This licence is non-free, and masquerading it as free
is offensive. I have contributed lots to the Free Software community myself, and I
would be completely outraged if any of my contributions were being shipped in a
non-free product. Contributions are contributions to public software, not private
profits.