Possibly the coolest page on the Internet, #34
Posted: September 11th, 2003 Comments Off on Possibly the coolest page on the Internet, #34
Harry Beck’s map – a design classic
As September 11th approaches, endocrinologist The Guardian is doing a special series of reports commemorating the tragedy that brought the end of thousands of lives and hundreds of thousands of freedoms – namely, more about the bloody, US-backed coup that brought General Augusto Pinochet to power in Chile thirty years ago. There’s lots of fascinating and horrifying material here, but of most obvious interest to a sad geek like myself is the astonishing story of a Surrey engineer’s project to bring ubiquitous radio-networked democratic mechanisms to Chile, as told in this superb piece by Andy Beckett:
What this collaboration produced was startling: a new communications system reaching the whole spindly length of Chile, from the deserts of the north to the icy grasslands of the south, carrying daily information about the output of individual factories, about the flow of important raw materials, about rates of absenteeism and other economic problems.
The ambition of the scheme is incredible: Firstly in technical terms – even today, when we have technology several orders of magnitude more powerful and more prevalent, such a scheme would still be considered little more than a pipe dream – but even more so in the core idea of pioneering technology to provide the feedback mechanisms that would enable a socialist economy to operate more efficiently than its capitalist equivalent.
The project was operational for a couple of years before the coup, but never fully completed. The story is little-known outside of Chile, but is another testament to the far-sightedness of President Allende and his dedication to his society, and another poignant reminder of all that was lost in this stupid, evil tragedy. Go read.
As September 11th approaches, endocrinologist The Guardian is doing a special series of reports commemorating the tragedy that brought the end of thousands of lives and hundreds of thousands of freedoms – namely, more about the bloody, US-backed coup that brought General Augusto Pinochet to power in Chile thirty years ago. There’s lots of fascinating and horrifying material here, but of most obvious interest to a sad geek like myself is the astonishing story of a Surrey engineer’s project to bring ubiquitous radio-networked democratic mechanisms to Chile, as told in this superb piece by Andy Beckett:
What this collaboration produced was startling: a new communications system reaching the whole spindly length of Chile, from the deserts of the north to the icy grasslands of the south, carrying daily information about the output of individual factories, about the flow of important raw materials, about rates of absenteeism and other economic problems.
The ambition of the scheme is incredible: Firstly in technical terms – even today, when we have technology several orders of magnitude more powerful and more prevalent, such a scheme would still be considered little more than a pipe dream – but even more so in the core idea of pioneering technology to provide the feedback mechanisms that would enable a socialist economy to operate more efficiently than its capitalist equivalent.
The project was operational for a couple of years before the coup, but never fully completed. The story is little-known outside of Chile, but is another testament to the far-sightedness of President Allende and his dedication to his society, and another poignant reminder of all that was lost in this stupid, evil tragedy. Go read.
Some more updates to the moblog script, global burden of disease
including image rotation (thanks, Chris) and a new version from Ben Milleare that picks mail up from a remote account via POP3.
Blog comment spam, erectile while certainly not a pest on the scale of its email equivalent, has still made enough of a presence felt for it to be considered a threat. Nobody wants to spend two hours a day cleaning out penis-enlargement ads from their blogs, so the blogosphere’s brightest stars have been mulling ways to hit it on the head before it becomes a major problem. Various blacklists have been proposed, along with CAPTCHAs, comment throttling, authentication checks and various others.
Thing is, I think spam prevention is simpler than that. Much simpler. Read on for a few quick solutions which MovableType users can implement right now to stave off all but the most persistent spammers.
The reason I consider this problem trivial is that we already have a small but significant advantage over the blog spammer, which is that blog spamming is an order of magnitude harder than email spamming. If a spammer wants to get an ad in your inbox, all he needs is your email address (and, admittedly, an open relay through which to spam). If a spammer wants to get an ad on your blog, there’s rather more work involved: he needs to visit your blog and scrape pages to work out where your comment script lives, then submit a POST to it, all the while assuming that this comment script works the way he expects. Of course, when I say “he”, I mean his blog-spidering robot, since it’d be pretty painful to do this kind of thing manually (but many seem to, as I’ll discuss later). In making such robots, spammers work on the assumption that the majority of Moveable Type blogs will still have the vital components intact from the base install, and referenced in all the standard places in the MT templates. It’s this assumption that we can use to trip them up.
Before we get going, it may be worth checking out Mark Pilgrim’s overview, comprehensive as ever. Actually, since I promised quickness, I’ll summarise the summary: As well as the review of existing anti-spam ideas, Mark discusses “Club” solutions versus “Lojack” solutions (both of which are car-theft prevention devices). A “Club” solution is one which blocks perpetration of the crime (with a big lock on the steering wheel). These are effective because though defeating them is far from impossible, it’s just not worth it, given all the unprotected cars around that make for quicker thefts. A “Lojack” solution, on the other hand, is invisible to the thief until after the theft, when it alerts the police to the car’s location. Lojacks not only make recovery quick and easy, but deter criminals because they have no idea which cars are equipped with them. Of the solutions presented here, most are Clubs, with a couple of much-wanted Lojacks at the end.
Notes And Warnings: Firstly, be careful with these tweaks: As well as the usual advice to backup everything, be aware that several of these fixes to the core of MT require changes to templates as well, and that means changes to all the blogs on the current installation. Also, there are quite a few different tips here. You don’t need to use all of them, especially since some make others redundant. Start with the easiest ones, and if they don’t keep the spammers away, keep adding (I’ll list some good combinations at the end of this piece). Also note that I’m not saying that these tips will fend off all robots, but it’ll certainly make your site less susceptible.
Tip 1: Rename your comment script
The reasoning: The quickest and easiest way to discover the URL of the comment script is just to search the page for mt-comments.cgi, which is its default name.
The fix: As well as giving the mt-comments.cgi file a new name (something relatively random, though make sure you keep the .cgi suffix) you’ll need to edit the CommentScript setting in your mt.cfg file. The best way to do this so as to ensure uninterrupted service is to copy (not rename) the script to the new filename, then edit the config, then delete the old script.
Unfortunately, it won’t be quite that easy for most MT users, as they’re still using templates based on older versions that didn’t use the <$MTCommentScript$> tag to get the location of the script, and have mt-comments.cgi hard-coded instead. It’s worth the time to go through your templates and do some search-and-replaces for this.
Tip 2: Don’t link to the comment script on your front page
The reasoning: By default, the MT front page template links to the comment script with the text Comments (n), which is also pretty easy to scan for.
The fix: You can change the link text, but I prefer a different tack – instead of linking to the comment script, link to the comments section of the individual entry, like so:
<a href="<$MTEntryPermalink$>#comments">Comments (<$MTEntryCommentCount$>)</a>
Not only does it get the comments script URL off the front page, it’s also a much nicer way of getting to entry comments that doesn’t involve pop-up windows.
Tip 3: Include several decoy forms in the Individual Entry template
The reasoning: Now that we’ve removed all the easy links to the comment script from the site’s HTML, a spam robot will have to scan for something that looks like the comment posting form and pull the script URL out of the form’s action attribute.
The fix: Dot some random decoy <form>s around the place, with meaningless actions, <input>s that are visible to the robot (to make them look real) but hidden to the user, etc.
Tip 4: Require a hidden variable for the comment script
The reasoning: The second line of defence: the URL of the script is known by the robot, so we have to use protection in the script itself. Fortunately, most robots will have a fixed idea of how the MT comment script works.
The fix: Shelley’s already done the work on this one.
Tip 5: Separate “Preview” and “Post” into two separate scripts
The reasoning: Even if robots are able to parse the page well enough to detect the hidden variables they need, they’ll still be assuming that posting to the script creates a posted comment, and detecting otherwise is much harder.
The fix: Phil got halfway there by removing the “Post” button from his Individual Entry template. Trouble is, that only forces humans to preview, and not bots. To achieve this, we need to dive into the code.
Firstly, copy your comment script to another file, named whatever you like, but ensuring it has the suffix .cgi and execution permissions. This new file’s going to be the comment-posting script from now on. Then edit the original comment script to include the bolded line (I’ve placed it between the lines you don’t need to edit):
local $SIG{__WARN__} = sub { $app->trace($_[0]) };
$app->add_methods( post => &MT::App::Comments::preview );
$app->run;
(What this fix does is replace the script’s ability to post with its existing ability to preview.)
Next, edit your Comment Preview Template, and replace the reference to the existing comment script (or the <$MTCommentScript$> tag) with the name of the new posting script. Oh, and you’ll also want to tweak your Individual Entry archive page to take the “Post” button out, along with some warning that all posts go through a preview process.
Tip 6: Include a “Delete this post” link in notification mail
The reasoning: All the previous tips have been “Club” solutions – this one’s a “Lojack”, and particularly useful when most of your comment spam seems to come from people doing manual entry rather than robots. One of the reasons that comment spam is such a pain is that it takes several clicks through MT’s interface to get rid of a single post. If you could kill a spam easily as soon as it appeared, their effectiveness would be reduced dramatically, with the hopeful aim of deterring spammers entirely.
The fix: This is another Perl insert, this time into the file lib/MT/App/Comments.pm (about line 150):
$Text::Wrap::cols = 72;
$body = Text::Wrap::wrap('', '', $body) . "
$link_url" .
$app->translate('IP Address:') . ' ' . $comment->ip . "
" .
$app->translate('Name:') . ' ' . $comment->author . "
" .
$app->translate('Email Address:') . ' ' . $comment->email . "
" .
$app->translate('URL:') . ' ' . $comment->url . "" .
$app->translate('Comments:') . "" . $comment->text . "
";
$body .= "
To delete this comment, click this link:
".
$app->{cfg}->CGIPath . "mt.cgi?__mode=delete_confirm&" .
"_type=comment&id=".$comment->id ."&blog_id=" . $blog->id ."
";
MT::Mail->send(\%head, $body);
It inserts a link into the mail that, when followed, jumps straight to the “Delete comment? [Yes/No]” page in MT (though, irritatingly, this page will close the browser window after you’ve hit the button, so you’ll want to ensure the page appears in a spare/new window when you click on the link)
Tip 7: What to do if you’re Six Apart (or another blog-tool producer)
- Include tip 6 in the next release, please, because I’m not the only one who wants it.
- Separate comment-posting and comment-preview into two separate scripts, as in tip 5.
- More generally, include more configuration and randomisation into key parts of the install process, so that every install is not identical and cannot be gamed that way. (More on this later.)
- Improve the comment-deletion interfaces so that bulk deletion’s easier (e.g. delete all comments matching a particular IP or body regexp). Also, IP banning on its own isn’t nearly enough – we’d like banning by regexp, please. (All the manual spams I’ve had so far have been for zipcode sites or DVD rental)
- Oh, and Ben ‘n’ Mena keychain dolls. The blog world will go nuts for ’em. Trust me on this.
How to use the tips
As I said before, you don’t need all of them. Personally, I’m only using tips 2 and 6, for two reasons: Firstly, the majority of the spam that I get is manually-entered (though I know that plenty of other blogs get hassled by robots) and there isn’t that much of it, so I’ll save the other tips for later. Secondly, my MT install is home to many blogs, and I haven’t the time yet to write the script that goes through and fixes all the templates (though when I do, I promise to put it here). When I have the chance, I’ll probably implement 1, 4 and 5 as well.
Why I’ve kept things simple
As I said earlier, I think the talk of complex systems like centralised MD5 hash databases is overkill – just because it’s necessary for email spam, it doesn’t mean comment spam needs such heavy artillery. If we use simple tricks to keep blog configurations sufficiently varied, we’ll show potential spammers that trying to hijack our blogs just isn’t worth their effort, because the amount of work it’ll take to win this war is considerably smaller on our side of the fence.
Variance is the key here. The current spate of Windows viruses are effective because most Windows installations work the same way – the virus rarely has to investigate possible differences that would break it. If we consider a spamming robot’s base assumptions about our blogs, it only takes a small amount of work to defeat them. Sure, it’s still possible to write a robot that would defeat all of the tips above (well, the first five, anyway), but the complexity of the code would be fairly nasty, and would still only take another few small changes to break.
At present, though, there are enough vanilla MT installs out there that blog spam robots will still work most of the time. To defeat them at the roots, the installation routine has to introduce enough variance that building a spam robot is a non-trivial task, and also provide maintenance systems to regularly churn that variance.
Blog comment spam, erectile while certainly not a pest on the scale of its email equivalent, has still made enough of a presence felt for it to be considered a threat. Nobody wants to spend two hours a day cleaning out penis-enlargement ads from their blogs, so the blogosphere’s brightest stars have been mulling ways to hit it on the head before it becomes a major problem. Various blacklists have been proposed, along with CAPTCHAs, comment throttling, authentication checks and various others.
Thing is, I think spam prevention is simpler than that. Much simpler. Read on for a few quick solutions which MovableType users can implement right now to stave off all but the most persistent spammers.
The reason I consider this problem trivial is that we already have a small but significant advantage over the blog spammer, which is that blog spamming is an order of magnitude harder than email spamming. If a spammer wants to get an ad in your inbox, all he needs is your email address (and, admittedly, an open relay through which to spam). If a spammer wants to get an ad on your blog, there’s rather more work involved: he needs to visit your blog and scrape pages to work out where your comment script lives, then submit a POST to it, all the while assuming that this comment script works the way he expects. Of course, when I say “he”, I mean his blog-spidering robot, since it’d be pretty painful to do this kind of thing manually (but many seem to, as I’ll discuss later). In making such robots, spammers work on the assumption that the majority of Moveable Type blogs will still have the vital components intact from the base install, and referenced in all the standard places in the MT templates. It’s this assumption that we can use to trip them up.
Before we get going, it may be worth checking out Mark Pilgrim’s overview, comprehensive as ever. Actually, since I promised quickness, I’ll summarise the summary: As well as the review of existing anti-spam ideas, Mark discusses “Club” solutions versus “Lojack” solutions (both of which are car-theft prevention devices). A “Club” solution is one which blocks perpetration of the crime (with a big lock on the steering wheel). These are effective because though defeating them is far from impossible, it’s just not worth it, given all the unprotected cars around that make for quicker thefts. A “Lojack” solution, on the other hand, is invisible to the thief until after the theft, when it alerts the police to the car’s location. Lojacks not only make recovery quick and easy, but deter criminals because they have no idea which cars are equipped with them. Of the solutions presented here, most are Clubs, with a couple of much-wanted Lojacks at the end.
Notes And Warnings: Firstly, be careful with these tweaks: As well as the usual advice to backup everything, be aware that several of these fixes to the core of MT require changes to templates as well, and that means changes to all the blogs on the current installation. Also, there are quite a few different tips here. You don’t need to use all of them, especially since some make others redundant. Start with the easiest ones, and if they don’t keep the spammers away, keep adding (I’ll list some good combinations at the end of this piece). Also note that I’m not saying that these tips will fend off all robots, but it’ll certainly make your site less susceptible.
Tip 1: Rename your comment script
The reasoning: The quickest and easiest way to discover the URL of the comment script is just to search the page for mt-comments.cgi, which is its default name.
The fix: As well as giving the mt-comments.cgi file a new name (something relatively random, though make sure you keep the .cgi suffix) you’ll need to edit the CommentScript setting in your mt.cfg file. The best way to do this so as to ensure uninterrupted service is to copy (not rename) the script to the new filename, then edit the config, then delete the old script.
Unfortunately, it won’t be quite that easy for most MT users, as they’re still using templates based on older versions that didn’t use the <$MTCommentScript$> tag to get the location of the script, and have mt-comments.cgi hard-coded instead. It’s worth the time to go through your templates and do some search-and-replaces for this.
Tip 2: Don’t link to the comment script on your front page
The reasoning: By default, the MT front page template links to the comment script with the text Comments (n), which is also pretty easy to scan for.
The fix: You can change the link text, but I prefer a different tack – instead of linking to the comment script, link to the comments section of the individual entry, like so:
<a href="<$MTEntryPermalink$>#comments">Comments (<$MTEntryCommentCount$>)</a>
Not only does it get the comments script URL off the front page, it’s also a much nicer way of getting to entry comments that doesn’t involve pop-up windows.
Tip 3: Include several decoy forms in the Individual Entry template
The reasoning: Now that we’ve removed all the easy links to the comment script from the site’s HTML, a spam robot will have to scan for something that looks like the comment posting form and pull the script URL out of the form’s action attribute.
The fix: Dot some random decoy <form>s around the place, with meaningless actions, <input>s that are visible to the robot (to make them look real) but hidden to the user, etc.
Tip 4: Require a hidden variable for the comment script
The reasoning: The second line of defence: the URL of the script is known by the robot, so we have to use protection in the script itself. Fortunately, most robots will have a fixed idea of how the MT comment script works.
The fix: Shelley’s already done the work on this one.
Tip 5: Separate “Preview” and “Post” into two separate scripts
The reasoning: Even if robots are able to parse the page well enough to detect the hidden variables they need, they’ll still be assuming that posting to the script creates a posted comment, and detecting otherwise is much harder.
The fix: Phil got halfway there by removing the “Post” button from his Individual Entry template. Trouble is, that only forces humans to preview, and not bots. To achieve this, we need to dive into the code.
Firstly, copy your comment script to another file, named whatever you like, but ensuring it has the suffix .cgi and execution permissions. This new file’s going to be the comment-posting script from now on. Then edit the original comment script to include the bolded line (I’ve placed it between the lines you don’t need to edit):
local $SIG{__WARN__} = sub { $app->trace($_[0]) };
$app->add_methods( post => &MT::App::Comments::preview );
$app->run;
(What this fix does is replace the script’s ability to post with its existing ability to preview.)
Next, edit your Comment Preview Template, and replace the reference to the existing comment script (or the <$MTCommentScript$> tag) with the name of the new posting script. Oh, and you’ll also want to tweak your Individual Entry archive page to take the “Post” button out, along with some warning that all posts go through a preview process.
Tip 6: Include a “Delete this post” link in notification mail
The reasoning: All the previous tips have been “Club” solutions – this one’s a “Lojack”, and particularly useful when most of your comment spam seems to come from people doing manual entry rather than robots. One of the reasons that comment spam is such a pain is that it takes several clicks through MT’s interface to get rid of a single post. If you could kill a spam easily as soon as it appeared, their effectiveness would be reduced dramatically, with the hopeful aim of deterring spammers entirely.
The fix: This is another Perl insert, this time into the file lib/MT/App/Comments.pm (about line 150):
$Text::Wrap::cols = 72;
$body = Text::Wrap::wrap('', '', $body) . "
$link_url" .
$app->translate('IP Address:') . ' ' . $comment->ip . "
" .
$app->translate('Name:') . ' ' . $comment->author . "
" .
$app->translate('Email Address:') . ' ' . $comment->email . "
" .
$app->translate('URL:') . ' ' . $comment->url . "" .
$app->translate('Comments:') . "" . $comment->text . "
";
$body .= "
To delete this comment, click this link:
".
$app->{cfg}->CGIPath . "mt.cgi?__mode=delete_confirm&" .
"_type=comment&id=".$comment->id ."&blog_id=" . $blog->id ."
";
MT::Mail->send(\%head, $body);
It inserts a link into the mail that, when followed, jumps straight to the “Delete comment? [Yes/No]” page in MT (though, irritatingly, this page will close the browser window after you’ve hit the button, so you’ll want to ensure the page appears in a spare/new window when you click on the link)
Tip 7: What to do if you’re Six Apart (or another blog-tool producer)
- Include tip 6 in the next release, please, because I’m not the only one who wants it.
- Separate comment-posting and comment-preview into two separate scripts, as in tip 5.
- More generally, include more configuration and randomisation into key parts of the install process, so that every install is not identical and cannot be gamed that way. (More on this later.)
- Improve the comment-deletion interfaces so that bulk deletion’s easier (e.g. delete all comments matching a particular IP or body regexp). Also, IP banning on its own isn’t nearly enough – we’d like banning by regexp, please. (All the manual spams I’ve had so far have been for zipcode sites or DVD rental)
- Oh, and Ben ‘n’ Mena keychain dolls. The blog world will go nuts for ’em. Trust me on this.
How to use the tips
As I said before, you don’t need all of them. Personally, I’m only using tips 2 and 6, for two reasons: Firstly, the majority of the spam that I get is manually-entered (though I know that plenty of other blogs get hassled by robots) and there isn’t that much of it, so I’ll save the other tips for later. Secondly, my MT install is home to many blogs, and I haven’t the time yet to write the script that goes through and fixes all the templates (though when I do, I promise to put it here). When I have the chance, I’ll probably implement 1, 4 and 5 as well.
Why I’ve kept things simple
As I said earlier, I think the talk of complex systems like centralised MD5 hash databases is overkill – just because it’s necessary for email spam, it doesn’t mean comment spam needs such heavy artillery. If we use simple tricks to keep blog configurations sufficiently varied, we’ll show potential spammers that trying to hijack our blogs just isn’t worth their effort, because the amount of work it’ll take to win this war is considerably smaller on our side of the fence.
Variance is the key here. The current spate of Windows viruses are effective because most Windows installations work the same way – the virus rarely has to investigate possible differences that would break it. If we consider a spamming robot’s base assumptions about our blogs, it only takes a small amount of work to defeat them. Sure, it’s still possible to write a robot that would defeat all of the tips above (well, the first five, anyway), but the complexity of the code would be fairly nasty, and would still only take another few small changes to break.
At present, though, there are enough vanilla MT installs out there that blog spam robots will still work most of the time. To defeat them at the roots, the installation routine has to introduce enough variance that building a spam robot is a non-trivial task, and also provide maintenance systems to regularly churn that variance.
It is the future, doctor
and the BBC World Service teaches English with the lyrics to “Born Slippy” and “Dark And Long”.