Yoz Grahame's Unresolvable Discrepancy

I came here to apologise and eat biscuits, and I'm all out of biscuits

Goodbye CAPTCHAs, hello Distributed Porn-Powered Processing

Posted: December 2nd, 2004 | 12 Comments »

There are some things that machines are better at doing than people, and vice versa. Automation is all about the former and CAPTCHAs – those little mangled-text images that you have to type in before you’re allowed a free email account – are all about the latter.

The purpose of CAPTCHAs is to foil automated attempts by spammers to harvest tons of free email accounts. The trouble is that, as was identified over a year ago, you can automate circumvention, if you’re clever about how you harness and use human processing power. In this case, you set up a site with content that people really want to get. (Porn, or warez, or… you get the idea.) In order for people to get to the content, they have to go through a CAPTCHA test – except that the CAPTCHA is actually grabbed from the web service whose defenses you want to breach. Your eager porn-surfing visitors are doing all the hard work for you.

I’m writing about this now in response to this post by Jon Udell, which discusses some of the pros and cons of CAPTCHAs. The main downside he identifies is that, in order to withstand computational defeat, some CAPTCHAs have become so hard that the average human can’t pass them. Similarly, as Matt May points out in this excellent post, CAPTCHAs are an accessibility black hole.
While these are notable problems, I think it’s pretty trivial compared to the CAPTCHA-farming idea I’ve outlined above, which lowers the CAPTCHA barrier to a trivially-breakable level.

To sum up: CAPTCHAs are a pain to users, they trample all over good accessibility practice and, most importantly, they’re useless as a defense against automation. So why the hell are Yahoo et al still using them? Am I wrong in calling CAPTCHA a dead duck? (I have no metrics to back them up, and invite any web techies from large CAPTCHA-using services to contradict me)

I took note of the CAPTCHA-farming idea when I saw it because it’s an ingenious way of harnessing large amounts of brainpower in tiny chunks, for which there are all kinds of applications. Here’s an example: instead of making CAPTCHA-style image tests which look like this…

A CAPTCHA image taken from Yahoo mail, which is probably useless to you if you're reading this

… make ones that look like this…

A scanned field from a census form, containing human handwriting

… and then you can lay off half of your data entry & verification staff. (The above image is an excerpt from a census form on this Lockheed Martin press release, which claims that they have handwriting-recognition up to 85% accuracy. That still leaves a ton of human intervention if you’re dealing with 100,000 forms)

Okay, I’m not being entirely serious with that example, but there are industries out there existing entirely to harness the power of web surfers who’ve lost their way. Prime example: those websites full of secondary link lists that exist purely to show up in Google results and act as a banner-loaded intermediary before sending the on their way to buy a digital camera, via an affiliate link. Popular Power – the late lamented startup that wanted to sell spare cycles of desktop computers to computationally-hungry customers – was aiming at the wrong resource. Distributed CPU cycles are worthless unless you’re SETI or Pixar. Distributed brain cycles… now that’s a much more intriguing proposition.

Or, to put it another way…

Tired: Third-world data-processing sweatshops

Wired: Thousands of clueless web surfers + a good aggregation engine


12 Comments on “Goodbye CAPTCHAs, hello Distributed Porn-Powered Processing”

  1. 1 Anonymous said at 2:58 pm on December 4th, 2004:

    data entry & verification?
    So when the user types ‘John smith’ how do you know they are right?

  2. 2 nick said at 3:37 am on December 5th, 2004:

    Just to get this into the Lazyweb: it’s a perfect dovetail with the Project Gutenberg Distributed Proofreading Project: at least, I think so.
    Have wiggle-room for difficult captchas, use it as part of the cross-checking process, and turn a volunteer process into a massive distributed squinting at scans of 19th-century Bodoni-heavy texts.

  3. 3 nick said at 3:43 am on December 5th, 2004:

    >> So when the user types ‘John smith’ how do you know they are right?
    You can’t. But you can have a preliminary set of criteria that have to be satisfied which leaves enough wiggle-room.
    If you used it with Project Gutenberg’s Distributed Proofreading, you have an ‘initial scan’ string for these sort of things. I presume that you have an initial scan with the ‘John Smith’ handwriting recognition stuff as well. So you use that as your base captcha entry, and allow a certain amount of leeway on what’s entered.

  4. 4 vague said at 9:01 pm on December 5th, 2004:

    heh, how about drug prescription verification? that could be a possibly life-saving use for this specific example….

  5. 5 Yoz said at 11:42 pm on December 6th, 2004:

    >> So when the user types ‘John smith’ how do you know they are right?
    As well as the criteria method than Nick suggests, there’s the old-fashioned way used by many big data entry departments: have the data entered twice and compare. If they differ, have it entered a third time.

  6. 6 Stuart Robinson said at 3:48 pm on December 7th, 2004:

    GMail only uses CAPTCHAs to prevent brute-force password guessing attacks. This seems to be a valid use.

  7. 7 thechak said at 10:51 pm on December 8th, 2004:

    The technology for this already exists, though their web site is pretty broke right now.
    http://www.openmind.org
    There are lots of silly online games that have been written to assign keywords to all the images on google, improve handwriting recognition, etc. They are based on multiple disconnected users viewing the same image and entering the same answer (for verification that their answer is correct) and recording the data.

  8. 8 thefoo said at 5:20 pm on December 9th, 2004:

    A scientific application — I’ll try to explain this one quickly. We have tens of thousands of fragments of DNA that we’ve run on gels, and we want to know what size they are. The size of the DNA can be determined by comparing the position of the DNA band to markers of known sizes. I want to get all of this information into a database, so we don’t have to manually look up the size of the fragment in question.
    I could imagine scanning in all the images and then making them into captchas, with lots of redundancy for improved accuracy. But first you’d have to get your porn-surfing/game-playing audience to learn how to read gels. Hmm…

  9. 9 L said at 9:57 pm on June 3rd, 2005:

    Captcha’s aren’t “useless”. Do you think someone is going to go to the extreme efforts to build an automated circumvention system, which may take them several days/months/years, let alone it might not even work?
    Plus, even with this automated system you speak of, they still have to do the extreme work of emailing all these people who are going to do this work for them.. and they may get caught spamming while doing so. And why would they even bother emailing people to verify captcha’s if they could just email the people about their offer they are trying to sell, instead? It’s double the work for nothing.
    The point is: it detours people from spamming a significant amount. It’s not like your door locks on the house are useless now that you’ve heard on the news that one person’s house got broken into even though it was locked.
    You can say it’s possible for someone to break into your home if you have door locks and a security system. But isn’t it harder for them to break in than if you left the door wide open?

  10. 10 Yoz said at 3:11 am on June 5th, 2005:

    L: Firstly, yes, people *are* going to build automated circumvention systems. If attackers weren’t automating already, CAPTCHAs wouldn’t have been invented. It’s an arms race – the attackers will just automate further.
    Secondly, I think you’ve misunderstood how CAPTCHA farms work – “they still have to do the extreme work of emailing all these people who are going to do this work for them.” No, they don’t. I’m not sure why you think CAPTCHA farming involves email. It involves websites. Email has very little to do with it.

  11. 11 tom sherman said at 6:57 am on August 7th, 2005:

    CAPTCHAs are not useless. In fact, the “automated system” of which you speak seems to trace to a rather famous blog entry by Cory Doctorow of Boing Boing. In the lovely echochamber of the blogosphere, Cory’s “[s]omeone told me” about the porn-defeats-CAPTCHAs idea has become gospel. That the techies have accepted this as fact is pathetic and laughable. (I thought some of you were trained in the scientific method?!)
    http://www.boingboing.net/2004/01/27/solving_and_creating.html
    CAPTCHAs can be done better, e.g. by combining visual CAPTCHAs with an audio alternative. Bottom line: it’s about making it harder to abuse a system, and CAPTCHAs do that. There’s no reason they can’t be used in combination with other techniques as well. The near-religious hatred I see of this technology is a joke.

  12. 12 Yoz said at 1:54 pm on August 7th, 2005:

    Firstly, Tom, it traces back several months earlier than Cory’s post – back to a newspaper article from 2003, which I’ve linked to in my post and you seem to have entirely missed. But more importantly, you dismiss it as “pathetic and laughable” without giving any argument about *why* it won’t work. As an experienced web techie I can see exactly how it’d work. Now, please stick to actual scientific method and disprove it with something more than bluster about the echo chamber.
    Combining visual and audio CAPTCHAs may solve the problem of dealing with partially-sighted users but it still doesn’t cope with the overall usability problem. And “near-religious hatred”? What the hell are you talking about?

Archive

The complete list of posts lives here.

yoz's bookmarks

  • Lee Maguire – WikiLeaks and the future Hydra
    Lee on the similarities between WikiLeaks and comic-book villainous organisations.
  • WebGL Inspector
    Lovely Firebug/Web Inspector-alike for WebGL, usable either as an extension (for the top WebGL-enabled browsers) or embedded JS. Under rapid development.
  • ge.tt
    Gorgeous hack: web-based file sharing service where the link to your file works while it's still uploading. Instant, super simple and free. (via DMM)
  • Async.js (Caolan McMahon)
    One of the many flow-control packages for Javascript, of which at least 3 are called async.js. This one has some really nice tricks, especially auto() which fires off function calls as soon as their dependencies are met.
  • Adequately Good - JavaScript Module Pattern: In-Depth
    Really good explanations of several useful function and module patterns to use when building your own
  • News flash: Deadly terrorism existed before 9/11 - Ask the Pilot - Salon.com
    The quantity of air-travel-targeting terrorist attacks between 1985 and 1989 would be unthinkable today; yet the presented danger and precautions taken are far worse
  • 100 Free High Quality WordPress Themes: 2010 Edition - Smashing Magazine
    Some really nice minimal ones here, along with good theme tools and a bunch of things that I didn't know WordPress could do
  • Lenore Skenazy: 'Stranger Danger' and the Decline of Halloween - WSJ.com
    Despite American parents' increasing paranoia, Halloween may be the safest day of the year for kids. (via schneier)
  • becoming the alien: apartheid, racism and district 9 « a subtle knife
    Superb essay on District 9's relevance. "It confronts us with our complicity with racism, by making us identify with the perspective of the racist, inviting us to feel the revulsion of the xenophobe – and then pulling the carpet from under our feet." (via kevin marks)
  • Music Hack Day: The Uninterrupter - Andrew Shearer's Other Blog
    "For an increasing number of us, the same device we use to play music also handles email and GPS directions." The presented solutions are as brilliant as they are ludicrous. (via extensionfm blog)

yoz on twitter

    follow me on Twitter

    Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

    Content licensed under the Creative Commons (Attribution - Share Alike) | Theme based on Clean Room by Columbia, MO Web Design