task 0000002 Louis J. Sheehan

In a neat example of Internet-enabled “crowdsourcing,” the method of distributing a large task to many contributors, researchers are using an anti-spam program to get people to decipher damaged or faded texts, one word at a time. Chances are that if you’ve solved one of those distorted-word tests to secure an account with Facebook, Craigslist, or Ticketmaster, you’ve helped The New York Times inch a little closer to digitizing its entire print newspaper archive from 1851 to 1980 [CNET]. Louis J. Sheehan.

The program, known as reCAPTCHA, is widely used to ensure that humans, rather than spam bots, are commenting on blogs (including some of DISCOVER’s) and signing up for free email accounts. “More web sites are adopting reCAPTCHAs each day, so the rate of transcription keeps growing,” said [lead researcher Luis] von Ahn. “More than 4 million words are being transcribed every day. It would take more than 1,500 people working 40 hours a week at a rate of 60 words a minute to match our weekly output” [Telegraph]. The service is available for free to any site.

Ahn’s lab uses two different optical character recognition (OCR) software programs to scan an old book or newspaper article and convert it into a digital, searchable file. But when the programs disagree on the reading of a word, that word is added to the reCAPTCHA database, and used as part of an anti-spam puzzle. According to a report published in the journal Science [subscription required], humans decipher such words with 99 percent accuracy.

In 2000, von Ahn helped invent the first “CAPTCHA,” which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” with a nod to the early computer scientist Alan Turing. The new reCAPTCHA cleverly slips a useful task into what has already become a mundane Internet activity. Says Ahn: “We are demonstrating that we can take human effort — human processing power — that would otherwise be wasted and redirect it to accomplish tasks that computers cannot yet solve” [Wired News].

Last year DISCOVER saw how humans could act as artificial artificial intelligence at the Amazon Mechanical Turk, another fine example of crowdsourcing.

Image: Science/AAAS

August 14th, 2008 Tags: , ,
by Eliza Strickland in Technology | 19 comments | RSS feed | Trackback >

19 Responses to “Computers Exploit Human Brainpower to Decipher Faded Texts”

  1. Jeremiah Says:
    August 15th, 2008 at 7:36 am Um… shouldn’t that be “This aged portion of society was”? Haha.
  2. john powell Says:
    August 16th, 2008 at 11:33 am A Mental Blockage

    In the current is often found
    Unknown particles of sky and ground.
    Oft they appear as phantasms or as dreams
    Or oft illusions of what is or only seems.

    Nonetheless they do appear as real or imagined fear
    Or as unknowns, unnaturals, torments to eye and ear.
    Look what the fresh new breeze doth bring–
    With its mysterious voice, it doth sing.

    Soft on the air with voice or visual treat,
    It lays its bearing or bounty at your feet.
    Now it is yours, this new thought;
    By this new wind, it is brought.

    Up from the abyss or down from heaven,
    In a current, air now is given.
    It’s oft a creature of what we ingest
    That gives unto us this worst or best.

    Oh, the hazards of seeing or hearing
    That soon become our reasons for fearing!
    The things accepted without investigation
    Causes the brain its mental constipation.

    120205

  3. Sir Mildred Pierce Says:
    August 17th, 2008 at 5:06 pm “Um? shouldn?t that be ?This aged portion of society was?? Haha.”

    Common mistake. “society” is a plurality, and as such is treated as such in the grammar. Another good example is one might say “Queen is Freddy Mercury, Brian May…” but the proper way to say it would be “Queen are Freddy Mercury, Brian May…” etc.. the brain thinks otherwise because the previous word doesn’t end in “s”, but nevertheless it’s a plurality and thus, treated as such.

  4. Sir Mildred Pierce Says:
    August 17th, 2008 at 5:10 pm Or rather “This aged portion of society” as a whole is a plurality, not just “society”…

    I would like to see the famous “Roswell Memo given the treatment, as it seems previously only those biased to the answer that the memo really does talk about aliens and discs are teh only ones interpreting it.

  5. Duck Says:
    August 17th, 2008 at 5:13 pm Hm, how then does the system verify if the typed-in word is correct? Wouldn’t someone have to physically write out the correct answer so the CAPTCHA would know later on if someone entered the correct word, or something else. I could just write ‘poop’ and it wouldn’t catch it.
  6. Ash Says:
    August 17th, 2008 at 5:19 pm I’m all for typing inane responses to articles if it means the furthering of literacy.
    Imagine if Youtube incorporated it.
  7. @MildredPierce Says:
    August 17th, 2008 at 5:22 pm Actually, that depends on whether your speaking British English or American English. In British English, collective nouns are treated as plural, “The class were…”, “The team were…”, “U2 are…”, but in American English they are treated as singular nouns.

    Furthermore, in the example above it should be “was” no matter what side of the Atlantic you’re on. The “was” refers to “this [aged] portion”, which is clearly singular because of the “this”. If the quote were “The aged portion of society…” then it would depend on B.E. vs. A.E.

    I’m guessing the quote is an archaic formulation.

  8. @Duck Says:
    August 17th, 2008 at 5:24 pm The system gives the same words to multiple people. If they agree on what the word should be, then the word is accepted as correct. If some of the writers disagree, then the word is given to more people.
  9. Grimmygrim Says:
    August 17th, 2008 at 6:01 pm Portion is singular so “was” would be correct. Using “was” or “were” would depend on the context (are they talking about the portion or the society). I’m leaning towards “was”.
  10. ayeroxor Says:
    August 17th, 2008 at 6:07 pm “Um… shouldn’t that be “This aged portion of society was”? Haha.”

    It can be either. Haha.

  11. Jmar Says:
    August 17th, 2008 at 6:12 pm I do not understand how this would work for “new words”, yet to be deciphered. Above someone suggested it sent the word to multiple people… does the first person have to wait until enough people verify? Haha. All my experence with this CAPTCHA has been instant either correct or incorrect, from my understanding it’s asking me to verify, not decipher. Am I just not getting a “new word” or what?
  12. rprebel Says:
    August 17th, 2008 at 7:17 pm It sounds like CAPTCHAs, for the commenter, aren’t new words at all. When I type ’suffolk’ and ‘chiffon’ into the little box below this bigger box, I’m not helping to decipher anything. I’m placing a vote in an election that’s already been decided. They’re also annoying, but spam is moreso.
  13. Ron Delta Says:
    August 17th, 2008 at 8:41 pm Wow dude, thsoe folks are pretty amazing arent they. Very smart bunch.

    RD
    www.anondo.alturl.com

  14. Fabrizio Says:
    August 17th, 2008 at 8:47 pm Andrei Broder was the first to invent a CAPTCHA when at Altavista and not Luis von Ahn
  15. Hank Roberts Says:
    August 17th, 2008 at 9:09 pm When all else fails, read the fine manual:

    http://recaptcha.net/learnmore.html

    “how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”

    See also: http://web.sbu.edu/history/tschaeper/Hist101/101wwwfbacon.html

  16. Jerome Says:
    August 18th, 2008 at 2:09 am Yes, that’s not clear to me either… if I’m deciphering the word, how does the program know what is correct?
  17. thomas Says:
    August 18th, 2008 at 3:21 am Here’s how they do it (From the website):

    “But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”

    Very cool idea.

  18. komatzu Says:
    August 18th, 2008 at 10:59 am @thomas: thanks for the answer!
    I think it should have been mentioned in the article.
  19. Fat Jolly Penguin Says:
    August 20th, 2008 at 6:39 pm ““Um… shouldn’t that be “This aged portion of society was”? Haha.”

    It can be either. Haha.”

    Actually, it should be “was.” The subject of the sentence is “portion.”

Louis J. Sheehan

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: