Anti-spam methods: securing the future of books

By at home

recaptcha text_25_06_12It has to be said that one of the most annoying features of surfing the web is that online security check that demands you copy squiggly, blurred letters and numbers of apparently nonsensical words when doing a spot of online shopping, social networking or the like to prove your human status. But now, your answers are being put to good use, as these checks are in fact digitising books simultaneously.

Invented by Luis von Ahn in 2000, the Completely Automated Public Turing test to tell Computers and Humans Apart (Captcha) software – those distorted images of words and numbers -is used by more than 350,000 websites to prevent computer programs from attacking them with spam.

In 2007, von Ahn calculated that 200 million Captchas were being typed by people all over the world every day – at a count of about 10 seconds spent per form. Multiply that by 200 million, and web surfers were wasting about 500,000 hours on these frustrating security codes every day.

He decided to put these hours to good use and devised ReCaptcha, a system that uses each human-typed response as both a security check and a means to digitise books one word at a time.This software differed from the Captcha process as forms now showed one randomly generated word paired with a photo of a word taken from the pages of an old book, newspaper or journal that needed digitising.

Usually in order to digitise documents hard copy texts are scanned, then run through a programme that transcribes every word into a digital format (known as optical character recognition), but when pages of a document are very old, the typeface faded or the paper yellowed and torn, the computer struggles to read it and needs human help. This is where the second picture on the ReCaptcha form comes in.

To make sure answers are accurate, the ReCaptcha system only logs a person’s second response if they get the first word right. It then collates the most popular second responses from a number of forms, and stores the most popular answer, as this is most likely to be correct.

The ReCaptcha software was bought by Google in 2009, and now its translating software is used exclusively for the Google’s Books project – Google’s attempt to transcribe every book in the world. It is available for websites to use for free. To find out more, visit www.google.com/recaptcha/learnmore

So next time you’re shouting at a web page because it requires some word guessing, take a deep breath and remind yourself that those 10 seconds of your time are helping the greater good.


Picture credit: © Google

Read Now

How to get the best night’s sleep

Sleep is the best meditation, according to the Dalai Lama, which is all very well, but if your bed...

Top 10 vegetables to grow over the winter

Cold weather is no excuse for not growing any vegetables – you just need to research carefully and...

Win £100 to spend at emreco.com Enter now

Win 2 nights stay at the famous Headland Hotel! Enter now

Win a professional skincare routine – plus £100 to spend on the Medik8 website Enter now