New anti-bot Captcha mod -- simple, elegant, effective

Started by mrcj, September 08, 2011, 02:13:43 AM

Previous topic - Next topic

mrcj

Like many of you, my forum was being slammed by bots that could defeat Captcha. They added lots of bogus users and spam messages. I tried some of the human-verification enhancement packages and found them to be very complex, problematic, or inoperative. So I came up with my own enhancement. I believe it is quite elegant in its simplicity.

Using CorelDRAW, I generated a .TTF font file that has what appears to be two letters for each letter form. The two letters are, for the most part, a phonetic representation of the desired letter. So, ABCDEFG comes out AY BE CE DE EE EF GE. The human puzzle is to read the phonetic pairs and enter the correct single letter they represent. I did have to punt on H and W, since the sounds can't be represented by two letters. I made those HH and WW.

The beauty of the system is that it works seamlessly in Captcha, including the "Listen..." function. Captcha doesn't know it's displaying pairs – the call to the font outline is just for a single letter. The bots, however, see all the letters and enter twice as many letters as they should. Result? Rejected registration.

Since implementing this, I've had no bot registrations. I had a mild concern about users who are not native English speakers, but ironically, the first registration to come through was from Germany! Canadian and Great Britain boards might want to change ZE to ZD, though.  :)

So here are the particulars. I placed the file (attached to this message) arial_phont.ttf in the Themes/default/fonts directory.

In the file Themes/default/languages/Login.english.php, I changed:

$txt['visual_verification_label'] = 'Visual verification';
$txt['visual_verification_description'] = 'Type the letters shown in the picture';


to:

$txt['visual_verification_label'] = 'Phonetic/Visual verification puzzle';
$txt['visual_verification_description'] = 'Letters are shown in (mostly) phonetic pairs. For every TWO letters,<br />type in the SINGLE letter that sounds like the pair. Say them out loud<br />or use the Listen feature if you get stuck.';


Then in the file Sources/Sub-Graphics.php, I changed:

      // Try use Screenge if we can - it looks good!
      if (in_array('Screenge.ttf', $ttfont_list))
         $ttfont_list = array('Screenge.ttf');


to:

      // Use arial_phont for double-letter bot defeat
      if (in_array('arial_phont.ttf', $ttfont_list))
         $ttfont_list = array('arial_phont.ttf');


Finally, I changed the "Complexity of visual verification image" in Registration/Settings to "Simple - overlapping colored letters, no noise". There's no point in making these letters unreadable for humans, but the colored letters help highlight the pairs. Plus, this setting is not as jarringly unsightly for your visitors.

Is this the be-all and end-all for defeating bots? Of course not – the slimy spammer bottom-feeders will eventually program against it. However, it does open a few more possibilities, for example, letter pairs in which the user is asked to enter only the left or right letter, or even stacked letters where the top or bottom letter is the correct choice.

I know nothing about making a package for automated installation into SMF. Also, I run 1.1.14 and I don't know anything about 2.0 or other versions. I leave it to some of you guys who are interested is such stuff to package these changes. Just give me some credit for the idea, OK?  ;)

Dave

Norty

Thanks mrcj.  Looks like a good solution.

I need to try and reduce the number of bot registrations that I get on my forum, so I will look at implementing this when I deploy my customisations in the next couple of weeks.


It would be nice to see something like this built into SMF core, as an extension of the current captcha validator.

My thoughts:

  • When using captcha, randomly select from a number of ttf files(double letter return 1st letter, double letter return 2nd letter, stacked letter return top, stacked letter return bottom, any combination return highlighted (or specified colour) letters/numbers)
  • SMF updates/upgrades could include addtional ttf files, which would keep expanding the number of options that bot writers have to account for.

mrcj

Norty,

I'll be interested in your feedback once you've installed it. It's been spectacularly successful on my site -- human registrations are coming through fine, and not a single bot registration all month.  :D

MrMike

Outstanding solution. And it is indeed, "simple, elegant, effective". Super tricky.  :)

MrPhil

Interesting. As you pointed out, this is going to be not only dependent on language, but also on culture (e.g., 'zed' for Z instead of 'zee'). You'll need to give extra instructions in the prompt, which spammers might eventually look for if this comes into widespread use. Even "spelling out" letters in full (e.g., "aitch" for H) would also be language and culture dependent. Even so, it could be helpful.

A variation might be to display maritime signal flags instead of letters. A human wishing to join would have to go through the bother of looking up signal flags to get the right letters. The display would have to be in full color, too. Another variation could be to generate nonsense 2, 3, or 4 letter words, mapped to the CAPTCHA's individual letters, and instruct the human to enter only the nth letter of each word. Another could be to display digits that somehow map to 26 individual letters (maybe two digits at a time, 01 through 26). Any graphic or glyph that corresponds 1-to-1 with a Latin letter might work.

MrMike

Quote from: MrPhil on September 18, 2011, 02:52:52 PMA variation might be to display maritime signal flags instead of letters. A human wishing to join would have to go through the bother of looking up signal flags to get the right letters.
Or you could just have them solve a couple of quadratic equation using n-space dimensional variables (with a generous 10-second time limit).

Seriously, if your CAPTCHA forces people to look up maritime signal flags, your forum will be a ghost town. Or populated solely by retired navy signal specialists.

MrPhil

True that it would discourage registration by all but the most die-hard, but sometimes that's what you want. Anyway, I was just brainstorming (thoughtshowering for our British cousins  ;D ) on ways you might exploit CAPTCHA's use of font files to fool bots and perhaps even discourage spam farm humans.

mrcj

Quote from: MrMike on September 18, 2011, 10:36:43 PM
Or you could just have them solve a couple of quadratic equation using n-space dimensional variables (with a generous 10-second time limit).

:laugh:  :laugh:  :laugh:
Having worked for a software company for eight years (and subsequently giving up high-tech almost completely), I've seen this before -- the tendency of programmer types to favor the more complex (or over-intellectualized) solution. My background was in electronic publishing, so the font-substitution idea came to me pretty quickly once I set my mind onto the problem. I think it would probably be wiser to stick with the phonetic pairs until they're cracked, then go to left/right or up/down choices. Semaphore and and quadratic equations are far down the list, I think. Simplify, simplify, simplify!

Advertisement: