Enhancements to Q&A

Arantor · April 08, 2013, 01:50:12 PM

QuoteSo, instead of suggesting using the right tool for the job (a full-text search database) we should rewrite them in PHP and MySQL? Seems futile.

You do realise that what Quora does is very different to what we're actually talking about here, right?

We're not talking about a QA system for *support*. We're talking about Q&A that members have to complete as an anti-spam tool. Whereupon you don't want vast databases, or search tools or anything else.

EDIT: Ninja'd

Getting back on track:

QuoteHow about scattering Questions around the page, with human-readable instructions on which one to do next? If "unwanted" ones are answered, you know it's a bot. The old trick of using invisible text to trick bots into answering a question might even work for a while!

The problem with that is the inherent slow-down you apply to legitimate users. The harder you make it for legitimate users to join up and get started, the harder it'll be to get a forum going - and a forum where legitimate users are being penalised excessively is not one people are going to join or contribute to - which will just kill it. See also visually impaired user requirements.

QuoteMy intent here is to give some flexibility in the range of answers that are acceptable, rather than forcing an exact match, as is done today (or exhaustively listing all acceptable answers)

While from a theoretical point of view, yes, it would be advantageous - the practical reality is that most people are just not developers. It's a level of complexity that most people just don't want - even down to regex matching in profile fields has proven to be beyond most people (bad documentation notwithstanding)

QuoteThat's why I also suggested Soundex as a way of accepting slight misspellings

I'd personally suggest metaphone rather than soundex (not capped to 4 characters, also has some ideas about pronunciation) but even that is not that hard to fool. It also doesn't handle numeric versus textual answers.

QuoteAs I've said many times before, you have to get away from relying on a hard shell defense (only keeping bots from registering), and implement defense-in-depth, where you look at member behavior and the content of their posts.

Some of us have been doing just this for a couple of years now

MrPhil · April 08, 2013, 08:53:11 PM

Quote from: Arantor on April 08, 2013, 01:50:12 PM
QuoteHow about scattering Questions around the page, with human-readable instructions on which one to do next? If "unwanted" ones are answered, you know it's a bot. The old trick of using invisible text to trick bots into answering a question might even work for a while!

The problem with that is the inherent slow-down you apply to legitimate users. The harder you make it for legitimate users to join up and get started, the harder it'll be to get a forum going - and a forum where legitimate users are being penalised excessively is not one people are going to join or contribute to - which will just kill it. See also visually impaired user requirements.

No question that making it tougher for bots risks making it tougher for legitimate members to sign up, too. It's a matter of striking a balance between blocking spambots/spam farm humans and making it too difficult for legit humans, including those using screen readers. For the latter, how about some kind of "SKIP THIS QUESTION" in the invisible text? You'd have to use different wording so bots can't be trained to look for that specific text. You'd probably want to set up tabs to get blind applicants easily to the right place, so physically skipping around the page might not accomplish anything useful in the end.

Quote
QuoteMy intent here is to give some flexibility in the range of answers that are acceptable, rather than forcing an exact match, as is done today (or exhaustively listing all acceptable answers)

While from a theoretical point of view, yes, it would be advantageous - the practical reality is that most people are just not developers. It's a level of complexity that most people just don't want - even down to regex matching in profile fields has proven to be beyond most people (bad documentation notwithstanding)

I find it hard to believe that people of normal intelligence would have such a hard time creating a regexp using some very basic rules -- alternation of choices, fixed text, groups, */?/+ on groups or characters, and escaping special command characters. The setup page could have a place to test various inputs. The forum admin would have a choice of plain text (exact match answer) and regexp.

Arantor · April 08, 2013, 09:47:36 PM

QuoteFor the latter, how about some kind of "SKIP THIS QUESTION" in the invisible text?

Works quite well for me.

QuoteYou'd have to use different wording so bots can't be trained to look for that specific text.

Actually that hasn't happened yet to my experience.

QuoteYou'd probably want to set up tabs to get blind applicants easily to the right place, so physically skipping around the page might not accomplish anything useful in the end.

SMF should be setting tabindexes pretty much automatically anyway.

QuoteI find it hard to believe that people of normal intelligence would have such a hard time creating a regexp using some very basic rules

That's because you and I aren't 'people of normal intelligence'. Being a programmer, especially the longer you do it, alters the way you handle logic.

Case in point, I know a site that had the question: '5 - 8 = what?' and the number of people that couldn't even get *that* right (actual humans, not bots) was staggering. I saw a good many people most insistent that the answer was 3. In all other respects, these people were 'of normal intelligence' and could communicate quite fluently (with a decent grasp of grammar and punctuation etc.)

People of this kind are also the kind of people who will be running forums - your average forum admin is *not* technically minded and has not been for some time. Long gone are the days where a site operator needs to actually get hands dirty with code.

MrPhil · June 08, 2013, 11:55:09 PM

Thought I'd prod this (hopefully not quite dead) Norwegian Blue...

I've been thinking a bit about how multiple languages and even multiple correct answers (and/or multiple choice format) might be handled. The following fields exist in smf_log_comments:

id_comment -- already used as the ID for a Q-A pair. would be the ID for the question text and the first (or only) correct answer, for one language
id_member -- 0 for Question record, id_comment of Question record for records with additional correct answers or red herrings
member_name -- use as language code. could be 'en' or 'en_US' etc.
comment_type -- already used ('ver_test') to distinguish from other users of log_comments
id_recipient -- use as answer number (0 is the first or only correct answer)
recipient_name -- already used as the answer (would add additional correct answers, as well as red herrings, in other records)
log_time -- use as number of red herrings (0 default)
id_notice -- use as number of correct answers (treat 0 as 1)
counter -- future use as 0 = text, 1 = regexp answer (for fill-in-the-blank only)
body -- already used as the question

Every record, whatever it's used for, still gets its own id_comment for uniqueness. Within a single language (member_name = language code), you would have the question + first (or only) correct answer in one record (id_member=0). Additional correct answers could be in additional records (question = '') as answer 1, 2,... For multiple choice format, there could be 1 or more red herrings available (start numbering after the last correct answer). You could have multiple languages with a non-'' entry in "member_name".

Unless I overlooked something in create_control_verification(), the language would have to be passed in to that function. The language codes used would have to match whatever's available, be it 1 or english or en or en_GB, etc. '' could be the default, so that current databases would still work, and could be the fallback (if it still exists) if the current language is not found in the Questions. Or, one of the entered languages could be designated ("default:en_US") as the default. We need to successfully handle '' as one (or only) language code, as well as various languages, and handle the case where Q&A doesn't exist for the current language (force the default language, or ask the registrant to select from the supported languages).

I'll leave the editing process in ManageSettings.php as an exercise for the reader. Potentially, different languages might even have different numbers of questions (so long as it is at least the number of questions that will be picked), as well as different numbers of right answers and red herrings. There's no reason that different languages can't have a completely different set of Q&A. Whether to present a question as fill-in-the-blank or a multiple choice format should be randomly picked at run time, not in the setup (a regexp answer forces fill-in-the-blank).

In the presentation of Questions, the first thing is to get a list of all the available questions (to choose a subset in random order). Currently that is done simply with comment_type. The revised code would have the current language, and would select only the questions and answers for the given language and id_member=0, at least for the purposes of making an array of questions to be asked. That should give the same number of id_comments to pick from as today. For multiple choice questions, a new method would be needed to pass over the answers and generate that format. Questions would be asked in the same manner as today (for fill-in-the-blank format) and the answer returned. For multiple choice, pick one of the "correct" answers and some number of red herrings. The text of the answer (which will match exactly) will be returned from multiple choice (checkbox, radio button, selection list) formats. In any case, we loop through the list of correct answers, comparing the returned answer to them. If the answer is a regexp, that would work only for fill-in-the-blank format.

It's getting late, so I'll sign off and hope that these thoughts will get someone interested in pursuing this. I hope I'm not beating a deceased equine here. Certainly multiple languages could come first, and then multiple correct answers. Multiple choice and/or regexp answers could come further down the road -- this format should support it.

live627 · June 09, 2013, 03:12:07 AM

Don't use smf_log_comments for this.. Didn't anyone stop and think that it's illogical?

MrPhil · June 09, 2013, 10:59:32 AM

Of course it's not logical to use it. But someone already did. It would be better to bite the bullet and use a new table just for Q&A, but we have to think about compatibility with existing databases (not forcing people to re-enter their questions, but that may be a small price to pay for great new features).

emanuele · June 09, 2013, 12:43:48 PM

Have you ever heard of "upgrade.php"?

MrPhil · June 09, 2013, 12:53:34 PM

Fine by me. Make a new table for Q&A, and give SMF all the new function I listed. Then you will have multiple language support, multiple correct answers, regexp answers if wanted, and various forms of multiple choice formats randomly selected to screw with the bots. I'll bet you'll hear "we can't do that -- the database is sacred" if you seriously propose this for 2.1.

Kindred · June 09, 2013, 02:30:12 PM

ummmm.... since emanuele is one of the devs working on 2.1, you won't hear that...

and if you have the idea and a way to do it, why not go ahead and submit the change to the github repository?

MrPhil · June 09, 2013, 03:45:57 PM

I don't have the time to do the full job (many other things demanding my attention), and I've heard too much about the political infighting and nitpicking when someone tries to contribute to SMF. That said, I'd be happy to collaborate with others (especially those already with a track record accepted as contributors to SMF) on designing this thing, offering feedback, and maybe doing some of the coding.

Kindred · June 09, 2013, 04:47:37 PM

I don't know what "political infighting" you are talking about...
there have been some comments made on the copyright change and the plans for the credit page - but I don't know anything else that has gotten any attention or nitpicking.

live627 · June 09, 2013, 06:09:44 PM

Something fishy is going on here, or maybe a misunderstanding. Forgive and forget?

Arantor · September 28, 2013, 01:03:58 PM

So, I'm going to go out on a limb here and say I'm working on this for 2.1 RIGHT NOW OMG!

(And also, I'm not implementing the full gamut of things MrPhil believes is so important. I'm implementing basically what I put in the opening post which is not some hypothetical solution.)

Arantor · November 07, 2013, 01:05:17 PM

This has been in 2.1 for some time now, moving to the 'applied' board.

News:

Enhancements to Q&A

MrPhil

MrPhil

MrPhil

MrPhil

MrPhil