Advertisement:

Author Topic: 2 character minimum in search  (Read 29462 times)

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #20 on: June 25, 2012, 08:49:33 AM »
Sounds like a plan to me :)
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: 2 character minimum in search
« Reply #21 on: June 25, 2012, 08:51:17 AM »
I wonder if I'm special somehow...



Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

kat

  • Guest
Re: 2 character minimum in search
« Reply #22 on: June 25, 2012, 09:00:36 AM »
Oh, you're special, all right. ;)

The difference, is that with engines, when you use the "exact phrase" option, should you include the word "a", it ignores that letter, if it appears as part of a word, rather than as a word unto itself, doesn't it?.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #23 on: June 25, 2012, 09:08:33 AM »
What is highlighted is not what is searched ;)
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

kat

  • Guest
Re: 2 character minimum in search
« Reply #24 on: June 25, 2012, 09:12:52 AM »

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #25 on: June 25, 2012, 09:15:45 AM »
Quote
it ignores that letter, if it appears as part of a word, rather than as a word unto itself, doesn't it?.

Yes, that's the problem. It's too short to be a word on its own as far as search engines are generally concerned. The usual rule (certainly Sphinx and MySQL FULLTEXT default) is 3 letters.

But let me just drop another complexity into the mix: '

Is "doesn't" a word? Should it be searchable as such? If that's true, that would imply we should include the ' as part of a word, but then that screws up searching on forums that have code, e.g. $txt['string'] = 'string' - 'string' is not the same as string then...

And that's before you realise that SMF doesn't store the ' as a ' but as a & code.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

kat

  • Guest
Re: 2 character minimum in search
« Reply #26 on: June 25, 2012, 09:20:38 AM »
I'd've imagined that a search motor would see the word "Doesn't", realise that it's a contraction of "Does not" and say "OK, I'll search both, in-situ, and put up the relavent results, along with a bit of pr0n, coz that's what they really wanna see".

At least, I'd want it to display something, rather than saying "Can't do that. Single letter words don't exist, sucker, so go do it all, again, so I can spit-out another error, just to piss you off".

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #27 on: June 25, 2012, 09:24:21 AM »
Oh no, no, no. Most search engines just are not that clever and would be thoroughly confused with such.

Search theory is still very young.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

kat

  • Guest
Re: 2 character minimum in search
« Reply #28 on: June 25, 2012, 09:33:46 AM »
Sounds like we need a good coder to sort this one out, then... ;)

Maybe, if we can refine it and get it to work well, we could flog it to Google. ;)

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: 2 character minimum in search
« Reply #29 on: June 25, 2012, 09:41:56 AM »
What is highlighted is not what is searched ;)
Right, what I clicked on the link I posted in my previous post:
K@ have you tried using the double quotes around the exact phrase?
http://www.simplemachines.org/community/index.php?action=search2;search="A brown kitten"
that would be the same as type:
Code: [Select]
"A brown kitten" in the search box.
That is in fact the "exact phrase" thing you were asking about.


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

kat

  • Guest
Re: 2 character minimum in search
« Reply #30 on: June 25, 2012, 09:49:40 AM »
Yeah, I know. The major engines do it, that way.

What I'm saying, is "Could it be done so that ONLY that exact phrase would appear and nothing else, at all".

Seems not, as things stand. :(

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #31 on: June 25, 2012, 09:51:37 AM »
Quote
That is in fact the "exact phrase" thing you were asking about.

Yes, but it still disregards the 'a' part of that. It still only searches on 'brown kitten', even if it highlights 'a brown kitten', and after thousands of support posts with Sphinx, I learned the hard way that having exact matching on single character words is actually a recipe for disaster.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: 2 character minimum in search
« Reply #32 on: June 25, 2012, 10:06:26 AM »
Let's come back a second to the original request:
as an admin and a regular user on several SMF's this annoys me when i have to omit characters like "a" and numbers and such when they are critical to finding the posts and or threads
He is not annoyed by the fact that "a" is not searched, he is annoyed by the fact that he gets the error "Each word must be at least two characters long".

So, even silently drop the single chars "words" and do the search on the remaining would probably be an acceptable solution to him. Of course searching
Code: [Select]
a e i o u would return an error too, but that would be less irritating probably.


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #33 on: June 25, 2012, 10:11:13 AM »
Therein lies the question: would it really be less irritating?

Would it really be less irritating to search for something you *know* is there and find that it can't find it?
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: 2 character minimum in search
« Reply #34 on: June 25, 2012, 11:08:56 AM »
Disclaimer:
1) I know nothing about searches, all I'm saying it's just based on the material in this topic. ;)
2) it's too hot here and my brain is crashing every two seconds... ::)

Now we are forcing the user to remove what SMF cannot use to do a search, silently hide it would allow him to at least get a result.
Is it the result he was expecting? Well, considering the relevance of an "a" in a search the probability that it is are rather high I think. Additionally: if he wants to do the search he has to remove the "a" and he will get anyway only the result SMF will be able to provide him. Nothing more, nothing less.
Is that the result he was expecting? For sure will be the same result he will get with SMF silently dropping the single chars. I can imagine.
Instead, if he is enclosing the search string into double quotes SMF searches for it without tell anything to the user (so from his perspective it is searching for the entire string).

So, provided that: present or not the "a" doesn't make any difference. I'd expect that "present" or not it would return the same result (of course my experience here is basically null).

So, instead of saying "before being able to search anything you have to remove all the things SMF doesn't want (I know it's not that, but from the user perspective it is) to search", wouldn't be much friendlier to say: "We have searched without considering the "a" because anyway it would be meaningless search for such a piece of information"? (that AFAIR is what google was used to do a long time ago, but now has stopped)

It would be one less click and one less edit for the user searching something (an improvement I'd say), and the result will anyway be the same.

Am I completely off track?


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #35 on: June 25, 2012, 11:17:28 AM »
Quote
Now we are forcing the user to remove what SMF cannot use to do a search, silently hide it would allow him to at least get a result.

You get a result, maybe. It probably won't be the result expected, but that's dependent on a lot of things like search method too.

Quote
Additionally: if he wants to do the search he has to remove the "a" and he will get anyway only the result SMF will be able to provide him

Again it depends on the search method.

Quote
Am I completely off track?

I don't think you're completely off track but at the same time I'm not convinced that it's actually meaningful to silently allow that search too. Especially as the exact behaviour is dependent on the search engine used, which will not be consistent.

Here's the thing. Certain engines will treat that search string differently. Some will silently ignore 'a brown kitten' and parse it as 'brown kitten', even for phrase matching. Some will attempt to match it as 'a brown kitten' but the index won't have the a in it (certain versions of MySQL FULLTEXT) and so just return nothing at all.

Some will treat it as a literal and try to match it literally, but if there happened to be an extra space in there, all bets are off anyway when phrase matching because of everything else going on.

I honestly believe that if you change the behaviour of the front end to silently accept one-character searches, you're unfairly giving an expectation that it'll be matched in phrases etc. when it won't.

The correct compromise as far as I'm concerned would be to drop the single character for searching purposes, and tell the user that's what you've done, as you've outlined. However, that raises its own problems with respect to hitting the server unnecessarily hard because in all likelihood the user will then proceed to reword their query anyway - and as it happens in neither case would the OP actually get the result he wants...
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: 2 character minimum in search
« Reply #36 on: June 25, 2012, 11:30:46 AM »
in neither case would the OP actually get the result he wants...
That's life... :P


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

MrPhil

  • Guest
Re: 2 character minimum in search
« Reply #37 on: June 25, 2012, 11:55:09 AM »
Something interesting...
  • a brown kitten gives the error message that words must be at least two characters
  • "a brown kitten" matches the letter 'a' anywhere, plus 'brown', plus 'kitten' (the phrase brown kitten is matched, even if it doesn't have an a in front)
  • "i brown kitten" matches the letter 'i' anywhere, plus 'brown', but does not match 'kitten'
  • "q brown kitten" matches the letter 'q' anywhere, plus 'brown' and 'kitten'
I'm guessing that in #3, since the 'i' in kitten was already matched, that word is 'taken' and the 'kitten' pattern is not applied against it.

It makes sense to ignore noise words (shorter than some minimum), but it would be nice to simply tell the user that short words are being ignored, rather than giving them a dope slap and telling them to try again. Using a quoted phrase, my expectation would be not that it be broken up into a list of words and individually matched (including the 'a'), but that the entire phrase be matched. Why else would I add quotation marks, except to indicate that I want the whole thing taken as an intact unit? BTW, that's how Google does it. (They do exclude punctuation within the target text, so possibly internally they are splitting it up, but only return cases where the subterms are adjacent and in the correct order, and all there.). I would say that SMF's search is broken because it breaks up a quoted phrase into individual words (including short words) and matches individually rather than if they are all present in the correct order.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,401
    • StoryBB/StoryBB on GitHub
Re: 2 character minimum in search
« Reply #38 on: June 25, 2012, 11:57:52 AM »
How, exactly, did you validate the above? Did you do it here or on a base SMF installation?

Here uses Sphinx, with some slightly atypical configuration therein.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

MrPhil

  • Guest
Re: 2 character minimum in search
« Reply #39 on: June 25, 2012, 12:19:56 PM »
Here, in this very topic.