Simple Machines Community Forum

SMF Development => Bug Reports => Fixed or Bogus Bugs => Topic started by: ashish101 on September 26, 2014, 01:56:45 AM

Title: Search results bug
Post by: ashish101 on September 26, 2014, 01:56:45 AM
I just discovered that when searching a particular string the search results are broken and show weird results - http://www.simplemachines.org/community/index.php?action=search2&search=j.s.%20sc

As far as I understand, it's happening due to Sphinx + SMF and not with normal installation of SMF. Just wanted to report, searched this board before posting, didn't see any thread on this.
Title: Re: Search results bug
Post by: Arantor on September 26, 2014, 05:14:56 AM
What are you calling broken exactly? I don't see where it's 'broken'.
Title: Re: Search results bug
Post by: margarett on September 26, 2014, 06:17:57 AM
I'm guessing it's because it shows   in the search results...
Title: Re: Search results bug
Post by: Arantor on September 26, 2014, 06:20:33 AM
I'm guessing they're supposed to do that seeing how the preparser replaces pairs of spaces with space+nbsp...
Title: Re: Search results bug
Post by: ashish101 on September 26, 2014, 06:48:19 AM
Quote from: Arantor on September 26, 2014, 06:20:33 AM
I'm guessing they're supposed to do that seeing how the preparser replaces pairs of spaces with space+nbsp...
Yeah but why would   appear for "j.s. sc" query?
Title: Re: Search results bug
Post by: Arantor on September 26, 2014, 06:54:20 AM
Because search indexes are a serious pain in the proverbial to get correct.

Here's what happens.

First it works out what the 'words' are since any search index will be using words not characters.
-> A word is defined as a sequence of characters that are legitimately part of words. A full stop is not.

So, what it's actually doing is searching for a match against 'j', 's' and 'sc' as words. Your first result there has sc_stuff (and similar) which gets processed the same way and that's what it matched on.

The highlighting step is a very different process, it just looks for the same words - and matches them even though they wouldn't otherwise be considered 'words'. nbsp is internally treated as a word but for *highlighting* purposes, it isn't and will just be highlighted regardless.

This is not a bug. It is not ideal behaviour, but it is working as designed originally. It could be handled differently but doing it is leads to other strangeness.