Search results bug

Started by ashish101, September 26, 2014, 01:56:45 AM

Previous topic - Next topic

ashish101

I just discovered that when searching a particular string the search results are broken and show weird results - http://www.simplemachines.org/community/index.php?action=search2&search=j.s.%20sc

As far as I understand, it's happening due to Sphinx + SMF and not with normal installation of SMF. Just wanted to report, searched this board before posting, didn't see any thread on this.

Arantor

What are you calling broken exactly? I don't see where it's 'broken'.

margarett

I'm guessing it's because it shows   in the search results...
Se forem conduzir, não bebam. Se forem beber... CHAMEM-ME!!!! :D

QuoteOver 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Arantor

I'm guessing they're supposed to do that seeing how the preparser replaces pairs of spaces with space+nbsp...

ashish101

Quote from: Arantor on September 26, 2014, 06:20:33 AM
I'm guessing they're supposed to do that seeing how the preparser replaces pairs of spaces with space+nbsp...
Yeah but why would   appear for "j.s. sc" query?

Arantor

Because search indexes are a serious pain in the proverbial to get correct.

Here's what happens.

First it works out what the 'words' are since any search index will be using words not characters.
-> A word is defined as a sequence of characters that are legitimately part of words. A full stop is not.

So, what it's actually doing is searching for a match against 'j', 's' and 'sc' as words. Your first result there has sc_stuff (and similar) which gets processed the same way and that's what it matched on.

The highlighting step is a very different process, it just looks for the same words - and matches them even though they wouldn't otherwise be considered 'words'. nbsp is internally treated as a word but for *highlighting* purposes, it isn't and will just be highlighted regardless.

This is not a bug. It is not ideal behaviour, but it is working as designed originally. It could be handled differently but doing it is leads to other strangeness.

Advertisement: