News:

Want to get involved in developing SMF? Why not lend a hand on our GitHub!

Main Menu

Search issue - doesn't search my name!

Started by Adish - (F.L.A.M.E.R), December 06, 2010, 01:54:41 PM

Previous topic - Next topic

Adish - (F.L.A.M.E.R)

If you search my name via the search function:
F.L.A.M.E.R
"F.L.A.M.E.R"
(F.L.A.M.E.R)
"(F.L.A.M.E.R)"

no results are given out. I havent got a chance to test it out on a virgin 2.x or even 1.x but this forum (sm.org) having my name in many boards does not return with any results.

I'll have a test if I can get hold of a computer anytime soon and post up results if anyone else doesnt test it out before me.

Illori

seems the search function also produces no results when searching for something like i. which could be the issue (F.L.A.M.E.R) is seeing.

Adish - (F.L.A.M.E.R)

I scared the search engine... Boo! Muahaha :P

Arantor

Before we can begin to debug this, let's take stock of a few things.

Firstly, I can guarantee your local install will perform differently to here - this server uses Sphinx. Sphinx tokenises words (building hashes out of word units) and IIRC (though SleePy can confirm), the . character is not considered a word character, so it's only interested in the individual letters. And, again IIRC, the minimum word length is 3 (certainly that's the Sphinx default), which means it won't even have indexed your name.

Same deal for & *here*. Your mileage will vary in other systems seeing how there are 3 different ways that SMF will perform a search depending on backend, and what characters are being stripped first.
Holder of controversial views, all of which my own.


Aleksi "Lex" Kilpinen

The search on sm.org has never been any good with any special chars. Stick to basic alphabets and your golden though ;)
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

Arantor

It's configurable though, as part of the Sphinx configuration, you give it a list of characters it should treat as parts of words. Trouble is, if you add something like ; to the list, it becomes part of a word, so $variable = $othervariable; - the ; is part of the 'word' othervariable.
Holder of controversial views, all of which my own.


Joshua Dickerson

Come work with me at Promenade Group



Need help? See the wiki. Want to help SMF? See the wiki!

Did you know you can help develop SMF? See us on Github.

How have you bettered the world today?

Arantor

QuoteCommon English punctuation shouldn't be part of the word.

Correct, but that's why it ignores the . in F.L.A.M.E.R. but on a technical forum wouldn't you want to consider punctuation as well, possibly?

In any case you can't tell Sphinx to differentiate between when it should, and when it shouldn't consider characters as punctuation, they're either parts of what Sphinx considers a wordunit or they're not, simple as that.
Holder of controversial views, all of which my own.


SlammedDime

The Sphinx Search API we use here strips all non-alphanumeric characters from the search string and replaces them with empty spaces and then searches on that.  In the case of your name, "(F.L.A.M.E.R)" becomes "F L A M E R", and since our minimum word length is set to '3' in the sphinx config file, your name will never show up when searched for...

Change your name, it's better that way for everyone... :P
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Arantor

...I thought I already explained this a few posts up? (though I won't deny you explained it a little more simply than I did... and I think you'll find certain characters are still indexed, that it's not just strictly alphanumeric)
Holder of controversial views, all of which my own.


SlammedDime

Quote from: Arantor on January 28, 2011, 05:29:21 PM
...I thought I already explained this a few posts up? (though I won't deny you explained it a little more simply than I did... and I think you'll find certain characters are still indexed, that it's not just strictly alphanumeric)
Your post was relating to how Sphinx indexed the words, I was pointing to how the search string is 'reformatted' before being sent to Sphinx.  Here on SM.org, it doesn't matter if you try searching for "!#)*#(#HI!@#(*(*#$", or "...HI...", Sphinx will be sent "HI" either way.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Arantor

Funny, it certainly used to bother it which is why there were all the 'unable to access the search daemon' issues which were Sphinx-side, not SMF side...
Holder of controversial views, all of which my own.


SlammedDime

When I rewrote the API recently to use SphinxQL, I modified how search strings were formed.  Instead of relying on what SMF was passing in to the API, I took the raw search string and cut out all the non-alphanumerics and used that instead (you have access to big boards, take a gander at the file).
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Arantor

Oh, so it's using the new API here? Interesting.

What's more interesting is that your API is nice enough to pass true alphanumeric characters to Sphinx, looking at the regexp you put in, but that the default configuration file that's built just discards anything that isn't 0..9, A..Z, a..z or _. (I know the configuration here has a much more thorough charset_table though)
Holder of controversial views, all of which my own.


SlammedDime

Yea, our charset_table is very verbose here, meant to encompass as many utf8 characters as possible (which I had to take into account when doing the regex).  The default config file though shouldn't be construed to be an 'end all' solution though, it forms the basis to get sphinx up and running and operating with SMF... as with anything that can serve a wide range of audiences, some fine tuning is necessary (although I'm sure SM.org could share it's finer details of sphinx.conf and maybe we could create a more robust solution).
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Arantor

There's not really a lot of need to share, it's basically just a dump of all the characters listed on the Sphinx wiki for all the different regions.
Holder of controversial views, all of which my own.


vbgamer45

Not sure how big of issue this is. Seems like all major search engines have the same issue. Tried searching his name or any name with different  punctuation  on google and it fails as well.
Community Suite for SMF - Grow your forum with SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com - Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

Adish - (F.L.A.M.E.R)

When we use "F.L.A.M.E.R" or "(F.L.A.M.E.R)", shouldn't it mean that you need the the word being searched to be exactly the same? Whereas in F.L.A.M.E.R or (F.L.A.M.E.R), they are considered to be as full stops?

Also, I think it is a better idea to treat F.L.A.M.E.R as FLAMER instead of F L A M E R, that is what all the search engines do.

As a note: We can search (F.L.A.M.E.R) or F.L.A.M.E.R without a warning saying you need at least 3 characters for the search but gives no results. So it does consider it in some other way..

The funny thing: Members search gives my name as the result for both (F.L.A.M.E.R) and F.L.A.M.E.R ;D ;D

coloradorockies

#18
Your name so popular (About 4,530,000 results ) next time try more unique and you will success with search result.  :)
service 

SlammedDime

LOL... that just screams "I'm a bot, ban me now!!!!!"
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Advertisement: