News:

Wondering if this will always be free?  See why free is better.

Main Menu

Support Full-Text Search for Non-English Chars?

Started by Sefank, September 24, 2017, 12:52:58 AM

Previous topic - Next topic

Sefank

It seems that SMF do not support full-text search for non-English chars like Chinese characters and it leads to a Chinese forum that is unable to use the search.
So any solutions?  :laugh:

albertlast

MySQL/PostgreSQL didn't support fulltext search index for Chinese/Japan,
but they provide support for other "non english chars" like russia or arabic.

Sefank

Quote from: albertlast on September 24, 2017, 01:05:55 AM
MySQL/PostgreSQL didn't support fulltext search index for Chinese/Japan,
but they provide support for other "non english chars" like russia or arabic.
So any solution for Chinese/Japan? THX :D

albertlast

Well the easiest one would be not to "speak" any asia language  ;D

When not possible maybe try to use a external full text search engine like sphinx and connect this to your smf:
https://github.com/SimpleMachines/sphinx-for-smf
but don't ask me about the mod how it work...

Sefank

It seems that it only works on VPS and I'm just a virtual host user. Maybe it can't be solved... :'(

shawnb61

Bear in mind that the term "fulltext search" is a specific reference to DB-engine (e.g., MySQL) built-in search.

The problem is that most of these engines interpret "words" as having white-space between them.  Some languages, in particular Asian languages, don't have white-space between words...

SMF also offers an alternate approach - a "custom index", but I am not sure how well that works for Asian languages.  I'm not sure how well Google works either for them. 

Folks who are interested may find it interesting that recent versions of MySQL have a couple of enhancements that may help address these issues, the ngram parser & the MeCab parser:
   https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html
   https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html

You may need to reach out to your host to see if these features may be enabled.  Then, testing will be needed to confirm compatibility with SMF and SMF's fulltext search option.   

If you enable ngram or MeCab, and have luck with SMF, please report back. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Sefank

My host only supports MySQL 5.5. :'(
Hoping some will try these new features with 5.7
THX! :D

albertlast

Just for the completeness (so this information didn't help you),
on pg side your need some kind of extension:
Like
https://github.com/amutu/zhparser/
which could run directly with smf when installed and setup
or
https://github.com/pgroonga/pgroonga
which need also a mod on smf side because the database language is different to the commone fulltext search in pg directly.

Advertisement: