Advertisement:

Author Topic: Support Full-Text Search for Non-English Chars?  (Read 1251 times)

Offline Sefank

  • Newbie
  • *
  • Posts: 7
Support Full-Text Search for Non-English Chars?
« on: September 24, 2017, 12:52:58 AM »
It seems that SMF do not support full-text search for non-English chars like Chinese characters and it leads to a Chinese forum that is unable to use the search.
So any solutions?  :laugh:

Offline albertlast

  • Development Contributor
  • Jr. Member
  • *
  • Posts: 171
Re: Support Full-Text Search for Non-English Chars?
« Reply #1 on: September 24, 2017, 01:05:55 AM »
MySQL/PostgreSQL didn't support fulltext search index for Chinese/Japan,
but they provide support for other "non english chars" like russia or arabic.

Offline Sefank

  • Newbie
  • *
  • Posts: 7
Re: Support Full-Text Search for Non-English Chars?
« Reply #2 on: September 24, 2017, 01:09:00 AM »
MySQL/PostgreSQL didn't support fulltext search index for Chinese/Japan,
but they provide support for other "non english chars" like russia or arabic.
So any solution for Chinese/Japan? THX :D

Offline albertlast

  • Development Contributor
  • Jr. Member
  • *
  • Posts: 171
Re: Support Full-Text Search for Non-English Chars?
« Reply #3 on: September 24, 2017, 02:16:22 AM »
Well the easiest one would be not to "speak" any asia language  ;D

When not possible maybe try to use a external full text search engine like sphinx and connect this to your smf:
https://github.com/SimpleMachines/sphinx-for-smf
but don't ask me about the mod how it work...

Offline Sefank

  • Newbie
  • *
  • Posts: 7
Re: Support Full-Text Search for Non-English Chars?
« Reply #4 on: December 09, 2017, 11:29:10 PM »
It seems that it only works on VPS and I'm just a virtual host user. Maybe it can't be solved... :'(

Online shawnb61

  • Support Specialist
  • Full Member
  • *
  • Posts: 508
    • sbulen on GitHub
Re: Support Full-Text Search for Non-English Chars?
« Reply #5 on: December 09, 2017, 11:51:51 PM »
Bear in mind that the term "fulltext search" is a specific reference to DB-engine (e.g., MySQL) built-in search.

The problem is that most of these engines interpret "words" as having white-space between them.  Some languages, in particular Asian languages, don't have white-space between words...

SMF also offers an alternate approach - a "custom index", but I am not sure how well that works for Asian languages.  I'm not sure how well Google works either for them. 

Folks who are interested may find it interesting that recent versions of MySQL have a couple of enhancements that may help address these issues, the ngram parser & the MeCab parser:
   https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html
   https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html

You may need to reach out to your host to see if these features may be enabled.  Then, testing will be needed to confirm compatibility with SMF and SMF's fulltext search option.   

If you enable ngram or MeCab, and have luck with SMF, please report back. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Offline Sefank

  • Newbie
  • *
  • Posts: 7
Re: Support Full-Text Search for Non-English Chars?
« Reply #6 on: Yesterday at 12:20:38 AM »
My host only supports MySQL 5.5. :'(
Hoping some will try these new features with 5.7
THX! :D

Offline albertlast

  • Development Contributor
  • Jr. Member
  • *
  • Posts: 171
Re: Support Full-Text Search for Non-English Chars?
« Reply #7 on: Yesterday at 02:00:07 AM »
Just for the completeness (so this information didn't help you),
on pg side your need some kind of extension:
Like
https://github.com/amutu/zhparser/
which could run directly with smf when installed and setup
or
https://github.com/pgroonga/pgroonga
which need also a mod on smf side because the database language is different to the commone fulltext search in pg directly.