News:

Join the Facebook Fan Page.

Main Menu

Rethinking search

Started by Arantor, April 09, 2022, 09:10:29 AM

Previous topic - Next topic

Arantor

I think it's time to talk about rethinking search.

2.1 does sort of nibble at the edges of what should be done, but ultimately the whole things needs a rework. The rules have changed since 2006, the needs have changed too.


First up, getting rid of the custom index. The value of it is simply not what it once was. The lack of stemming support is also painful in usability, and the optimisation vs MySQL's own fulltext is no longer true in any case I can find.

Secondly, making it actually pluggable - bits of the code for the custom index and the fulltext index are intermashed throughout. Basically, it should be a proper API where the code simply says 'here's a new message, board 1, topic 2, message 3, its content is XYZ' and the search backend figures out how to put that into the index. Similarly the search frontend should then solely query the API.

Thirdly, supporting other content types. The search index is currently forum only, but making it know about different kinds of content wouldn't be particularly hard to do. Define content types (such as message), plus a scope (topic) and a permission boundary (board), and you can build a system that can cope with searching all the things from all the places. LevGal would totally have used this if it had been an option.

Fourth, the search index shouldn't just accept the bare content - ever. This is surprisingly ineffective in SMF in general if you're not careful, and renders a lot of things very unhelpful. What it should do is parse the bbcode, remove things that are quotes of other messages (where can reliably defined as such), flatten out the preserved spacing and index the content that's left. This also means, incidentally, that there's no bootstrapping of the index on top of the messages table so no fudging around of column type to support fulltext on older MySQL versions. (Side, side note: just make it a mediumtext already)

Fifth, the search system should support its own index by default (with none of this 'no index' nonsense), plus ideally ElasticSearch and maybe Sphinx; ES won the war even if Sphinx still wins on performance, just because ES is easier to throw things into.

Last, make the index rebuild a background task.


Most people won't see a significant alteration to their life, other than the folks who never had a search index before now getting more space usage, and search improves for everyone. On top of that the folks who do the modding thing should be able to hook into the search system meaning that content just becomes more discoverable by design.

Other than the effort of implementation I see no downside to any of this.
No good deed goes unpunished / All helpful urges should be circumvented

I have something to say: it's better to burn out than to fade away. There can be only one.

spiros

I once did make a recommendation on ES (used by Xenforo among others). You replied there too!

I would also like to see something like search autocomplete out of the box (for example with a separate index just for topic subjects). It is the way search is done nowadays. Bugo has written a nice mod to provide similar topics autocomplete when creating new topics, this could be enhanced so as to run on standard search as well as support Sphinx.

Arantor

Search autocomplete is actually massively hard to do. It only works well against certain kinds of corpus, and unless you're doing it just against titles (which isn't particularly useful - not even in 'similar topics' contexts), it's hard to do *well*.

The only reason Google does it even passably well is because they're basing it fairly heavily on what people search for, and can use the volume of search queries to improve that index - something the rest of us simply won't have.
No good deed goes unpunished / All helpful urges should be circumvented

I have something to say: it's better to burn out than to fade away. There can be only one.

spiros

The idea is using it just against titles :) I think it is still much better and faster at retrieving certain kinds of information than not having it there.

Arantor

I think you'd want that to be a configurable option, there are usecases this would work really well in - I think your site would *definitely* benefit. But this site for example it would actually be counterproductive.
No good deed goes unpunished / All helpful urges should be circumvented

I have something to say: it's better to burn out than to fade away. There can be only one.

spiros

Well, certainly an option. Just imagine it this way: you get 5-10 suggestions based on your search, if you see something that is immediately helpful you go directly to that topic, failing that, you just hit Enter to get the full search results. I see that as helpful without disrupting the traditional search workflow. It can actually save time.

To give you an example, if I want to search here how to increase subject length I may get a lot of irrelevant posts with the traditional search, with autocomplete (searching in topics only), I can find and access what I want much faster.

Arantor

It *can* but in practice for most people *doesn't tend to*, having observed this in practice in other contexts. It relies too much on people putting in good titles which in a lot of communities just isn't what people do.
No good deed goes unpunished / All helpful urges should be circumvented

I have something to say: it's better to burn out than to fade away. There can be only one.

spiros

Yes, you are bound to get a lot of "help with a..." topics. Still, I find it better than nothing since it is an extra functionality and not the whole functionality.

Sesquipedalian

Quote from: Arantor on April 09, 2022, 09:10:29 AMI think it's time to talk about rethinking search.

[snip...]

All of these suggestions make sense to me.
Slava Ukraini!
Heroiam slava!

I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

spiros

Well, at the very least 2.1 does away with the dreaded Each word in your search query must be at least two characters long...

Advertisement: