News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Search index creation woes

Started by Roph, February 13, 2008, 08:11:37 PM

Previous topic - Next topic

Roph

As my forum has continued to grow and search performance continued to degrade, today I thought I'd recreate the (custom) search index, but this time on the large setting instead of moderate.

Removing the custom index worked like a charm, and I went though the processes of starting to create a large index.

As those of you with larger forums may know, this takes absolutely ages and is very boring :P, so I left it going and went off for a while. When I came back, I was met with the admin login screen. Maybe my session timed out or something? No worries anyway, I know SMF lets you resume, so I went back to Admin -> Search -> Search Method and clicked resume.

All I get (the same result on many tries) is just an offer to download index.php though  ::).

From here, I thought I'd try removing how far it'd gone in creating it and starting over. Removing it seemed to work ok, but now I'm at my problem position:



This is what I'm greeted with now. Clicking on [remove custom index] does nothing except basically refresh the page. The option still stays there. Chosing [resume] just offers me a download of index.php, every time without fail.

Um,  :'(

Bigguy

If your asked to download an index.php file  that is a server side issue and your host should probably be contacted to see what is going on.

Roph

#2
I'm the host, pretty much. I run my own VPS.

Sources/ManageSearch.php is a little cryptic under examination  :(

There is no "smf_log_search_words" table in my database.


Pretty much, I just want to restart creating the index, but SMF refuses to think that there isn't one, and will only let me either try to delete or resume a non-existant index (both of which obviously fail). Any help?  :'(


Ben_S

Empty any log_search tables and see what happens.
Liverpool FC Forum with 14 million+ posts.

Roph

Tried both emptying and then dropping the log_search tables, got the same result as is in the first post each time. :(

Sarge

Go to Admin > Forum Maintenance and run the first 5 operations.


    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

Roph

Done, still exactly the same  :(

Incase needed or relevant, phpinfo / status.

Roph

Bump?  :'(

I'm currently running on no index at all, and searches not only consume enormous amounts of my VPS' resources, they take over a minute for some queries  :(

Rumbaar

Have you checked to see if the Table is actually stored as Type = MyISAM for those particular tables?
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

Every single table in the DB is in MyISAM :(

Rumbaar

I'm not sure what is refers too, but what is the body field in the smf_messages type set too?  (TEXT?)
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

#11
In the structure of the messages table under body it's set to "MEDIUMTEXT" for type.

Should I try changing it to TEXT?

[edit] I backed up incase anything went wrong, set the field to TEXT and now SMF let me generate a fulltext index, which is working perfectly =)

Rumbaar

#12
Yes, well you've solved it.

Just some additional info for you. TEXT type has a maximum character set of 65535 bytes, were as MEDIUMTEXT has something like 4 million bytes.  Hence the error you got.  It was either of the two, but as the standard for the BODY is TEXT I didn't think that first and most people don't change that.

The problem you might find is that if any posts were above 65535 bytes they will now have been truncated to that maximum.

EDIT: Correct incorrect information.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Sarge

Quote from: Rumbaar on February 29, 2008, 02:00:19 AM
Yes, well you've solved it.

Just some additional info for you. TEXT type has a maximum character set of 65535 characters, were as MEDIUMTEXT has something like 4 million characters.  Hence the error you got.  It was either of the two, but as the standard for the BODY is TEXT I didn't think that first and most people don't change that.

The problem you might find is that if any posts were above 65535 character they will now have been truncated to that maximum.

Actually, I think the limits are in bytes. Some character sets (UTF-8 for example) can use more than 1 byte per character, depending on the character, and that would further decrease the limit of characters in a post.

However, it should be pretty safe to say that English texts (excluding non-standard characters like "smart quotes", em—dash etc.) take up 1 byte per character.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

Rumbaar

Sorry about that, yes it's bytes and not characters, and yes utf8 characters can take up to three bytes depending on the language.

Thx Sarge.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

My maximum allowed post size is set at 40,000 characters, so I think I'm safe in that regard. Thanks for the help & Info =)

Roph

Ok, here's an update. I just had to use a backup from the 27th, due to a severe table crash of the messages table today(29th). Attempts at repairing failed and actually brought down my entire VPS.

I'm running fine now on the 2 days old backup, with the field set back on "MEDIUMTEXT", and I hope you'll forgive me for not being too keen on trying out TEXT again.

I'd still like to just use the custom index option. My old custom index (on the moderate setting) performed acceptably I suppose, I was just hoping to shoot for better performance with a large index, though the creation failed. Read my first post again I guess. I'd like to just re-start creating a custom index. :(

Rumbaar

Well the BODY field in a default install of SMF is set to TEXT, so I don't see how that was the cause of your issue.  Do you know how and why it was changed to MEDIUMTEXT in your install?

With the field set higher than 65535 (which is MEDIUMTEXT) I don't know how you're going to get past that error message and how you were able to create one in the past.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

The database has quite a story to tell - Invision powerboard originally, then phpBB2, then SMF 1.0 and finally SMF 1.1. It's been around the block, so to speak :P

I've never edited anything in the db "raw" before, so I can only assume it's a side effect of one of the conversions.

Is a custom index limited by TEXT/MEDIUMTEXT also then? Up until I removed it to attempt to create the large one, my moderate sized custom index was functioning properly, just showing signs of slowness on large queries or at busy times.

Rumbaar

Ah, I'm not sure how those conversion would affect that field.  I'm sure at one point it was MEDIUMTEXT from a previous forum software and was just retained.

I can't say I'm too familiar with process behind standard/custom indexing, so can't comment on the affect of each type.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

Going to have to bump, is there no way I can re-start creating a custom index?

I'm back to my problem as I explained it in the first post, so please read that.

Rumbaar

I'm still at a loss to how you were able to create an index in the first place, with the field set to MEDIUMTEXT.  But then I'm not sure of how it might have turned from just TEXT to MEDIUMTEXT.  From the error message it indicated it is in fact limited to TEXT property.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Roph

Hmm.

Is there a query I can run on the database to make sure that none of my messages are over the size limitation of TEXT?

As said, my post size limit is 40,000, though I know it hasn't always been that. If there are any over 65535, the likely culprits will be scripts (game creation scripting language boards), and if they were cut off it would render them useless.

H

Hi Silverline,

Did you manage to sort this out?

If not I don't know if you can achieve this with just SQL but I imagine you should write a php batch script to grab all the messages and then count each one.
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

NightHorse

Hi, i have the same problem !
the table "smf_messages is in "text" " so I think there would be something else..

I was successfully using custom index, but when server crashed down (after restart) I had the same problems like silverline also I couldn't create it anymore. Now i'm using fulltext index but forum is always crashing.. sometimes cannot be run via watchdog and so we must go to telehouse and start it manually..

H

Quote
I was successfully using custom index, but when server crashed down (after restart) I had the same problems like silverline also I couldn't create it anymore. Now i'm using fulltext index but forum is always crashing.. sometimes cannot be run via watchdog and so we must go to telehouse and start it manually..

So you can't even recover the server via SSH? SMF shouldn't cause the whole server to crash or reboot :)
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

orange

Quote from: Silverline on February 13, 2008, 08:11:37 PM
I went though the processes of starting to create a large index.

As those of you with larger forums may know, this takes absolutely ages and is very boring :P, so I left it going and went off for a while. When I came back, I was met with the admin login screen. Maybe my session timed out or something?

I'm trying to create a search index at the moment, and yes it's painfully slow. But the worst part is that I can't just leave it overnight, because once an hour it stops and prompts for my admin password, though I'm still logged in (I presume the admin session is timing out or something).

Is there a fix for this? For me to remember to come back to the thing and type in my password every hour, in an operation that looks like it's going to take about 24 hours, is a painful prospect...

Alternatively, is there a query script to generate the index that I could run manually in MySQL while taking the forum offline, so that I don't have to keep waiting for this page to refresh itself 5000 times and finally get to 100%?

Rumbaar

I'm not sure if it will allow for the whole process to continue, but you can temporarily disable 'admin security' while you run this process and it shouldn't prompt for the admin password after the default timeout period for admin security.
"An important reward for a job well done is a personal sense of worthwhile achievement."

[ Themes ]

Sarge

Quote from: orange on May 03, 2008, 03:35:18 PM
I'm trying to create a search index at the moment, and yes it's painfully slow. But the worst part is that I can't just leave it overnight, because once an hour it stops and prompts for my admin password, though I'm still logged in (I presume the admin session is timing out or something).

Is there a fix for this?

Quote from: Rumbaar on May 03, 2008, 08:30:40 PM
I'm not sure if it will allow for the whole process to continue, but you can temporarily disable 'admin security' while you run this process and it shouldn't prompt for the admin password after the default timeout period for admin security.

Yes, creation of a search index should complete automatically if "Disable administration security" is checked from Admin > Features and Options > Basic Features tab.

Once I was creating an index in a large forum and SMF prompted me for the admin password every hour or so; I just disabled it and the index creation finished by itself after that.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

orange

Thanks guys, that should do the trick!

Advertisement: