Extreme issues with Copying to Tmp table.

Started by Tom17, August 05, 2014, 10:25:59 AM

Previous topic - Next topic

Tom17

Hey all,

So I have a 'large forum' (I am in the large forums section of the site, but understand that I need to post this kind of question here first instead), that is suffering terribly from queries taking 20s to over 5 mins on certain queries. I have performed most of what I can in the '24 things' thread, and when these long-running queries are NOT taking place, the forum runs speedily enough. It is running 2.0.8 on a VPS with 2GB ram and a seemingly slow disk. It has ~2.5m posts, ~80K members and ~50K views per day.

I have been monitoring the mysql variables for tmp table creation and it doesn't seem like it's creating many on-disk tmp tables - instead I assume that it's just the disk reading that's clogging things up. It's a crappy server, way under-sized for the forum, but I am restrained in that department for now (I am only the volunteer sys admin for it and there is no official line of funding other than donations). I think the disk access on this rubbish VPS is the bottleneck, as well as the paltry 2GB ram allocation. Believe me, I am trying to find options to get onto a more appropriate server.

But in the meantime....

One of the queries in question (And most slow queries suffer the same problem) is this one, which just took over 1000 secs:

SELECT t.id_topic
   FROM smf_topics AS t
     INNER JOIN smf_messages AS mf ON (mf.id_msg = t.id_first_msg)
   WHERE t.id_board = 34
   ORDER BY is_sticky DESC, mf.subject DESC
   LIMIT 5425, 35;

From my fast-track learning of MySQL recently, I understand that by having a text field, the smf_messages table (with ~2.5m rows) will always create an on-disk tmp table when you perform a sorting command on them. I clarified this by running the query without the sort and the query is almost instant.

In the 24 steps thread, it went through the conversion to InnoDB and the conversion of text fields to varchar, except for in certain tables, including smf_messages. I saw one post that alluded to the fact that converting smf_messages was an 'oops' moment, but no clarification as to why.

My real question is this: What *are* the ramifications of converting smf_messages to InnoDB and converting its TEXT field to varchar? We have already had to disable search 2 years ago due to performance, so I won't be missing any functionality in that regard.

Note that I am not looking for tuning tips right now, that's why I have not given much info on the setup amd what exactly I have done so far. This query is obviously a bottleneck so I just want to understand the implications of the conversions I mentioned above. I may just go ahead and test this on a copy of the database anyway.

Thanks,

Tom...

Arantor

Well... converting everything from text to varchar isn't what the tip was recommending because in most cases it isn't possible. It was suggesting converting *tinytext* (i.e. < 255 characters) columns to varchar, which generally is a good idea.

You simply cannot convert a text column to varchar because it's too long. A table row is limited to 65535 bytes in general, and varchars are included in that while text is not. So if you have even modestly long posts you may well find yourself hitting that limit. It isn't a magic bullet, either, because you can still get temporary tables out of that.

Converting to InnoDB will help in other regards but not particularly the sorting issue you see here.

What I would wonder is whether or not you'd be more interested in simply neutering that particular query which is the most expensive way to traverse the message list.

Tom17

Ahh yes, tinytext. Sorry, I have been taking in so much of late with this in the few minutes I get here and there to work on it, forgot that 'minor' detail :) Makes sense.

So neutering this, and any other queries that order rows derived from smf_messages... I'm not 100% familiar with the code, but I'm assuming that the query above is the one that's done when you go to a board index and it shows recent posts in that board. Neutering that would mean no more recent posts in that board index, right? (And a similar loss of functionality in other slow queries) Or did you mean something else by neutering it?

Thanks for your speedy reply :)

Tom...
(Yes, I want my cake and I want to eat it too lol)

Arantor

Specifically, that's the one that does sorting of topics inside a board... but most of the time it doesn't sort on both the topics and messages tables. In the case you've posted, it's sorting by subject, which means the lookup (and thus the penalty)

You could remove the option to sort by subject and cut out that entire query in the first place.

LiroyvH

afaik you can convert all tables to InnoDB, have multiple forums running that way with no problems at all.
Funnily enough though, you're not looking for tweaks/tuning, but InnoDB really *does* need a good bunch of configuring and tuning to actually make it run like a charm. Leaving it to the defaults or setting wrong values will result in extremely poor performance on larger databases.

I do know recent topics/unread replies can cause huge queries for tmp tables, if you see those than the following may help.
Butch2k suggested a fix for that a while ago, most useful on larger forums imho;
http://www.simplemachines.org/community/index.php?topic=502082.msg3530311#msg3530311
((U + C + I)x(10 − S)) / 20xAx1 / (1 − sin(F / 10))
President/CEO of Simple Machines - Server Manager
Please do not PM for support - anything else is usually OK.

Tom17

I had previously tuned, and foolishly didn't backup my good my.cnf when the provider overwrote it (Maybe WHM did it - ugh-, not sure). I do need to sit down and get this thing tuned up again, but I fear my problem is more a case of cramming this thing into 2GB more than anything else.

I don't think losing the 'recent posts' functionality would go down at all well... Will check that other thread, ta!

Tom...

Arantor

Except you're not losing 'recent posts'. The function I'm talking about is a specific subset of 'the list of topics inside a board' not recent posts.

Ben_S

Quote from: ‽ on August 05, 2014, 10:39:32 AMYou could remove the option to sort by subject and cut out that entire query in the first place.

By far the simplest solution, it really is a pointless option for a big forum.
Liverpool FC Forum with 14 million+ posts.

Tom17

OK, I have never been much of a 'user' of the forum myself. Not a 'power user' at least (I am not a mother - it's a parenting board).

Trying to work out where our terminology is getting muddled up, I only just noticed the sort by subject. I guess people are clicking that and causing the issue. I was wondering why it was doing a sort on the subject, that would explain it  ::)

I'll try disabling it and see how it goes :)

I can't see an option for this - I assume I need to twiddle the code?

Arantor

Yup. It's almost certainly bots doing it, as well, rather than humans - at least generally.

Dirty fix would be to modify Sources/MessageIndex.php:
// We only know these.
if (isset($_REQUEST['sort']) && !in_array($_REQUEST['sort'], array('subject', 'starter', 'last_poster', 'replies', 'views', 'first_post', 'last_post')))
$_REQUEST['sort'] = 'last_post';


Replace with:
// We only know these.
if (isset($_REQUEST['sort']) && !in_array($_REQUEST['sort'], array('starter', 'last_poster', 'replies', 'views', 'first_post', 'last_post')))
$_REQUEST['sort'] = 'last_post';


On a more serious note, you could also remove starter and last_poster which sort by the name of the person posting the first/last post respectively which would likely be as bad.

This won't remove the links on the UI for them or anything but it will prevent those types of sorting being done.

Tom17

I tried that, and then re-enabled the forums. Started getting order by queries straight away. Trued removing ALL ordering options and so far, touch wood, I have had no slow queries.

Will monitor this for today. Thanks for the help, and sorry I took so long to 'get' what you were all saying - fuzzled head. Learning more though and I always love to do that :)

My guess is that users ARE sorting the forums based on various things, as the search is disabled and I guess it's a good way to find what they are looking for. I guess I will get complaints in. If this frees up some resources, I may be able to squeeze a tiny Sphinx in there until we get a bigger server.

Thanks,

Tom...

Arantor

Experience suggests it isn't actually users doing it but search engines since the links are visible to guests, even when search is disabled. See if you get complaints, but I'd be a bit surprised to be honest.

Tom17

Again, me and words lol. Yeah, I get that the majority of it is bots, especially as the tmp table hits came straight away after I turned the forum back on. What I meant is that I would not be surprised if some users do use it.

On the flipside, I do also know that they use google to search for their posts for now, so maybe they don't :)

Oh and will google-et-al be able to crawl the site properly now? I assume they will be able to because of the links to the different pages. In fact, following the 'sort by' links seems like a pretty inefficient way to searc - not that the bots would know that. Is it normally advised to try to prevent them following them in robots.txt?

Thanks again for the help. The forum is flying again. We can live on a little longer in 2GB RAM, but I need to get on that soon :)

Tom...

Arantor

Well, Google etc. will still have those links indexed - removing them is more fiddly - but they won't *do* anything if that makes sense.

It's not normally covered in robots.txt to be honest.

LiroyvH

Quote
(Maybe WHM did it - ugh-, not sure).

Nah, it can totally kill your php.ini, but my.cnf is untouched.
In fact, when you upgrade to a higher version of mySQL (eg 5.1 to 5.5) it can refuse to restart until my.cnf is fixed or throw a bunch of errors about functions it has no idea what to do with anymore.

Quote
but I fear my problem is more a case of cramming this thing into 2GB more than anything else.

I'm curious what the size of your db is.
((U + C + I)x(10 − S)) / 20xAx1 / (1 − sin(F / 10))
President/CEO of Simple Machines - Server Manager
Please do not PM for support - anything else is usually OK.

Advertisement: