News:

Wondering if this will always be free?  See why free is better.

Main Menu

Table to large, strategies?

Started by Biology Forums, May 04, 2019, 11:23:56 PM

Previous topic - Next topic

Biology Forums

What do you do when a MYSQL database table gets so large that it starts to eat away at the server's CPU? I have a table called ***_related_topics that's currently 14 million rows. The CPU is constantly running >100% when the query associated with it run - take it away, it normalizes. Recently I switched to INNODB from MYISAM, but it didn't help at all.

What are some other strategies?

shawnb61

Configs for innodb are very different than those for myisam. 

Lots of articles out there on this topic.

https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Biology Forums

Not sure how that helps. Anyway, here's the query that's causing the hurt

SELECT IF(rt.id_topic_first = 1849000, rt.id_topic_second, rt.id_topic_first) AS ID_TOPIC
                FROM ***_related_topics AS rt
                        JOIN ***_topics AS t ON (t.ID_TOPIC = IF(rt.id_topic_first = 1849000, rt.id_topic_second, rt.id_topic_first))
                        JOIN ***_boards AS b ON (b.ID_BOARD = t.ID_BOARD)
                WHERE (id_topic_first = 1849000 OR id_topic_second = 1849000)
                        AND FIND_IN_SET(-1, b.memberGroups)
                ORDER BY rt.score DESC
                LIMIT 8

shawnb61

Quote from: Study Force on May 05, 2019, 01:05:00 AM
Not sure how that helps.

You recently changed to innodb, and you have a problem with a query.  Some mysql installs, especially older ones, have default settings that are good for myisam, but not innodb.  Mine was...   I had to increase innodb buffer pools after converting to innodb.

The first thing I would check is the innodb buffer size. 

If that looks reasonable, next I'd tune the query.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Biology Forums

I changed it back to MYISAM. I think what's best is if I optimize the query because what I'm noticing is that while the table I mentioned above has many rows, I can't seem to find documentation suggesting that # of rows affects performance. Any idea as to how I can optimize it?

Arantor

As usual, we'd want to see the EXPLAIN because the indexes are probably not good on it - and without indexes, the query must by definition examine every single row in the table (thus # of rows becomes directly related to how fast it runs)

Biology Forums

Thanks for the response. Do you mind helping me interrupt the EXPLAIN below? :-\

Arantor

So the first thing MySQL does is work out which tables need to be looked at and it tries to work out the most efficient way it can based on the tables, the types of joins and the things it guesses it will be able to find. Explaining the actual mechanics of that is multiple chapters in a textbook's worth of complexity, but it doesn't matter, you just need to understand it's going to look at all the tables and try to work out how to join them together.

In this particular case, it's started with the related topics table, worked out it has zero indexes it can actually use to find anything - the key column tells you what it is trying to use for an index (or what it did use as an index), and the Extra column provides notes on how it did it.

So it's started with related topics, decided that it will need to start there, then add the others in the order topics then boards (based on the order of rows in the list) and it can match 1 row to 1 row from related topics to actual topics, and 1:1 related topics to boards, so that's probably not the immediate cause of problem.

The where on membergroups (on boards) isn't tragic, because the real work of filtering was done in the first row and it can't use anything to help it.

I'd suggest adding a new non-unique index on the id_topic_first column and seeing what that does. Alternatively a combination of (id_topic_first, id_topic_second) or even (id_topic_second, id_topic_first) on top might help. It's hard to say because seeing the query without context (the IF() is a bit weird), it's hard to know quite what's going on.

But right now it's going over every single row of the rt table because it has no indexes to help it ignore the rows that don't matter, though the IF might nuke that regardless of indexes :(

Biology Forums

Currently both id_topic_first and id_topic_second have primary keys.

How do I add a non-unique index to a column? I've search online, but couldn't find an answer that sufficed.

Arantor

In what order? This matters. (Also, there's only one primary key on a table, that's why it's called primary)

Biology Forums

if_topic_first, then id_topic_second (shown below).

That said, should I drop the key on one of these?

Arantor

No, you should not drop these; it will break if you do.

Also, no, that's not what I asked. You can have phpMyAdmin show up with the little gold key on 2 columns and not be in the order given there.

What does the indexes section under the structure tab say? That will list the actual order of the columns in table indexes.

Biology Forums

Noted, I was about to drop them, luckily I did not.

Here's what I see:

Arantor

So there's already an index on first then second, well that eliminates my first suggestion as being useful. You can try adding a new index on id_topic_second + id_topic_first (use the 'create index on [] columns' option, with 2 in the box, then put id_topic_second in the first row, id_topic_first in the second row, and tell it to use type INDEX)

(This, incidentally, is how to make a non-unique index, column order is important.)

Biology Forums

Would it be something like:

create index idx on ***_related_topics (id_topic_second, id_topic_second);

I want to make sure it's correct before executing the query.

Arantor

Or you could just use the UI to do it for you which will get the syntax correct.

Biology Forums

Good call, just executed:

ALTER TABLE `***_db`.`***_related_topics` ADD INDEX `idx` (`id_topic_first`, `id_topic_second`);

Out of curiosity, what happens if this strategy doesn't work?

Do you suggest creating a third index where the columns are reversed?

Arantor

Wait, which one did you make? You previously had id_topic_second, id_topic_first, now you're doing id_topic_first, id_topic_second... you already have an index on id_topic_first, id_topic_second - the primary key (and MySQL would have used it if it was of any use here, but it wasn't).

If adding a second index doesn't help, we'll need to look at rewriting that query to actually use an index and work out wtf it's doing. I actually suspect it won't help because of the IF() meaning MySQL will still probably want to visit every single row regardless.

Biology Forums

I created the non-unique index respecting the column order as mentioned earlier. See the attachment for details. Unfortunately it's still eating up my CPU >300% :(

Here's what the function looks like in its raw from:

function loadRelated($topic)
{
global $modSettings, $context, $db_prefix, $user_info;

$result = db_query("
SELECT IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first) AS ID_TOPIC
FROM {$db_prefix}related_topics AS rt
JOIN {$db_prefix}topics AS t ON (t.ID_TOPIC = IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first))
JOIN {$db_prefix}boards AS b ON (b.ID_BOARD = t.ID_BOARD)
WHERE (id_topic_first = $topic OR id_topic_second = $topic)
AND $user_info[query_see_board]
ORDER BY rt.score DESC
LIMIT $modSettings[relatedTopicsCount]", __FILE__, __LINE__);

if (mysql_num_rows($result) == 0)
return false;

$topics_ids = array();
while ($row = mysql_fetch_assoc($result))
$topics_ids[] = $row['ID_TOPIC'];

mysql_free_result($result);

return prepareTopicArray($topics_ids);
}


You'll notice that it's designed for SMF 1.x. Important to mention that on my 2.x community, I use the same mod except its adapted for 2.x; the code is identical and I get the same sort of choke up. I'm assuming the mod author never had >1 mil rows in mind when developing it.

How can we improve the query?

Arantor

Delete the index you just made, it's not helping you out any. I didn't realise that there was already a primary key covering first-then-second when I suggested the changes, but as usual you jump ahead without giving me opportunity to respond and save you the time and trouble, and unfortunately you're at the point where jumping ahead and just doing things is potentially likely to make things a lot worse if not really careful.

In fact in your current situation you will have made your problem worse, not better, because now you have two indexes that do the exact same thing (and MySQL will quite happily keep both up to date but use neither in this case)

The reality is that this is simply badly written and at this point I don't have the patience to sit and fix it for you; the fact it was written for 1.x and ported to 2.0 is simply not relevant, it's just a badly written query and no amount of indexes can actually fix it, they can just mitigate (if applied properly) but at this point I'm actually afraid of suggesting anything because you'll shortcut through it without understanding any of it, which is something I don't want to deal with when inevitably it makes the problem worse.

Advertisement: