Simple Machines Community Forum

General Community => Scripting Help => Topic started by: Biology Forums on May 04, 2019, 11:23:56 PM

Title: Table to large, strategies?
Post by: Biology Forums on May 04, 2019, 11:23:56 PM
What do you do when a MYSQL database table gets so large that it starts to eat away at the server's CPU? I have a table called ***_related_topics that's currently 14 million rows. The CPU is constantly running >100% when the query associated with it run - take it away, it normalizes. Recently I switched to INNODB from MYISAM, but it didn't help at all.

What are some other strategies?
Title: Re: Table to large, strategies?
Post by: shawnb61 on May 05, 2019, 12:05:53 AM
Configs for innodb are very different than those for myisam. 

Lots of articles out there on this topic.

https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 01:05:00 AM
Not sure how that helps. Anyway, here's the query that's causing the hurt

SELECT IF(rt.id_topic_first = 1849000, rt.id_topic_second, rt.id_topic_first) AS ID_TOPIC
                FROM ***_related_topics AS rt
                        JOIN ***_topics AS t ON (t.ID_TOPIC = IF(rt.id_topic_first = 1849000, rt.id_topic_second, rt.id_topic_first))
                        JOIN ***_boards AS b ON (b.ID_BOARD = t.ID_BOARD)
                WHERE (id_topic_first = 1849000 OR id_topic_second = 1849000)
                        AND FIND_IN_SET(-1, b.memberGroups)
                ORDER BY rt.score DESC
                LIMIT 8
Title: Re: Table to large, strategies?
Post by: shawnb61 on May 05, 2019, 03:01:08 AM
Quote from: Study Force on May 05, 2019, 01:05:00 AM
Not sure how that helps.

You recently changed to innodb, and you have a problem with a query.  Some mysql installs, especially older ones, have default settings that are good for myisam, but not innodb.  Mine was...   I had to increase innodb buffer pools after converting to innodb.

The first thing I would check is the innodb buffer size. 

If that looks reasonable, next I'd tune the query.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 08:38:52 AM
I changed it back to MYISAM. I think what's best is if I optimize the query because what I'm noticing is that while the table I mentioned above has many rows, I can't seem to find documentation suggesting that # of rows affects performance. Any idea as to how I can optimize it?
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 08:52:11 AM
As usual, we'd want to see the EXPLAIN because the indexes are probably not good on it - and without indexes, the query must by definition examine every single row in the table (thus # of rows becomes directly related to how fast it runs)
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 09:25:58 AM
Thanks for the response. Do you mind helping me interrupt the EXPLAIN below? :-\
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 09:35:25 AM
So the first thing MySQL does is work out which tables need to be looked at and it tries to work out the most efficient way it can based on the tables, the types of joins and the things it guesses it will be able to find. Explaining the actual mechanics of that is multiple chapters in a textbook's worth of complexity, but it doesn't matter, you just need to understand it's going to look at all the tables and try to work out how to join them together.

In this particular case, it's started with the related topics table, worked out it has zero indexes it can actually use to find anything - the key column tells you what it is trying to use for an index (or what it did use as an index), and the Extra column provides notes on how it did it.

So it's started with related topics, decided that it will need to start there, then add the others in the order topics then boards (based on the order of rows in the list) and it can match 1 row to 1 row from related topics to actual topics, and 1:1 related topics to boards, so that's probably not the immediate cause of problem.

The where on membergroups (on boards) isn't tragic, because the real work of filtering was done in the first row and it can't use anything to help it.

I'd suggest adding a new non-unique index on the id_topic_first column and seeing what that does. Alternatively a combination of (id_topic_first, id_topic_second) or even (id_topic_second, id_topic_first) on top might help. It's hard to say because seeing the query without context (the IF() is a bit weird), it's hard to know quite what's going on.

But right now it's going over every single row of the rt table because it has no indexes to help it ignore the rows that don't matter, though the IF might nuke that regardless of indexes :(
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 09:44:15 AM
Currently both id_topic_first and id_topic_second have primary keys.

How do I add a non-unique index to a column? I've search online, but couldn't find an answer that sufficed.
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 10:03:53 AM
In what order? This matters. (Also, there's only one primary key on a table, that's why it's called primary)
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 10:08:48 AM
if_topic_first, then id_topic_second (shown below).

That said, should I drop the key on one of these?
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 10:14:29 AM
No, you should not drop these; it will break if you do.

Also, no, that's not what I asked. You can have phpMyAdmin show up with the little gold key on 2 columns and not be in the order given there.

What does the indexes section under the structure tab say? That will list the actual order of the columns in table indexes.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 10:33:48 AM
Noted, I was about to drop them, luckily I did not.

Here's what I see:
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 10:56:40 AM
So there's already an index on first then second, well that eliminates my first suggestion as being useful. You can try adding a new index on id_topic_second + id_topic_first (use the 'create index on [] columns' option, with 2 in the box, then put id_topic_second in the first row, id_topic_first in the second row, and tell it to use type INDEX)

(This, incidentally, is how to make a non-unique index, column order is important.)
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 11:06:05 AM
Would it be something like:

create index idx on ***_related_topics (id_topic_second, id_topic_second);

I want to make sure it's correct before executing the query.
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 11:14:19 AM
Or you could just use the UI to do it for you which will get the syntax correct.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 11:35:35 AM
Good call, just executed:

ALTER TABLE `***_db`.`***_related_topics` ADD INDEX `idx` (`id_topic_first`, `id_topic_second`);

Out of curiosity, what happens if this strategy doesn't work?

Do you suggest creating a third index where the columns are reversed?
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 11:56:21 AM
Wait, which one did you make? You previously had id_topic_second, id_topic_first, now you're doing id_topic_first, id_topic_second... you already have an index on id_topic_first, id_topic_second - the primary key (and MySQL would have used it if it was of any use here, but it wasn't).

If adding a second index doesn't help, we'll need to look at rewriting that query to actually use an index and work out wtf it's doing. I actually suspect it won't help because of the IF() meaning MySQL will still probably want to visit every single row regardless.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 01:38:09 PM
I created the non-unique index respecting the column order as mentioned earlier. See the attachment for details. Unfortunately it's still eating up my CPU >300% :(

Here's what the function looks like in its raw from:

function loadRelated($topic)
{
global $modSettings, $context, $db_prefix, $user_info;

$result = db_query("
SELECT IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first) AS ID_TOPIC
FROM {$db_prefix}related_topics AS rt
JOIN {$db_prefix}topics AS t ON (t.ID_TOPIC = IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first))
JOIN {$db_prefix}boards AS b ON (b.ID_BOARD = t.ID_BOARD)
WHERE (id_topic_first = $topic OR id_topic_second = $topic)
AND $user_info[query_see_board]
ORDER BY rt.score DESC
LIMIT $modSettings[relatedTopicsCount]", __FILE__, __LINE__);

if (mysql_num_rows($result) == 0)
return false;

$topics_ids = array();
while ($row = mysql_fetch_assoc($result))
$topics_ids[] = $row['ID_TOPIC'];

mysql_free_result($result);

return prepareTopicArray($topics_ids);
}


You'll notice that it's designed for SMF 1.x. Important to mention that on my 2.x community, I use the same mod except its adapted for 2.x; the code is identical and I get the same sort of choke up. I'm assuming the mod author never had >1 mil rows in mind when developing it.

How can we improve the query?
Title: Re: Table to large, strategies?
Post by: Arantor on May 05, 2019, 01:55:40 PM
Delete the index you just made, it's not helping you out any. I didn't realise that there was already a primary key covering first-then-second when I suggested the changes, but as usual you jump ahead without giving me opportunity to respond and save you the time and trouble, and unfortunately you're at the point where jumping ahead and just doing things is potentially likely to make things a lot worse if not really careful.

In fact in your current situation you will have made your problem worse, not better, because now you have two indexes that do the exact same thing (and MySQL will quite happily keep both up to date but use neither in this case)

The reality is that this is simply badly written and at this point I don't have the patience to sit and fix it for you; the fact it was written for 1.x and ported to 2.0 is simply not relevant, it's just a badly written query and no amount of indexes can actually fix it, they can just mitigate (if applied properly) but at this point I'm actually afraid of suggesting anything because you'll shortcut through it without understanding any of it, which is something I don't want to deal with when inevitably it makes the problem worse.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 02:01:43 PM
That's fine -- I backup all my databases regularly, but I swear I followed everything you suggested to a T. I even provided screenshots. Anyway, I don't mind hiring someone to do this, as I don't think I'm in the position to do this correctly either. Disappointing though, since it has worked flawlessly for several years
Title: Re: Table to large, strategies?
Post by: vbgamer45 on May 05, 2019, 02:20:05 PM
Optimize your mysql config and or add more resources such as more ram, faster cpu. You want to be able to fit that whole table into memory.
Title: Re: Table to large, strategies?
Post by: SpacePhoenix on May 05, 2019, 03:40:01 PM
Have you looked to see if there's a 2.0.x mod that gives you the same functionality?
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 08:44:11 PM
Quote from: vbgamer45 on May 05, 2019, 02:20:05 PM
Optimize your mysql config and or add more resources such as more ram, faster cpu. You want to be able to fit that whole table into memory.

Sometimes no amount of optimization could remedy a bad query. I tried doing that, adding the following components to mysqld with limited improvement:

tmp_table_size = 256M
max_heap_table_size = 256M
innodb_buffer_pool_size = 1000M
innodb_stats_on_metadata = OFF

query_cache_type=0
query_cache_size=0


I think it's best to optimize the query. Could any of you recommend a coding genius to help me rewrite that function?

Check these numbers out, it's about to BLOW!
Title: Re: Table to large, strategies?
Post by: vbgamer45 on May 05, 2019, 08:48:30 PM
I do these as well
performance_schema=OFF
innodb_flush_log_at_trx_commit=2


The second is more important less disk usage.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 09:07:29 PM
Unfortunately, no major improvement. Without the function running, the CPU hovers between 0% and 150%. With the function, it jumps to >300%. Remember that the function runs on every thread load, hence it's best to be optimized than anything... Thoughts on optimizing the function?
Title: Re: Table to large, strategies?
Post by: SpacePhoenix on May 05, 2019, 09:35:49 PM
What's the name of the mod that this query is a part of?
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 05, 2019, 09:45:39 PM
Quote from: SpacePhoenix on May 05, 2019, 09:35:49 PM
What's the name of the mod that this query is a part of?

Here it is: https://custom.simplemachines.org/mods/index.php?mod=189

Initially I thought it was really well-written, but now I'm second guessing that assertion.

Quote from: SpacePhoenix on May 05, 2019, 03:40:01 PM
Have you looked to see if there's a 2.0.x mod that gives you the same functionality?

I have it installed on another website running SMF 2.x, but given that community is a lot smaller ~700 000 threads, I see the problem appear here and there.
Title: Re: Table to large, strategies?
Post by: SpacePhoenix on May 06, 2019, 05:13:54 AM
Don't know what version's database this was intended for but might be worth investigating (it's in related_topics.php)

// Check that all indexes are in and correct
foreach ($table['indexes'] as $id => $index)
{
$exists = false;

foreach ($currentTable['indexes'] as $index2)
{
// Primary is special case
if ($index['type'] == 'primary' && $index2['type'] == 'primary')
{
$exists = true;

if ($index['columns'] !== $index2['columns'])
{
$smcFunc['db_remove_index']('{db_prefix}' . $table_name, 'primary');
$smcFunc['db_add_index']('{db_prefix}' . $table_name, $index);
}

break;
}
// Make sure index is correct
elseif (isset($index['name']) && isset($index2['name']) && $index['name'] == $index2['name'])
{
$exists = true;

// Need to be changed?
if ($index['type'] != $index2['type'] || $index['columns'] !== $index2['columns'])
{
$smcFunc['db_remove_index']('{db_prefix}' . $table_name, $index['name']);
$smcFunc['db_add_index']('{db_prefix}' . $table_name, $index);
}

break;
}
}

if (!$exists)
$smcFunc['db_add_index']('{db_prefix}' . $table_name, $index);
}

// Remove unnecassary indexes
foreach ($currentTable['indexes'] as $index)
{
$exists = false;

foreach ($table['indexes'] as $index2)
{
// Primary is special case
if ($index['type'] == 'primary' && $index2['type'] == 'primary')
$exists = true;
// Make sure index is correct
elseif (isset($index['name']) && isset($index2['name']) && $index['name'] == $index2['name'])
$exists = true;
}

if (!$exists)
{
if (isset($table['upgrade']['indexes']))
{
foreach ($table['upgrade']['indexes'] as $index2)
{
if ($index['type'] == 'primary' && $index2['type'] == 'primary' && $index['columns'] === $index2['columns'])
$smcFunc['db_remove_index']('{db_prefix}' . $table_name, 'primary');
elseif (isset($index['name']) && isset($index2['name']) && $index['name'] == $index2['name'] && $index['type'] == $index2['type'] && $index['columns'] === $index2['columns'])
$smcFunc['db_remove_index']('{db_prefix}' . $table_name, $index['name']);
elseif (!empty($db_show_debug))
$log[] = $table_name . ' has Unneeded index ' . var_dump($index);
}
}
elseif (!empty($db_show_debug))
$log[] = $table_name . ' has Unneeded index ' . var_dump($index);
}
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 06, 2019, 08:34:08 AM
That file I believe is ran once during installation or when you manually build the index. It has little or nothing to do with the query being run on every page load.
Title: Re: Table to large, strategies?
Post by: SpacePhoenix on May 06, 2019, 09:21:11 AM
Have you tried installing the newer version of the mod as the new version doesn't have any of the mysql_ calls that the version that you're using has?
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 06, 2019, 09:52:55 AM
Quote from: SpacePhoenix on May 06, 2019, 09:21:11 AM
Have you tried installing the newer version of the mod as the new version doesn't have any of the mysql_ calls that the version that you're using has?

As mentioned previously, I have the latest version installed on my 2.x website, "but given that community is a lot smaller ~700 000 threads, I see the problem appear here and there."

Unless you found a newer version elsewhere, that's the extent of what I've done.
Title: Re: Table to large, strategies?
Post by: SpacePhoenix on May 06, 2019, 12:26:49 PM

Quote from: Study Force on May 05, 2019, 01:38:09 PM
I created the non-unique index respecting the column order as mentioned earlier. See the attachment for details. Unfortunately it's still eating up my CPU >300% :(

Here's what the function looks like in its raw from:

function loadRelated($topic)
{
global $modSettings, $context, $db_prefix, $user_info;

$result = db_query("
SELECT IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first) AS ID_TOPIC
FROM {$db_prefix}related_topics AS rt
JOIN {$db_prefix}topics AS t ON (t.ID_TOPIC = IF(rt.id_topic_first = $topic, rt.id_topic_second, rt.id_topic_first))
JOIN {$db_prefix}boards AS b ON (b.ID_BOARD = t.ID_BOARD)
WHERE (id_topic_first = $topic OR id_topic_second = $topic)
AND $user_info[query_see_board]
ORDER BY rt.score DESC
LIMIT $modSettings[relatedTopicsCount]", __FILE__, __LINE__);

if (mysql_num_rows($result) == 0)
return false;

$topics_ids = array();
while ($row = mysql_fetch_assoc($result))
$topics_ids[] = $row['ID_TOPIC'];

mysql_free_result($result);

return prepareTopicArray($topics_ids);
}


You'll notice that it's designed for SMF 1.x. Important to mention that on my 2.x community, I use the same mod except its adapted for 2.x; the code is identical and I get the same sort of choke up. I'm assuming the mod author never had >1 mil rows in mind when developing it.

How can we improve the query?
Quote from: Study Force on May 06, 2019, 09:52:55 AM
Quote from: SpacePhoenix on May 06, 2019, 09:21:11 AM
Have you tried installing the newer version of the mod as the new version doesn't have any of the mysql_ calls that the version that you're using has?

As mentioned previously, I have the latest version installed on my 2.x website, "but given that community is a lot smaller ~700 000 threads, I see the problem appear here and there."

Unless you found a newer version elsewhere, that's the extent of what I've done.

In that code snippet that you posted there's the use of some of the mysql_* functions. In the current version of the mod, them functions aren't used
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 06, 2019, 01:10:06 PM
SMF 2.x replaces that the 2.x version. I believe under the hood they're the same thing. It's the main query that's problematic.
Title: Re: Table to large, strategies?
Post by: shawnb61 on May 06, 2019, 06:18:39 PM
I'm pretty sure that query needs an index on id_topic_second.  It will be of type INDEX, pretty sure it's not unique. 

Not sure you tried that yet?   I did a quick test & it yielded a 10x improvement on that query. 

Do an explain plan on that query before & after adding an index there.  You want to see that ALL go away under "type"... 

Title: Re: Table to large, strategies?
Post by: Biology Forums on May 06, 2019, 11:59:16 PM
That did it!

@shawnb61: I *really* appreciate that, thank you. And to think I had to rewrite the query

If I notice anything wonky, I'll return...
Title: Re: Table to large, strategies?
Post by: shawnb61 on May 07, 2019, 09:30:44 AM
I'm sure much more could/should be done.  But I think this finding & discussion really belongs in the mod support thread, where other mod users can benefit & help.
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 07, 2019, 12:15:21 PM
I agree, can a moderator merge it?

BTW, what else do you think can be done?

To add, I think this belongs in the big board discussion forum too
Title: Re: Table to large, strategies?
Post by: vbgamer45 on May 07, 2019, 12:48:59 PM
Can you run and post back show create table smf_releated_topics;
With the change made now.
Title: Re: Table to large, strategies?
Post by: shawnb61 on May 07, 2019, 01:33:24 PM
Quote from: Study Force on May 07, 2019, 12:15:21 PMI agree, can a moderator merge it?
...
To add, I think this belongs in the big board discussion forum too
Most of this thread doesn't really apply; I don't think we should spam those threads. 

I think a simple helpful post on that board with a link here would suffice. 


Quote from: Study Force on May 07, 2019, 12:15:21 PMBTW, what else do you think can be done?

I'm really not familiar with this mod, so I really can't help much more.  The mod support thread is the best place. 

Tuning takes time, patience, knowledge, & most won't admit it, but more than a little trial & error...   I would suggest reading up more.  These two articles are reasonably good introductions, and will give you ideas where to look next:
https://code.tutsplus.com/tutorials/profiling-mysql-queries-with-phpmyadmin--net-12687
https://dzone.com/articles/how-to-optimize-mysql-queries-for-speed-and-perfor

Like most tech articles out there, you can argue lots of points, but as I said, they are reasonably good introductions.  Not too technical, & a good fly-over of what to look for. 

EDIT:  I'm going to flag this thread as solved for now...
Title: Re: Table to large, strategies?
Post by: Biology Forums on May 08, 2019, 10:29:33 AM
Quote from: vbgamer45 on May 07, 2019, 12:48:59 PM
Can you run and post back show create table smf_releated_topics;
With the change made now.


Here you go!

Thanks for the links, shawnb61
Title: Re: Table to large, strategies?
Post by: albertlast on May 10, 2019, 05:42:14 PM
Possible improvment would be to use the new version of authority check.
You see:

FIND_IN_SET(-1, b.memberGroups)

since this got no index support,
2.1 changed the code of "query_see_board": https://github.com/SimpleMachines/SMF2.1/blob/release-2.1/Sources/Subs.php#L6559
this version use now index.
Also in rc3 query_see_message_board and query_see_topic_board got added: https://github.com/SimpleMachines/SMF2.1/blob/release-2.1/Sources/Subs.php#L6589
Less amounts of join are needed for authority check.