Performance issues on certain queries (and some semi-related questions)

RobertMfromLI · January 15, 2011, 06:33:08 PM

Hello all,

I am having some interesting performance issues only with certain queries. First, here's the system details:

RUNNING:

Software:
- MySQL v5.3.4
- PHP v5.2.14
- Apache v2.2.17

- SMF v2.0rc4
- ModernDark64rc4 Theme (ie: version for SMF v2.0rc4)
- Ad Management for v2.0rc4
- No real customizations

Server running with:
- 4GB ECC Registered RAM (8GB split and mirrored)
- Dual 2.4GHz Intel Pentium 4 Xeon DP CPUs
- 147GB HDD space via two 147GB Ultra320 15K SCSI drives (ie: mirrored setup)
- RAID support via IBM ServeRAID 6m with 128MB cache
Forum Statistics:
- Approx 10,000 members
- Approx 114,000 posts
- Approx 7,800 topics
- Approx half a million pageviews a month
- Avg page creation times range between .1 seconds to .8 seconds (except as noted below)

.

The interesting performance issues and problems:

Virtually everything runs blazingly fast except:
- Viewing "All Unread" if there are none
- Marking all posts as read (this one I am sure is related to the one above, see notes):
- If marking all posts as read returns to forum home screen, it processes blazingly fast
- If marking all... returns to the "View unread topics" page, it takes 2-5 seconds
The forums have been generating a lot of table errors of late:
- Database Error: Incorrect key file for table '.\iftcomma_stnvforum\smf_messages.MYI'; try to repair it

- Or problems with smf_sessions

- Everything has been taken offline, and ALL of the tables have been repaired using extended repair, and then optimized.
- No change in this. This, btw, was something I had been noticing on our previous host (before we started hosting the forums ourselves), on a little less than a daily basis (20-25 times a month).
.
MSNBot (an unlabeled or oddly labeled (via useragent) version) seems to create a LOT of errors (such as):
- (WEB ADDRESS)/index.php?action=dlattach;attach=747;type=avatar
8: Undefined index: session_var
File: (FORUM HOME)/Sources/ManageSearchEngines.php
Line: 563

- (WEB ADDRESS)/index.php?action=dlattach;topic=9938.0;attach=4773
2: array_reverse() [<a href='function.array-reverse'>function.array-reverse</a>]: The argument should be an array
File: (FORUM HOME)/Sources/Subs.php
Line: 3491

- And so on... I've currently banned 5 Class B address ranges that MSNBot uses, largely due to (besides the above, which is commonplace and ONLY for MSNBot) them ignoring (yet again) the robots.txt file, and repeatedly downloading the same file ten of times or hundreds of times a day (depending on their mood, apparently), sometimes with as many as 200 bots on the server doing requests simultaneously. Some trivial single-digit (2%-4%?) amount of our traffic comes from them (of the whole 100% search engine referred traffic).

Q: Is there some sort of security issue buried someplace in there that Microsoft is trying to exploit (knowingly or through stupidity)? Others have already noticed MSNBot sending functions as part of requests to Wordpress - functions it should not know about because they should never be exposed to the outside world.

And finally, any tips on tweaking MySQL for the type of load we have, would be greatly appreciated. I've found a lot of conflicting information out there. Figure, we've got 750MB to 1GB (after all caches in place for stuff like Apache, disk, and other programs).

Best,
Robert

Joshua Dickerson · January 15, 2011, 07:45:24 PM

Have you looked at the sticky about how to optimize SMF? Start there. Your hardware is very good and should meet your needs with ease. Consider using nginx or similar web server as you'll notice a considerable stability in resource usage.

Your second one appears to be an issue relating to themes. Possibly due to your custom theme but I'm not sure of that.

I've notified the devs to look in to this.

RobertMfromLI · January 15, 2011, 07:58:55 PM

I've looked over them and made changes for the max key seeks value, played with the different cache values and so on (combination of what's on this site and elsewhere).

The key corruption issue seems to be the biggie I can't resolve. Id been hoping upgrading MySQL a few months ago would fix that (there were some bugs in the earlier v5 releases that could cause such) but alas, no such luck.

Moving attachments and avatars next.

Certain Linux stuff I cannot (yet) run on this box. The underlying OS is OS/2 Warp Server for eBusiness (2002 release with updates from 2008 & 2009). The AMP Stack is comprised entirely of Linux ports (including the "support modules" such as GBM, etc).

I'll re-dig thru the tips to see if I missed something, and any pointers would also be appreciated.

Best,
Rob

RobertMfromLI · January 15, 2011, 09:50:37 PM

My apologies for not posting that I'd already read that (and elsewhere) and already did many of the changes. I know you guys get a lot of people who don't bother to read that stuff.

Rundown of what I've done (or why I havent done certain ones):

1: Move your uploaded avatars directory.
- DONE (but just today)

2: Disable hostname lookups.
- DONE (ages ago, no noticeable difference)

3: Enable (or disable) gzip compression.
- Switched from enabled to disabled about 4 months ago.
Note: If your webserver compresses the type of data you use to serve your webpages... (YES)

4: Tune topics, members, and posts per page.
- Tuned about 4 months ago. Increased performance somewhat (pageloads at 75% time on medium size topics, 40-50% time on large size topics (ie: size topics=#posts in topic)

5: If you are going to enable search, and are not or cannot use Sphinx, use a Large Custom Index.
- DONE April 2010

NOTE: not using UTF-8. I dont see how much of a difference that may make performance wise, but there is UTF-8 character data in the posts, so I suspect I need to fix this. For some reason, our previous host (BlueHost/HostMonster) rebuilt the database without it set as UTF-8 charset.

6: Don't use post moderation on an active forum.
- We dont

7: Disable the recent posts feature on the board index.
- Ages ago

8: Run the mark read for inactive users script periodically. The log_topics, log_boards, and log_mark_read tables
- NOT done until RIGHT NOW (still processing)

9: Do you really need that many boards?
N/A - we don't have a ton of boards, and we need the ones we have.

10: Do you really need the calendar?
- Sadly

11: Ben has a post about disabling features according to server load here.
- Not quite willing to try the v1.1 recommendations on v2.0rc4 due to some changes I've noticed in the database since earlier rc's.

12a: Consider using database-driven sessions
- Since day three (Apr 2010)

12b: and offloading all attachments to Mediawiki.
- Last time I checked, I couldn't find a recommended version for v2.0rc4 that was reported as stable.
- We've had very few attachments until just recently (last couple months) and the issues we've had predate that.

Addition #1 (March 5th): Prune zero-post members
- CANT: some of our lurkers are our contributors or crew.

Addition #2 (May 3rd): Perform only necessary tasks, and spread them out (SMF 2.0 only)
- Done 4 months ago

Addition #3 (July 1st): Cap the length of your threads
- Not really applicable to the type of traffic or prolonged interest we generate. With #4 tweaked, even pages from the largest thread we have (which isn't that large at 2600 posts) are now instantaneous.

Addition #4 (November 1st): Don't go crazy on on-line time
- We haven't

Addition #5 (November 1st): If you really want, disable tracking daily page views
- Well, we kinda do need this. At least till we do some custom reports via Google Analytics (but it's currently having problems with our subdomain and keeps claiming all traffic is from www.domain instead of forums.domain (as well as other issues in recording page info)).

So, that pretty much covers it. As you can see, I already went through all of that.

Joshua Dickerson · January 15, 2011, 10:10:08 PM

That is only the basic improvements. Moving to InnoDB and Memory engines is a significant one. You already run current LAMP versions. The text to varchar is another HUGE one although I think those have all been fixed in 2.0 and you probably won't be affected by it. I can only assume you're using caching in some way and Sphinx.

A status.php would help a lot.

RobertMfromLI · January 15, 2011, 10:12:44 PM

Quote from: groundup on January 15, 2011, 10:10:08 PM
That is only the basic improvements. Moving to InnoDB and Memory engines is a significant one. You already run current LAMP versions. The text to varchar is another HUGE one although I think those have all been fixed in 2.0 and you probably won't be affected by it. I can only assume you're using caching in some way and Sphinx.

A status.php would help a lot.

We did other tweaks, from links to other pages found here and via a Google search. I'll post a status.php report as soon as the log unread stuff finishes.

Thanks very much,
Robert

RobertMfromLI · January 16, 2011, 12:03:16 AM

Status.php results attached as an html file (page save).

Some of the MySQL settings (indicated in the file) have been changed to default or different settings than we previously had.

Best,
Robert

NOTE: Of course, some of the performance values are skewed because I am clearing unread messages still. (I didnt end up waiting)

Joshua Dickerson · January 16, 2011, 09:42:59 AM

It would be easier and better if you just provide the URL

RobertMfromLI · January 16, 2011, 04:54:49 PM

Quote from: groundup on January 16, 2011, 09:42:59 AM
It would be easier and better if you just provide the URL

http://forums.startreknewvoyages.com/stts.php

Ensiferous · January 18, 2011, 08:17:21 AM

Enable the slow query log for MySQL and see what you're actually dealing with. Making changes in the dark is fun and all, but it's rarely useful.

RobertMfromLI · January 18, 2011, 08:42:21 AM

Quote from: Ensiferous on January 18, 2011, 08:17:21 AM
Enable the slow query log for MySQL and see what you're actually dealing with. Making changes in the dark is fun and all, but it's rarely useful.

Percentage of slow queries:

0.0001

No seeming problems there. I've tweaked a bunch of the settings, not so much via making changes in the dark, but by updating the configuration files based off (a) status.php and (b) taking advantage of the better hardware that the forums are on now.

I've also done some more digging, and I suspect the crashes may be indeed related to msnbot after all. As an example, I've found (in the error logs) 783 requests by 5 bots in only TWO seconds. There's a lot more like that in the logs (of which, in 35.5 hours, 3330 errors were generated by msnbot/bingbot).

Quick stats on the few I have looked at so far:

331 in a two second period
782 in a two second period
1181 in a four second period

It's like a bunch of attack spurts all at once. They get confused when they get something that they don't like (probably an SMF forum error due to a broken bit of PHP in the theme I am using), and then go crazy.

Anyway, I KNOW I did not have the server daemons tuned to handle 150-390 requests a second. I figured no more than 100 when it was originally tuned. Something that I have since rectified (kinda a moot point since msnbot is blocked in the firewall via five rules for five Class B ranges).

I think that leaves me watching over things for a while, and seeing what I can do to fix the forum theme.

Best,
Robert

RobertMfromLI · January 18, 2011, 04:31:50 PM

Currently the stats that status.php is showing are:

Connections per second:    0.2052

Kilobytes received per second:    1.0382

Kilobytes sent per second:    9.8871

Queries per second:    4.5492

Percentage of slow queries:    0

Opened vs. Open tables:
(table_cache)    15.1556 (should be <= 80)

Key buffer read hit rate:
(key_buffer_size)    0.0305 (should be <= 0.01)

Key buffer write hit rate:
(key_buffer_size)    0.2104 (should be <= 0.5)

Thread cache hit rate:
(thread_cache_size)    138.8824 (should be >= 30 )

Thread cache usage:
(thread_cache_size)    [/color]0.1667 (should be >= 0.7 and <= 0.9)

Temporary table disk usage:
(tmp_table_size)    0.4498 (should be <= 0.5)

Sort merge pass rate:
(sort_buffer)    0 (should be <= 0.001)

Query cache enabled:
(query_cache_type)    1 (should be >= 1 and <= 1)

Query cache miss rate:
(query_cache_limit)    0.4086 (should be <= 0.5)

Query cache prune rate:
(query_cache_size)    0 (should be <= 0.05)

The key buffer is quite large - it simply doesn't seem to be efficiently using it. In that, I am not sure what else I can do.

Thread cache usage is the other interesting one. I suspect that number is low simply because not many threads are being used concurrently.

RobertMfromLI · January 20, 2011, 08:56:42 AM

For anyone interest, with 2GB assigned to OS/2's High Memory Arena (of 4GB, with the rest assigned to shared memory and system memory), below is my my.cnf with the irrelevant statements (such as directories) removed. On *nix, your memory pool will be set up differently obviously (OS/2 handles memory a bit differently), so with a system with 4GB of RAM, you will undoubtedly have a different amount of memory you can allocate to AMP.

Code Select


[client]
compress
user=root
[mysqld]
skip-innodb #deactivate innodb
read_buffer_size=4M
read_rnd_buffer_size=4M
myisam_sort_buffer_size=128M
max_heap_table_size=160MB
tmp_table_size=384M
max_seeks_for_key=100
key_cache_block_size=2048
key_buffer_size=512M
table_cache=288
sort_buffer_size=8M
thread_cache_size=96
thread_concurrency=4
skip_name_resolve
query_cache_size=128M
query_cache_limit=8M
myisam_recover=BACKUP,FORCE

If anyone has any further suggestions, they'd be appreciated. So far, MySQL is only using about 300MB of RAM at peak periods, which is currently a lot less than it can use (not much else running on the system other than AMP).

The biggest slowdowns I have right now are as follows:

Marking all messages read
I don't see any way of speeding this up from looking at the code that gets executed (other than clearing all of those tables or running the mark read script that marks messages read for absent forum members - which I am doing, but will probably take another week or two to complete).
Doing a forum search (this is the one I am interested in improving if possible)
I switched from the standard fulltext index to a large one. Somehow this slowed down searches tremendously. The new large index is roughly 132MB (about twice the size of the original).

Any suggestions would be greatly appreciated.

Joshua Dickerson · January 21, 2011, 10:52:37 AM

What user variable cache are you using and what level do you have caching at for SMF?

RobertMfromLI · January 21, 2011, 03:49:52 PM

Quote from: groundup on January 21, 2011, 10:52:37 AM
What user variable cache are you using and what level do you have caching at for SMF?

I have no additional caching (specific to SMF or MySQL) enabled, and no caching mods installed. Nor did I previously. I've increased the cache values considerably since the time I was running with a standard fulltext index (at which time, the exact same search queries would take roughly 2-3 seconds, as opposed to 15). I've increased max_heap_table_size to 256MB (last night) which improved search performance a little (roughly 8-9 seconds for the same query). I checked for real memory exhaustion (ie: "Am I forcing swapping?") and 70% of the memory pool is untouched by AMP. And of what is being "allocated" for MySQL, much seems unused, so I am suspecting those cache values are fine.

I would consider enabling one of the caching options in SMF, but would really like a recommendation as to which one is best. If they are reliant on a separate caching module, I'd have to compile/port it (hence the preference for a recommendation before I simply start porting and testing the various packages).

By "user variable caching" I am not 100% sure what you mean.

Best,
Robert

RobertMfromLI · January 21, 2011, 06:14:38 PM

Quote from: RobertMfromLI on January 20, 2011, 08:56:42 AM

The biggest slowdowns I have right now are as follows:

Doing a forum search (this is the one I am interested in improving if possible)

I switched from the standard fulltext index to a large one. Somehow this slowed down searches tremendously. The new large index is roughly 132MB (about twice the size of the original).
Any suggestions would be greatly appreciated.

Just a clarification on what "slow" is. Whatever setup BlueHost had (the one we couldn't change) would cause searches to take upwards of 45 seconds (on a dedicated box)... straight up to and over the timeout directives at times.

When we took over hosting, and implemented a proper fulltext index and did our initial MySQL (etc) configuration, we cut that down to an acceptable 2-3 seconds. After switching to a large index, searches WERE taking about 15 seconds. We've done some further tweaking to take into account the larger index and reduced that to 7-9 seconds.

While that isn't horrendous, it's still roughly three times the length of time it used to take with a standard fulltext index.

Best,
Rob

RobertMfromLI · January 21, 2011, 06:23:15 PM

ZendCache:

Here's what phpinfo reports is our current Zend configuration:

Zend Extension[/color]220060519

[/color]Debug Build[/color]no

[/color]Thread Safety[/color]enabled

[/color]Zend Memory Manager[/color]enabled

[/size]I believe I have to get sqlite fixed (for SMP on Warp Server) in order to enable ZendCache?[/size][/font]

Joshua Dickerson · January 22, 2011, 08:39:13 PM

ZendCache = suck
APC or memcached = awesome

Use Sphinx for searching.

Something like that · January 28, 2011, 02:07:46 AM

Quote from: groundup on January 15, 2011, 10:10:08 PM
That is only the basic improvements. Moving to InnoDB and Memory engines is a significant one.

This.

RobertMfromLI · January 28, 2011, 06:09:24 AM

Quote from: «Mark» on January 28, 2011, 02:07:46 AM
Quote from: groundup on January 15, 2011, 10:10:08 PM
That is only the basic improvements. Moving to InnoDB and Memory engines is a significant one.

This.

...is sadly not (yet) applicable for solving my remaining issues: all of which center around smf_messages.

- Table corruption (smf_messages)
- Slow search (smf_messages)

Still waiting on an answer on memchached. If that goes ok, then if I can manage to switch to Sphinx, THEN, finally I can move smf_messages to memcached.

青山素子 · January 28, 2011, 10:46:07 AM

Table corruption is almost always a database engine issue. Even on sites with very high load, tables shouldn't normally become corrupt. There is no "corrupt the table" SQL command.

Given you're running on an unusual OS, it's likely that you're hitting some odd bug in the OS or MySQL or an interaction between the two.

I'm likely to consider that such a thing is at least contributing to the search issues as well, but MySQL's full-text search can be slow at the best of times.

RobertMfromLI · January 28, 2011, 10:50:20 AM

Quote from: 青山素子 on January 28, 2011, 10:46:07 AM
Table corruption is almost always a database engine issue. Even on sites with very high load, tables shouldn't normally become corrupt. There is no "corrupt the table" SQL command.

Given you're running on an unusual OS, it's likely that you're hitting some odd bug in the OS or MySQL or an interaction between the two.

I'm likely to consider that such a thing is at least contributing to the search issues as well, but MySQL's full-text search can be slow at the best of times.

I considered as much, BUT:

as there are only thousands of similar complaints out there from people running other OS's (Linux primarily)
and DOZENS of bugs in earlier MySQL versions that caused this same behavior (maybe some left?)
and this problem persisted when we were on BlueHost (20 times a month) which was an EIGHT core Linux machine. When msnbot was hammering the BlueHost Linux server (before we added the crawl delay), this would occur 5-10 times a DAY.

Somehow I do not think my choice of operating systems or the way it interacts with AMP is the problem. Would you come to the same conclusion? (I hope... because if not, this will be difficult for the OS/2 dev/porter to track down, as his site does not get the type of traffic we do).

On a possibly related, I'm not seeing these issues on the Wordpress sites on that server.

Best,
Robert

Addendum (11:17EST):
Perhaps there is an issue in certain branches of MySQL or the AMP package as a whole? I am not sure what Linux source is being used in the OS/2 version, but if that Linux version is the cause of the issues, that would explain why numerous Linux users (and myself on OS/2) are experiencing this problem. The problem I have with this is though I can find out which Linux source/distro/whatever is being used in the OS/2 port, I'd then have to track down all the Linux related threads on this and ask which source tree their version(s) are being compiled from and compare the results.

Addendum #2 (11:47EST):
Is it possible it's a charset issue? There were earlier bugs in that (in MySQL) as well, and I am noting that for whatever reason, the build of MySQL I am using is defaulting to an odd charset (latin?) while the tables in question are UTF8.

character_set_client:

latin1

character_set_connection:

latin1

character_set_database:

latin1

character_set_filesystem:

binary

character_set_results:

latin1

character_set_server:

latin1

character_set_system:

utf8

collation_connection:

latin1_swedish_ci

collation_database:

latin1_swedish_ci

collation_server:

latin1_swedish_ci

RobertMfromLI · January 28, 2011, 10:52:56 AM

Quote from: 青山素子 on January 28, 2011, 10:46:07 AM
I'm likely to consider that such a thing is at least contributing to the search issues as well, but MySQL's full-text search can be slow at the best of times.

MySQL's fulltext search is not a problem. That's reasonably fast (a few seconds). Switching to SMF's large text search has caused slowdowns in searching. I have increased (this morning) the relevant caches and will see how that goes.

青山素子 · January 28, 2011, 12:35:44 PM

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

as there are only thousands of similar complaints out there from people running other OS's (Linux primarily)

How old are the complaints? I know when I was running early versions of 5.0 (or was it 5.1?) on transaction-heavy sites, I experienced crashed tables fairly often. It only occurred with MyISAM tables, which seems to indicate there was a bug specifically for that type. Upgrading to a newer version of MySQL fixed this issue.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

and DOZENS of bugs in earlier MySQL versions that caused this same behavior (maybe some left?)

I experienced it as well. Newer versions fixed those bugs. I haven't seen any recurrence yet, even on a site running at over 4mbit/sec in traffic. Of course, most of the transaction-heavy tables are InnoDB on that site because of prior issues with locking causing 20 minutes plus of backlog with queries.

The only time I've seen crashed tables since is when I've had either a power loss on my own personal servers (since fixed with a UPS unit and using NUT to do an orderly shutdown) or when I've had to kill -9 the daemon due to various issues (since resolved when the developers cleaned up the database schema).

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

and this problem persisted when we were on BlueHost (20 times a month) which was an EIGHT core Linux machine. When msnbot was hammering the BlueHost Linux server (before we added the crawl delay), this would occur 5-10 times a DAY.

Interesting. What MySQL version were they running? Also, given how packed they run their servers, it might have been an issue with disk space filling up, leading to MySQL shutting down and then an unclean startup.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Somehow I do not think my choice of operating systems or the way it interacts with AMP is the problem. Would you come to the same conclusion? (I hope... because if not, this will be difficult for the OS/2 dev/porter to track down, as his site does not get the type of traffic we do).

Given the only table corruption I've ever seen for my hosting customers was on certain early versions of MySQL's 5-series, I'm inclined to believe it's a MyISAM issue causing the table crashes. Looking at the daemon log files might help disagnose things better.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
On a possibly related, I'm not seeing these issues on the Wordpress sites on that server.

Wordpress is not very transactional. It's primarily SELECT-oriented, so you don't as much of a chance of corruption as the table files are only written when the table has changed. SMF has many tables that are changed almost constantly, which will have a greater chance of causing the problem.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Addendum (11:17EST):
Perhaps there is an issue in certain branches of MySQL or the AMP package as a whole? I am not sure what Linux source is being used in the OS/2 version, but if that Linux version is the cause of the issues, that would explain why numerous Linux users (and myself on OS/2) are experiencing this problem. The problem I have with this is though I can find out which Linux source/distro/whatever is being used in the OS/2 port, I'd then have to track down all the Linux related threads on this and ask which source tree their version(s) are being compiled from and compare the results.

It is possible, depending on how far your OS/2 port has diverged from the upstream editions and how closely it follows upstream development. If it's based on an early, buggy, release then that would make sense.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Addendum #2 (11:47EST):
Is it possible it's a charset issue? There were earlier bugs in that (in MySQL) as well, and I am noting that for whatever reason, the build of MySQL I am using is defaulting to an odd charset (latin?) while the tables in question are UTF8.

Nah, that wouldn't cause the table files to corrupt themselves. Solar radiation is probably a more likely cause.

Latin-1 is a fine default character set and has been default for years. It covers nearly all languages based on the Roman Alphabet, which is why It's been the default. UTF-8 should also be stable if you're running one of the 5.x series of MySQL. I've not seen any reports of a character set causing corruption, anyway.

RobertMfromLI · January 28, 2011, 12:43:55 PM

Quote from: 青山素子 on January 28, 2011, 12:35:44 PM
Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

as there are only thousands of similar complaints out there from people running other OS's (Linux primarily)

How old are the complaints? I know when I was running early versions of 5.0 (or was it 5.1?) on transaction-heavy sites, I experienced crashed tables fairly often. It only occurred with MyISAM tables, which seems to indicate there was a bug specifically for that type. Upgrading to a newer version of MySQL fixed this issue.

Some very recent - but yes, as I cannot determine the MySQL version(s) running, it's possible the issues are fixed in the new version.

Quote from: 青山素子 on January 28, 2011, 12:35:44 PM

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

and DOZENS of bugs in earlier MySQL versions that caused this same behavior (maybe some left?)

I experienced it as well. Newer versions fixed those bugs. I haven't seen any recurrence yet, even on a site running at over 4mbit/sec in traffic. Of course, most of the transaction-heavy tables are InnoDB on that site because of prior issues with locking causing 20 minutes plus of backlog with queries.

The only time I've seen crashed tables since is when I've had either a power loss on my own personal servers (since fixed with a UPS unit and using NUT to do an orderly shutdown) or when I've had to kill -9 the daemon due to various issues (since resolved when the developers cleaned up the database schema).

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM

and this problem persisted when we were on BlueHost (20 times a month) which was an EIGHT core Linux machine. When msnbot was hammering the BlueHost Linux server (before we added the crawl delay), this would occur 5-10 times a DAY.

Interesting. What MySQL version were they running? Also, given how packed they run their servers, it might have been an issue with disk space filling up, leading to MySQL shutting down and then an unclean startup.

We were the only ones on that box - or so we were told. We were switched to our own box after lots of high cpu usage and a few extra bucks for it. The msnbot issue I suspect was not restart issues... the server itself never went down, just MySQL table crashes. As for what version, that I sadly cannot recall. It was a v5 release, but as far as I can tell, quite a few of the earlier ones had the MyISAM storage engine/corruption bug(s).

Quote from: 青山素子 on January 28, 2011, 12:35:44 PM

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Somehow I do not think my choice of operating systems or the way it interacts with AMP is the problem. Would you come to the same conclusion? (I hope... because if not, this will be difficult for the OS/2 dev/porter to track down, as his site does not get the type of traffic we do).

Given the only table corruption I've ever seen for my hosting customers was on certain early versions of MySQL's 5-series, I'm inclined to believe it's a MyISAM issue causing the table crashes. Looking at the daemon log files might help disagnose things better.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
On a possibly related, I'm not seeing these issues on the Wordpress sites on that server.

Wordpress is not very transactional. It's primarily SELECT-oriented, so you don't as much of a chance of corruption as the table files are only written when the table has changed. SMF has many tables that are changed almost constantly, which will have a greater chance of causing the problem.

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Addendum (11:17EST):
Perhaps there is an issue in certain branches of MySQL or the AMP package as a whole? I am not sure what Linux source is being used in the OS/2 version, but if that Linux version is the cause of the issues, that would explain why numerous Linux users (and myself on OS/2) are experiencing this problem. The problem I have with this is though I can find out which Linux source/distro/whatever is being used in the OS/2 port, I'd then have to track down all the Linux related threads on this and ask which source tree their version(s) are being compiled from and compare the results.

It is possible, depending on how far your OS/2 port has diverged from the upstream editions and how closely it follows upstream development. If it's based on an early, buggy, release then that would make sense.

MySQL v5.1.52 - perhaps not recent enough to take advantage of the fixes of the MyISAM corruption issues? As far as I know, no code from earlier versions was cobbled into it.

Quote from: 青山素子 on January 28, 2011, 12:35:44 PM

Quote from: RobertMfromLI on January 28, 2011, 10:50:20 AM
Addendum #2 (11:47EST):
Is it possible it's a charset issue? There were earlier bugs in that (in MySQL) as well, and I am noting that for whatever reason, the build of MySQL I am using is defaulting to an odd charset (latin?) while the tables in question are UTF8.

Nah, that wouldn't cause the table files to corrupt themselves. Solar radiation is probably a more likely cause.

LoL, I'll look into that one. Thanks for the smile/laugh!

Quote from: 青山素子 on January 28, 2011, 12:35:44 PM

Latin-1 is a fine default character set and has been default for years. It covers nearly all languages based on the Roman Alphabet, which is why It's been the default. UTF-8 should also be stable if you're running one of the 5.x series of MySQL. I've not seen any reports of a character set causing corruption, anyway.

Well, that's at least one more I can cross off my list.

RobertMfromLI · January 28, 2011, 12:59:31 PM

Is this (see link) a possibility?

http://www.hikaro.com/linux/mysql/mysql-table-corruption-post-migration-to-mysql5%C2%A0%C2%A0.html

Synopsis: upgrading versions of MySQL can cause such issues and require using a third party tool to properly correct the tables (specific mention of v4).

Note: I never ran v4.0, but I did run some v5.0 versions that upgrading required running the upgrade_table script.

Note 2: It is possible the table was originally on a MySQL v4 series release when it was on BlueHost, though I did not copy the database; I did a full export and then full import (into an early v5 version of MySQL).

Joshua Dickerson · January 28, 2011, 02:28:07 PM

Move your messages table to memcached? Your corruption probably relates to locking contentions due to MyISAM. Your searches are most likely slow due to locking. I agree with Motoko-Chan in that it is an odd combination.

青山素子 · January 28, 2011, 02:41:25 PM

Quote from: RobertMfromLI on January 28, 2011, 12:43:55 PM
We were the only ones on that box - or so we were told. We were switched to our own box after lots of high cpu usage and a few extra bucks for it. The msnbot issue I suspect was not restart issues... the server itself never went down, just MySQL table crashes. As for what version, that I sadly cannot recall. It was a v5 release, but as far as I can tell, quite a few of the earlier ones had the MyISAM storage engine/corruption bug(s).

You probably weren't the only one on the box, unless you were paying at least $200/mo for it alone, which would still cost them money (management of servers isn't cheap).

Quote from: RobertMfromLI on January 28, 2011, 12:43:55 PM
MySQL v5.1.52 - perhaps not recent enough to take advantage of the fixes of the MyISAM corruption issues? As far as I know, no code from earlier versions was cobbled into it.

Not too bad. That shouldn't be having issues on supported platforms at least. It is very possible that there is a bug between the interaction of the HPFS filesystem (I believe OS/2 still uses that) and MyISAM. As it's not a widely tested configuration, there are likely to be quite a few bugs.

Quote from: RobertMfromLI on January 28, 2011, 12:59:31 PM
Synopsis: upgrading versions of MySQL can cause such issues and require using a third party tool to properly correct the tables (specific mention of v4).

Note 2: It is possible the table was originally on a MySQL v4 series release when it was on BlueHost, though I did not copy the database; I did a full export and then full import (into an early v5 version of MySQL).

That issue is specific to when you copy the actual table files from one install to another. Importing a SQL script would re-create the table files and not be part of the issue.

If you have kept the same table files with upgrades to MySQL itself and have not run mysql_upgrade, that might be a potential cause. You should at least run mysql_upgrade on each update, even minor. I'd also advise running mysqldump and re-importing the data that way so you know that the backing data files are good.

Quote from: groundup on January 28, 2011, 02:28:07 PM
Move your messages table to memcached? Your corruption probably relates to locking contentions due to MyISAM. Your searches are most likely slow due to locking. I agree with Motoko-Chan in that it is an odd combination.

Locking shouldn't cause table corruption under normal circumstances, but this is a MySQL port onto an OS and there is likely very little testing, especially with heavy loads and transactions. It wouldn't surprise me if there was a condition that caused table corruption under high loads on an unsupported platform.

RobertMfromLI · January 28, 2011, 03:01:36 PM

Quote from: 青山素子 on January 28, 2011, 02:41:25 PM

Quote from: RobertMfromLI on January 28, 2011, 12:43:55 PM
MySQL v5.1.52 - perhaps not recent enough to take advantage of the fixes of the MyISAM corruption issues? As far as I know, no code from earlier versions was cobbled into it.

Not too bad. That shouldn't be having issues on supported platforms at least. It is very possible that there is a bug between the interaction of the HPFS filesystem (I believe OS/2 still uses that) and MyISAM. As it's not a widely tested configuration, there are likely to be quite a few bugs.

I thought about that. It was actually originally running on JFS (which is what the Linux JFS port is based off of). I have since switched it to HPFS386 (a bit different and more thoroughly tested than HPFS) and noticed no change in behavior.

Quote from: 青山素子 on January 28, 2011, 02:41:25 PM
Quote from: RobertMfromLI on January 28, 2011, 12:59:31 PM
Synopsis: upgrading versions of MySQL can cause such issues and require using a third party tool to properly correct the tables (specific mention of v4).

Note 2: It is possible the table was originally on a MySQL v4 series release when it was on BlueHost, though I did not copy the database; I did a full export and then full import (into an early v5 version of MySQL).

That issue is specific to when you copy the actual table files from one install to another. Importing a SQL script would re-create the table files and not be part of the issue.

If you have kept the same table files with upgrades to MySQL itself and have not run mysql_upgrade, that might be a potential cause. You should at least run mysql_upgrade on each update, even minor. I'd also advise running mysqldump and re-importing the data that way so you know that the backing data files are good.

Excellent suggestion. Will do. I did not run mysqlupgrade on the minor version changes. Just on the first v5.0 to v5.1 change.

Quote from: 青山素子 on January 28, 2011, 02:41:25 PM
Quote from: groundup on January 28, 2011, 02:28:07 PM
Move your messages table to memcached? Your corruption probably relates to locking contentions due to MyISAM. Your searches are most likely slow due to locking. I agree with Motoko-Chan in that it is an odd combination.

Locking shouldn't cause table corruption under normal circumstances, but this is a MySQL port onto an OS and there is likely very little testing, especially with heavy loads and transactions. It wouldn't surprise me if there was a condition that caused table corruption under high loads on an unsupported platform.

The interesting thing is, I run tables equally as large (on an older, far less powerful server no less (Quad 550MHz)) and never run into these issues. This page (http://store.aibpc.com/) is entirely dynamically generated (all daemons running on the older server), and is comprised of about 100,000 records in the main table. Each page load does all the "stats" in real time via SQL queries (ie, the item counts, vendor counts, category counts). That machine manages to handle 249 inserts a second (via using one of the CPUs for the table inserts and leaving the rest open for other stuff) with no table corruption. Same MySQL version. BUT, no Apache or PHP (Lotus DominoGo Webserver and REXX and the REXXSQL DLL to support MySQL inserts, queries, etc).

That'd point to it being... ? ? A PHP/MySQL interaction? Apache/MySQL interaction? I just can't figure out how that could create the problem.

Something like that · January 28, 2011, 03:55:03 PM

If you have the opportunity, stress test your hardware. Run something that maxes the CPU. Check for SMART errors. Etc.

I'd run memtest86, too.

RobertMfromLI · January 28, 2011, 03:58:39 PM

Quote from: «Mark» on January 28, 2011, 03:55:03 PM
If you have the opportunity, stress test your hardware. Run something that maxes the CPU. Check for SMART errors. Etc.

I'd run memtest86, too.

I did all of that. The box is an IBM eServer xSeries 440, with the memory in RAIDed redundant mode and full ECC support. The drives have been thoroughly tested and scanned (and are in mirrored RAID configuration), as well full testing of the SCSI RAID adapter and it's battery & cache.

The server has a very extensive built in diagnostic suite (and all hardware passes), and I've additionally run various benchmarking and stress test tools against it with no issue.

Something like that · January 28, 2011, 04:07:45 PM

Quote from: RobertMfromLI on January 28, 2011, 03:58:39 PM
Quote from: «Mark» on January 28, 2011, 03:55:03 PM
If you have the opportunity, stress test your hardware. Run something that maxes the CPU. Check for SMART errors. Etc.

I'd run memtest86, too.

I did all of that. The box is an IBM eServer xSeries 440, with the memory in RAIDed redundant mode and full ECC support. The drives have been thoroughly tested and scanned (and are in mirrored RAID configuration), as well full testing of the SCSI RAID adapter and it's battery & cache.

The server has a very extensive built in diagnostic suite (and all hardware passes), and I've additionally run various benchmarking and stress test tools against it with no issue.

Okay. Double checking, because I've had more than one flaky machine reveal itself that way.

That being done, I echo Motoko's suggestion of dumping and reloading.

RobertMfromLI · January 28, 2011, 04:09:56 PM

Much appreciated, as are any suggestions to anything I may have missed, forgotten, or overlooked.

-R

Something like that · January 28, 2011, 04:27:41 PM

Quote from: RobertMfromLI on January 20, 2011, 08:56:42 AM
Code Select Expand
max_seeks_for_key=100 key_cache_block_size=2048

Have you tested without these explicitly set?

I'm especially suspicious of key_cache_block_size. The default is 1024, and you may have stumbled upon a bug/assumption in the code. Google reveals there were corruption issues with setting it to 4096 in the past.

RobertMfromLI · January 28, 2011, 04:30:19 PM

Quote from: «Mark» on January 28, 2011, 04:27:41 PM
Quote from: RobertMfromLI on January 20, 2011, 08:56:42 AM
Code Select Expand
max_seeks_for_key=100 key_cache_block_size=2048

Have you tested without these explicitly set?

I'm especially suspicious of key_cache_block_size. The default is 1024, and you may have stumbled upon a bug/assumption in the code. Google reveals there were corruption issues with setting it to 4096 in the past.

I've since set it back. It had originally been default (via no entry), forced 1024 (via the my.cnf entry) and then 2048 as noted above. :-(

And always a number that was a factor of 2 (which I believe was the other earlier issue with that setting).

青山素子 · January 28, 2011, 05:05:37 PM

On that high-transaction code, what table type are you using?

Joshua Dickerson · January 28, 2011, 11:23:12 PM

http://blogs.oracle.com/mysql/2011/01/comparing_innodb_to_myisam_performance.html

RobertMfromLI · January 29, 2011, 07:30:32 AM

Quote from: 青山素子 on January 28, 2011, 05:05:37 PM
On that high-transaction code, what table type are you using?

All MyISAM. Only default caches enabled on that machine.

Correction: the Netfinity 7000 M10 runs at 229 inserts a second per 550MHz CPU, not 249/sec. Averaged over 10 runs, it's 229.1 (with them varying between 229.05 to 229.12).

In all fairness, I have very little idea how PHP deals with SQL. On the REXX end, the REXX DLL is simply a wrapper to the MySQL DLL, meaning the REXX code is virtually directly calling MySQL to do inserts. REXX DLLs are wonderful at wrapping and calling C code directly. Assuming the PHP integration is a little more complex, I'd expect performance degradation on the PHP side.

In this, I also expect higher performance then an AMP integrated solution, simply because I know that MySQL is better optimized to utilize OS/2's threading engine (and since it is the only component of AMP being used on that machine, this implementation won't suffer from the lack of optimization on the PHP and Apache end). And LDGW is entirely thread oriented, meaning the website side also benefits.

But that's getting a little off topic. To get back on topic, does anyone suspect that the PHP/MySQL integration could be the issue? As I said, I have no idea how PHP actually handles (back end/internally) MySQL requests; while the REXX DLL is only limited to the capabilities (or bugs) of MySQL itself.

RobertMfromLI · January 29, 2011, 07:35:37 AM

I'll be giving APC a try maybe tonight if I have the time. The OS/2 AMP maintainer just ported it for me yesterday... so, it'll be a test run.

青山素子 · January 29, 2011, 12:56:27 PM

Quote from: RobertMfromLI on January 29, 2011, 07:30:32 AM
To get back on topic, does anyone suspect that the PHP/MySQL integration could be the issue? As I said, I have no idea how PHP actually handles (back end/internally) MySQL requests; while the REXX DLL is only limited to the capabilities (or bugs) of MySQL itself.

I don't think so, but it's a possibility. PHP's MySQL module is just a simple wrapper around libmysql, the MySQL client library. It's a pretty stable thing, so any bugs would likely be in the MySQL side.

RobertMfromLI · January 29, 2011, 01:01:11 PM

Quote from: 青山素子 on January 29, 2011, 12:56:27 PM
Quote from: RobertMfromLI on January 29, 2011, 07:30:32 AM
To get back on topic, does anyone suspect that the PHP/MySQL integration could be the issue? As I said, I have no idea how PHP actually handles (back end/internally) MySQL requests; while the REXX DLL is only limited to the capabilities (or bugs) of MySQL itself.

I don't think so, but it's a possibility. PHP's MySQL module is just a simple wrapper around libmysql, the MySQL client library. It's a pretty stable thing, so any bugs would likely be in the MySQL side.

ok... so the issue isn't there - it's doing the same thing as the REXX wrapper DLL.

Something like that · January 29, 2011, 01:13:31 PM

Quote from: RobertMfromLI on January 29, 2011, 07:30:32 AM
In all fairness, I have very little idea how PHP deals with SQL. On the REXX end, the REXX DLL is simply a wrapper to the MySQL DLL, meaning the REXX code is virtually directly calling MySQL to do inserts. REXX DLLs are wonderful at wrapping and calling C code directly. Assuming the PHP integration is a little more complex, I'd expect performance degradation on the PHP side.

It uses the MySQL C interface. All data is passed back and forth as plain ASCII strings, so it's practically impossible that any corruption would happen there. This is also why PHP returns numeric columns in MySQL as strings.

RobertMfromLI · January 29, 2011, 01:25:09 PM

Quote from: «Mark» on January 29, 2011, 01:13:31 PM
Quote from: RobertMfromLI on January 29, 2011, 07:30:32 AM
In all fairness, I have very little idea how PHP deals with SQL. On the REXX end, the REXX DLL is simply a wrapper to the MySQL DLL, meaning the REXX code is virtually directly calling MySQL to do inserts. REXX DLLs are wonderful at wrapping and calling C code directly. Assuming the PHP integration is a little more complex, I'd expect performance degradation on the PHP side.

It uses the MySQL C interface. All data is passed back and forth as plain ASCII strings, so it's practically impossible that any corruption would happen there. This is also why PHP returns numeric columns in MySQL as strings.

So, it's virtually exactly the same as what the REXX DLL does. This still leaves me baffled. 229 inserts a second (on the older Netfinity) on one 550MHz CPU is far less than our forum's total 6-8 MySQL queries a second (with peaks that I presume aren't much more than 30/sec) on two 2.4GHz CPUs.

I think I'll start by installing and using APC, then take a look at whether I'm (technically) capable of porting Sphinx over (or if the OS/2 AMP maintainer will do so for me).

Question: If I can get Sphinx installed and running, can I correctly presume that I can turn smf_messages into an INNODB format?

Something like that · January 29, 2011, 02:21:01 PM

Quote from: RobertMfromLI on January 29, 2011, 01:25:09 PM
Question: If I can get Sphinx installed and running, can I correctly presume that I can turn smf_messages into an INNODB format?

Yes, definitely.

I run entirely InnoDB, using Sphinx for search.

RobertMfromLI · February 04, 2011, 12:35:04 PM

Here's an oddity I cannot explain (though I can speculate maybe). The only change I've done in a little while was doubling the thread_concurrency value (to twice the recommended - currently at 8 for a dual CPU machine). This seems to have alleviated table corruption issues.

I'm at a loss to explain why... though I do know Warp Server for eBusiness was designed for exceptional thread handling... but I am not sure how using more threads is actually alleviating the problem.

Something like that · February 04, 2011, 06:16:32 PM

Quote from: RobertMfromLI on February 04, 2011, 12:35:04 PM
Here's an oddity I cannot explain (though I can speculate maybe). The only change I've done in a little while was doubling the thread_concurrency value (to twice the recommended - currently at 8 for a dual CPU machine). This seems to have alleviated table corruption issues.

I'm at a loss to explain why... though I do know Warp Server for eBusiness was designed for exceptional thread handling... but I am not sure how using more threads is actually alleviating the problem.

I'd file a bug with MySQL then!

RobertMfromLI · February 04, 2011, 06:24:25 PM

Quote from: «Mark» on February 04, 2011, 06:16:32 PM
Quote from: RobertMfromLI on February 04, 2011, 12:35:04 PM
Here's an oddity I cannot explain (though I can speculate maybe). The only change I've done in a little while was doubling the thread_concurrency value (to twice the recommended - currently at 8 for a dual CPU machine). This seems to have alleviated table corruption issues.

I'm at a loss to explain why... though I do know Warp Server for eBusiness was designed for exceptional thread handling... but I am not sure how using more threads is actually alleviating the problem.

I'd file a bug with MySQL then!

Will do... I'm going to give it another week first though, and see what happens. I'll also re-enable the old settings, debug mode and try to capture an error or crash log.

Thanks to everyone who's responded!

Deaks · March 05, 2011, 07:35:19 PM

RobertMfromLI, did you get this sorted? its been longer than a week

I have marked this as solved until you get back to us

RobertMfromLI · March 05, 2011, 08:17:13 PM

Quote from: Runic on March 05, 2011, 07:35:19 PM
RobertMfromLI, did you get this sorted? its been longer than a week

I have marked this as solved until you get back to us

Hi Runic,

I guess the answer would be "worked-around", so, for now (since there isn't such an option), "Solved" should be fine. I'm guessing from my experience on both Linux and OS/2 boxes with SMF, that this is indeed a bug in MySQL, which I am currently working around by doubling the thread_concurrency settings. As of now (since I mentioned I'd done that), I havent had the type of problems I had been previously experiencing. Auto recovery from crash works now, and table crashes are a lot lot less frequent.

In the event we get somewhere with tracking down the bug and the MySQL team finds a solution, I'll post here and change the solution to that post. I think I may need to fire up a Linux box to do that though (even though the OS/2 sources are nearly a direct compile of the Linux version) just to make things easier and bypass that hurdle (ensuring they understand that I've managed to replicate this on Linux as well, so it's not OS/2 specific).

Best,
Robert

News:

Performance issues on certain queries (and some semi-related questions)