News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Streamlining large forums by splitting the database?

Started by Conrad, August 28, 2024, 08:50:04 PM

Previous topic - Next topic

Conrad

Is there a feature already in existence that helps large forum databases (10 GB+ size) become more manageable and less server-intensive by splitting old posts (maybe by year?) into separate database(s) so that the main one that is most used is smaller in size?

Kindred

#1
No...  and I don't think that doing that would have the effect that you think either
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Sir Osis of Liver

What you could do is clone the forum and set it up as an archive, with no permissions to post/pm, members can just read old topics.  Then you can remove all topics not posted in for a specified length of time (i.e., a year) on the production install so that only active topics remain.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

Arantor

Out of interest where actually is that size in your database?

But also, no, sharing it really wouldn't have the performance boost expected unless you planned on making everything older completely read only. (No likes, no view updates, no replies.) And even then... less likely than you'd imagine.

There are more effective techniques for scaling than redesigning the entire system for what is a fringe use case. My example was the largest regular SMF, it had 90 million posts, was getting 90k posts per day and required just two servers (dedicated DB server, dedicated front end server)
Holder of controversial views, all of which my own.

Conrad

My database takes up several gigs, of course mainly due to posts.

When reading about features across various forum software solutions I came across the archive feature which is supposed to streamline large databases by splitting up the posts table. The admin basically selects such-and-such date and then all older posts get moved from the main posts table to a separate, archived-posts table that is read-only.

Apparently this allows the current posts table, that is then a lot smaller, to run more efficiently from a server standpoint as all the masses of old posts have been moved to the read-only archive table.

Does this make sense from a technical standpoint?

vbgamer45

Yes but at several gigs is not large. You can optimize mysql many ways, using latest software, innodb and then using mysql tuner..

partitioning is something as well but again I wouldn't even look at that till you get extremely large.
https://planetscale.com/blog/what-is-mysql-partitioning
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

Arantor

It made sense 20 years ago when systems didn't have the scaling capacity we have now - my first VPS in 2006 had 160MB RAM, the same plan from the same company now is an 8GB RAM VPS.

Archiving just isn't the requirement it used to be.

Better question: do you actually have a performance bottleneck? How are you determining it if so? Chances are there are things you can do to fix that without tearing up the entire system.
Holder of controversial views, all of which my own.

Conrad

Quote from: Arantor on September 03, 2024, 05:35:04 AMBetter question: do you actually have a performance bottleneck? How are you determining it if so? Chances are there are things you can do to fix that without tearing up the entire system.

I just remember how our host moved us away from shared hosting not so much due to traffic but the size of the database slowing things down as it grew into several gigs+ and started being a burden.

I assume having all posts ever written stored in just one table hasn't helped much. That's why I figured the idea of splitting up posts into two tables and making one of them just a read-only archive might help, and I see one of the forum software solutions (albeit a pricy one) doing just that.

Currently I am looking at a new server though so it should handle everything, but I thought it would be useful to keep things tidy this way with new and old posts in separate tables.

Arantor

Like I said, I've seen an SMF forum with 90,000,000 posts all in one table perform just fine.

If you're at the "few gigabytes" stage, you're probably in small VPS territory (usual recommendation, 100k to 250k posts is the point you should be there), and you don't need to split the database at that point.

I believe vBulletin does have the archive option but that's mostly because it's had it for the last 20 years, not because it's that useful now. Notably, vBulletin's spiritual successor XenForo does not have it to the best of my knowledge, and they definitely have customers in the 100 million post range who don't need it. E.g. IGN's board is 140 million posts with no archive.
Holder of controversial views, all of which my own.

Aleksi "Lex" Kilpinen

Just to support that, I run a forum that's about 4-5gb I think (haven't checked in a while), and running fine on a VPS, in a single database.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

Kindred

He'll, I run a multi gigabyte database on a shared server, because we don't get nearly as much traffic as we used to.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Conrad

Quote from: Arantor on September 03, 2024, 09:04:13 AMI believe vBulletin does have the archive option but that's mostly because it's had it for the last 20 years, not because it's that useful now.

As you mentioned it's an old feature (2011?) but it's IPS/Invision that has it.

I have a funny feeling though that server processors and storage technology have advanced much faster than the growth of discussion forums which are mostly text stored in a database, so as many of you have mentioned even a shared account (or VPS) today will suffice for something that might have required a dedicated server years ago.

But looking back at when the feature was introduced they were convinced it would really lower the server load:
- Free up your forum post table so it runs speedy again by giving MySQL less old data to have to work with all the time
- Searches are faster, look-ups are faster and everything zips along
- In testing, it actually saves 30% database access showing an archived topic (they mentioned earlier that: It doesn't have to load up the editor javascript or load up any follow data from the database)

I have about 5.6M posts that will soon be imported into SMF so I hope everything runs smoothly! :)

On a side note, are most VPS or shared accounts configured automatically to work with SMF in terms of SQL/PHP/etc.? Are there any database-specific options/software to aim for so that a large(r) SMF forum can run most efficiently?

Arantor

I'm surprised, I thought they'd dropped that years ago, especially given the changes in architecture between 3 and 4, though given how they're actively pushing people to the cloud where that kind of scaling is firmly not your problem, I'd be highly surprised if it remains in 5 (currently in beta)

It's also worth noting that Invision Community (the company is IPS, the software renamed years ago) does very different things around loading CSS and JS - loading the editor in SMF is at most 2 database calls (often 1) and that one is frequently cached anyway assuming the site isn't running with caching turned off. Serving assets isn't done via the DB (the common call is for getting the smileys for the editor, the less common for the message icon), just as an example, and if the topic is in a board that is read only, even that won't happen.

Worth noting that Invision's archive mode limits further interaction, including removing topics from the site's search function, which may or may not be a deal breaker.

As far as VPSes go, you want something with a decent amount of RAM (the more the beter, ideally) and a caching solution. 2.0 supports memcache, 2.1 supports a few more things and I believe there's a Redis plug-in kicking around. Other than that, any host actively using SSDs will win over hosts that aren't (and plenty don't because legacy HDDs are still surprisingly cost effective for systems that will run multiple drives in parallel)
Holder of controversial views, all of which my own.

Advertisement: