How long should conversion take???

Started by dmorris68, March 14, 2008, 05:26:52 PM

Previous topic - Next topic

dmorris68

Trying to convert a very large phpBB 2.0.22 forum on my development box.  After a few failed attempts due to outdated convert/SQL scripts (why does the main SMF site even host the conversion script archives if the forum attachments are so much newer?), I finally got past the "duplicate key" errors on the topics conversion.  With the latest scripts from the sticky, I'm now getting an annoying but apparently harmless PHP error (Notice: Undefined index: charsets in D:\Tools\Development\Web\SMF\phpbb2\convert.php on line 378), but otherwise the conversion is proceeding... very slowly.

It's been running for nearly 24 hours now.  It's converting PM's at the moment (step=1&substep=23&start=1201500).  Yes, that appears to be 1.2M PM's.  Yes, this is a huge forum -- over 30K registered users, and the db is about 1.3GB uncompressed.  And it's running on a local instance of Apache/PHP/MySQL, on a 3.5Ghz quad-core machine with 4GB of RAM!  I hate to see how long it takes to convert the live server, which is dedicated but it's a much lower powered box.  I may have to just tell everybody to export their important PM's and then prune all PM's older than 6 months old or something.

Anybody want to place bets on whether this actually succeeds, and how long it will take?  :)  Will I win the record for the largest phpBB forum conversion ever?

EDIT: It's almost done with PM's.  We have 1.4M PM's total, and the script appears to have only about 70K left to do.  I'm impressed that it has run this long and chewed through this much data without blowing a gasket.  :)  Looks like the simple reason it's taking so long is that the script is converting only 500 PM's per pass.  That's ~2960 passes, with each pass taking ~22 seconds (including delay time between passes), which adds up to more than 18 hours just to convert PM's!  Ouch...

dmorris68

Well, conversion is done.  And surprisingly enough, everything looks okay.  I don't have an exact time because I went to bed at 2am and it was still running.  Based on where it was, I would estimate it ran another 1-2 hours, so that comes to around 29 hours total for the conversion.

When I get ready to this live, I'm going to shut down our forum for a day, snap the phpBB database, copy it down to my dev box (which is much beefier than the server), hack the convert script to greatly increase if not eliminate the 500 record iteration size, and run it locally.  Then I'll zip up the smf db and upload it back to the server, to be restored there.  I think that'll be a whole lot easier and quicker than trying to do the conversion on the live server.

Kudos to you for writing such a solid script.  Aside from the excruciatingly long runtime and a few minor quirks (like not supporting the phpBB subforum mod we used, instead it broke them out into top-level forums), for such a humongous conversion it went surprisingly well.  Now I have the major task ahead of me of re-writing our custom phpBB mods for SMF, a couple of which are major, but I was prepared to do that anyway with our initial intention of moving to phpBB3.

JayBachatero

Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

dmorris68

Quote from: JayBachatero on March 22, 2008, 12:38:32 AM
It shouldn't take so long.  Try doing the command line conversion.  http://www.simplemachines.org/community/index.php?topic=207760.msg1322067#msg1322067

I'm getting ready to do this conversion again and was about to try this CLI version of convert.php.   However I notice that it appears to be a bit stale compared to the current web version.  Much of the difference is in improved charset support, but I do see a new step inserted to "Remove all topics that have zero messages in the message table."  While these changes *probably* won't affect my conversion, are you planning to update the CLI version, or have you already and I'm missing it somewhere?

dmorris68

I guess you can disregard my previous post -- I was just looking through the source of the latest config.php and it seems CLI functionality is included.  So now there's only one config.php for both CLI and web, correct?

Oh well, I've been running the slighly older CLI config.php you linked to above, and 5 hours later it's still converting posts (~757K posts).  PM's were the chokepoint I remember from the last 29 hour conversion, and I've cleaned out about half, from 1.2M to about 650K, so I was hoping for a few hour task this time, especially with the CLI version.  But if the post conversion step is any indicator, the CLI version is going to take at least a full day to complete.  Doesn't seem to be a lot faster, and this is on a 2.6Ghz Core 2 Duo with 4GB RAM, with SMF running on a local XAMPP Apache/MySQL instance.

SleePy

There is no config.php we release.

The convert.php should be CLI compatible. I haven't tried recently but from Jays post it appears it should and with the work I was doing on convert.php to make it work for SMF 2.0 I see a lot of command line work being done :)
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

dmorris68

Sorry, that was a typo (a repeated one, apparently).  I meant "convert.php" obviously.

I had to cancel the CLI conversion 8 hours into it, and it was still converting posts.  I'm going to retry on another machine I can leave alone for as long as it takes.  I was just hoping that with more than halving our PM's and using the CLI version, it would be down to a few hours job, but it's not looking like it.

SleePy

How big is your site (as in members/posts/topics)
We could bump up how many posts it is processing at once as well. This would affect server performance though :|

As a suggestion if you have a test site, you could do the conversion there with the database and everything, and when you are ready push the database back to the live site.
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

dmorris68

That's exactly what I'm doing -- running the conversion on a faster local machine with a copy of the live site, then copying the db dump back to a couple of different test servers.  I mentioned that in the OP of this thread.  We have about 35K users, 750K posts, and about 650K PM's (down from 1.4M when I did the 29 hour conversion mentioned earlier in this thread).

As I mentioned above, I had already found the spot in the code to edit the record chunk size from 500 to a much larger number, since I don't care about server load for this conversion.  But then Jay suggested I use the CLI version, which is the track I'm on now -- I was hoping without Apache in the mix it would run an order of magnitude faster, but apparently not.  Looks like I might have to make that chunk size modification after all.

Today's attempt was on a Vista machine running XAMPP.  Now I'm about to try again on a 64-bit Ubuntu 8.04 box.

SleePy

Can you look at the process list for mysql and see if its running the queries or if its getting stuck doing something such as copying to a temp table?

If I see jay around tonight I will ask him if he has time to help out here quickly :)
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

dmorris68

The conversion finished on my Linux box overnight, running the latest script in CLI mode this time.  I intended to run it through the "time" process but forgot, and didn't want to abort it and start again, so I'm not sure how long it actually took.  But it was obviously a lot better than the first 29 hour conversion.  :)  Part of that is due to the fact I forgot how much I actually scrubbed private messages.  We had over 1.4M PM's on the first conversion.  Since then I cut them down to about 170K, not the 600K I was thinking.  So whereas before the PM conversion took the most time, this time it was the posts conversion.

SleePy

It could take a while for the conversion to take place no mater what you do really :|
We can't help much with how long it takes to select chunks of data and insert them.

I know a few of our big board users even sometimes have to plan out upgrades on their servers for the quietest time and stay up late sometimes, because it is expected when you are making changes to the database for things such as an upgrade for it to take a while.
We had one major update here our self once when making a big upgrade of the site to 2.0 and it took a couple of hours for it to run the upgrade. And thats with a dedicated mysql server :o
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

dmorris68

Well, all of my conversions have been on local boxes with local Apache and MySQL instances.  I would LOVE to have only a 2 hour conversion.  :)  My initial conversion took 29 hours.  That was on a dedicated, local server running a Quad Core @ 3.5Ghz and 4GB of RAM.  I attributed that to the Apache load relief logic and the sheer amount of data to be converted.  It would be cool to have an option for people who don't care about server load to disable or modify the chunk size and pass interval.  Just let it peg the box until it finishes.  For now the CLI is a sufficient compromise.  I'm guessing it took about 9-10 hours for this conversion, and that was on a slower Linux box (Opteron 175 + 2GB RAM).  So the CLI sure seems to be the way to go for large conversions.

SleePy

Yes, running CLI is way faster than most other things.

You could go to to the pastTime Function and just do a return on it at the start.
This would totally have it ignore the pastTime and just continue on forever.

You might also try something that I am not 100% sure if it will work. But starting the process up and having it only convert posts in one process and let another one run to convert the everything else besides posts.
It doesn't look like the convert.php offers this at the moment.
But I am thinking that if you have a convert.php and modified sql file in two locations, then you could just specific the paths to SMF and the old forum in these and let it run with these modified convert scripts.

Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

Advertisement: