News:

Wondering if this will always be free?  See why free is better.

Main Menu

Converting big forum from Yabb to SMF

Started by evgenydeep, May 20, 2008, 06:53:08 AM

Previous topic - Next topic

SleePy

Give this one a try.

I believe I fixed some issues in it. The good and bad karma should of be handled it looks like.
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

evgenydeep

SleePy, thanks, I saw you added block_size. Something other?
I have tried your version, but it strangely not ALTER tables sometimes, which produce errors on next steps. I'm not trying to investigate why and where... I simple return to old one version, which works almost well. Almost becasue i still not converted my forum :) But now I'm serious, and i want to fight and finish it finaly, becasue if it will be Yabb forum, my server will not survive on autmn, when many users will come to my forum...


I have other issue.  I looked on the convertion process and realize that on step 8(function convertStep8), it become unexpected slowly on the query like
UPDATE `forum_smf`.smf_log_topics\n                    SET ID_TOPIC = 38\n                    WHERE tempID = 1111015939
I have almost 4mln rows in smf_log_topics and around 30 000 topics in smf_topics.
So, to really speed up the process, is only needed to add index to the tempID column, and then drop it.
Otherwise this 8 step can lasts months :)

How do you think?

evgenydeep

Other one issue/bug:
On STEP 4 find next code:

      if ($_GET['substep'] >= -4)
      {
         $knownColumns = array(
            'temp_toName' => 'DROP temp_toName',
         );
         alterTable('personal_messages', '', $knownColumns, false, true);

         pastTime(-5);
      }

I suppose it should be 'DROP COLUMN temp_toName' instead 'DROP temp_toName'
?

evgenydeep

3) On step 8(function convertStep8), during query
         $knownKeys = array(
            'PRIMARY' => 'ADD PRIMARY KEY (ID_TOPIC, ID_MEMBER)',
         );
         $knownColumns = array(
            'tempID' => 'DROP COLUMN tempID',
         );
         alterTable('log_topics', $knownKeys, $knownColumns, '', true, true);

I got following error:


<br />
<b>Notice</b>:  Undefined variable: endl in <b>/home/forumgr/public_html/smf102/yabb21_to_smf.php</b> on line <b>800</b><br />
Duplicate entry '20706-8050' for key 1
                ALTER TABLE `forumgr_smf102`.smf_log_topics
                ADD PRIMARY KEY (ID_TOPIC, ID_MEMBER),
                DROP COLUMN tempIDhosting#


Why to not check by aditional one sql-query - is there duplicates or not?
Can You tell me correct SQL query(-ies), 1) to check for duplicates and then 2) delete duplicates, but let one recored will remain only one from duplicates.
?
Thanks.

SleePy

Well there should hopefully not be duplicate topics :|
It does happen though, the flat file converters dont make it very fun because they remove key during conversion that is supposed to help against this.
This may work to show all duplicate topics in the log_topics table. Replace smf_ with your prefix

SELECT ID_TOPIC
FROM smf_log_topics
WHERE ID_TOPIC != 0
GROUP BY ID_TOPIC, ID_MEMBER
HAVING ID_TOPIC > 1


But it should show the id of a topic that has more than 1 ID_TOPIC and ID_MEBER (both are the same).
The next part would be to find and delete those.
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

evgenydeep

Moreover, I have the same bug with poll_choices:
<b>Notice</b>:  Undefined variable: command_line in <b>/home/forumgr/public_html/smf102/convert.php</b> on line <b>1148</b><br />
Duplicate entry '275-0' for key 1
                ALTER TABLE `forumgr_smf102`.smf_poll_choices
                ADD PRIMARY KEY (ID_POLL, ID_CHOICE)hosting#

And moreover, I saw such bugs have not me only. Try to search string 'Duplicate entry' on SMF community forum and you will notice that many users have the same bug. So, it would be good before adding any PRIMARY KEY(UNIQUE KEY) check - is there duplicates or not, and if there - then delete them.

BTW:  your sql is not working, I used

SELECT ID_TOPIC, ID_MEMBER, COUNT(*)
FROM smf_log_topics
GROUP BY ID_TOPIC, ID_MEMBER
HAVING COUNT(*) > 1

but on 2mln rows table without index it working VEEEEEEEERY :-) slow

and for polls:
SELECT ID_POLL, ID_CHOICE, COUNT(*)
FROM forumgr_smf102.smf_poll_choices
GROUP BY ID_POLL, ID_CHOICE
HAVING COUNT(*) > 1

evgenydeep

18 step

      if ($_GET['substep'] >= -2)
      {
         $knownKeys = array(
            'poll' => 'ADD UNIQUE INDEX poll (ID_POLL, ID_TOPIC)',
         );
         alterTable('messages', $knownKeys, '', true);

         pastTime(-3);
      }

Is it error? Is it should be messages table? But in messages table there is no field ID_POLL

evgenydeep

it looks that on 18 step should be not messages, should be topics, becasue only in topics exist ID_POLL, ID_TOPIC
Yeah... and on clean install poll index exist in topics only.

Guys... :-) How many users already converted Yabb to SMF? :)

evgenydeep

Another thing: during Cleaning up steps(14 and 15), no indexes creating. May be problems with the substeps again?

evgenydeep

I decided to remove duplicates by following SQL

CREATE TABLE forumgr_smf102.smf_poll_choices_temp AS SELECT * FROM forumgr_smf102.smf_poll_choices WHERE 1 GROUP BY ID_POLL, ID_CHOICE;
DROP TABLE forumgr_smf102.smf_poll_choices;
RENAME TABLE forumgr_smf102.smf_poll_choices_temp TO forumgr_smf102.smf_poll_choices;

and Indexes should be added manually also. It will be good to add this into converting script. At fiest - check, is there duplicates or not, and then if yes - remove them.


evgenydeep

On step 8

         $knownKeys = array(
            'PRIMARY' => 'ADD PRIMARY KEY (ID_TOPIC, ID_MEMBER)',
         );
         $knownColumns = array(
            'tempID' => 'DROP COLUMN tempID',
         );
         alterTable('log_topics', $knownKeys, $knownColumns, '', true, true);


Here ADD PRIMARY KEY (ID_TOPIC, ID_MEMBER) , but I look, on clean installed SMF
PRIMARY KEY (ID_MEMBER, ID_TOPIC)

Is it important?

SleePy

#31
evgenydeep,

Many users have converted to SMF from Yabb :P
Just not all of them have been big boards where these minor issues come poping out.

Thanks for the info on the query for finding duplicates. Did that table have a index for                  ID_TOPIC itself? If this is missing it would explain why it took a while. Its not a unique index or anything so adding it shouldn't cause issues but would speed up that query. As well a limiter on it would help.

As for step 18, are you sure you where using the .php file I gave you?
Lots of those steps got changed because of the inequality worked better the opposite way preventing the columns from being dropped to early.

Edit,
Attaching the working copy I have from SVN.
If you find bugs let me know, The hardest part is getting these flat file converters to work right especially from big boards where we don't go through 10 steps in a second :P
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

evgenydeep

Quote from: SleePy on August 04, 2008, 12:57:18 PM
Thanks for the info on the query for finding duplicates. Did that table have a index for                  ID_TOPIC itself? If this is missing it would explain why it took a while. Its not a unique index or anything so adding it shouldn't cause issues but would speed up that query. As well a limiter on it would help.
About what table you discuss? smf_log_topics? Sorry, not understand.

Quote
As for step 18, are you sure you where using the .php file I gave you?
Lots of those steps got changed because of the inequality worked better the opposite way preventing the columns from being dropped to early.
Yes, i look into your file also, here is the same bug :-)
On 18 step should be not messages, should be topics, becasue only in topics exist ID_POLL, ID_TOPIC

SleePy

yes sorry, I was asking about the smf_log_topics. It should have a primary index and another index of id_topic.

What line you seeing that on?
In fact in that file I gave you, its slightly different.
This is what I am seeing:

        if ($_GET['substep'] <= -3)
        {
            $knownKeys = array(
                'poll' => 'ADD UNIQUE INDEX poll (ID_POLL, ID_TOPIC)',
            );
            alterTable('messages', $knownKeys, '', true);

            pastTime(-4);
        }


If you look it is now substep -3 in step 18. The inequality is going the other way as well (This seemed to fix issues with changes to the table being done before the conversion was complete).
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

evgenydeep

Should be as quoted.  Sholuld be not messages table, must be topics table. Becasue only in topics table exist fields ID_POLL, ID_TOPIC.

$knownKeys = array(
'poll' => 'ADD UNIQUE INDEX poll (ID_POLL, ID_TOPIC)',
);
alterTable('topics', $knownKeys, '', true);



Quote
If you look it is now substep -3 in step 18. The inequality is going the other way as well (This seemed to fix issues with changes to the table being done before the conversion was complete).
Aha, thanks, I see. I'll look on it.

evgenydeep

One another bug. Serious bug.
In "Converting membergroups" step function convertStep1().
I notice that NoPost Groups has not been converted.

How it now:
elseif (preg_match('~\$NoPost\[(\d+)\] = [\'|"]([^|]*)\|(\d*)\|([^|]*)\|([^|]*)~', $group, $match) != 0)

How it should be:
elseif (preg_match('~\$NoPost\[(\d+)\} = [\'|"]([^|]*)\|(\d*)\|([^|]*)\|([^|]*)~', $group, $match) != 0)

SleePy

Ahh, I thought I had fixed the id_poll index issue. I fixed it

Thanks for the suggestion in the regex. I see from the other regex in membergroup conversion it looks right :)
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

Gilrod


SleePy

Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

Gilrod

#39
The one with the error in the 2.2 thread  :P

this one

Advertisement: