Always garbage characters in forums after database restore ???

Started by JimSelleck, July 03, 2008, 01:55:44 AM

Previous topic - Next topic

JimSelleck

I did a search on this problem since I have ALWAYS had it, but it looks like its just me.  Every time I move my SMF forum to a new host, everything works fine, but the new forum is filled with odd special characters (particularly multi-character graphic substitutions for apostrophes and double quote marks).  This happens whether I let SMF to do the backup or if I use phpMyAdmin.

It looks like the default backup and restore is using a different character set and garbage comes in in the translation.

The first time this happened, 2 years ago, I got a team of admin users together and we painfully FIXED all of the errors MANUALLY over about a 2 week period.  Now, we are moving hosts again but the board is much larger and it is no longer practical to fix the garbage manually.

Isn't there a way to backup and restore an SMF forum database so that the messages look THE SAME with all special characters transferred after the move ????????

What is the trick?

JimSelleck

Problem #2 -- None of my attachments are showing up.  I know the files are copied directly over and the attachments directory has been correctly set (checked, rechecked and checked AGAIN)

Attachments FIXED.  Filezilla automatically FTP transferred them in ASCII mode instead of binary.

The special character thing is REALLY strange though.  It only affects certain posts.

JimSelleck

Okay... nobody responded, so I worked out a fix of my own.  I will share it here in case anybody else has the problem in the future.

Here's the deal:  It seems PHP is (at this time) kind of 8-bit about character sets, not terribly unicode friendly, or so I have been reading.  So sometimes you get some oddities when users put graphic characters and unexpected extended characters in posts, in either titles and/or body.

PHPMyAdmin, when it does a scripted backup of your SMF forum, will TRY like heck to translate those odd characters, but occasionally may come up with strange things like:

’ (apostrophe)
‘  (back apostrophe)
â€"   (em dash)
â€"   (en dash)
…   (elipsis: ...)
“ (open quote)
”  (close quote)

Our forum (trutopia.org) is a haven for professional writers, and we have people not only using the SMF editor, but also copy/pasting in manuscripts from Word, Word Perfect, Mac, OpenOffice, Final Draft, Movie Magic Screenwriter, not to mention pasting in web pages, newsletters, and articles from every browser from IE to Firefox to Opera to Safari.....

In other words, after I backup and restore my SMF database I generally have about 2,500 corrupted characters spread through the forum to fix, and that is NOT something you want to try to do manually one at a time.  TRUST ME!  The first time it happened (last year) it took a team of 10 admins 2 weeks to hunt them all down and fix them.

The following script can be run in PHPMyAdmin to go through your "messages" table to find and fix the aforementioned errors in the "body" and "subject" fields.  Note that I use the default table name prefix of "smf_" which is reflected in the script.  Obviously you will need to do a search and replace through the script before running it on your system if you use a different table name prefix.

WARNING:  I take NO responsibility for any munging of data that might occur from running modified versions of this script.  ALWAYS do a full backup of your SMF database BEFORE you run any script of this nature!!!!!  ONLY run SQL scripts in PHPMyAdmin if you are pretty dang sure you know what you are doing.

==== SQL follows =====

UPDATE smf_messages
SET body=replace(body, '’', ''');

UPDATE smf_messages
SET body=replace(body, '‘', ''');

UPDATE smf_messages
SET body=replace(body, 'â€"', '—');

UPDATE smf_messages
SET body=replace(body, 'â€"', '–');

UPDATE smf_messages
SET body=replace(body, '…', '...');

UPDATE smf_messages
SET body=replace(body, '“', '"');

UPDATE smf_messages
SET body=replace(body, '”', '"');

UPDATE smf_messages
SET subject=replace(subject, '’', ''');

UPDATE smf_messages
SET subject=replace(subject, '‘', ''');

UPDATE smf_messages
SET subject=replace(subject, 'â€"', '—');

UPDATE smf_messages
SET subject=replace(subject, 'â€"', '–');

UPDATE smf_messages
SET subject=replace(subject, '…', '...');

UPDATE smf_messages
SET subject=replace(subject, '“', '"');

UPDATE smf_messages
SET subject=replace(subject, '”', '"');

greyknight17

Thanks for posting the solution :)  I'm sure it will benefit others who run into a similar issue.

Topic marked as solved.

Advertisement: