SMF Support > SMF 1.1.x Support

CR + LF file corruption after FTP transfer

(1/7) > >>

GigaWatt:
I don't know exactly where to post this... it's not exactly related to SMF itself, but I was Googling around the past few weeks for a solution to this problem to no avail.

The problem happened after the main admin decided that (for some unknown reason ::)) to just... abandon the forum. And not just abandon it, he removed it from the hosting service (he just removed the top level domain and... you get the idea, nothing loads ::)). I was a mod, but I decided to keep the forum going, along with some of the other admins of the forum. Since none of us had access to the hosting, I asked one of them to (politely ::)) ask the main admin if we could have a copy of the script and the database. I personally knew the main admin... I was surprised when I received a copy of the database and the script.

Anyhow, I had to update the script and the database, it was running an older version of SMF (I think 1.1.16), blah blah blah... to cut a long story short, everything turned out well... except for the attachments. Something must have happened during the FTP transfer (the main admin wrote me that he did the backup of the script via FTP). After some file analysis and hex editing, binary compares and whatnot, I learned that this is called a "CR + LF file corruption". Lucky me ::).

I have no idea if he (the main admin) did this on purpose or not... in the end, it doesn't matter. What matters to me is... is there a way to fix the files?

Now, after some online searching, I found out that the most common type of corruption is just a simple "0D 0A" binary modification. For those who have no idea what these bytes represent, 0D represents carriage return (CR) and 0A represents line feed (LF). Unix/Linux uses LF (0A) to end a line (and start a new one), Macs use CR (0D) and Windows uses both of them, CR + LF (0D 0A). Most FTP programs, when set to transfer FTP files in ASCII mode, just make a simple binary change (change 0D to 0A or vice versa, in some cases add 00 before or after the line end), but they don't "eat" bytes. In this particular case, whole bytes were missing... which in return, means that the file downloaded via FTP had a smaller file size than the original file.

Here's a screen shot of a binary compare of an original (left) and a corrupted (right) file.



Basically, you have to add bytes to the file based on a certain criteria (binary string) that appears in the file.

Now, I searched and searched... I didn't find any program that can do this. I can do it manually for some of the attachments, but I can't do it for all of them (around 7000 attachments).

Basically, what I'm looking for is a program that can do this in bulk. Give it a bunch of files, set the binary criteria, patch the files.

Or... if someone has a better idea, I'm opened to suggestions. In case you're thinking of suggesting another backup of the forum's script and files... already tried to force that with the main admin... it's a no go. He's a coder... and an admin, so... you know... he's god ::)... he knows best ::)... he thinks I'm not doing something right and either won't make another backup or... he's deleted the forum from the server (I think this is the more probable scenario). In either case, I can't get another copy of the files.

Sir Osis of Liver:
If you used FileZilla to download/upload attachments, and transfer type was not set to binary, the files were transferred as ascii and are corrupt.  Several schemes were posted to fix the files, but none of the ones I tried ever worked.  I don't know of any software that will do what you're suggesting, possibly someone else does, otherwise you would have to write it yourself.

Aleksi "Lex" Kilpinen:
Are the originals corrupt for sure, will they not work at all if you give them a proper filetype on your local harddrive? The attachments folder(s) should be transferred as binary both ways to avoid this problem.
If they are corrupt already, I haven't heard of a good way to save them.

GigaWatt:

--- Quote from: Sir Osis of Liver on February 26, 2018, 09:37:55 PM ---If you used FileZilla to download/upload attachments, and transfer type was not set to binary, the files were transferred as ascii and are corrupt.
--- End quote ---

I have no idea which software was used. I wasn't the one who made the backup.

But there was a MACOSX folder inside the archive (the backup of the script)... I guess it could have been done on a Mac.


--- Quote from: Aleksi "Lex" Kilpinen on February 26, 2018, 11:42:36 PM ---Are the originals corrupt for sure, will they not work at all if you give them a proper filetype on your local harddrive?
--- End quote ---

No. Actually, that's how I got to the corrupted file in the example. I checked the hash in the database and searched for that string in the attachments, just added the proper extension (pdf in the example) and compared it to the "original" (I had a local copy of the file).

And yes, all of them are corrupt. I have local copes of some of the files... all of the attachments that I compared to the "original" files were corrupt. I have no idea if all of them are, but at least those that contain "0D 0A" in sequence as a binary string are corrupt for sure.


--- Quote from: Aleksi "Lex" Kilpinen on February 26, 2018, 11:42:36 PM ---The attachments folder(s) should be transferred as binary both ways to avoid this problem.
--- End quote ---

That's what I though ::). As it turn out, the default setting for most FTP programs is to transfer files in "Auto" mode in both ways. In the process, they analyze the file extension and decide weather they should transfer the file and change the line endings (CR, LF or CR + LF) or transfer the file untouched. And get this... some of them assume that if a file has no file extension (like the attachments), that it's a text file ??? :o. I assume this is how they got corrupted.


--- Quote from: Aleksi "Lex" Kilpinen on February 26, 2018, 11:42:36 PM ---If they are corrupt already, I haven't heard of a good way to save them.
--- End quote ---

I thought that might be the case...

I'm not a coder by occupation, I work with low level stuff mostly (microcontrollers, electronics and whatnot), but I guess I can code something like this... it shouldn't be too hard. Except for one thing :S.

Let's say you have a binary string somewhere and let's say this binary string contains the problematic string "0D 0A".

... 00 FF 00 FF 0D 0A 00 FF 00 FF ...

Now, here is what the FTP program did.

... 00 FF 00 FF 0A 00 FF 00 FF ...

Compare the upper with the lower sequence. It's clear that 0D is missing. The FTP program literally "ate" a byte each time it encountered the string 0D 0A.

This is easily fixable if we knew for certain that each time we encounter 0A in the corrupted file, we have to add 0D in front of it and just move all of the other bytes "to the right"... but, it's not that simple.

Let's say we encounter the following string in the corrupted file.

... 00 FF 00 FF 0A 00 FF 00 FF 0A 00 FF 00 FF ...

We would assume that the original should be like this.

... 00 FF 00 FF 0D 0A 00 FF 00 FF 0D 0A 00 FF 00 FF ...

But what if this string in the original file was actually like this.

... 00 FF 00 FF 0A 00 FF 00 FF 0D 0A 00 FF 00 FF ...

We would add an extra 0D in the file and, once more, the file is corrupt :S.

Based on the assumption that these are reserved characters and that most of the time they are used in combination, I guess my question is, what are my chances of being wrong and adding that extra 0D byte if there wasn't one present in the original ???. I would code something like this, but only if I knew that I could save a large portion of the attachments (lets' say 20% and above). If more than 90% would still be corrupt, I wouldn't bother.

Aleksi "Lex" Kilpinen:
I'm not sure of this, but on a hunch I'd say you would be playing a lottery - it might work for some files by chance, might not for others. Because we can't know what the files actually contained originally, reconstructing them is not really easy - even if your assumption sounds logical.

Navigation

[0] Message Index

[#] Next page

Go to full version