SMF 2.1.3 has been released! Take it for a spin! Read more.
Started by AncientDragonfly, April 02, 2010, 11:52:18 AM
Quote from: Arantor on May 03, 2010, 06:18:13 AMIt doesn't actually hurt to transfer everything as binary, really.The whole concept of text transfer as text is an outdated concept that's really not an issue with modern editors; it's only for the olden days when editors couldn't handle Linux and Windows file endings (like Windows Notepad still can't, heh), if you're using a modern editor like Notepad++, it's a non issue.
Quote from: Dismal Shadow on May 02, 2010, 06:26:11 PMOk, I haven't backup in a week so I am gonna backup today but I want to make sure the image shown:http://www.simplemachines.org/community/index.php?topic=377117.msg2594267#msg2594267is correct? SO that it doesn't not mess up the avatars and attachments.
QuoteASCII can be transferred as binary without corruption.
Quote from: Arantor on May 03, 2010, 06:27:38 PMQuoteASCII can be transferred as binary without corruption.Not entirely true, in fact, it depends what you define as 'corruption'.If you mean 'without corruption' as the actual, byte for byte file it will be corrupted.The file will be silently converted for line endings. For example, go grab the SMF install package and open any SMF file in Windows Notepad. Not Notepad++, not anything fancy, just normal Notepad. You get a mess where everything is on one line.
QuoteNow download the same file from FTP in ASCII mode and open it. Now it's not a mess. Why? Because FileZilla has silently converted the line endings on your behalf.If you have ANY doubt, use binary mode. I expect the files I send and receive to be the actual files, not a modified-in-ANY-way version.
Quote from: Arantor on May 03, 2010, 06:30:01 AMText mode == ASCII mode, they're one and the same.When I'm using either FileZilla or WinSCP, I have it set to treat *everything* as binary.
Quote from: Dismal Shadow on May 03, 2010, 06:39:07 PMOk uncheck the one from the image and set it to binary mode instead of auto. I open with textwrangler, seem fine. No lines messed up.
QuoteRFC 959File Transfer Protocol[snip]3.4.3. COMPRESSED MODE There are three kinds of information to be sent: regular data, sent in a byte string; compressed data, consisting of replications or filler; and control information, sent in a two-byte escape sequence. If n>0 bytes (up to 127) of regular data are sent, these n bytes are preceded by a byte with the left-most bit set to 0 and the right-most 7 bits containing the number n.
QuoteRFC1635 - How to Use Anonymous FTP- You may set BINARY mode to transfer executable programs or files of data. Type "binary" to do so. Usually FTP programs assume files use only 7 bits per byte, the norm for standard ASCII-encoded files. The BINARY command allows you to transfer files that use the full 8 bits per byte without error, but this may have implications on how the file is transferred to your local system.
QuoteFTP was developed at a time when typical modem speeds were 110 to 300 bits per second (as compared with 28,000 to 56,000 today). Since ASCII only used 7 bits, long files could be transmitted more quickly by not sending all the unused bits.The big drawback is that if a file that uses all 8 bits in each byte is accidentally sent using ASCII transfer, it will lose 1/8 of its information content. In most files, even the loss of one bit is enough to make it invalid, and losing 1/8 makes them totally unreadable. So ASCII transfer can be fatal to a file's health!With today's higher speeds, the time lost by sending all 8 bits of an ASCII file is practically unnoticeable. But FPT has incorporated features into ASCII transfer that make it useful for other reasons, so the two modes remain.
QuoteBinary vs ASCIIThe internet was developed to transfer information in 7-bit packets.Computers store data in 8-bit bytes.Normal ASCII text data is stored using only 7 bits per byte (the 8th bit is always zero). (This kind of document is created using programs like notepad.exe) As a result, it is fairly easy to transfer this type of data over the internet. This is one reason the html pages are always written in ASCII.Other types of data (images, programs, MS Word documents, and the like) use all 8 bits to store data. As a result, special algorithms are required to transfer 8-bit data over a 7-bit interface.When you are using FTP, you need to be aware of the type of data being transferred - ASCII or binary (7-bit or 8-bit).If you try to transfer 8-bit data in ASCII mode, data WILL be lost (the 8th bit will be set to zero).
QuoteTo: Sandeep Srivastava , ietf at ietf . orgSubject: Re: [RFC 959] FTP in ASCII modeFrom: John C KlensinDate: Tue, 21 Feb 2006 03:34:27 -0500[snip]> I knew that a FTP transfer in ASCII mode does EOL and EOF> conversions based on the OS of the receiving system.No, it doesn't. That was part of the point. It does no EOFconversions at all. The command and data channels wereseparated for several reasons, but the desire to stay out of theEOF business was an important one. And the server is requiredto convert whatever line-end convention it uses to CRLF, and anycharacters it uses to ASCII, and transmit that over the wire.If the client then converts from CRLF and ASCII to some localconvention, that is its business, not that of the protocol. Inother words, there are, at most, conversions to and from CRLFand ASCII. There are no FTP-specified conversions based on theproperties of the receiving system.
Quote from: Arantor on May 05, 2010, 06:07:58 PMOriginally, yes that was the case. But 959 has been superceded multiple times since 1985, meaning that in the current specifications, it's not a strict 7-bit safe only environment.
QuoteAbstract The File Transfer Protocol, as defined in RFC 959 [RFC959] and RFC 1123 Section 4 [RFC1123], is one of the oldest and widely used protocols on the Internet. The protocol's primary character set, 7 bit ASCII, has served the protocol well through the early growth years of the Internet. However, as the Internet becomes more global, there is a need to support character sets beyond 7 bit ASCII. This document addresses the internationalization (I18n) of FTP, which includes supporting the multiple character sets and languages found throughout the Internet community. This is achieved by extending the FTP specification and giving recommendations for proper internationalization support.
Quote1 Introduction As the Internet grows throughout the world the requirement to support character sets outside of the ASCII [ASCII] / Latin-1 [ISO-8859] character set becomes ever more urgent. For FTP, because of the large installed base, it is paramount that this is done without breaking existing clients and servers. This document addresses this need. In doing so it defines a solution which will still allow the installed base to interoperate with new clients and servers. This document enhances the capabilities of the File Transfer Protocol by removing the 7-bit restrictions on pathnames used in client commands and server responses, RECOMMENDs the use of a Universal Character Set (UCS) ISO/IEC 10646 [ISO-10646], RECOMMENDs a UCS transformation format (UTF) UTF-8 [UTF-8], and defines a new command for language negotiation. The recommendations made in this document are consistent with the recommendations expressed by the IETF policy related to character sets and languages as defined in RFC 2277 [RFC2277].
Quote2 Internationalization The File Transfer Protocol was developed when the predominate character sets were 7 bit ASCII and 8 bit EBCDIC. Today these character sets cannot support the wide range of characters needed by multinational systems. Given that there are a number of character sets in current use that provide more characters than 7-bit ASCII, it makes sense to decide on a convenient way to represent the union of those possibilities. To work globally either requires support of a number of character sets and to be able to convert between them, or the use of a single preferred character set. To assure global interoperability this document RECOMMENDS the latter approach and defines a single character set, in addition to NVT ASCII and EBCDIC, which is understandable by all systems. For FTP this character set SHALL be ISO/IEC 10646:1993. For support of global compatibility it is STRONGLY RECOMMENDED that clients and servers use UTF-8 encoding when exchanging pathnames. Clients and servers are, however, under no obligation to perform any conversion on the contents of a file for operations such as STOR or RETR. The character set used to store files SHALL remain a local decision and MAY depend on the capability of local operating systems. Prior to the exchange of pathnames they SHOULD be converted into a ISO/IEC 10646 format and UTF-8 encoded. This approach, while allowing international exchange of pathnames, will still allow backward compatibility with older systems because the code set positions for ASCII characters are identical to the one byte sequence in UTF-8.
QuoteThis document also uses notation defined in STD 9, RFC 959 . In particular, the terms "reply", "user", "NVFS" (Network Virtual File System), "file", "pathname", "FTP commands", "DTP" (data transfer process), "user-FTP process", "user-PI" (user protocol interpreter), "user-DTP", "server-FTP process", "server-PI", "server-DTP", "mode", "type", "NVT" (Network Virtual Terminal), "control connection", "data connection", and "ASCII", are all used here as defined there.
QuoteASCII The ASCII character set is as defined in the ARPA-Internet Protocol Handbook. In FTP, ASCII characters are defined to be the lower half of an eight-bit code set (i.e., the most significant bit is zero).
Quote from: Arantor on May 05, 2010, 06:07:58 PMIf it were, every instance of SMF would be broken by FTP clients that treat PHP files as 'text' because there is a function in Subs.php that has 8-bit characters in it. As would any language file that isn't English.
Quote from: Arantor on May 05, 2010, 08:37:13 PMOK, hereafter I'm going to throw out the rulebook and the specs and go purely on what I have seen and observed because when it comes down to it, quoting spec after spec doesn't solve the issue - because it's how things implement the spec that matters.
Quote(Heck, if everyone observed the spec right, how come IE6 is such a pile of ****?)
QuoteAnyway.There are two VERY DIFFERENT issues here.One is 8-bit sanctity, one is line ending sanctity.The former is essentially a non issue for modern FTP servers and clients because funnily enough the spec is ignored, and they send 8 bit files as is. In today's UTF-8 world, it's pretty much a necessity to send 8-bit files, because UTF-8 is a full 8-bit text format.The latter is still very much an issue because line ending conversions get done. THIS is what breaks attachments, not loss of the 8th bit.How do I know this? I've seen attachments broken by this, specifically I've observed the differences in the files, and viewed them in more tolerant viewers. The files get broken part way through, not totally. If 8th bit loss were the case, on average 50% of the file would be damaged since the odds of any one bit being set are non exclusive 50%. But that's not the case.From experience, FTP between two Linux servers hasn't damaged such files. That isn't to say it wouldn't but in my experience that was the case.