SMF Development > Next SMF Discussion

[Change] I hope SMF all developer and other language provide change to UTF8 defa

(1/3) > >>

explorer1979:
Hi all,

I am Chinese, and discovered a thing on SMF, the default language of course is english, it is a international language, but for some other non-english language country, like Hong Kong SAR, China, Taiwan etc and all over the world other country ... and look forward of the near future all business will connect by the internet and net ...

UTF8 is the well new standard to solve all language problems ... if I am wrong, correction me  :)

So after I am using 1.0.5 + the T.Chinese UTF-8 language package (Actually, I want to using the S.Chinese lanaguage package, but it only have the 1.0.2 version, after installed it on 1.0.5 forum, have problems .., so choose T.Chinese UTF-8 version)

I just discovered, if let the user change to english language interface, it is default not using UTF8, so all the chinese word will be change to like #@$#RRETFA%R#$R etc ....

So I thinking ... why SMF developer not change english version default using UTF8? It will solve some non-english, and using UTF8 language package like me problems ...

And all other language package provide also help for makeing the UTF8 version of their language package ... it is wonderful and maybe fix many problems ...

Just my hope and suggestion...

And can someone teach me how to change my english forum, all change to UTF8??

CrayZ:
I think I need an explenation about this isue to. I'm havin the same problem ( I think )

Elmacik:
Correct me if I am wrong but UTF-8 is not a new standard and its the oldest :P

AzaToth:

--- Quote from: Elmacik on November 03, 2005, 10:43:08 AM ---Correct me if I am wrong but UTF-8 is not a new standard and its the oldest :P

--- End quote ---
UTF-8 is the most optimal encoding to use for the web.
This because this propities (from man utf-8)

* UCS  characters  0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit  ASCII  characters have the same encoding under both ASCII and UTF-8.
* All UCS characters > 0x7f are encoded as a multi-byte sequence consisting only of bytes in the range 0x80 to 0xfd, so no ASCII byte can appear as part of another character there are no problems with e.g.  ’\0’  or ’/’.
* The lexicographic sorting order of UCS-4 strings is preserved.
* All possible 2^31 UCS codes can be encoded using UTF-8.
* The bytes 0xfe and 0xff are never used in the UTF-8 encoding.
* The  first  byte of a multi-byte sequence which represents a single non-ASCII UCS character is always in the range 0xc0 to 0xfd and indicates how long this multi-byte sequence is. All further  bytes  in  a  multi-byte sequence  are in the range 0x80 to 0xbf. This allows easy resynchronization and makes the encoding stateless and robust against missing bytes.
* UTF-8 encoded UCS characters may be up to six bytes long, however the Unicode standard specifies no  characters above 0x10ffff, so Unicode characters can only be up to four bytes long in UTF-8.

Elmacik:
AzaToth, I didnt object its being optimal. I know it ;)
I just said, its not new

Navigation

[0] Message Index

[#] Next page

Go to full version