News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Automatic Link Posted URLs vs URL BBCode Tag

Started by Paracelsus, April 26, 2009, 10:44:47 AM

Previous topic - Next topic

Paracelsus

Hi,

The search wasn't helpful so I ask you guys here... this question came up in my forum about URLs:

When using "Automatic Link Posted URLs" this doesn't behave in the same way as using the URL Tag (BBCode)... let's see with an example:

With "Automatic Link Posted URLs" (which are activated here in this forum):
http://pt.wikipedia.org/wiki/Namíbia

With url bbcode tag:
http://pt.wikipedia.org/wiki/Namíbia


As you can see, the second link works fine but not the first one due to the character "í" in the link. What I would like to know is if there's some coding tweak for "Automatic Link Posted URLs" so that these kind of links behave the same way as when using the BBCode url tag.

karlbenson

Its because the í is meant to be url encoded.

Paracelsus

Quote from: regularexpression on April 26, 2009, 10:50:52 AM
Its because the í is meant to be url encoded.

Ok, but is there a way to include this encoding in the automatic link posted urls?

karlbenson

Currently there isn't a way to change it.

We have a bug report on our tracker, but it has been set for SMF 2.1 not 2.0 because its going to require alot of work.
http://dev.simplemachines.org/mantis/view.php?id=2449

Technically wikipedia is in the wrong for using it in the urls without being encoded according the url specifications.


MrPhil

Quote from: regularexpression on April 26, 2009, 06:25:49 PM
Technically wikipedia is in the wrong for using it in the urls without being encoded according the url specifications.

Interesting. So, URIs should be ASCII only, with anything else being Latin-1 encoded, and certain reserved characters (space, /, $, etc.) within ASCII must also be encoded. This link lists what's going on. Apparently anything other than letters, digits, and a few punctuation characters needs to be encoded (%nn) in Latin-1.

Where does this leave people who want to encode arbitrary text in a character set that's not Western European (Latin-1/ISO-8859-1)? Is there an officially supported way to put arbitrary Unicode (or other character sets) into a URI (including URL Query Strings)? What is the current status of non-Latin text in domain names and directories, as well as Query Strings? When googling for information, I see entries with URIs that are apparently Unicode, but I'm not sure how official it is.

Advertisement: