[SOLVED]German Umlaut in Hyperlinks

Started by CMBurns, March 30, 2019, 03:31:57 PM

Previous topic - Next topic

CMBurns

I haven't found anything related to that question and I have no idea how to setup SMF to get rid of that issue: Since some time you are able to register domains containing some strange characters that we call Umlaut (äöü).

When a Umlaut-URL is pasted to our forum SMF the URL is truncated right before the first Umlaut. Any chance to change that behavior?

http://www.fc-köln.de [nofollow]

edit: As it isn't working here either I assume the chances are not very high.

CMBurns

Found it  :)

Subs.php

before

if (is_string($result = preg_replace(array(
'~(?<=[\s>\.(;\'"]|^)((?:http|https)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>\.(;\'"]|^)((?:ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_]+)+(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i' */



after

if (is_string($result = preg_replace(array(
'~(?<=[\s>\.(;\'"]|^)((?:http|https)://[\w\-_%@:|äöü]+(?:\.[\w\-_%äöü]+)*(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>\.(;\'"]|^)((?:ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%äöü]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_äöü]+)+(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i'

GigaWatt

The browser should take care of that when copying the URL from the address bar, converting non-standard Latin (or Cyrillic) characters into %XX where X is a hex numeral. It's not the forum's job to take care of that.
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

Sesquipedalian

Actually, this is a known issue in SMF 2.0's automatic URL linking functionality, which was created before URLs with international  characters were common. But SMF 2.1's auto-linker has full support for IRIs (i.e. URLS with international characters). So the best way to solve CMBurns' issue is to upgrade to SMF 2.1. On SMF 2.0, the workaround is to manually wrap the URL in [url] BBCode tags.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

GigaWatt

OK... but, I don't get it. By automatic linking you're referring to the way every URL pasted in a post is automatically converted to a hyperlink, right? How come this has never happened to me or to members of my forum?

Hmmm... I see what you mean. I tried copy/pasting http://кто.рф/ here and no, the browser doesn't convert the Cyrillic characters to %XX. Hmmm... this could also be considered as a bug in Pale Moon, since I know for sure that it did used to convert Cyrillic characters in URLs but only if the domain was a pure Latin one and the rest of the URL was in Cyrillic (example: http://something.tld/нешто/нешто-повеќе/уште-нешто). Yep, I can see that it doesn't convert this URL as well... maybe it only works with copy/paste ???. Test: https://plusinfo.mk/%D0%B7%D0%B0%D0%B5%D0%B2-%D0%BA%D0%B0%D1%82%D0%B8%D1%86%D0%B0-%D0%B8-%D1%81%D1%98%D0%BE-%D1%98%D0%B0-%D0%B2%D1%80%D0%B0%D1%82%D0%B8%D1%98%D0%B0-%D0%BD%D0%B0%D0%B4%D0%B5%D0%B6%D1%82%D0%B0-%D0%B7%D0%B0/ ... yep, works with copy/paste but only if the TLD is in Latin ::).
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

GigaWatt

Posted on the Pale Moon forum about this as well ;).
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

GigaWatt

Update: One of the users on the Pale Moon forum told me that this is expected behavior: not translating nonstandard Latin or Cyrillic characters in TLDs to %XX. It's more convenient to see the TLD as it is, apposed to seeing it as a string of %XX characters (which would be similar to typing and IP address in the address bar), so that's why most browsers don't actually convert the TLDs to %XX (though this is allowed, just not commonly used) when you copy/paste an URL. But, yes, on the other hand, if the URL contains other nonstandard Latin or Cyrillic text (don't know if Arabic, Chinese, Japanese, etc. is allowed ???), everything except the TLD is converted to %XX.
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

Advertisement: