Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: CMBurns on March 30, 2019, 03:31:57 PM

Title: [SOLVED]German Umlaut in Hyperlinks
Post by: CMBurns on March 30, 2019, 03:31:57 PM
I haven't found anything related to that question and I have no idea how to setup SMF to get rid of that issue: Since some time you are able to register domains containing some strange characters that we call Umlaut (äöü).

When a Umlaut-URL is pasted to our forum SMF the URL is truncated right before the first Umlaut. Any chance to change that behavior?

http://www.fc-köln.de

edit: As it isn't working here either I assume the chances are not very high.
Title: Re: German Umlaut in Hyperlinks
Post by: CMBurns on March 30, 2019, 04:18:55 PM
Found it  :)

Subs.php

before

if (is_string($result = preg_replace(array(
'~(?<=[\s>\.(;\'"]|^)((?:http|https)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>\.(;\'"]|^)((?:ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_]+)+(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i' */



after

if (is_string($result = preg_replace(array(
'~(?<=[\s>\.(;\'"]|^)((?:http|https)://[\w\-_%@:|äöü]+(?:\.[\w\-_%äöü]+)*(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>\.(;\'"]|^)((?:ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%äöü]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
'~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_äöü]+)+(?::\d+)?(?:/[\w\-_\~%\.@!,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i'
Title: Re: [SOLVED]German Umlaut in Hyperlinks
Post by: GigaWatt on March 30, 2019, 06:20:15 PM
The browser should take care of that when copying the URL from the address bar, converting non-standard Latin (or Cyrillic) characters into %XX where X is a hex numeral. It's not the forum's job to take care of that.
Title: Re: [SOLVED]German Umlaut in Hyperlinks
Post by: Sesquipedalian on April 02, 2019, 10:33:54 AM
Actually, this is a known issue in SMF 2.0's automatic URL linking functionality, which was created before URLs with international  characters were common. But SMF 2.1's auto-linker has full support for IRIs (i.e. URLS with international characters). So the best way to solve CMBurns' issue is to upgrade to SMF 2.1. On SMF 2.0, the workaround is to manually wrap the URL in [url] BBCode tags.
Title: Re: [SOLVED]German Umlaut in Hyperlinks
Post by: GigaWatt on April 03, 2019, 07:30:12 PM
OK... but, I don't get it. By automatic linking you're referring to the way every URL pasted in a post is automatically converted to a hyperlink, right? How come this has never happened to me or to members of my forum?

Hmmm... I see what you mean. I tried copy/pasting http://кто.рф/ here and no, the browser doesn't convert the Cyrillic characters to %XX. Hmmm... this could also be considered as a bug in Pale Moon, since I know for sure that it did used to convert Cyrillic characters in URLs but only if the domain was a pure Latin one and the rest of the URL was in Cyrillic (example: http://something.tld/нешто/нешто-повеќе/уште-нешто). Yep, I can see that it doesn't convert this URL as well... maybe it only works with copy/paste ???. Test: https://plusinfo.mk/%D0%B7%D0%B0%D0%B5%D0%B2-%D0%BA%D0%B0%D1%82%D0%B8%D1%86%D0%B0-%D0%B8-%D1%81%D1%98%D0%BE-%D1%98%D0%B0-%D0%B2%D1%80%D0%B0%D1%82%D0%B8%D1%98%D0%B0-%D0%BD%D0%B0%D0%B4%D0%B5%D0%B6%D1%82%D0%B0-%D0%B7%D0%B0/ ... yep, works with copy/paste but only if the TLD is in Latin ::).
Title: Re: [SOLVED]German Umlaut in Hyperlinks
Post by: GigaWatt on April 03, 2019, 07:50:19 PM
Posted on the Pale Moon forum about this as well ;).
Title: Re: [SOLVED]German Umlaut in Hyperlinks
Post by: GigaWatt on April 05, 2019, 05:50:49 PM
Update: One of the users on the Pale Moon forum told me that this is expected behavior: not translating nonstandard Latin or Cyrillic characters in TLDs to %XX. It's more convenient to see the TLD as it is, apposed to seeing it as a string of %XX characters (which would be similar to typing and IP address in the address bar), so that's why most browsers don't actually convert the TLDs to %XX (though this is allowed, just not commonly used) when you copy/paste an URL. But, yes, on the other hand, if the URL contains other nonstandard Latin or Cyrillic text (don't know if Arabic, Chinese, Japanese, etc. is allowed ???), everything except the TLD is converted to %XX.