[2.1 RC3] "Automatically link posted URLs" setting does not work

Started by davidhs, February 23, 2021, 03:13:26 PM

Previous topic - Next topic

davidhs

Automatically link posted URLs setting (Administration Center > Configuration > Features and Options > Bulletin Board Code) does not work in SMF 2.1 RC3 and previous (until 2.1 Beta 3). In 2.1 Beta 2 works.

In 2.1 Beta 3, regular expresion of URLs without BBC was change, so I suppose the problem is this regular expresion.

In 2.1 RC3 the file is:
Code (Souces\Subs.php (line 2340)) Select
<?php

if (!empty($modSettings['autoLinkUrls']))
{
// Are we inside tags that should be auto linked?
$no_autolink_area false;
if (!empty($open_tags))
{
foreach ($open_tags as $open_tag)
if (in_array($open_tag['tag'], $no_autolink_tags))
$no_autolink_area true;
}

// Don't go backwards.
// @todo Don't think is the real solution....
$lastAutoPos = isset($lastAutoPos) ? $lastAutoPos 0;
if ($pos $lastAutoPos)
$no_autolink_area true;
$lastAutoPos $pos;

if (!$no_autolink_area)
{
// An &nbsp; right after a URL can break the autolinker
if (strpos($data'&nbsp;') !== false)
{
$placeholders['<placeholder non-breaking-space>'] = '&nbsp;';
$data strtr($data, array('&nbsp;' => '<placeholder non-breaking-space>'));
}

// Parse any URLs
if (!isset($disabled['url']) && strpos($data'[url') === false)
{
// For efficiency, first define the TLD regex in a PCRE subroutine
$url_regex '(?(DEFINE)(?<tlds>' $modSettings['tld_regex'] . '))';

// Now build the rest of the regex
$url_regex .=
// 1. IRI scheme and domain components
'(?:' .
// 1a. IRIs with a scheme, or at least an opening "//"
'(?:' .

// URI scheme (or lack thereof for schemeless URLs)
'(?:' .
// URL scheme and colon
'\b[a-z][\w\-]+:' .
// or
'|' .
// A boundary followed by two slashes for schemeless URLs
'(?<=^|\W)(?=//)' .
')' .

// IRI "authority" chunk
'(?:' .
// 2 slashes for IRIs with an "authority"
'//' .
// then a domain name
'(?:' .
// Either the reserved "localhost" domain name
'localhost' .
// or
'|' .
// a run of IRI characters, a dot, and a TLD
'[\p{L}\p{M}\p{N}\-.:@]+\.(?P>tlds)' .
')' .
// followed by a non-domain character or end of line
'(?=[^\p{L}\p{N}\-.]|$)' .

// or, if no "authority" per se (e.g. "mailto:" URLs)...
'|' .

// a run of IRI characters
'[\p{L}\p{N}][\p{L}\p{M}\p{N}\-.:@]+[\p{L}\p{M}\p{N}]' .
// and then a dot and a closing IRI label
'\.[\p{L}\p{M}\p{N}\-]+' .
')' .
')' .

// Or
'|' .

// 1b. Naked domains (e.g. "example.com" in "Go to example.com for an example.")
'(?:' .
// Preceded by start of line or a non-domain character
'(?<=^|[^\p{L}\p{M}\p{N}\-:@])' .
// A run of Unicode domain name characters (excluding [:@])
'[\p{L}\p{N}][\p{L}\p{M}\p{N}\-.]+[\p{L}\p{M}\p{N}]' .
// and then a dot and a valid TLD
'\.(?P>tlds)' .
// Followed by either:
'(?=' .
// end of line or a non-domain character (excluding [.:@])
'$|[^\p{L}\p{N}\-]' .
// or
'|' .
// a dot followed by end of line or a non-domain character (excluding [.:@])
'\.(?=$|[^\p{L}\p{N}\-])' .
')' .
')' .
')' .

// 2. IRI path, query, and fragment components (if present)
'(?:' .

// If any of these parts exist, must start with a single "/"
'/' .

// And then optionally:
'(?:' .
// One or more of:
'(?:' .
// a run of non-space, non-()<>
'[^\s()<>]+' .
// or
'|' .
// balanced parentheses, up to 2 levels
'\(([^\s()<>]+|(\([^\s()<>]+\)))*\)' .
')+' .
// Ending with:
'(?:' .
// balanced parentheses, up to 2 levels
'\(([^\s()<>]+|(\([^\s()<>]+\)))*\)' .
// or
'|' .
// not a space or one of these punctuation characters
'[^\s`!()\[\]{};:\'".,<>?«»""''/]' .
// or
'|' .
// a trailing slash (but not two in a row)
'(?<!/)/' .
')' .
')?' .
')?';

$data preg_replace_callback('~' $url_regex '~i' . ($context['utf8'] ? 'u' ''), function($matches)
{
$url array_shift($matches);

// If this isn't a clean URL, bail out
if ($url != sanitize_iri($url))
return $url;

$scheme parse_url($urlPHP_URL_SCHEME);

if ($scheme == 'mailto')
{
$email_address str_replace('mailto:'''$url);
if (!isset($disabled['email']) && filter_var($email_addressFILTER_VALIDATE_EMAIL) !== false)
return '[email=' $email_address ']' $url '[/email]';
else
return $url;
}

// Are we linking a schemeless URL or naked domain name (e.g. "example.com")?
if (empty($scheme))
$fullUrl '//' ltrim($url':/');
else
$fullUrl $url;

// Make sure that $fullUrl really is valid
if (validate_iri((strpos($fullUrl'//') === 'http:' '') . $fullUrl) === false)
return $url;

return '[url=&quot;' str_replace(array('['']'), array('&#38;#91;''&#38;#93;'), $fullUrl) . '&quot;]' $url '[/url]';
}, $data);
}

?>

(line 2463 has UTF-8 characters)

For example, if I write this:
http://www.simplemachines.org1/
https://www.simplemachines.org2/
ftp://www.simplemachines.org3/
ftps://www.simplemachines.org4/
[url]http://www.simplemachines.org5/[/url]
[url=http://www.simplemachines.org6/]Home of SMF6[/url]


I see this:
Quote from: 2.1 Beta 2 and previoushttp://www.simplemachines.org1/
https://www.simplemachines.org2/
ftp://www.simplemachines.org3/
ftps://www.simplemachines.org4/
http://www.simplemachines.org5/
Home of SMF6
Quote from: 2.1 RC3http://www.simplemachines.org1/ -- without link
https://www.simplemachines.org2/ -- without link
ftp://www.simplemachines.org3/ -- without link
ftps://www.simplemachines.org4/ -- without link
http://www.simplemachines.org5/
Home of SMF6

shawnb61

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

shawnb61

Two layers to the issue, I think...

One is it doesn't appear to like invalid domain extensions.  E.g., .org1 is not valid.  That is likely a good check - it's not a valid URL.

The other is that it can sometimes get confused when there are encoded special characters at the very end, as in the original bug report.  That can fail on valid URLs.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

davidhs

Quote from: shawnb61 on February 23, 2021, 03:21:34 PM
Thanks for the report.

Yes, known issue, it's a dupe of:
https://www.simplemachines.org/community/index.php?topic=576638

And it is up on GitHub as:
https://github.com/SimpleMachines/SMF2.1/issues/6497
:o I am sorry, I search before post it but I did not find.




Quote from: shawnb61 on February 23, 2021, 03:32:17 PM
Two layers to the issue, I think...

One is it doesn't appear to like invalid domain extensions.  E.g., .org1 is not valid.  That is likely a good check - it's not a valid URL.

The other is that it can sometimes get confused when there are encoded special characters at the very end, as in the original bug report.  That can fail on valid URLs.
:o Again...  :-[

I use these not valid URL (really not valid domain) in a test of one mine mod (this mod searches urls in post and writes all at begin or at end of post. Until this moment this worked, so i did not think the problem was the validity of URL.

I test now with
http://www.simplemachines.org/
https://www.simplemachines.org/
ftp://www.simplemachines.org/
ftps://www.simplemachines.org/
http://www.simplemachines1.org/
https://www.simplemachines2.org/
ftp://www.simplemachines3.org/
ftps://www.simplemachines4.org/

and works
Quotehttp://www.simplemachines.org/
https://www.simplemachines.org/
ftp://www.simplemachines.org/
ftps://www.simplemachines.org/
http://www.simplemachines1.org/
https://www.simplemachines2.org/
ftp://www.simplemachines3.org/
ftps://www.simplemachines4.org/
(I must modify my test with valid domain!)

But... there is an inconsistency: if I use url/iurl BBC works always (with valid domain and with not valid domain).

shawnb61

BBCs in general don't do edits on content...   You can do all kinds of illogical things with BBC.

But if someone put it there on purpose, SMF shouldn't second-guess them. 

OTOH, the auto-link is put there by SMF. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: