News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Replacing regex with preg_replace

Started by spiros, December 06, 2020, 08:37:29 AM

Previous topic - Next topic

spiros

I am trying to change the Titled Links mod (see request here: https://www.simplemachines.org/community/index.php?topic=575738.0), so that if a link is internal (forum or domain) then instead of adding url tags, add iurl tags.

So I added this:

$title = preg_replace('(\[url=https:\/\/www.translatum\.gr)(.*?)(\[\/url\])', '[iurl=https://www.translatum.gr$2[/iurl]', $title);

After
$title = str_replace('&', '&', $title);

But the output was something like:
[iurl=https://www.translatum.gr/forum/index.php?topic=988118.0][iurl]

// Time to do some cool stuff with URL's
if (!$previewing && $modSettings['convert_urls'])
{

// We're gonna save the old socket timeout, reduce it down to 3 seconds, so we prevent this from hanging up
// We are supressing the errors in case the user cannot change this value on their server.
$timeout = ini_get('default_socket_timeout');
@ini_set('default_socket_timeout',3);

$message = preg_replace(array('~(?<=[\s>\.(;\'"]|^)((?:https)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#+:\'\\\\]*|[\(\{][\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i', '~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_]+)+(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#+:\'\\\\]*|[\(\{][\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i'), array('[iurl]$1[/iurl%]', '[iurl]$1[/iurl%]'), $message);

// Now we find the urls that we just changed, so we can run through them and get titles
preg_match_all("~\[iurl\](.+?)\[/iurl%\]~smi", $message, $urls);

if (!empty($urls[0]))
{

// Lets make a counter so we don't exceed the settings...
$title_counter = 0;
foreach($urls[1] as $uri)
{
// If our counter has exceeded the amount, replace the remaining urls back to what they were and get outta dodge.
if(!empty($modSettings['title_url_count']) && $title_counter++ >= $modSettings['title_url_count'])
{
$message = preg_replace('~\[iurl\]' . $uri . '\[/iurl%\]~', $uri, $message);
continue;
}

// Attach http:// to the url...
$uri_modified = strpos($uri, 'https://') === false ? 'https://' . $uri : $uri;

// Use the @ to suppress errors from the function not finding a url, which will still return false if
// there is an error.  This just prevents the error log from filling up without cause.
// In 2.0, we have fetch_web_data() which makes our lives easier :)
require_once($sourcedir . '/Subs-Package.php');
$request = @fetch_web_data($uri_modified);

if ($request !== false && preg_match('~<title>(.+?)</title>~smi', $request, $matches))
{
$title = $smcFunc['htmlspecialchars'](stripslashes($matches[1]), ENT_QUOTES);
// Need to fix the &amp;amp;
// $title = str_replace('&amp;amp;', '&amp;', $title);
// Spiros, fixing ->
$title = str_replace('&amp;', '&', $title);
// w00t!  Changin the link to titlize it (is that word?)
$message = str_replace('[iurl]' . $uri . '[/url%]', '[url=' . $uri_modified . ']' . $title . '[/url]', $message);
}

// Looks like we couldn't get the title, darn.  Back to the original we go...
else
$message = str_replace('[url]' . $uri . '[/url%]', $uri, $message);
}
}
// Change it back to what it was.  Suppress again incase...
@ini_set('default_socket_timeout',$timeout);

shawnb61

What were you expecting it to do different?  Looks like it did what it was told...  (Missing a / at the end but I think that was a typo...)

A couple of nits, but you probably wanted to escape the first '.', and you didn't really need the 1st & 3rd capturing groups. 

Tools like this are very helpful for test-driving regex: https://regex101.com/
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: