News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Where is autolinking function?

Started by blamed, September 27, 2012, 03:03:01 PM

Previous topic - Next topic

blamed

I have found some piece of code in Subs.php(in parce_bbc function) that do autolinking, but it seems it is only doing it for preview.

So, I figured that it may be in Post, or Subs_post.php, but no luck.

So, where is actual autlinking is happening?

Arantor

No, it primarily does it in parse_bbc(). What exactly do you want to know about it for? (Depending on what you're trying to do, there are likely multiple better ways than editing the code directly)

blamed

I want to change it. I dont like default regexp, it does not include url's without www or http, it does include punctuation at the end of the url, it does not work with non-latin characters.

In parce_bbc i found this

// Parse any URLs.... have to get rid of the @ problems some things cause... stupid email addresses.
if (!isset($disabled['url']) && strpos($data, '@') == false && strpos($data, '[url') === false)
{
// Switch out quotes really quick because they can cause problems.
$data = strtr($data, array(''' => '\'', '&nbsp;' => $context['utf8'] ? "\xC2\xA0" : "\xA0", '&quot;' => '>">', '"' => '<"<', '&lt;' => '<lt<'));

// Only do this if the preg survives.
if (is_string($result = preg_replace(array(
'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:https?://)(?:www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',
'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:ftps?://)(?:www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',
'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',
), array(
'[url]$1[/url]',
'[ftp]$1[/ftp]',
'[url=http://$1]$1[/url]',
), $data)))
$data = $result;

$data = strtr($data, array('\'' => ''', $context['utf8'] ? "\xC2\xA0" : "\xA0" => '&nbsp;', '>">' => '&quot;', '<"<' => '"', '<lt<' => '&lt;'));

}


as for now i changed regexp and now it look like this:

'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:https?://)(?:www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',
'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:ftps?://)(?:www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',
'~(?<=[\s>\.(;\'"]|^)(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»""'']))~ui',


And everything looks exactly as I wanted, except it's changed only in preview.

Actually I even deleted this whole block and nothing changes(except for preview of course). So it definitely not the code i'am looking for.. 

Arantor

QuoteSo it definitely not the code i'am looking for..

Which is why I asked you what you wanted it for. Because now I know that, I know that it will be handled elsewhere.

Quoteit does not include url's without www or http

There's a reason for that - it makes the whole process so much more unreliable. Apart from potentially opening up security holes in the URL checking subsystem, you will very likely get things that aren't links being highlighted as such.

I don't know what your community is like, but if it's *anything* like other communities, most URLs are simply posted by people, where the browsers already include http anyway.

This makes me wonder if you're trying to do something else like introduce a different scheme (like rsps:// like one person I know)

blamed

non http and www is optional.

My main concern is punctuation at the end of the url(sometimes it's included in the url, and sometimes not..)
and non-latin characters in the url
like this:
http://ru.wikipedia.org/wiki/Вики [nofollow]

hmm, I guess I have to low posts to post links or something)

Arantor

There are two reasons it's not optional for *automatic* linking: reliability (too many things will get linked that shouldn't be linked) and performance (you're taking the SINGLE LARGEST part of SMF, over 1500 lines in total and potentially making it an order of magnitude slower)

Quoteand non-latin characters in the url

SMF does not support non-Latin domains for auto linking, and changing that is complicated as it is also dependent on your server's configuration too. It might be changed in the future.

QuoteMy main concern is punctuation at the end of the url

Such as?

blamed

Quote from: Arantor on September 27, 2012, 03:56:09 PM
SMF does not support non-Latin domains for auto linking, and changing that is complicated as it is also dependent on your server's configuration too. It might be changed in the future.
I have vds, so I have control over the server configuration, I just need to know where the magic of autolinking is happening)

Quote from: Arantor on September 27, 2012, 03:56:09 PM
Such as?
on my forum links such as this is "broken"
http://forum.loc/index.php/topic,187.new.html#new.
it results in this:
[url=http://forum.loc/index.php/topic,187.new.html#new.]http://forum.loc/index.php/topic,187.new.html#new.[/url]

I dont know, maybe it's a local bug caused by some of the mods, but to fix it I need to know where and how links are autolinked)

Kindred

well...   a url with .new.html like that is not actually a valid URL by standards.

So, I'd say the problem is whatever silly html re-write you are using (I am assuming that you think adding the html makes your site more SEF? - note, it does not)
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

That isn't broken. That's doing EXACTLY what it's supposed to be doing: taking the supplied URL and converting it to a URL tag. It has no way of knowing if the URL is correct or not.

Try turning off queryless URLs elsewhere in the admin panel first.

The rest of it is in Subs-Post.php, particularly in preparsecode() but also all the stuff related to fixing tags. You see, I want to help but I find it frustrating that every time I ask what you're trying to achieve, I seem to get a slightly different answer which makes it very hard to help.

@Kindred, yes it is. There is absolutely nothing in the spec about it. And the option that creates those is built into SMF itself, it's the queryless URLs option.

Kindred

well....  it's a piece of junk...   no wonder I never bothered with the sef urls (in smf or out of it)


I swear that there is some sort of w3 standard for urls which that violates...   ah well, I suspect that you are right and I'm just getting old. LOL
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

blamed

Quote from: Kindred on September 27, 2012, 04:20:49 PM
well...   a url with .new.html like that is not actually a valid URL by standards.

So, I'd say the problem is whatever silly html re-write you are using (I am assuming that you think adding the html makes your site more SEF? - note, it does not)

Links like this http://www.simplemachines.org/community/index.php?topic=487305.0. are broken too... so I guess rewrites isn't causing it, but thanks, I'll look into it.

Arantor


Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

blamed

Quote from: Arantor on September 27, 2012, 04:24:02 PM
You see, I want to help but I find it frustrating that every time I ask what you're trying to achieve, I seem to get a slightly different answer which makes it very hard to help.

Sorry, I'am trying my best :\ Also my English isn't that good yet.

I'am trying to achieve two things.
1) autolinking non-latin characters in url's.(I get it, it is complicated or not possible at all currently)
Maybe at least not autolinking urls with non-latin characters at all then.
2) Do not include punctuation in urls. I get it, my urls is probably not correct. I'll try to turn off queryless URLs, thanks.

Arantor

Quote2) Do not include punctuation in urls. I get it, my urls is probably not correct. I'll try to turn off queryless URLs, thanks.

Except that some punctuation is absolutely necessary for it to work.

blamed

Quote from: Arantor on September 27, 2012, 04:34:19 PM
Define broken.

It is not exactly broken, it just include trailing dot into url... as I said, I dont know, maybe it's local bug.

I'll try to do some stuff you suggested, and report back)

QuoteExcept that some punctuation is absolutely necessary for it to work.

Sorry. I mean - do not include trailing punctuation.

emanuele

http://forum.loc/index.php/topic,187.new.html#new.

Some text http://forum.loc/index.php/topic,187.new.html#new.


Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

emanuele

From quick reply direct posting: http://forum.loc/index.php/topic,187.new.html#new.


Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

emanuele

WYSIWYG on: http://forum.loc/index.php/topic,187.new.html#new.


Using button: http://forum.loc/index.php/topic,187.new.html#new. <= here it is (ETA: but that's not autolinking)


Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Arantor

And if you press the button, it's NOT autolinked, is it?

Advertisement: