News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Preg_match Pattern for urls.

Started by Nathaniel, June 23, 2008, 04:01:30 AM

Previous topic - Next topic

Nathaniel

Basically I am working with a urls in a mod and I need a fairly simple way of checking to see if a string is a url or not. I am currently making the url have "http://" or "https://" because that seems like the best way of checking for a url. I am then just making sure that there is a full stop and that there is some content before and after it, so that it has to have a domain.

I have created this preg_match call and pattern.
preg_match('/(http(s)?:\/\/)(www\.)?(.)*[\.](.)*$/i', $string);

I am new to this function so I had a look on the web, but all of the patterns that I found for urls are rubbish because they are either too complicated or they just don't work for the cases that I want them too.

I am interested to get the opinion of some PHP coders who have more experience with this function, do you think that it needs to be more complicated or is simple better?
SMF Friend (Former Support Specialist) | SimplePortal Developer
My SMF Mods | SimplePortal

"Quis custodiet ipsos custodes?" - Who will Guard the Guards?

Please don't send me ANY support related PMs. I will just delete them.

[SiNaN]

LHVWB, I was also planing to start such a topic.

I'd be also very glad if somebody can give a detailed information about this pattern issue.
Former SMF Core Developer | My Mods | SimplePortal

[SiNaN]

Former SMF Core Developer | My Mods | SimplePortal

Nathaniel

Hmm, I already know how to create fairly simple patterns for the preg_match() function, I learnt it from a useful tutorial that I know.

I found some uses of the preg_replace() function in relation to urls within SMF 2 Beta, on lines 1760 and 1670 of Subs.php but they seem hideously over-complicated.

Thank you for the reference anyway [SiNaN], it will be a very good guide to help me to learn some more about the patterns anyway.
SMF Friend (Former Support Specialist) | SimplePortal Developer
My SMF Mods | SimplePortal

"Quis custodiet ipsos custodes?" - Who will Guard the Guards?

Please don't send me ANY support related PMs. I will just delete them.

karlbenson


preg_match('~^(http|ftp)(s)?\:\/\/((([a-z0-9]{1,25})(\.)?){2,7})($|/.*$)~i', $string)

Should hopefully detect
http://sub.sub.sub.sub.stupiddomains.co.cn
http://youtube.com
http://www.youtube.com/etc

and reject
http://[email protected]

Nathaniel

Thank you for the pattern karlbenson. I have however found a few errors with it that I believe that I have fixed, here is the updated pattern.

preg_match('~^(http|ftp)(s)?\:\/\/((([a-z0-9\-]*)(\.))+[a-z0-9]*)($|/.*$)~i', $string)

I liked yours so this is based off it, the errors which I fixed included:

  • No hyphins allowed in the domain name.
  • Possiblity of "http://youtube" being valid even though it has no tld, this happened because of the "?" in the "(\.)?".
  • Removed the character number restrictions because they caused some odd errors.

I took out the restrictions of the number of characters as well because I am wondering if they are necessary ( "{1,25}"). Are there actually restrictions on the number of charactors that a domain name can have? I also did the same thing for the "{2,7}", because I was unsure if there is actually a limit to the number of subdomains and tlds that you can have.
SMF Friend (Former Support Specialist) | SimplePortal Developer
My SMF Mods | SimplePortal

"Quis custodiet ipsos custodes?" - Who will Guard the Guards?

Please don't send me ANY support related PMs. I will just delete them.

karlbenson

There isn't a 'perfect' regex for domains.
There are always conditions/types that might not match your regex.  Thats why they start getting very very long and complicated to fit different conditions.

In trying to fix the couple of issues with mine, you've actually added more errors.
I'll see if i can have another go later.

I can fix the repetition and lengths to make it unlimited.

Nathaniel

Yeah, I understand that there definitely isn't a perfect pattern, and there probably never will be....

I am just trying to find a slightly better one, the one from my last post seems fine though, I can't find any major errors. Thanks for the help anyway. :D
SMF Friend (Former Support Specialist) | SimplePortal Developer
My SMF Mods | SimplePortal

"Quis custodiet ipsos custodes?" - Who will Guard the Guards?

Please don't send me ANY support related PMs. I will just delete them.

Nathaniel

* Pseudo Bump *

Any luck with fixing those errors karlbenson?
SMF Friend (Former Support Specialist) | SimplePortal Developer
My SMF Mods | SimplePortal

"Quis custodiet ipsos custodes?" - Who will Guard the Guards?

Please don't send me ANY support related PMs. I will just delete them.

Artur

Sorry for bumping this topic, but because this is one of the first google results i want to add a correction (or an add)

Quote from: karlbenson on June 23, 2008, 02:00:29 PM
preg_match('~^(http|ftp)(s)?\:\/\/((([a-z0-9]{1,25})(\.)?){2,7})($|/.*$)~i', $string)

the above code has a little flaw i found. Domains with a - (minus) in it, aren't detected as urls.

preg_match('~^(http|ftp)(s)?\:\/\/((([a-z|0-9|\-]{1,25})(\.)?){2,7})($|/.*$)~i', $string)

but this one works fine with - in the url :)

Advertisement: