Need help with a Regular Expression for YouTube links....

Started by dougiefresh, February 16, 2015, 04:54:30 PM

Previous topic - Next topic

dougiefresh

I've got a mod called Yet Another YouTube BBCode Tag that I'm having problems with.  It's the regular expression that detects if the link passed is a valid YouTube link.  The code detects these URLs correctly:
http://youtu.be/fA4cphzsjn8
http://www.youtube.com/embed/fA4cphzsjn8
http://www.youtube.com/watch?v=fA4cphzsjn8
http://www.youtube.com/v/fA4cphzsjn8
http://www.youtube.com/e/fA4cphzsjn8
http://www.youtube.com/p/fA4cphzsjn8
http://www.youtube.com/?v=fA4cphzsjn8

but not these:
http://www.youtube.com/user/username#p/u/11/fA4cphzsjn8
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/0/fA4cphzsjn8
http://www.youtube.com/watch?feature=player_embedded&v=fA4cphzsjn8
http://www.youtube.com/?feature=player_embedded&v=fA4cphzsjn8


This is the current code that the next version of this mod will be using, which I copied and have modified from the Youtube I.D parsing for new URL formats page:

function parse_yturl($url)
{
$pattern = '#^(?:https?://)?';    # Optional URL scheme. Either http or https.
$pattern .= '(?:www\.)?';         #  Optional www subdomain.
$pattern .= '(?:';                #  Group host alternatives:
$pattern .=   'youtu\.be/';       #    Either youtu.be,
$pattern .=   '|youtube\.com';    #    or youtube.com
$pattern .=   '(?:';              #    Group path alternatives:
$pattern .=     '/e/';            #      or /e/,
$pattern .=     '|/embed/';       #      Either /embed/,
$pattern .=     '|/v/';           #      or /v/,
$pattern .=     '|/\?v=';         #      or /?v=
$pattern .=     '|/watch\?v=';    #      or /watch?v=,   
> $pattern .=     '|/\?.+&v=';      #      or /?v=
> $pattern .=     '|/watch\?.+&v='; #      or /watch?other_param&v=
$pattern .=   ')';                #    End path alternatives.
$pattern .= ')';                  #  End host alternatives.
$pattern .= '([\w-]{11})';        # 11 characters (Length of Youtube video ids).
$pattern .= '(?:.+)?$#x';         # Optional other ending URL parameters.
preg_match($pattern, $url, $matches);
return (isset($matches[1])) ? $matches[1] : false;
}

The lines starting with the > character are the parts of the regular expression that I can't get to work.  I admit that I'm not very good at regular expressions, but I'm hoping someone can help me out.

Thanks in advance for your assistance....

EDIT: The parts of the expression that start with > are supposed to filter out everything between ? and &v=.  Hope this helps someone....

karavan2

Hello dear dougiefresh, I've been watching your mod, and wrote in the topic further links that did not work http://www.simplemachines.org/community/index.php?topic=531060.msg3786649#msg3786649 My good friend and colleague Sapozhnik has introduced an interesting idea, which, with his permission, I will tell you a PM. Thank you for what you are doing such a good free mpd, you do SMF more simply and a nice in work.

kelvincool

These two already work for me   ???

http://www.youtube.com/watch?feature=player_embedded&v=fA4cphzsjn8
http://www.youtube.com/?feature=player_embedded&v=fA4cphzsjn8

There's no code to take care of these two is there?

http://www.youtube.com/user/username#p/u/11/fA4cphzsjn8
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/0/fA4cphzsjn8

Maybe like this?

$pattern .=     '|/user/.+\#.+/'; #      or /user/username#p/u/11/
$pattern .=     '|/.+\#.+/';      #      or /sandalsResorts#p/c/54B8C800269D7C1B/0/

dougiefresh

#3
Quote from: kelvincool on February 16, 2015, 06:54:27 PM
These two already work for me   ???

http://www.youtube.com/watch?feature=player_embedded&v=fA4cphzsjn8
http://www.youtube.com/?feature=player_embedded&v=fA4cphzsjn8
When tested on my localhost, my mod still reports that those are invalid links....  :-[  Keep in mind that the code I'm showing is the code I'm using....

Quote from: kelvincool on February 16, 2015, 06:54:27 PM
There's no code to take care of these two is there?

http://www.youtube.com/user/username#p/u/11/fA4cphzsjn8
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/0/fA4cphzsjn8

Maybe like this?

$pattern .=     '|/user/.+\#.+/'; #      or /user/username#p/u/11/
$pattern .=     '|/.+\#.+/';      #      or /sandalsResorts#p/c/54B8C800269D7C1B/0/

That takes care of those particular situations....  Thanks!

dougiefresh

@karavan2: Thanks for that piece of code you sent via PM.  I've integrated the relevant sections into the function and now it works!!  Thank you very much!

karavan2

Quote from: dougiefresh on February 17, 2015, 08:06:14 PM@karavan2: Thanks for that piece of code you sent via PM.  I've integrated the relevant sections into the function and now it works!!  Thank you very much!
I am very glad to help you. The idea of how to do it is owned by Sapozhnik, if you have questions, ask him to PM, it is very good specialist and humble man. Good Luck!

dougiefresh

Oops...  This topic isn't quite solved yet.  karavan2 notified me of another link that this regular expression doesn't cover.  It has to do with the contents of the URL that is AFTER the video ID.

For example, this isn't a valid YouTube URL according to the function given:
https://www.youtube.com/watch?feature=player_embedded&v=WlobIDJyyOs&x-yt-cl=85114404&x-yt-ts=1422579428
Right now, my function uses the following code as a last resort detection method:
// LAST RESORT: If we still have no result, detect the video ID by parsing the URL:
if (empty($result))
{
parse_str(parse_url(str_replace('&', '&', $url), PHP_URL_QUERY), $out);
$result = (isset($out['v']) ? 'v/' . $out['v'] : false);
}

which does get the v variable from the URL, but I'm wondering if there is someway to rewrite the expression to filter that out....

The line in the expression that deals with this is marked with a >:
$pattern .= '([\w-]{11})';        # 11 characters (Length of Youtube video ids).
> $pattern .= '(?:.+)?$#x';         # Optional other ending URL parameters.

Any ideas?  Assistance is much appreciated!  Thanks in advance!

Advertisement: