Recognizing protocols in URLs

Started by MrPhil, June 16, 2013, 01:32:41 PM

Previous topic - Next topic

MrPhil

From time to time, people complain here (e.g., http://www.simplemachines.org/community/index.php?topic=505934.msg0) that they give a "nonstandard" protocol such as "steam://" (whatever that is) and SMF insists on shoving an http:// in front of the URL. I suggest that wherever SMF is checking for a full URL (protocol://domain/path), that it accept any protocol explicitly given by the user, rather than checking just for http:// and https://. The regexp for this would be something like ([a-zA-Z]+)://. I don't know if any legitimate protocols include non-alphabetics. If no protocol is given, add http://. Are there any known harmful protocols that should be excluded, either always or at the discretion of the forum owner?

If this is implemented, it would probably be best done as $url = check_protocol($url, 'http', array(exclude1, exclude2)); rather than hard coding it over and over in SMF code. You would have the original URL as input, the default protocol to use, and any protocols to forbid (or a set of allowed protocols). Maybe the allowed/forbidden list could be array('A', 'http', 'https', 'ftp') or array('F', 'steam') or array('U') style. That would be explicitly allowed, forbidden, or a user-defined (forum admin) list of allowed or forbidden protocols, stored in the database.

Arantor

Just to answer this one, this was actually introduced way back because there are URLs that are NOT safe to implement in the first place, most notably javascript: URLs, though there is also an argument that data: URLs are not safe either.

It would not be impossible to expand the list of possible syntaxes but it's sufficiently rare I'd rather not it be a core feature.

For now I'm going to move this to declined, but if anyone disagrees I'll re-evaluate it.

Advertisement: