How to use Regex to exclude website prefixes?

Julius_2000 · January 03, 2024, 02:00:04 PM

Hi, I wasn't able to respond to this topic because it's been closed, so I open a new one.

I kind of struggling with the execution of the following idea: In the profile fields for social platforms like FB, Twitter etc, I would like to prevent people from typing in the entire URL to their profiles and only put in their names. While I do made it clear in the descriptions for the inputs, I also would like to use the Regex validation to prevent people from pasting the the https://sitename/ prefix.

How would I have to set up the pattern code for this?

Any help is much appreciated!

Sesquipedalian · January 03, 2024, 02:21:46 PM

I'm in a "teach a man to fish" mood today, so here you go: https://www.regular-expressions.info/tutorial.html

Julius_2000 · January 04, 2024, 06:27:40 AM

Thanks, Sesquipedalian for the effort of trying to help me help myself. I appreciate it.
However, the site is quite extensive and a bit overwhelming and even a bit confusing at times for the uninitiated.

I found a code here that seems working for my case.
I wanted to reject text that either includes the "https:" pattern or the "@" symbol.

Code Select

^((?!.*https:.*).)*$
^((?!.*(https:|@).*).)*$

But frankly, I would have never been able to do that with this tutorial site. I understand that the terms inside the inner bracket is a group but why the rest is structured like that, I have only a limited clue.

Arantor · January 04, 2024, 06:50:57 AM

?! means, in short terms, confirm that the next thing isn't "0 or more characters followed by https: followed by 0 or more characters", which is what you asked for - that the string doesn't contain https:

The second version is much the same except it is https: or @ that it must not match. The ^ and $ mean "starting from the start of the string and up to the end of the string" respectively.

Personally I'd flip the problem the other way around here and assert what a valid pattern looks like rather than an invalid one, e.g. instead of refusing https (because people can still put domain.com/username in which is just as invalid), I'd try to find the criteria for just the bit that you care about.

In most cases this is often boiled down to /^[a-z0-9]+$/i which is starting at the start followed by a-z or 0-9 some amount of times until the end of the string, and treat it case insensitively. After that, add in - or _ if the services accept that (many do) which gets you /^[a-z0-9\-_]+$/i (you need the \ before the minus to make it clear you actually want the minus and not to treat it as part of a range like a-z)

Julius_2000 · January 04, 2024, 07:37:00 AM

Quote from: Arantor on January 04, 2024, 06:50:57 AMIn most cases this is often boiled down to /^[a-z0-9]+$/i which is starting at the start followed by a-z or 0-9 some amount of times until the end of the string, and treat it case insensitively. After that, add in - or _ if the services accept that (many do) which gets you /^[a-z0-9\-_]+$/i (you need the \ before the minus to make it clear you actually want the minus and not to treat it as part of a range like a-z)

Thanks! What are the "/" at the start and before case-sensivity i for?

If I understand correctly what you write, this should match all letters and numbers including the special characters "-" and "_". But everything else should be excluded, like if somebody typed in an @ which is part of the Tiktok user url.

Unfortunately, when I use your example for Tiktok user name input:

Code Select

~/^[a-z0-9\-_]+$/i ~
and type in "test" in the name field, I get a match error.

Also, do I need to have a space between different ranges for example? In your example, the letter range and number range are "glued" together. Is that supposed to be or a typo?

Arantor · January 04, 2024, 07:42:44 AM

Drop the tildes.

When you interact with SMF's feature you have to give it delimiters - one to indicate the start of the regex, one to indicate the end, and modifiers (such as case insensitive) go after the end delimiter.

In your case, you're putting tildes around it so the whole thing gets treated as the expression which definitely won't match anything useful because you're not looking to match / in the content itself.

As for the ranges, no, there's no typo. The [] block indicates "any of these characters", and you want any of a-z, 0-9 (and maybe - and _) and the syntax for regex is that you just put them all in the [] block. If you want to match a space, you also put a space in it.

Julius_2000 · January 04, 2024, 07:52:30 AM

Quote from: Arantor on January 04, 2024, 07:42:44 AMDrop the tildes.

When you interact with SMF's feature you have to give it delimiters - one to indicate the start of the regex, one to indicate the end, and modifiers (such as case insensitive) go after the end delimiter.

In your case, you're putting tildes around it so the whole thing gets treated as the expression which definitely won't match anything useful because you're not looking to match / in the content itself.

Ah, ok. So I totally got it wrong. I thought this was required as per SMF's help popup description for input masks:

Quote"Delimiters marking the beginning and end of the pattern are required! They are tildes (~) in the examples below."

"~[A-Za-z]+~" - Match all upper and lower case alphabet characters.
"~[0-9]+~" - Match all numeric characters.

...

Thank you so much!

Steve · January 04, 2024, 09:07:44 AM

Marking solved for now unless you have further questions.

News:

How to use Regex to exclude website prefixes?