Programmer Gurus, need Regular Expression help

[ARCHIVED THREAD] - Programmer Gurus, need Regular Expression help

Posted: 11/2/2001 12:34:34 PM EDT

I've been writing code for quite a while but never got around to learning regular expressions, until today. Had to write some code to validate a domain name as being properly formatted, and figured I'd learn how to do it with reg exp.

Here's my pattern: [b][a-zA-Z0-9]\.com|net|org$[/b]

Which I understand as "match until you don't find a letter or number, then you must find a dot/period then com, net or org". Problem is that I can pass it stuff like a&a.com and it matches, and I can't figure it out.

Any help would be, well, helpfull. [:)]

Posted: 11/2/2001 1:09:49 PM EDT

[#1]

Hi DVDTracker,

I assume you are talking PERL regexps? There is slightly different syntax for different languages (various UNIX shells, xemacs, C language, etc).

The problem is anything in brackets means you are only matching one letter by default.

So your expression matches:
a&a.com
because it is finding a match with the underlined portion:
a&[u]a.com[/u]

If you are trying to match any alphanumeric for any number of characters, followed by the .net, .com, or .org domains, your expression should look like this:
[b]^[a-zA-Z0-9]+\.com|net|org$[/b]

I'm assuming the rest of your syntax is right. Offhand I forget how the logical or operator works (the |).

The changes are the ^ anchor. This anchors your regexp to the beginning of the word. That was why your previous regexp was not excluding the "a&" portion in your example -- you didn't tell it that it couldn't have anything in front of the string you were matching.

The other change is the "+" modifier. As I said earlier, anything in brackets denotes just one character. So the + modifier means that you are now matching for one or more characters that meet the criterion of [a-zA-Z0-9]. If it was okay to have zero or more characters, you could use the * modifier instead of the +, but I don't think that is what you want in your example.

Hope that helps and it wasn't too terribly confusing. Let me know if you need any follow up clarification.

Dilbert

[Edited to fix UBBcode]
__
If it ain't broke, fix it till it is!

Posted: 11/2/2001 1:11:08 PM EDT

[#2]

What your regular expression says is to find within the string a pattern that has one letter/number followed by a .com, .net, or .org at the end of the string. Your string "a&a.com" matches that criteria. It has an "a", which is a valid letter/number, followed by a ".com". If you want your string to have only letters/numbers preceding the .com, etc., from the very beginning of the string, you need to write it thusly:

^[a-zA-Z0-9]+\.[com|net|org]$

I think that's right, although I didn't actually check it out myself. YMMV.

Posted: 11/2/2001 1:16:32 PM EDT

[#3]

Yes, thanks guys! I can see now where the missing "^" at the beginning was causing problems, as well as the "+". FYI, here's the working expresion. I added a hyphen since that is valid in a domain name.

[b]^[a-zA-Z0-9\-]+\.com|net|org$[/b]

Looks like the slash in front of the hyphen isn't necessary, but it's not hurting anything since that just forces it to be interpreted literally.

Posted: 11/2/2001 1:28:33 PM EDT

[#4]

Actually, since we need to support all the top-level domains:

[b]^[a-zA-Z0-9\-]+\.[a-zA-Z]\{2,4}$[/b]

Posted: 11/2/2001 4:38:48 PM EDT

[#5]

DVDTracker, your last rule allows for invalid domain names that start with a - to still pass. Below is the rule I developed years ago to look for valid e-mail addresses:

^[0-9a-z]([-_.]?[0-9a-z]\.?)*@[0-9a-z]([-.]?[0-9a-z])*\\.[a-z]+$

Posted: 11/2/2001 4:46:01 PM EDT

[#6]

Boy, good luck with that working 100%. Now all you have to do is include all the national domains like ".co.uk" for commercial UK sites, etc, ad nauseum. You might try Network Solutions for a list of all TLDs.

Posted: 11/2/2001 4:49:47 PM EDT

[#7]

Luckily were not handling domains like that. This is for domains for auto repair/parts companies in the US. I'd bet 99% of them will be .com

[ARCHIVED THREAD] - Programmer Gurus, need Regular Expression help

General » General Discussion

Win a FREE Membership!

Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!

You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.

Warning

Confirm Action

About AR15.COM

Stay Connected

Newsletter

Contact Us