Understanding Regular Expressions (Regex)

This tutorial is not meant to be an exhaustive one, but merely a ‘look’ at some of the syntax used in Regular Expressions, so that it will not look quite so ‘foreign’ to you, the next time you look at a Regular Expression inside the Regular Expression Validator in ASP.Net.

The use of regular expressions is based on the contents of a string, matching criteria set in play by the assigned Regular Expression. It tests for a pattern within a string. For instance, let’s say the string to search comes from a text box called ‘Text1′. Let’s say that the Regular Expression (matching characters) we are searching for is any lower case letter. The Regular Expression would be [a-z]. Therefore, the string returned from ‘Text1.text’ will be searched and matched against the Regular Expression. Regular Expressions in ASP.Net, used in Validator Server Controls are kind of like applying rules to the text input. If the Regular Expression above were put into a validator for ‘Text1′, then, since anything other than lower case letters would NOT match, if we put in the number 6, it would FAIL validation. On the otherhand, ‘top’ would PASS validation.

At this point, we need to stop and discuss several characters (Metacharacters) we must come to understand when using Regular Expressions. Above, we saw two of them – the brackets ([])and the dash (-). First of all, we call any characters we use for search or matching, ‘Literals’. The ‘a’ and the ‘z’ in the example above are ‘Literals’. Next we come to the ‘Metacharacter’. Metacharacters in the above example are the brackets and the dash. You can also use the caret (^)/(Shift-6). Also, we can use the dollar sign ($), the question mark (?), the asterisk (*), the plus sign (+), and the period (.). The meanings for these Metacharacters are described below.

Metacharacter Meanings:
The brackets denote searching for anything laid out inside the brackets.
The dash is a ‘range separator’. Instead of listing the entire lowercase alphabet, we can merely list a-z, as above, to show all the characters. Therefore, in this example, basically, using [a-z] for the regular expression would mean that we would search the text, and only match, lower case alphabetic characters. That brings up another anomaly about Regular Expressions – case sensitivity. Yes, Regular Expressions ARE case sensitive. This means that if we wanted any alphabetic character to be matched, we would need to put in [a-z], plus [A-Z]. Of course, we could put numeric ranges here, too, like ‘[0-9]‘ to match ONLY numeric characters. Another special identifier ‘\d’ does the same thing – the ‘d’ stands for a digit character.

The Caret (^)/(Shift-6), if it is INSIDE brackets will match anything BUT what is listed in the characters next to it – but – it ‘negates’ whatever is in the Regular Expression (like [^Aa] – this will match anything except the lower or upper case ‘A’. However, if it is quite different when used outside brackets. Here, it will look for the characters next to it, and test them against only the BEGINNING of the string, but it looks for an exact match. For instance: ‘^Dav’ will be found in the string ‘David is here today’, but it will not be matched in ‘We’re going to find David’, since it’s not at the beginning of the string.

The dollar sign ($), as opposed to the Caret, will look at the END of the target string. For instance, ‘$fox’ will find a match in ’silver fox’ but not in ‘the fox jumped over the log’

The period (.) can be used like a wildcard. Anyone who has used databases should know a little about wildcards, hopefully. Let’s say the Regular Expression was ‘exp.’ – It would match if it found ‘expression’, ‘experience’, or ‘exponential’.

The question mark (?) matches the preceding character 0 or 1 times

The asterisk (*) matches the preceding character 0 or more times. It is also sort of like a wildcard.
The plus sign (+) matches the previous character 1 or more times.

If we need to use one of the metacharacters, literally, then we need an ‘escape’ character. In this case, we’d precede the metacharacter with a backwards slash mark (\). For instance, since the dash is a range separator, to include the dash itself in the character sequence, we’d have to do it like this: [\-]

Two additional metacharacters are the parentheses and the vertical bar (|), sometimes called a pipe, in the Dos/Windows world. Parentheses can be used to group sections of the search expression together, while the pipe will have characters on the right and left of it, and it’s used as an either/or type of search. For instance, gr(a|e)y will find ‘gray’ or ‘grey’.

What we’ve seen here is a hint or glimpse into the syntax of Regular Expressions. Hopefully, this will help you the next time you check out any of the Regular Expressions you encounter. This isn’t all there is to know about Regular Expressions at all. In later tutorials, we will delve deeper into other characters and how to assemble Regular Expressions. In the mean time, if you would like to see a web site full of Regular Expressions, go to http://www.regexlib.com/ .

Related Posts:

Twitter Digg Delicious Stumbleupon Technorati Facebook Email

No comments yet... Be the first to leave a reply!