Site Search:
Sign in | Join | Help
4Penny.net

VB.NET

Notes, Tricks and Tips on VB.NET

April 2007 - Posts

  • Introduction to Regular Expressions: Part II

    A word of introduction to the 'introduction'. This is from the magazine ASP.NET. If you are reading this and you don't have a subscripton, stop and get one now - it's invaluable.

     I only copy the article here because the information is so good - I don't want to have to waste time looking for it or worry that it will get archived.

     Again - get the mag. http://www.aspnetpro.com/ 

     

    Introduction to Regular Expressions: Part II

    Validating Expressions in .NET

     

     

    This is the second part of my Regular Expressions overview. In the first article I discussed the basics of creating Regular Expressions and provided a link to test expression patterns using the .NET Framework RegEx classes. In this article I’ll discuss the various actions that can be taken to match values in a string using regular expressions. I’ll also discuss how you can implement a SQL CLR UDF to allow regular expression validation from your database to provide a strong level of input validation at the database level.

     

    I’ll start by discussing the use of the RegEx classes in standard .NET applications. My examples will all be based off a C# .NET console application designed to display the results of expression testing. You can expand these examples to apply to other program types. Please note that for all examples you must add a “using System.Text.RegularExpressions” statement to your code to be granted direct access to the RegEx classes. (See end of article for download details to obtain sample projects for each of my examples.)

     

    Regular Expressions: Standard .NET Application Examples

    Using bool Regex.IsMatch(string input, string pattern). The first Regular Expression match option I’ll provide is the static method Regex.IsMatch(string input, string expression). This method provides a quick way to receive a boolean result regarding a match between a regular expression and the input string provided by the user. The method signature in its simplest form is provided above. Using this overload of the “IsMatch” function you use the default regular expression options and receive a boolean value indicating the success or failure of a match.

     

    Below is the code required to receive input from the user, and to test the input value for a match based on the regular expression:

     
    
    
    //Prompt the user, 2 separate input items (inputRegEx and inputMatchText) 
    
    
    Console.WriteLine("Welcome to the Regular Expression demonstrator!"); 
    
    
    Console.Write("Please enter a regular expression string:"); 
    
    
    string inputRegEx = Console.ReadLine(); 
    
    
    Console.Write("Please enter a test string for matching:"); 
    
    
    string inputMatchText = Console.ReadLine(); 
    
    
    //Perform the match test, then output the result 
    
    
    bool isDirectMatch = Regex.IsMatch(inputMatchText, inputRegEx); 
    
    
    Console.WriteLine("Result of Regex.IsMatch(string input, string expression): " + isDirectMatch.ToString()); 

    Using this method I performed a test on the expression “b\w*” (without the quotes); this is to match a string that contains the letter b followed by zero or more word characters:

     

    Value      Result

    billy         true

    billy777   true

    b              true

    Billy         false

    aaaBill    false

     

     

    The results reported above are to be expected for a few reasons. Firstly, by default, Regular Expression matches are case sensitive, therefore my above match would ONLY work on an input string that contained a lowercase b. At times you want to validate that a string contains a particular string, but you don’t care if it is an uppercase or a lowercase letter. You could modify your expression to be “[bB]\w*” to allow an uppercase or lowercase letter b; however, this can add an extra level of confusion to your expression. The .NET Framework provides a RegexOptions enumeration you can use to provide additional options when matching expressions. We’ll discuss using this next.

     

    Using bool Regex.IsMatch(string input, string pattern, RegexOptions options). This method allows us to use the RegexOptions enumeration to specify a specific option or bit-switched option set. For the case of this article we’ll only discuss the IgnoreCase RegexOption; however, to perform further research on the available options, please see the following MSDN article: http://msdn2.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx. This example code performs a case-insensitive match on an expression text:

     

     

    The change to this example is very simple, yet the effect on the valid matches is great. Below is the truth table for the same examples used in the first test of the article. The same regular expression text was used; the only difference was that RegexOptions.IgnoreCase was provided:

    //Perform the match test, using the ignore case option
    
    
    bool isIgnoreCaseMatch = Regex.IsMatch(inputMatchText,
    
    
    inputRegEx, RegexOptions.IgnoreCase); 
    
    
    Console.WriteLine("Result of RegEx.IsMatch(string input,
    
    
    string expression, RegexOptions options): " + isIgnoreCaseMatch);

    Value      Result

    billy         true

    billy777   true

    b              true

    Billy         true

    aaaBill    true

     

     

    This shows you the power of using the regular expression options; you can provide more flexibility in your validation system. Using the Regex.IsMatch method is a great tool, but it won’t always suit your needs — there are times you need to find how many matches there are for a specified string among other complex validations. We’ll discuss some other options available in the following section.

     

    Using MatchCollection Regex.Matches(string input, string pattern). The Regex.Matches method provides a facility to check a string for multiple matches of an expression within the string. This can be helpful when validating license keys or other types of input that might have multiple occurrences of the same pattern. The inputs for this method are the same as for the IsMatch function; however, this method returns a MatchCollection object to show you the collection of matches. The MatchCollection will contain one Match object for each successful match. The Match object provides methods to receive information about the match. The most helpful methods are Index, Length, and value. Index is the index of the starting match character in the input string; this allows you to extract the match if needed. The Length value is the number of characters contained in the match and the Value is the actual match string.

     

    Below is the code required to attempt a multiple match and then output the results of the match:

     

     

    Standard Usage Summary. Using the above examples should help you get started with Regular Expression validation in .NET. The .NET Framework provides many methods and classes to validate and work with Regular Expressions and this article has only scratched the surface. Please see the below section to see how to create a CLR UDF to validate Zip Code Input!

    //Perform Matches() test
    
    
    MatchCollection oMatches = Regex.Matches(inputMatchText, inputRegEx); 
    
    
    Console.WriteLine("Matches found using Regex.Matches(string input, string pattern): "
    
    
                       + oMatches.Count.ToString());
    
    
     
    
    
    Console.WriteLine("Match Detail, if appliciable"); 
    
    
     
    
    
    //Loop through the collection, this will be skipped if no match
    
    
    foreach (Match oMatch in oMatches) 
    
    
    {
    
    
       Console.WriteLine("Match Index: " + oMatch.Index.ToString());
    
    
       Console.WriteLine("Match Length: " + oMatch.Length.ToString());
    
    
       Console.WriteLine("Match Value: " + oMatch.Value); 
    
    
       Console.WriteLine(""); 
    
    
     
    
    
    }

    Regular Expressions in SQL CLR User Defined Functions

    A place where Regular Expression validation can become very handy is in SQL Server 2005 CLR User Defined Functions and Stored Procedures. Prior to the ability to use CLR functions and procedures in SQL Server it was very cumbersome to implement sophisticated string validation at the database level. Now with the SQL CLR functionality you can quickly build procedures that can be used to validate input on SQL Server. Below is an example of how to create a CLR User Defined Function to validate a zip code based on the expression we created in Part I one of this article series.

     

    First, before building this example function we must ensure that CLR Integration is enabled on your specific database. To validate this you may run the following script to enable CLR Integration:

     

    sp_configure 'clr enabled', 1
    
    
    GO
    
    
    RECONFIGURE
    
    
    GO
    
    
     

    When this has been completed you’ll want to create a new SqlServer project. You can create this project by selecting New Project from the File menu in Visual Studio. You’ll find the SqlServer project type under Visual C# | Database | SqlServer. (NOTE: you may also create the UDF in Visual Basic by selecting the SqlServer project type from the Visual Basic project listing.) When you create the project it will request that you provide it a link to your SQL Server. This is needed for the automatic deployment and configuration of your stored procedure.

     

    Once your project has been created you’ll want to right-click on the project and select Add | User-Defined Function; you’ll then be asked to give it a name. In our case we’ll call it ValidateZip.cs to keep the name short and simple. Visual Studio will then provide a shell to place the code for your validation method. In our case we’ll want to be sure to set the return type to “bool” as it is simply a yes or no answer. We’ll also want to ensure that an input string value was provided. Because a zip code is an all or nothing validation, we’ll use the Regex.IsMatch method with a specific validation string to provide the result to the calling user. Below is the completed code to validate the zip code (NOTE: all User-Defined functions intended for use in SQL Server must be declared as public and static!):

     

    [Microsoft.SqlServer.Server.SqlFunction] 
    
    
    public static bool ValidateZip(string input) 
    
    
    {
    
    
       //Declare our expression
    
    
       string expression = @"^\d{5}(-\d{4})?$"; 
    
    
       //Test the input and return the result
    
    
       return Regex.IsMatch(input, expression); 
    
    
    }

    Now that you have the function built you can simply right-click on your project and select Deploy. Visual Studio will then register your function with SQL Server and you can now freely use this validation function in your SQL Queries. Below is a sample SQL Query to retrieve the validation result for a local Des Moines, Iowa zip code. If successful validation occurs, a 1 will be returned, if unsuccessful, a zero is returned.

     

    select dbo.ValidateZip('50320')

     

    Conclusion

    This article shows you the basics of Regular Expression validation in .NET, as well as how to incorporate regular expression validation into new SQL Server CLR User Defined Functions. This should serve as a great starting point for understanding the various methods to implement regular expression validation in your new and existing projects. The download file contains two sample projects and the sample code used in this article. Please feel free to review this code and let me know any questions you might have.

  • Introduction to Regular Expressions: Part I

    A word of introduction to the 'introduction'. This is from the magazine ASP.NET. If you are reading this and you don't have a subscripton, stop and get one now - it's invaluable.

     I only copy the article here because the information is so good - I don't want to have to waste time looking for it or worry that it will get archived.

     Again - get the mag. http://www.aspnetpro.com/ 

    Introduction to Regular Expressions: Part I

    Creating Expressions

     

     

    This two-part article series provides a quick and practical introduction to using regular expressions. Regular expressions can be used for many things; however, they are typically used for input validation or to perform advanced searches on text in supporting applications. This first article will explain how to create a regular expression pattern; the expression defines what is considered a match. The second article will provide details on how to implement regular expressions in .NET applications.

     

    Before starting I’d like to point out I have a free regular expression tester available on my Web site (http://www.mitchelsellers.com); you can use this to test the behavior of your regular expressions. During the second article I’ll discuss the specific options available on this test page, as well as how the page was created.

     

    Regular expressions have three basic types of symbols that are used: meta characters, escape characters, and character classes. The following table lists the important meta character(s), a short description, and an example of each.

     

    Character

    Description

    Example

    Matches

    ^

    Indicates the start of a string; used to match a specific beginning sequence.

    ^abc

    abc, acb123, abcdefg

    $

    Indicates end of a string; used to match a specific ending sequence.

    abc$

    123456789abc, 987abc

    .

    Any character excluding \n (new line).

    a.c

    abc, aac, a9c

    |

    Or operator used to specify one criteria or another.

    john|jane

    jane, john

    *

    Zero or more of previous expression.

    12c*

    12, 12c, 12cc

    +

    One or more of previous expression.

    1a+c

    1ac, 1aac

    ?

    Zero or one of previous expression.

    12?c

    1c, 12c

    \

    Escape character, used to make any of the special characters (^, $, ., |, *, +, ?, (, [, {, etc...) literal for matching. See next chart for other escape characters.

    1\*a

    1*a

    {....}

    Explicit quantifier notation; used to indicate _ occurrences of a character or character class. A comma can be added to provide min/max occurrences.

    12a{2}

    12aa, 12aa3

    [....]

    Matches a range of characters; you can provide collections of characters (abcdefg), as well as hyphenated ranges of characters for matching (A-Z).

    123[abc]

    123a, 123b, 123c

    (....)

    Groups a portion of the expression; used to group sections for display.

    (123){2}

    123123

    Meta Characters

     

    The characters in the table below are used to match special characters in regular expressions; we will use some of these later in this article. NOTE: This is a list of commonly used escape characters, not a complete list of escape characters.

     

    Character

    Matches

    \b

    Word boundary; indicates a space or other non-word character to signify the end of a word.

    \t

    Tab character.

    \n

    New line character (great for multi-line textboxes).

    \(any metacharacter)

    Matches the entered meta character. (\* matches *, \$ matches $).

    Escape Characters

     

    Below are character classes that represent different groups of characters to make it easier to match common groups of characters.

     

    Character Class

    Description

    Example

    Matches

    .

    Matches any character except \n. If Single Line option is enabled, it matches ANY character.

    a.c

    aac, abc, a1c

    [rstlne]

    Matches any single character in the provided list.

    a[rstlne]

    ar, as, al

    [^aeiou]

    Matches any single character NOT in the provided list.

    a[^aeiou]

    ab, ad, ah

    [0-9a-zA-Z]

    Matches any single character in the following ranges (0 through 9, A through Z, and a through z). The hyphen indicates a range element.

    123[0-9A-F]

    123A, 1234

    \w

    Matches any word character; in ECMAScript mode this matches [0-9A-Za-z].

    123\w

    123a, 1234

    \W

    Matches any NON-word character; in ECMAScript mode this is the same as [^0-9A-Za-z].

    123\W

    123$, 123-

    \s

    Matches any whitespace character; in ECMAScript mode this matches spaces, tabs, and new lines.

    123\sa

    123 a

    \S

    Matches any NON-whitespace character.

    1\Sa

    14a, 1ba

    \d

    Matches any digit character; in ECMAScript mode this matches 0-9.

    \d2

    12, 32

    \D

    Matches any NON-digit character; in ECMAScript mode this matches anything that is not 0-9

    \D2

    a2, b2

    Character Classes

     

    How to Apply this Information

    Now that we’ve explained the various characters included in matching regular expressions, let’s walk through some practical examples to illustrate how all these items are pulled together. In the following subsections I’ll walk you through a series of real-world validations and provide examples with detailed information.

     

    Before beginning the examples I want to point out that in ALL of my examples the regular expressions created start with the ^ character and end with the $ character. This is done to ensure that the expression matches the entire string. This is done to ensure that the string is that match, and ONLY that match. Otherwise, you can receive matches for strings with more than the included characters. You may play around with this using my expression tester to see the effects of omitting the ^ and $ characters.

     

    Postal Code Validation

    Postal code validation is a very common user input validation; typically, your postal code will either be five digits or nine digits, with a hyphen after the fifth digit. We can validate this input with the following expression:

     

    ^\d{5}(-\d{4})?$

     

    First we have the “\d{5}” portion of the expression, which indicates that the input must start with five digit characters (0-9). Next the portion of the expression inside the parenthesis, “-\d{4}” indicates a hyphen (-) to be followed by four digit characters. This is grouped within parentheses and has a question mark appended to the end. This question mark indicates that the input should have zero or one of the preceding items, which happens to be the entire expression in the parentheses. Therefore, in the case of zero, the expression would simply be five digit characters; in the case of one, the expression would be five digits, a hyphen, and four more digits.

     

    Simple Date Validation

    Validation of date input is another very common occurrence, full regular expression date validation is very involved; however, it is very easy to restrict users to a MM/DD/YYYY format with basic checking for incorrect input. Below is a regular expression to validate a date in the MM/DD/YYYY format; I’ve added parenthesis for readability:

     

    ^([01]\d)/([0-3]\d)/(\d{4})$

     

    The first section of this expression “([01]\d)” represents the month portion of our date, because there are only 12 months in the year we restrict the first digit to either a zero or a one, and the second character can be any number 0-9. This is one portion of this example that can be improved upon; you can modify and create regular expressions that are capable of validating that the input is between 1 and 12 (however, this is outside the scope of this article).

     

    The second section of this expression “([0-3]\d)” represents the day portion of our date. This is separated from our first part by a / character, which is a literal requirement that the month be separated from the date by a forward slash. The first part of our day check requires that the first digit of the day is a 0, 1, 2, or 3, then the second digit can be any number 0-9. Just as with the month portion, this can be expanded to ensure that the day value is appropriate for the month provided; however, it is outside the scope of this article.

     

    The final section of this expression is again separated by a / character, then it allows for four digit characters to be entered. This forms the final portion of the date.

     

    Phone Number Validation

    Another common input item to validate are phone numbers, including area codes and extensions. Below is a sample regular expression that validates a phone number that meets one of the following formats; (555) 555-1212, 555-555-1212, (555) 555-1212 x1111, or 555-555-1212 x1111. Portions of the expression have been highlighted to illustrate the different sections of logic. These sections will be explained below:

     

    ^ (\(\d{3}\)\s|\d{3}\s) (\d{3}[\s-]\d{4}) (\sx\d+)?$

     

    The yellow portion of this expression validates the area code input. Notice that we have two individual groups separated by the or operator (|). This indicates that one of the two expressions must be true. The first one validates on a left parenthesis (, three digits, a right parenthesis ), and a space; the second option validates on three digits and a space. Therefore, the phone number must begin with either (515) or 515; this validates the area code portion of our phone number.

     

    The green portion of this expression validates the remaining portion of the standard phone number. The first part “\d{3}” requires three digits, then the “[\s-]” allows for either a space or a hyphen. This is then followed by the “\d{4}” portion, which indicates that an additional four digits are required. We now have validation for a standard 10-digit phone number with support for multiple formats.

     

    The gray portion of this expression validates the optional telephone extension. The expression “\sx\d+” indicates that the input string should have a space, the letter x, and then one or more digits. This is enclosed in parentheses and followed by a question mark to indicate that it is optional. This provides for validation of numbers such as (555) 555-1212 x102.

     

    This should provide a helpful overview of regular expressions. Stay tuned for Part II.

     

    Mitchel W. Sellers is a Microsoft Certified Professional Developer with multiple specializations. He’s been developing in .NET since shortly after the release of .NET 1.1 He is the Co-Founder of a startup software consulting firm, IowaComputerGurus L.L.P. He is also very active in multiple online communities, including GotDotNet and DotNetNuke. Find out more about him at http://www.mitchelsellers.com or e-mail him at mailto:mitchel.sellers@gmail.com.

     

    Mitchel Sellers provides a quick and practical introduction to using regular expressions.

    Abstract -->