An Introduction to Regular Expressions and Their Use in .NET II...
Regular expressions may look a little strange at first, but they are very powerful and you will probably need to use them at some point.
In the first article in this series we introduced background information regarding regular expressions as well as starting to look at the support for regular expressions in .NET. In this article we'll look in more detail at this latter area.
The examples presented are in VB.NET for Windows Forms. They can be easily modified for ASP.NET if preferred.
As well as allowing us to create regular expressions the Regex class also provides a number of useful methods for manipulating string data using regular expressions. We would create a Regex object if we had a regular expression we wanted to utilise repeatedly on different strings. However, recall that as many of the class' methods are shared methods it is more efficient if using the regular expression only once not to create a Regex object but to simply use the methods required. For example we can use the shared Replace() method:
Regex.Replace(input_string, regular_expression, replacement_string)
The Regex class supports a number of options via the RegexOptions enumeration. Common examples are: IgnoreCase, RightToLeft, MultiLine, SingleLine. Use of the enumeration for a shared method (passing as a parameter) would be:
Regex.IsMatch(MyString,"xyz", RegexOptions.IgnoreCase)
To set more than one option simply OR each flag together.
We'll return to IsMatch shortly.
The two main ways of creating a Regex object are:
Regex(RegularExpressionPatternString)
Regex(RegularExpressionPatternString, OptionsEnumeration)
E.g.
Dim myRegex as new Regex("xyz")
MessageBox.Show(myRegex.IsMatch("Match found"))
Note that by default the Regex object matches against ASCII text. You can specify Unicode characters but these will be converted to ASCII wherever possible for performance reasons.
Indicates whether the regular expression finds a match in the input string.
Overload list:
Public Function IsMatch(strInputString) As Boolean
Public Function IsMatch(strInputString, intStartPosition) As Boolean
Public Shared Function IsMatch(strInputString, strRegExString) As Boolean
Public Shared Function IsMatch(strInputString, strRegExString, enumRegexOptions) As Boolean
If a match is found the method returns true, else it returns false.
Note the latter two members are shared whereas the first two operate on a Regex object. Parameters are largely self explanatory. intStartPosition allows you to specify a start position in the string to begin matching.
See the last section for an example.
Replaces all occurrences of a character pattern defined by a regular expression with a specified replacement character string.
There are 8 overloaded members, 4 shared. The two simpler options are:
Public Shared Function Replace(strInputString, strRegExString, String) As String
Public Shared Function Replace(strInputString, strRegExString, String, enumRegexOptions) As String
E.g.
Using a shared method to correct a common capitalisation error, replacing Cymru-web with Cymru-Web.
Dim strInputString as string = "Welcome to the Cymru-web web site"
strInputString = Regex.Replace(strInputString, "Cymru-web","Cymru-Web")
Additional functionality is available via the non-shared methods as we have the choice of specifying a maximum number of replacements and a start index.
Public Function Replace(strInputString, strRegExString, intMaxReplacements, intStartPosition) As String
Public Function Replace(strInputString, strRegExString, intMaxReplacements) As String
amongst others.
-1 indicates 'all' for intMaxReplacements.
Splits an input string into an array of substrings at the positions defined by a regular expression match. The two shared methods are:
Public Shared Function Split(strInputString, strRegExString) As String()
Public Shared Function Split(strInputString, strRegExString, enumRegexOptions) As String()
The non-shared members are similar to those we saw for Replace() as they allow specification of a maximum number of splits and the start position of the pattern matching.
Public Function Split(strInputString) As String()
Public Function Split(strInputString, intMaxReplacements) As String()
Public Function Split(strInputString, intMaxReplacements, intStartPosition) As String()
E.g.
take the example of a comma delimited string we wish to split at the commas via a shared method:
Dim strInputString as string = "string1,string2,string3,string4")
Dim SplitResults() as string
Dim strElement as string
Dim strResults as string
Results=Regex.Split(strInputString,",")
For Each strElement in SplitResults
strResults+=strElement & ControlChars.CRLF
Next
MessageBox.Show(strResults)
Which will split the input string to:
string1
string2
string3
string4
The Match and MatchCollection classes allow us to obtain the details of each match made via a regular expression. The Match class represents a single match. The MatchCollection is a collection of these Match objects. The Regex object's Match() or Matches() methods are used to retrieve the matches. Looking at the overloaded Match() members:
Public Function Match(strInputString) As Match
Public Function Match(strInputString, intStartPosition) As Match
Public Function Match(strInputString, intStartPosition, intMaxReplacements) As Match
Public Shared Function Match(strInputString, strRegExString) As Match
Public Shared Function Match(strInputString, strRegExString, enumRegexOptions) As Match
Note that each returns a match object that contains details of the first match made.
E.g.
Dim strInputString as string = "Welcome to the Cymru-Web.net web site. Cymru-Web specialise in developing
.NET solutions to your information system requirements.")
Dim myMatch as Match
Dim myRegex as New Regex("Cymru-Web")
myMatch=myRegex.Match(strInputString)
Do While myMatch.Success
MessageBox.Show(myMatch.Value)
myMatch.NextMatch()
Loop
As you may have guessed the NextMatch method searches for the next successful match in the string from the point of the last successful match or start position of the search (the start of the string by default or a specified location as determined by which overloaded method is used).
Some of the more useful members are:
Methods
| NextMatch | Returns a new Match with the results for the next match, starting at the position at which the last match ended (at the character beyond the last matched character). |
| Result | Returns the expansion of the passed replacement pattern for use with Groups. For example, if the replacement pattern is $1$2, Result returns the concatenation of Groups[1].Value and Groups[2].Value (Groups(1).Value and Groups(2).Value in Visual Basic). |
| Synchronized | Returns a Match instance equivalent to the one supplied that is safe to share between multiple threads. |
Properties
| Captures | Gets a collection of all the captures matched by the capturing group. The collection may have zero or more items. |
| Groups | Gets a collection of groups matched by the regular expression. |
| Index | The position in the original string where the first character of the captured substring was found. |
| Length | The length of the captured substring. |
| Success | Gets a boolean value indicating whether the match is successful. |
| Value | Gets the captured substring from the input string. |
Note that we'll return to look at some of the above concepts (e.g. groups) in the next article of this series.
In this second article in the series on regular expressions we've looked in a little more detail at the support for regular expressions as provided by .NET. In the next article we'll continue this progression looking at more complex topics such as groups, substitutions and backreferences as well as introducing some common regular expression patterns you may well find useful in your day-to-day coding activities.
VB.NET Text Manipulation Handbook
Liger et al.
Wrox
.NET Framework SDK v1.1