Visual Basic Scripting Edition  

Alternation and Grouping

Alternation allows use of the '|' character to allow a choice between two or more alternatives. Expanding the chapter heading regular expression, you can expand it to cover more than just chapter headings. However, it's not as straightforward as you might think. When alternation is used, the largest possible expression on either side of the '|' character is matched. You might think that the following expressions for JScript and VBScript match either 'Chapter' or 'Section' followed by one or two digits occurring at the beginning and ending of a line:

/^Chapter|Section [1-9][0-9]{0,1}$/
"^Chapter|Section [1-9][0-9]{0,1}$"

Unfortunately, what happens is that the regular expressions shown above match either the word 'Chapter' at the beginning of a line, or 'Section' and whatever numbers follow that, at the end of the line. If the input string is 'Chapter 22', the expression shown above only matches the word 'Chapter'. If the input string is 'Section 22', the expression matches 'Section 22'. But that's not the intent here so there must be a way to make that regular expression more responsive to what you're trying to do and there is.

You can use parentheses to limit the scope of the alternation, that is, make sure that it applies only to the two words, 'Chapter' and 'Section'. However, parentheses are tricky as well, because they are also used to create subexpressions, something that's covered later in the section on subexpressions. By taking the regular expressions shown above and adding parentheses in the appropriate places, you can make the regular expression match either 'Chapter 1' or 'Section 3'.

The following regular expressions use parentheses to group 'Chapter' and 'Section' so the expression works properly. For JScript:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

For VBScript:

"^(Chapter|Section) [1-9][0-9]{0,1}$"

These expressions work properly except that an interesting by-product occurs. Placing parentheses around 'Chapter|Section' establishes the proper grouping, but it also causes either of the two matching words to be captured for future use. Since there's only one set of parentheses in the expression shown above, there is only one captured submatch. This submatch can be referred to using the Submatches collection in VBScript or the $1-$9 properties of the RegExp object in JScript.

Sometimes capturing a submatch is desirable, sometimes it's not. In the examples shown above, all you really want to do is use the parentheses for grouping a choice between the words 'Chapter' or 'Section'. You don't necessarily want to refer to that match later. In fact, unless you really need to capture submatches, don't use them. Your regular expressions will be more efficient since they won't have to take the time and memory to store those submatches.

You can use '?:' before the regular expression pattern inside the parentheses to prevent the match from being saved for possible later use. The following modification of the regular expressions shown above provides the same capability without saving the submatch. For JScript:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

For VBScript:

"^(?:Chapter|Section) [1-9][0-9]{0,1}$"

In addition to the '?:' metacharacters, there are two other non-capturing metacharacters used for something called lookahead matches. A positive lookahead, specified using ?=, matches the search string at any point where a matching regular expression pattern in parentheses begins. A negative lookahead, specified using '?!', matches the search string at any point where a string not matching the regular expression pattern begins.

For example, suppose you have a document containing references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Suppose further that you need to update the document by finding all the references to Windows 95, Windows 98, and Windows NT and changing those reference to Windows 2000. You can use the following JScript regular expression, which is an example of a positive lookahead, to match Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

To make the same match in VBScript, use the following:

"Windows(?=95 |98 |NT )"

Once the match is found, the search for the next match begins immediately following the matched text, not including the characters included in the look-ahead. For example, if the expressions shown above matched 'Windows 98', the search resumes after 'Windows' not after '98'.