cc/td/doc/product/rtrmgmt/info_ctr/1_2
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Regular Expressions

Regular Expressions

This appendix explains how to use regular expressions and contains the following sections:

Description

The components of a regular expression are:

A regular expression consists of zero or more branches, separated by pipe characters (|). It matches anything that matches one of the branches.

A branch is zero or more concatenated pieces. It matches a match for the first, followed by a match for the second, and so on.

A piece is an atom possibly followed by asterisk (*), plus (+), or question mark (?).

An atom is:

\t, \n, \b, \r, and \f represent the characters: tab, newline, backspace, carriage return, and form feed.

A range is a sequence of characters enclosed in brackets []. It normally matches any single character from the sequence. When the sequence begins with a caret (^), it matches any single character not from the rest of the sequence. When two characters in the sequence are separated by hyphen (-), this is shorthand for the full list of ASCII characters between them. For example, [0-9] matches any decimal digit. To include a literal bracket "]" in the sequence, make it the first character, following a possible caret. To include a literal hyphen (-), make it the first or last character. A backslash "\" followed by a single character, includes that character, however, backslashes are not necessary for most special characters, as inside a range, only the "]", "-", and "\" characters are treated specially.

How to Use Regular Expressions

Regular expressions are strings used for matching. Most people are used to wildcard matching from the command lines of many operating systems. For example, the following expression would typically return anything beginning with "FRED":

FRED*

Wildcards have a very limited range in terms of the complexity of expressions and, in turn, the complexity of things they can search for. Regular expressions are different. In terms of active characters, such as the asterisk (*) in a wildcard search, regular expressions have more active characters available to be used.

For example, the following wildcard matches only "FRED":

FRED

To emulate a wildcard search, the expression must match any character. For this, regular expressions use the period (.) character. For example the following regular expression matches FRED followed by any single character:

FRED.

However, this is still not equivalent to the wildcard search. The asterisk (*) in the wildcard search can match with no characters or any number or characters. This is actually applicable in regular expressions to a point. For example, the following regular expression does exactly the same as the wildcard pattern above:

FRED.*

However, in a regular expression, when you use an asterisk (*), it means none, one, or more of the previous character in the pattern.

In the example, the asterisk is preceded with a period, which means none, one, or more of any character.

The plus (+) character works in a similar way to the asterisk. When you use a plus, it means one or more of the previous characters in the pattern.

Table D-1 shows a comparison of matches.


Table D-1:
Examples of Regular Expressions
String Comparison FRED.* FRED.+ FRED. FRED...

FREDRICK

yes

yes

no

no

FRED

yes

no

no

no

FRED123

yes

yes

no

yes

FRED AND BILL

yes

yes

no

no

BEFORE FRED

no

no

no

no

You can use an asterisk and plus after any characters and follow it with more text. Table D-2 shows some examples of this, and some of the other active characters in regular expressions.


Table D-2: Using the Asterisk and Plus in Regular Expression
Example Description

START *END

Matches a string that starts with the text "START" followed by any number of spaces (or none), followed by the text "END".

A*B*

Matches any number of A's followed by any number of B's.

(Testing)+

Matches any string made up of the word "Testing", for example, "TestingTesting".

START([0-9])+END

Matches a string that starts with the text "START" followed by one or more numbers 0-9, followed by the text "END".

The parenthesis () characters are used to regard this as one character, therefore, with the following expression, the string "Testing" can appear one or more times.

(Testing)+

However, the following expression looks for the string "Testin" followed by one or more of the letter "g".

Testing+

The bracket [] characters allow for the definitions of a set of characters to be matched. The brackets also mean the contents will be taken as a single character.

For example, the following expression matches with a, b, c, or d:

[abcd]

The following expression matches with a, b, c, or thing:

[abc(thing)]

Use the hyphen (-) character inside the square brackets to mean all the characters from the character preceding the hyphen to the character following it. For example, [0-9] is the same as [0123456789]. Also, [a-z] is the same as [abcdefghijklmnopqrstuvwxyz].

When you use a caret (^) character after the bracket character "[", the meaning of the set changes to anything except the contents of the brackets. For example, the following expression matches anything except a, b, c, or d:

[^abcd]

The caret character has another use; when used in a regular expression outside the brackets [], it acts as a marker for start of line. Therefore, the following regular expression only matches the string "blah" at the start of a line:

^blah

The caret is often used in conjunction with a dollar sign ($) which is an end of line marker. For example, the following expression only matches a string just containing "blah":

^blah$

The last active character is the pipe character (|). This is an or operation. Following is an example expression to match "abcd" or "fghi":

abcd|fghi

To use any of these active characters as something to match against, you must escape them with a back slash (\). For example, when you want to match with an opening bracket followed by any digits or spaces, then followed by a close bracket, you could use the following regular expression:

\[[0-9]*\]


hometocprevnextglossaryfeedbacksearchhelp
Posted: Mon Sep 27 17:57:42 PDT 1999
Copyright 1989-1999©Cisco Systems Inc.