.
A
character class expression
is a
character group
enclosed in square brackets, for example
[A-Z]
. Character groups are described on page 919. A character group defines a set of permitted characters; it matches a single character from the input if the character is a member of this set.
Subexpressions
An atom written in the form of a regular expression within parentheses is referred to as a subexpression (or group). Subexpressions serve two main purposes:
- They allow a sequence of characters to be defined as repeated or optional. For example, the regex
([0-9],)*[0-9]
matches strings such as
1,2,3
or
8,0
.
- They allow the application to determine which parts of the input string were matched by particular parts of the regular expression. For example, when the string
12 September 2008
is matched by the regex
([0-9]+)\s([A-Za-z]+)\s([0-9]+)
, three groups are captured, corresponding to the three parenthesized subexpressions: group 1 is the string
12
, group 2 is the string
September
, and group 3 is the string
2008
. Note that when a parenthesized subexpression has a quantifier, or when it is within an enclosing construct that allows repetition, it is the last matching substring that is accessible as the content of the corresponding group.