Regular Expressions

A list of all the regular expression characters (metacharacters) that Ruby supports is provided below.

The above two rules apply to metacharacters.

Back References

The regular expression \1 \2... \n is a back reference. It matches the character string matched in the nth parentheses (regular expression ( ) grouping).

/((foo)bar)\1\2/

The above expression is the same as the following:

/((foo)bar)foobarfoo/

Example:

re = /(foo|bar|baz)\1/
p re =~ 'foofoo'   # => 0
p re =~ 'barbar'   # => 0
p re =~ 'bazbaz'   # => 0
p re =~ 'foobar'   # => nil

The corresponding parentheses must be to the left of the back reference.

If there is a back reference in the corresponding parentheses, the match will always fail. Also, the match will always fail when a single digit back reference has no parentheses.

p /(\1)/ =~ "foofoofoo" # => nil
p /(foo)\2/ =~ "foo\2"  # => nil

While one can specify a back reference of 2 or more digits, one must be careful not to confuse it with \nnn (characters corresponding to the octal nnn) of backslash notation. If a numeric value is 1 digit, it is a back reference. When there are more than 2 digits, it will be perceived as octal code if parentheses are not used.

Also, when working with regular expressions, it is necessary to start with 0 (such as \01, etc.) when using a 1-digit code in octal. (To prevent ambiguity, there are no \0 back references.)

p   /\1/ =~ "\1"   # => nil     # back reference that doesnft use parentheses.
p  /\01/ =~ "\1"   # => 0       # octal code
p  /\11/ =~ "\11"  # => 0       # octal code

# octal code (because there are no corresponding parentheses)
p /(.)\10/ =~ "1\10" # => 0

# back reference (because there are corresponding parentheses)
p /((((((((((.))))))))))\10/ =~ "aa"  # => 0

# octal code (Though there is no such
# \08 "\0" + "8" octal code)
p /(.)\08/ =~ "1\0008" # => 0

# If you want to write numbers following a back reference,
# you have to use parentheses to group them and split them up.
p /(.)(\1)1/ =~ "111"   # => 0

Character Class

Regular class [ ] is a character class specification. One character inside [ ] will be matched.

For example, /[abc]/ matches "a", "b" or "c". You can also write character strings using "-" when characters follow the ASCII code order like this: /[a-c]/. Also, if the first character is a ^ character, one character other than the specified characters will be matched.

Any e^' not at the beginning will be matched with that character. Also, any "-" at the front or end of a line will be matched with that character.

p /[a^]/ =~ "^"   # => 0
p /[-a]/ =~ "-"   # => 0
p /[a-]/ =~ "-"   # => 0
p /[-]/ =~ "-"   # => 0

An empty character class will result in an error.

p /[]/ =~ ""
p /[^]/ =~ "^"
# => invalid regular expression; empty character class: /[^]/

The "]" at the front of a line (or directly after a NOT "^") doesn't mean that the character class is over. It is just a simple "]". It is recommended that this kind of "]" performs a backslash escape.

p /[]]/ =~ "]"       # => 0
p /[^]]/ =~ "]"      # => nil

"^", "-", "]" and "\\" (backslash) can do a backslash escape and make a match with that character.

p /[\^]/ =~ "^"   # => 0
p /[\-]/ =~ "-"   # => 0
p /[\]]/ =~ "]"   # => 0
p /[\\]/ =~ "\\"  # => 0

Inside the [] you can use character string and the same backslash notation, and also the regular expressions \w, \W, \s, \S, \d, \D (these are shorthand for the character class).

Please note that the character classes below can make a match with a line feed character, too, according to the negation (the same is true with regular expressions \W and \D.)

p /[^a-z]/ =~ "\n"    # => 0