Regular Expressions

Back References
Character Class

A list of all the regular expression characters (metacharacters) that Ruby supports is provided below.

Alphanumeric characters without \ are not metacharacters
Symbols with \ are not metacharacters

The above two rules apply to metacharacters.

^

Beginning of line. Match directly after the first character or line feed character.
$

End of line. Match directly before the end of a character string or line feed character.
```
p "\n".gsub(/$/, "o")     # => "o\no"
```
.

Match any single character, excluding a line feed (when working with multi-byte characters, this refers to one character, not one byte). With the Regular Expression option m (multiple line mode. See Regular Expression Literals.), it matches any character that includes a line feed.
```
p /./e =~ " A "[0,1]     # => nil
```
\w

Alphanumeric character. The same as [0-9A-Za-z_]

Also matches Japanese double-byte characters.
\W

Non-alphanumeric character. Characters besides \w.
\s

Space character. The same as [ \t\n\r\f].
\S

Non-space character. Characters besides [ \t\n\r\f]
\d

Number. The same as [0-9].
\D

Non-number.
\A

Beginning of a character string. Unlike ^, the presence of a line feed has no effect.
\Z

End of a character string. Also matches the characters directly preceding a line feed if the string ends with one.
```
p "\n".gsub(/\Z/, "o")     # => "o\no"
```
\z

End of a character string. Unlike $ or \Z, the presence of a line feed has no effect.
\b

A language boundary outside the specified character class. (Matches between \w and \W.) While in the specified class, it is a backspace (0x08).
\B

Non-language boundary.
\G

Matches (doesn't have a width) the place matched from the previous one (directly after). Matches the front position only the first time. (Same as \A)

Can be used with scan or gsub. Use when you want to make a match after the location that was matched the previous time.
```
#  Takes values from the front of the line three digits at a time (for as long as the values continue.)
str = "123456 789"
str.scan(/\G\d\d\d/) {|m| p m }
```
[ ]

Set character class. See Character Class.
*

Returns the previous expression 0 or more times. Will try to match for as long as possible.
*?

Quantifiers. Returns the previous expression 0 or more times. (Shortest match.)
+

Quantifiers. Returns the previous expression 1 or more times.
+?

Quantifiers. Returns the previous expression 1 or more times. (Shortest match.)
{m}
{m,}
{m,n}

Controls the return of the specified range (interval quantifier). Returns all of the previous regular expressions as follows:
- m times
- m or more times
- m or more times, at most n times.
Matches for {,n} or {,} will always fail.
```
str = "foofoofoo"
p str[/(foo){1}/]   # => "foo"
p str[/(foo){2,}/]  # => "foofoofoo"
p str[/(foo){1,2}/] # => "foofoo"
```
Regular expressions ?, *, + are the same as {0,1}, {0,} {1,}, respectively.
{m}?
{m,}?
{m,n}?

Interval quantifier. Returns each of the previous regular expressions as follows:
- m times
- more than m times
- more than m times, at most n times.
and repeats. (Shortest match.)
?

Quantifiers. Returns the previous regular expression 1 or 0 times.
??

Quantifiers. Returns the previous regular expression 1 or 0 times (shortest match).
|

Alternative.
( )

Regular expression grouping. The character string matched to the regular expression in parenthesis is remembered for back referencing.
\1, \2 ... \n

Back reference. See Back Reference.
(?# )

Comment. All characters in the parentheses are ignored.
(?: )

Grouping without back reference. In other words, flexible grouping without becoming targets for \1, \2 (or $1, $2) and so on is used.
```
/(abc)/ =~ "abc"
p $1
=> "abc"

/(?:abc)/ =~ "abc"
p $1
=> nil
```
(?= )

Lookahead. Set location according to pattern. (Has no width.)
```
(?=re1)re2
```
The above expression is a regular expression that matches a match of both re1 and re2.
```
re1(?=re2)
```
The above expression is regular expression re1 that continues to the following character string matching with re2.
```
p /foo(?=bar)/ =~ "foobar"      # => 0
p $&    # => "foo"    (no information about the "bar" section)
```

(?! )

Negative lookahead. Sets a position depending on the negation of a pattern. (Has no width.)

(?!re1)re2

The above expression is a regular expression that does not match re1, but does match re2.

# A three-digit number that excludes 000
re = /(?!000)\d\d\d/
p re =~ "000"   # => nil
p re =~ "012"   # => 0
p re =~ "123"   # => 0

# C identifier (A character string starting with [A-Za-z_] and continue with [0-9A-Za-z_])
/\b(?![0-9])\w+\b/

Back References

The regular expression \1 \2... \n is a back reference. It matches the character string matched in the nth parentheses (regular expression ( ) grouping).

/((foo)bar)\1\2/

The above expression is the same as the following:

/((foo)bar)foobarfoo/

Example:

re = /(foo|bar|baz)\1/
p re =~ 'foofoo'   # => 0
p re =~ 'barbar'   # => 0
p re =~ 'bazbaz'   # => 0
p re =~ 'foobar'   # => nil

The corresponding parentheses must be to the left of the back reference.

If there is a back reference in the corresponding parentheses, the match will always fail. Also, the match will always fail when a single digit back reference has no parentheses.

p /(\1)/ =~ "foofoofoo" # => nil
p /(foo)\2/ =~ "foo\2"  # => nil

While one can specify a back reference of 2 or more digits, one must be careful not to confuse it with \nnn (characters corresponding to the octal nnn) of backslash notation. If a numeric value is 1 digit, it is a back reference. When there are more than 2 digits, it will be perceived as octal code if parentheses are not used.

Also, when working with regular expressions, it is necessary to start with 0 (such as \01, etc.) when using a 1-digit code in octal. (To prevent ambiguity, there are no \0 back references.)

p   /\1/ =~ "\1"   # => nil     # back reference that doesnft use parentheses.
p  /\01/ =~ "\1"   # => 0       # octal code
p  /\11/ =~ "\11"  # => 0       # octal code

# octal code (because there are no corresponding parentheses)
p /(.)\10/ =~ "1\10" # => 0

# back reference (because there are corresponding parentheses)
p /((((((((((.))))))))))\10/ =~ "aa"  # => 0

# octal code (Though there is no such
# \08 "\0" + "8" octal code)
p /(.)\08/ =~ "1\0008" # => 0

# If you want to write numbers following a back reference,
# you have to use parentheses to group them and split them up.
p /(.)(\1)1/ =~ "111"   # => 0

Character Class

Regular class [ ] is a character class specification. One character inside [ ] will be matched.

For example, /[abc]/ matches "a", "b" or "c". You can also write character strings using "-" when characters follow the ASCII code order like this: /[a-c]/. Also, if the first character is a ^ character, one character other than the specified characters will be matched.

Any e^' not at the beginning will be matched with that character. Also, any "-" at the front or end of a line will be matched with that character.

p /[a^]/ =~ "^"   # => 0
p /[-a]/ =~ "-"   # => 0
p /[a-]/ =~ "-"   # => 0
p /[-]/ =~ "-"   # => 0

An empty character class will result in an error.

p /[]/ =~ ""
p /[^]/ =~ "^"
# => invalid regular expression; empty character class: /[^]/

The "]" at the front of a line (or directly after a NOT "^") doesn't mean that the character class is over. It is just a simple "]". It is recommended that this kind of "]" performs a backslash escape.

p /[]]/ =~ "]"       # => 0
p /[^]]/ =~ "]"      # => nil

"^", "-", "]" and "\\" (backslash) can do a backslash escape and make a match with that character.

p /[\^]/ =~ "^"   # => 0
p /[\-]/ =~ "-"   # => 0
p /[\]]/ =~ "]"   # => 0
p /[\\]/ =~ "\\"  # => 0

Inside the [] you can use character string and the same backslash notation, and also the regular expressions \w, \W, \s, \S, \d, \D (these are shorthand for the character class).

Please note that the character classes below can make a match with a line feed character, too, according to the negation (the same is true with regular expressions \W and \D.)

p /[^a-z]/ =~ "\n"    # => 0