# Introduction:

The goal of this tutorial is to give one a basic primer on the subject of regular expression. It will not cover every aspect of regular expression as this is not the goal. It is meant to be for beginners and will have many examples so that it will help the novice get started in using regular expression. Even though regular expression can be confusing at first, it can be a powerful tool in programming if one takes the time and learn the technique. Note: any string that is __underlined__ is the regular expression pattern that it is trying to find. While any characters in bold represents the matching characters from a regular expression.

# What is Regular Expression?

Regular expression, which is often abbreviated regex, is a technique to pattern matches a certain amount of characters within a string. Its name is derived from a mathematical theory from which it is based. Stephen Kleene was the American mathematicians who created the regular expression. This technique can be quite confusing but also useful once one gets used to the format.

# The Basic

## Character Literals

Character Literals is the simplest pattern matching technique in regular expression. It will match only those characters that are specified in the target text.

### Example of Character Literal:

1. __/o/__ will only match any character that is an “o”. In the following sentences the bold characters will be the one that are match if __/o/__ is used. Hello World.
2. __/Matt/__ will only match “Matt” within the target string. Again the bold character in the following sentence will be the matched characters if __/Matt/__ is used. Matt is my first name.

## Positional Characters

There are two characters that determent which matching pattern that will get match in the target text. The “^” character will only match the first occurrence of match within a string. While the “\$” will only match the last matching pattern in a string.

### Example of Positional Characters

1. __/^Matt/__ will match the first occurrence of Matt in the following sentence. My name is Matt and also, my friend’s name is Matt.
2. __/Matt\$/__ will match the last occurrence of Matt in the following sentence. My name is Matt and also, my friend’s name is Matt.

## Wildcard Character

`The wildcard character “.”  can represent any character in the string and can be quite useful if used properly but can lead to disaster if not used right.`

### Example of Using Wildcard Characters

1. __/.a/__ will match any matching pattern that is any character followed by “a”. See the following sentence to see how it works. Matt is my name.
2. ___/.a./__ will match any three characters that has an “a” in the middle. See the following sentence for example. Matt is my name

## Escape Characters

If you are a program, one would expect that one heard of escape characters. As mention in the last section the dot “.” is a wildcard character that can represent any character. But what if one wants to match a period in the text, how does one do that. This is when an escape character comes into play. If one wants to match the dot or any other character that has special meaning in regular expression (eg. “[“ ,”\”, etc) one has to escape the character with “\”

### Example of Escape Character

1. The following regular expression __\.a\__ will match any character that is followed by an “a”. But in the next example __/\.a__ will match if there is a dot followed by an “a”

## Character Sets

Using character sets one can match one of several different character, match a range of characters or exclude a set of characters. Character sets are enclosed in [] and can be quite powerful once one gets used to it. To exclude a character set from the pattern matching one uses the following format __[^]__. And for one to use a range of character set one uses the hyphen __[A-F]__.

### Example of Matching One of Several Different Characters

1. __/m[ae]t/__ – will match any letter that is pattern in a string that starts with an “M”, then is followed by either an “a” or an “e” and ends with “t”. So this expression will match both mat and 'met but please take note that it will not match “Maet” because it contains an “ae” in the middle instead of either an “a” or an “e”.
2. __/gr[ae]y/__ will match any pattern within a string that contains “gr” followed by either an “a” or an “e” and is followed by “y”. Thus, both grey and gray would both be matched but not graay, graey or greey

### Example of Excluding a Character Set

1. __/M[^eiou]tt/__ – this will match any pattern within a text that starts with an “M” and is followed by any character besides “e”,”i”,”o”, or an “u” and ends with two “tt”s. This will match Matt but not Mott.

### Example of Using a Range of Characters

1. __/[0-9]/__ this will match any single character that is a integer between 0 and 9. This will match 5 or 9 but will not match “59” or “X”
2. __/[a-f]/__ this will match any single letter that is between “a” and “f”. Thus this will match “b” but not match “g”
3. __/M[0-9]/__ this will match any pattern that begins with “M” and is followed by a single digit between 0 and 9. Thus this will match “M5” but not “MM”

## Repeating Character Sets

If one wants to not just match one character of a set but whether, one wants to match a character set multiple times one would want to use repeating character sets. The two most common symbols that is used to make a repeating character set: “*” will match the pattern zero or more times, while “+” will match the pattern in a text one or more times.

### Examples of Repeating Character Sets

1. __/[A-Za-z][0-9]*/__ This will match a pattern if the first character is a letter followed by zero or more digit. The following would be matched a7, G88 and H but would not match the following patterns a7a, G88a or 9.
2. __/[A-Za-z][0-9]+/__ This will match a pattern if the first character is a letter followed by zero or more digit. The following would be matched a7, G88 and H1 but would not match the following patterns a7a, G88a or H. Please not that when one uses “*” then the pattern allows just H because the number of digit can be zero but if one uses “+” the H will have to be followed by at least one digit.

## Conditional Statements in Regular Expression

One can use conditional statement (if –then –else) in one’s regular expression. It takes a special form (?if then | else) . Thus, if the statement is true the regular expression run the first token but if the statement is false it will run the token in the else part of the regular expression. This is often used with the lookahead and lookbehind functions because it return true or false whether than returning a match. Note, the word “else” can be omitted but I included to make it clear what is going on in the regular expression.

### Example of a Conditional Statement

The following regular expression __/(w)?x(?(1)y|z)/__ will match xz and wxy but does not match xz in the text wxz.

## Grouping and Backreference of Regular Expression

One can group parts of regular expression together which is often use with a regex operator (like repetition operator) to the entire group. To group regular expression together one encloses the expressions in parentheses (). Also, one can use parentheses to create a backreference which stores the pattern that match by the regular expression inside the parentheses. Basically a backreferences allows one to reuse part of the regular expression match. To reuse the regular expression one would use “\1” which means repeat the matching pattern one time.

### Examples of Backreferences

1. __/([m-p])a\1/__ This regular expression will match any pattern that starts with “m”, “n”, “o” or”p” and is followed by the character “a”. Then the metacharacter “\1” tells it to run the [m-p] match again. Thus the following would be a valid match: mam, man, mao, map, nam, nan, nao, nap, oam, oan, oao, oap, pam, pan, pao and pap.

## Word Boundaries

Word Boundaries are used to match whole words and not just part of a word. The metacharacter __\b__ is used to do “whole word” matching on a text. Also, there is a negated word boundary that uses the metacharacter __\B__ and will match any word where __\b__ does not match. Thus, it acts the opposite of __\b__.

### Example of a Word Boundary Pattern

1. __/\bis\b/__ will match “is” in the text. In the following example the bold character would be the match. My name is Matt.

Example of a negate word boundary pattern.

1. __/\Bis\B/__ will match everything but “is” in the following sentences/ My name is Matt.

## Alternation pattern matching

`This is very similar to character sets where one is trying to match one character out of a set.  But instead of matching a single character out of a set, one is matching a regular expression out of regular expression set.`

### Examples of Alternation Pattern Matching

1. __/\b(Matt|Davy)\b/__ This pattern will match any word that is Matt or Davy but not both. If this pattern is used in the following sentence then Matt would be the match. Matt is my name. However, in the following sentence Davy would be the match. Davy is his name.
2. __/er|ay/__ would match either “er” or “ay” in the following sentences. I have been programming computers for a very long time.

## Lookaround

The lookaround constructs does not return the matching pattern but whether returns if a match is possible or not. This is called assertions and can be quite useful if used properly. There are two lookaround functions that are used in regular expression: Lookahead and Lookbehind. There are two lookahead varieties: positive lookahead and negative lookahead. The negative lookahead will return true if there is a match which is not followed by something else. This takes the form of __regex(?!regex)__. For example, the following regular expression will match __q(?!u)__ will return true is there a “q” which is followed by any character but “u”. The positive lookahead does just the opposite; it will return true if there is a regular expression followed some other token. The positive lookahead take the following format __regex(?=regex)__. For example, the following regular expression __q(?=u)__ will return true if there is a “q” character followed by the “u” character. Lookbehind works similar to lookahead but works the text backwards. The negative lookbehind function takes the form __(?<!x)x__ whereas the positive lookbehind takes the form __(?⇐x)x__. In the following negative lookbehind regular express __(?<!m)__a will return true if there is a “a” character and is preceded by any character besides “m”. The following positive lookbehind regular expression __(?⇐m)__ a will return true if there is an “a” character preceded by the “m” character. Once one learns to use lookaround construct it can be quite useful.

# Conclusion

This tutorial is aim at novice to regular expression and does not cover every aspect of regular expression. It’s goal is to get beginner’s a quick overview of regular expression and give plenty examples as this is a good way to help a new person understand this concept. Regular expression can be quite powerful and is a very useful tool for a programmer.

# Real World Example

A real world example, which regular expression is extremely useful, is in validating inputs from submited form. Below is a small PHP class that does some basic input validation. Please not the regular expression used in the class is not perfect but will work in most cases. I didn’t want to overly complicated the regular expression due to this being a beginners tutorial.

```<?php

Class Validation{

Public function isDecimal(\$data){
//this function return true if the data is a decimal
\$pattern= "/[-+]?[0-9]*\.?[0-9]+/";
return \$this->match(\$data,\$pattern);
}
Public function isEmail(\$data){
//this function return true if the data is a valid email address
\$pattern="/^([a-zA-Z0-9_.-])[email protected]([a-zA-Z0-9_.-])+\.([a-zA-Z])+([a-zA-Z])+/";
return \$this->match(\$data,\$pattern);
}
Public function isDate(\$data){
//this function returns true if the data is a date that take the following format yyyy-mm-dd
\$pattern="/^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])\$/";
return \$this->match(\$data,\$pattern);
}
private function match(\$data,\$pattern){
return preg_match(\$pattern, \$data);
}

}

?>```