DEVTOME.COM HOSTING COSTS HAVE BEGUN TO EXCEED 115$ MONTHLY. THE ADMINISTRATION IS NO LONGER ABLE TO HANDLE THE COST WITHOUT ASSISTANCE DUE TO THE RISING COST. THIS HAS BEEN OCCURRING FOR ALMOST A YEAR, BUT WE HAVE BEEN HANDLING IT FROM OUR OWN POCKETS. HOWEVER, WITH LITERALLY NO DONATIONS FOR THE PAST 2+ YEARS IT HAS DEPLETED THE BUDGET IN SHORT ORDER WITH THE INCREASE IN ACTIVITY ON THE SITE IN THE PAST 6 MONTHS. OUR CPU USAGE HAS BECOME TOO HIGH TO REMAIN ON A REASONABLE COSTING PLAN THAT WE COULD MAINTAIN. IF YOU WOULD LIKE TO SUPPORT THE DEVTOME PROJECT AND KEEP THE SITE UP/ALIVE PLEASE DONATE (EVEN IF ITS A SATOSHI) TO OUR DEVCOIN 1M4PCuMXvpWX6LHPkBEf3LJ2z1boZv4EQa OR OUR BTC WALLET 16eqEcqfw4zHUh2znvMcmRzGVwCn7CJLxR TO ALLOW US TO AFFORD THE HOSTING.

THE DEVCOIN AND DEVTOME PROJECTS ARE BOTH VERY IMPORTANT TO THE COMMUNITY. PLEASE CONTRIBUTE TO ITS FURTHER SUCCESS FOR ANOTHER 5 OR MORE YEARS!

Introduction:

The goal of this tutorial is to give one a basic primer on the subject of regular expression. It will not cover every aspect of regular expression as this is not the goal. It is meant to be for beginners and will have many examples so that it will help the novice get started in using regular expression. Even though regular expression can be confusing at first, it can be a powerful tool in programming if one takes the time and learn the technique. Note: any string that is underlined is the regular expression pattern that it is trying to find. While any characters in bold represents the matching characters from a regular expression.

What is Regular Expression?

Regular expression, which is often abbreviated regex, is a technique to pattern matches a certain amount of characters within a string. Its name is derived from a mathematical theory from which it is based. Stephen Kleene was the American mathematicians who created the regular expression. This technique can be quite confusing but also useful once one gets used to the format.

The Basic

Character Literals

Character Literals is the simplest pattern matching technique in regular expression. It will match only those characters that are specified in the target text.

Example of Character Literal:

  1. /o/ will only match any character that is an “o”. In the following sentences the bold characters will be the one that are match if /o/ is used. Hello World.
  2. /Matt/ will only match “Matt” within the target string. Again the bold character in the following sentence will be the matched characters if /Matt/ is used. Matt is my first name.

Positional Characters

There are two characters that determent which matching pattern that will get match in the target text. The “^” character will only match the first occurrence of match within a string. While the “$” will only match the last matching pattern in a string.

Example of Positional Characters

  1. /^Matt/ will match the first occurrence of Matt in the following sentence. My name is Matt and also, my friend’s name is Matt.
  2. /Matt$/ will match the last occurrence of Matt in the following sentence. My name is Matt and also, my friend’s name is Matt.

Wildcard Character

The wildcard character “.”  can represent any character in the string and can be quite useful if used properly but can lead to disaster if not used right.

Example of Using Wildcard Characters

  1. /.a/ will match any matching pattern that is any character followed by “a”. See the following sentence to see how it works. Matt is my name.
  2. _/.a./ will match any three characters that has an “a” in the middle. See the following sentence for example. Matt is my name

Escape Characters

If you are a program, one would expect that one heard of escape characters. As mention in the last section the dot “.” is a wildcard character that can represent any character. But what if one wants to match a period in the text, how does one do that. This is when an escape character comes into play. If one wants to match the dot or any other character that has special meaning in regular expression (eg. “[“ ,”\”, etc) one has to escape the character with “\”

Example of Escape Character

  1. The following regular expression \.a\ will match any character that is followed by an “a”. But in the next example /\.a will match if there is a dot followed by an “a”

Character Sets

Using character sets one can match one of several different character, match a range of characters or exclude a set of characters. Character sets are enclosed in [] and can be quite powerful once one gets used to it. To exclude a character set from the pattern matching one uses the following format [^]. And for one to use a range of character set one uses the hyphen [A-F].

Example of Matching One of Several Different Characters

  1. /m[ae]t/ – will match any letter that is pattern in a string that starts with an “M”, then is followed by either an “a” or an “e” and ends with “t”. So this expression will match both mat and 'met but please take note that it will not match “Maet” because it contains an “ae” in the middle instead of either an “a” or an “e”.
  2. /gr[ae]y/ will match any pattern within a string that contains “gr” followed by either an “a” or an “e” and is followed by “y”. Thus, both grey and gray would both be matched but not graay, graey or greey

Example of Excluding a Character Set

  1. /M[^eiou]tt/ – this will match any pattern within a text that starts with an “M” and is followed by any character besides “e”,”i”,”o”, or an “u” and ends with two “tt”s. This will match Matt but not Mott.

Example of Using a Range of Characters

  1. /[0-9]/ this will match any single character that is a integer between 0 and 9. This will match 5 or 9 but will not match “59” or “X”
  2. /[a-f]/ this will match any single letter that is between “a” and “f”. Thus this will match “b” but not match “g”
  3. /M[0-9]/ this will match any pattern that begins with “M” and is followed by a single digit between 0 and 9. Thus this will match “M5” but not “MM”

Repeating Character Sets

If one wants to not just match one character of a set but whether, one wants to match a character set multiple times one would want to use repeating character sets. The two most common symbols that is used to make a repeating character set: “*” will match the pattern zero or more times, while “+” will match the pattern in a text one or more times.

Examples of Repeating Character Sets

  1. /[A-Za-z][0-9]*/ This will match a pattern if the first character is a letter followed by zero or more digit. The following would be matched a7, G88 and H but would not match the following patterns a7a, G88a or 9.
  2. /[A-Za-z][0-9]+/ This will match a pattern if the first character is a letter followed by zero or more digit. The following would be matched a7, G88 and H1 but would not match the following patterns a7a, G88a or H. Please not that when one uses “*” then the pattern allows just H because the number of digit can be zero but if one uses “+” the H will have to be followed by at least one digit.

Conditional Statements in Regular Expression

One can use conditional statement (if –then –else) in one’s regular expression. It takes a special form (?if then | else) . Thus, if the statement is true the regular expression run the first token but if the statement is false it will run the token in the else part of the regular expression. This is often used with the lookahead and lookbehind functions because it return true or false whether than returning a match. Note, the word “else” can be omitted but I included to make it clear what is going on in the regular expression.

Example of a Conditional Statement

The following regular expression /(w)?x(?(1)y|z)/ will match xz and wxy but does not match xz in the text wxz.

Grouping and Backreference of Regular Expression

One can group parts of regular expression together which is often use with a regex operator (like repetition operator) to the entire group. To group regular expression together one encloses the expressions in parentheses (). Also, one can use parentheses to create a backreference which stores the pattern that match by the regular expression inside the parentheses. Basically a backreferences allows one to reuse part of the regular expression match. To reuse the regular expression one would use “\1” which means repeat the matching pattern one time.

Examples of Backreferences

  1. /([m-p])a\1/ This regular expression will match any pattern that starts with “m”, “n”, “o” or”p” and is followed by the character “a”. Then the metacharacter “\1” tells it to run the [m-p] match again. Thus the following would be a valid match: mam, man, mao, map, nam, nan, nao, nap, oam, oan, oao, oap, pam, pan, pao and pap.

Word Boundaries

Word Boundaries are used to match whole words and not just part of a word. The metacharacter \b is used to do “whole word” matching on a text. Also, there is a negated word boundary that uses the metacharacter \B and will match any word where \b does not match. Thus, it acts the opposite of \b.

Example of a Word Boundary Pattern

  1. /\bis\b/ will match “is” in the text. In the following example the bold character would be the match. My name is Matt.

Example of a negate word boundary pattern.

  1. /\Bis\B/ will match everything but “is” in the following sentences/ My name is Matt.

Alternation pattern matching

This is very similar to character sets where one is trying to match one character out of a set.  But instead of matching a single character out of a set, one is matching a regular expression out of regular expression set.

Examples of Alternation Pattern Matching

  1. /\b(Matt|Davy)\b/ This pattern will match any word that is Matt or Davy but not both. If this pattern is used in the following sentence then Matt would be the match. Matt is my name. However, in the following sentence Davy would be the match. Davy is his name.
  2. /er|ay/ would match either “er” or “ay” in the following sentences. I have been programming computers for a very long time.

Lookaround

The lookaround constructs does not return the matching pattern but whether returns if a match is possible or not. This is called assertions and can be quite useful if used properly. There are two lookaround functions that are used in regular expression: Lookahead and Lookbehind. There are two lookahead varieties: positive lookahead and negative lookahead. The negative lookahead will return true if there is a match which is not followed by something else. This takes the form of regex(?!regex). For example, the following regular expression will match q(?!u) will return true is there a “q” which is followed by any character but “u”. The positive lookahead does just the opposite; it will return true if there is a regular expression followed some other token. The positive lookahead take the following format regex(?=regex). For example, the following regular expression q(?=u) will return true if there is a “q” character followed by the “u” character. Lookbehind works similar to lookahead but works the text backwards. The negative lookbehind function takes the form (?<!x)x whereas the positive lookbehind takes the form (?⇐x)x. In the following negative lookbehind regular express (?<!m)a will return true if there is a “a” character and is preceded by any character besides “m”. The following positive lookbehind regular expression (?⇐m) a will return true if there is an “a” character preceded by the “m” character. Once one learns to use lookaround construct it can be quite useful.

Conclusion

This tutorial is aim at novice to regular expression and does not cover every aspect of regular expression. It’s goal is to get beginner’s a quick overview of regular expression and give plenty examples as this is a good way to help a new person understand this concept. Regular expression can be quite powerful and is a very useful tool for a programmer.

Real World Example

A real world example, which regular expression is extremely useful, is in validating inputs from submited form. Below is a small PHP class that does some basic input validation. Please not the regular expression used in the class is not perfect but will work in most cases. I didn’t want to overly complicated the regular expression due to this being a beginners tutorial.

<?php

Class Validation{
	
	Public function isDecimal($data){
		//this function return true if the data is a decimal
		$pattern= "/[-+]?[0-9]*\.?[0-9]+/";
		return $this->match($data,$pattern);
	}
	Public function isEmail($data){
		//this function return true if the data is a valid email address
		$pattern="/^([a-zA-Z0-9_.-])+@([a-zA-Z0-9_.-])+\.([a-zA-Z])+([a-zA-Z])+/";
		return $this->match($data,$pattern);
	}
	Public function isDate($data){
		//this function returns true if the data is a date that take the following format yyyy-mm-dd
		$pattern="/^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$/";
		return $this->match($data,$pattern);
	}
	private function match($data,$pattern){
		return preg_match($pattern, $data);
	}

}

?>

Computing


QR Code
QR Code regular_expression (generated for current page)
 

Advertise with Anonymous Ads