PHP Regular Expression Functions

hipowlWhen you use a regular expression, you create a special regular expression syntax pattern.  That pattern is then used to search the supplied text for that pattern, and then do something if it finds the pattern.  It can be used to verify and find credit card numbers, zip codes, and words that start with "t" and end with "s," for example.

PHP has nine funcions that help with regular expressions.  I'm going to review each function quickly.  To make things easier, we can group the nine functions into three groups: functions used to test for a pattern, functions used to insert text, and functions used to output patterns found in the text.

FUNCTIONS USED TO TEST

These functions are meant to be used with "if" statements and yield true, if it is a match, or false, if there is not a match.

PREG_MATCH

Preg_match looks to see if there is a match between the regular expression and the string your testing, like so:

$source = "The quick brown fox jumped over the lazy dog.";
$pattern = "/quick/";  // look for the word 'quick'

if (preg_match($pattern, $source))
{
    //do something
    jump();
}

The output of preg_match is either 1 or 0.  There is either a match or not.  It stops searching after it finds a match. In the above example, the word "Quick" with a capital Q would not be a match, we'd have to have the pattern  "/quick/i" (the i for case-insensitive) to do that.  Preg_match is probably the most used of the preg functions.

PREG_LAST_ERROR

This function returns the error code from the last preg_ function you  ran.  Error code patterns are a set of PHP predefined constant error patterns.  It is often used with PREG_MATCH to see if an error has occurred while matching.

preg_match($pattern, $source);

if(preg_last_error() === PREG_RECURSION_LIMIT_ERROR)
{
echo ("Recursion limit was exhausted!");
}
else if (preg_last_error() === PREG_BACKTRACK_LIMIT_ERROR)
{
echo ("Backtrack limit was exhausted!");
}

There are 13 pre-defined regular expression error patterns in PHP. If you are checking for errors check the PHP manual for a list of these constants.

FUNCTIONS USED TO INSERT TEXT

PREG_SPLIT

preg_split  splits a string into different array items.

$pattern = "/ /";
$source = "123456789";
$result = array();

$limit = 4;  // optional - do 4 characters
$flag = "PREG_SPLIT_NO_EMPTY"; // optional - predefined PHP constants.  This one returns only non-empty characters

$result = preg_split($pattern, $source, $limit, $flag);

echo $result;

Which returns:  Array(0=>1, 1=>2, 2=>3, 3=>4, 4=>56789)

The reason all the numbers are not comma separated, is we used the $limit to only do four matches. The $flag variable is a set of three PHP predefined constants you can use in splitting strings.

PREG_QUOTE

Preg_quote is unique in that it places a \ in front of any reqular expression characters.  It is set up differently then other preg functions.

$delimiter =  "#"; // optional - a replacement character other than "\"
$source = "The quick brown fox cost me $600 when it bit my dog.";

$result = preg_quote($source, $delimiter);
echo $result;

The result is: The quick brown fox cost me #$600 when it bit my dog#.

Without the delimiter it would read: The quick brown fox cost me \$600 when it bit my dog\.  This is useful for escaping characters for printing.

FUNCTIONS USED TO OUTPUT

PREG_REPLACE

preg_replace performs a regular expression search and replace.

$source =       // What to search
$pattern =      // The search pattern
$replacement =  // What to use as a replacement
$limit =        // optional - The number of replacements to do
$count =        // optional - The number of replacements made
$result =       // The array that is returned

$result = preg_replace ($pattern, $replacement, $source, $limit, $count)

A Limit of -1 will do all the replacements.  The entire $source is returned with the replacement strings in place.

This is an interesting function as you could have several patterns and replacements in an array.  There is a surprisingly good example on the PHP web site.

$string = 'The quick brown fox jumped over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';

echo preg_replace($patterns, $replacements, $string);

The result is:

"The bear black slow jumped over the lazy dog."

These functions usually output a $result variable that is an array of pattern matches that you use elsewhere in your code.

PREG_FILTER

This is identical to preg_replace, it does a search and replace based on a pattern, but it filters out what doesn't match, and returns the replace items in an array.

$result = preg_filter( $pattern, $replacement, $string, $Limit, $count);

PREG_REPLACE_CALLBACK

This is identical to preg_replace. It performs a regular expression search and replace, but instead of a $replacement value a callback is specified.

$source = "The quick brown fox jumped over the lazy dog.";
$pattern = "/quick/";  // looking for the lowercase word "quick"
$limit = -1;
$matches = array();

function theCallBack($matches)
{
    echo $matches[0];
}

$preg_replace_callback($pattern, 'theCallBack', $source, $limit, $count)

Every time the word "quick" matches the callback is fired.  In this case, the word quick will be in $matches[0].

PREG_MATCH_ALL

Preg_match_all matches repeatedly all occurrences of a pattern in an array or string and outputs the results to a multidimensional array.  It does not stop after the first occurrence, like preg_match.  This function is useful to pull out specific information from a document.  It can be used to pull out all javascript source files in a web page, for example.

$source = file_get_contents("http://www.geekgumbo.com");  // Open a web page source
$pattern = " /src=[\"']?([^\"']?.*(js)[\"']?/i " ;   // Start with "src=" and end with a quote, or double quote, after the ".js"
$result = array();

preg_match_all($pattern, $source, $result);

The result might looks something like this.

$result [0][0] -> src="../../js/jquery.js"
$result [0][1] -> src="../../js/script.js"
$result [1][0] -> ../../js/jquery.js
$result [1][1] -> ../../js/script.js
$result [2][0] -> js
$result [2][1] -> js

If you want to play around with preg_match_all, a perfect way to see all the results easily is using my newchk utility at "http://www.newchk.com".

PREG_GREP

Preg_grep is like the grep command in Linux.  It searches through an array and returns all the matches of a particular pattern into a result array.  Let's look.

$source = array("apples", "appricots", "oranges", "grapes", "bananas");
$pattern = "/^ap/";  //begins with an "ap"
$result = array();

$result = preg_grep($pattern, $source);

print_r($result);
// Prints: "Array ( [0] => apples [1] => appricots )"

In the above example, preg_grep returned all fruits starting with the letters "ap".

There is an interesting option, where you can return an array of all the items that do not match the pattern by adding "preg_grep_invert," like so.

$nomatch = array();
$nomatch = preg_grep($pattern, $array, preg_grep_invert);
print_r($nomatch);

// Prints "Array ( [2] => "oranges" [3] => "grapes" [4] => "bananas" )"

Notice that the result array maintains the orginal array keys.

Preg_match is by far the most used regular expression function, followed by preg_replace.  This post did not intend to be a complete write-up on regular expression functions, but rather useful as a quick look-up when writing code as to the syntax and intent of a preg function.   Check the  PHP manual for the definitions of any constants referred to in the article and further explanation.

Comments are closed.