Regular Expression Syntax

In our last post I went through a quick overview of the PHP functions that can be used with regular expression patterns.  Let's take a closer look at these patterns.

I always assign my reqular expression to a variable, to make it easy to change, and use with a function, like so:

$pattern = "/quick/";

if(preg_match($pattern, $text))
{
... do something
};

The above reqular expression will find a lower-case quick.  We then use the pattern with a PHP regular expression function like preg_match, see my last article.

Let's talk about this pattern.  The quotes are enclosing the pattern, like you would any string in PHP. The / starts the regular expression and encloses the pattern with the exception that there are some modifiers that you can use after the last /, like so:

$pattern = "/quick/i"

The "i" says ignore case, now we would match on either quick or Quick.

If you want to include a / in the pattern, you  can escape the / with a backslash, \ , like so.
/123\/456/  would match 123/456

Let's work through the syntax:

/^ar/     ^  finds a string starting with ar

/ar$/   $  finds strings ending in ar.

/a.r/    .   is like a wild card and matches any one character, here this would match aar, abr, acr, adr, ...

/ab*c/   *   the asterisk means zero or more of the last character.  This matches ac, abc, abbc, abbbc, ...

/do(es)?/  ?  the question mark matches the preceding grouping 0 or 1 time.  This matches do or does.

BRACKETS [   ]

Brackets are used to match anything within the bracket.

/ar[ckt]/  matches arc, ark, and art

/[0-9.-]/  matches any number, dot, or - sign

There is an or, |, operator
/[abc|xys]/  matches abc or xys

NOT CHARACTER ^

There is a reverse character inside the bracket, ^, which matches anything but the characters given.   This is not at the start of the pattern, and doesn't mean "starts with."  It is inside the brackets.

/ar[^ckt]/  matches ara, arb,ard, ... not arc, ark, or art

/[^A-Za-z0-9]/  matches any symbol not a number or letter

RANGES  -

Brackets also allow for ranges.
/ar[c-e]/  matches ar with c,d, or e, that is: arc,ard,are, but not ara, or arf
/[0-9]/    matches any numbers
/[A-Z]/    matches any capital letters

You can combine ranges
/[0-9A-Za-z]/  matches all letters upper and lower case and numbers.  In the ASCII character table capital letters come before lowercase letters and are separate characters.

MULTIPLIERS

There's some special characters that act as multipliers.

If you want to do one or more you use a plus, like this:
/ab+c/  matches abc, abbc, abbbc, abbbbc, ...

You can use multipliers with ranges
/[a-z]+/  matches one or more lowercase letters.  For example, searching  "This one" would match "his".

If you want to do 0 or 1 more
/ab?c/  matches ac, abc, and that's it.

You can do a repetive grouping with ( )
/a(bc)+d/  matches abcd, abcbcd, abcbcbcd, ...

And you can multiply patterns with qualifiers {}
/ab{3}c/ matches abbbc
/a(bc){4}d/ matches abcbcbcbcd

CONTROL CLASSES

There are some control classes, or groups of characters represented by a word. They are set off with [: :]

[:lower:] matches lower case letters, a but not A
[:upper:] matches upper case letters, A but not a
[:alpha:] matches letters any case, a,A
[:alnum:] matches alphanumeric, letters or numbers a,A,2
[:space:] matches a space
[:blank:] matches a space or tab
[:digit:]{3} matches any three digits, 012, 213, ...

[:cntrl:] matches control characters.  Control characters are null, bell, backspace, horiz tab, line feed, form feed, carriage return, escape, and delete.

[:punct:] matches punctuation, such as ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

[:xdigit:] matches hexadecimal digits, 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f

Regular expressions can be put together into some fairly complex patterns. They'll look so complicated, you'll wonder how it ever works.

Let's do one, a simple US zip code allowing both the 5 and 9 digit zips.  Here goes,

/^([0-9]{5})(-[0-9]{4})?$/

This reads at the start of the string match a digit exactly  5 times, group, and in the next group, match a -, match a digit exactly 4 times, and in the second group either 0 or one match only on the - 4 digits, and finally the string ends with none or - 4digits.

Online Test Tool

So you don't get lost in building your regular expression, there is an online regular expression test tool, here, that I'd like to recommend.   This will give you the ability to insert a pattern, insert a test string, and see if your pattern works before you use it in your code.  The tester, along with a good basic knowledge of regular expression syntax, will go a long way toward making searching and validating your data a lot easier.

 

Comments are closed.