PHP Regular Expressions

Match patterns in strings with PHP's PCRE engine. Master preg_match, preg_replace, capture groups, named groups, and common production-ready patterns.

Intermediate 9 min read 10 examples

Pattern Syntax

PHP regex patterns are wrapped in delimiters (commonly /, #, or ~) followed by optional modifiers:

/pattern/modifiers

TokenMeaning
.Any single character (except newline)
\d \DDigit / non-digit
\w \WWord char [A-Za-z0-9_] / non-word
\s \SWhitespace / non-whitespace
^ $Start / end of string (or line with m)
* + ?0+, 1+, 0 or 1
{n} {n,m}Exactly n / between n and m
[...]Character class
(...)Capture group
(?:...)Non-capturing group
|Alternation (OR)
\bWord boundary

preg_match - Test & Capture

PHP
<?php
$text = "The order #12345 was placed";

// Just test
if (preg_match("/order #\d+/", $text)) {
    echo "Matches!";
}

// Capture groups
if (preg_match("/order #(\d+)/", $text, $m)) {
    echo $m[0];   // "order #12345"  (whole match)
    echo $m[1];   // "12345"         (first group)
}

// Multiple groups
$date = "2024-11-15";
if (preg_match("/(\d{4})-(\d{2})-(\d{2})/", $date, $m)) {
    [$full, $year, $month, $day] = $m;
}

preg_match_all - All Hits

PHP
<?php
$html = '<a href="/a">A</a> <a href="/b">B</a>';

preg_match_all("/href=\"([^\"]+)\"/", $html, $m);
print_r($m[1]);     // ["/a", "/b"]

// PREG_SET_ORDER groups by match instead of by capture group
preg_match_all("/(\w+):(\d+)/", "a:1 b:2 c:3", $m, PREG_SET_ORDER);
// $m = [["a:1","a","1"], ["b:2","b","2"], ["c:3","c","3"]]

preg_replace - Substitute

PHP
<?php
// Replace with $1 backreferences
echo preg_replace("/(\w+)@(\w+)/", "$1 at $2", "user@host");
// "user at host"

// Multiple patterns at once
$out = preg_replace(
    ["/foo/", "/bar/"],
    ["FOO", "BAR"],
    "foo and bar"
);   // "FOO and BAR"

// Callback - transform each match
$out = preg_replace_callback("/\b(\w+)\b/", function ($m) {
    return ucfirst($m[1]);
}, "hello world");
// "Hello World"

preg_split - Split

PHP
<?php
// Split on any whitespace
$words = preg_split("/\s+/", "  hello   world  foo");
// ["", "hello", "world", "foo"]

// Skip empty pieces
$words = preg_split("/\s+/", "  hello   world  ", -1, PREG_SPLIT_NO_EMPTY);
// ["hello", "world"]

// Split on multiple delimiters
$parts = preg_split("/[,;\s]+/", "a, b; c\td");
// ["a", "b", "c", "d"]

Named Capture Groups

PHP
<?php
$pattern = "/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/";
preg_match($pattern, "2024-11-15", $m);

echo $m["year"];     // 2024
echo $m["month"];    // 11
echo $m["day"];      // 15

// Use in replacement
echo preg_replace($pattern, "$3/$2/$1", "2024-11-15");   // 15/11/2024

Pattern Modifiers

ModEffect
iCase-insensitive
m^ and $ match line breaks (multiline)
s. matches newlines too
uUTF-8 mode (always use for text)
xExtended - ignore whitespace, allow # comments
UUngreedy by default

Common Patterns

PHP
<?php
// Email (prefer filter_var for production)
"/^[\w.+-]+@[\w-]+(\.[\w-]+)+$/"

// Strong password - 8+ chars, mixed case, digit, symbol
"/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w]).{8,}$/"

// URL
"#^https?://[\w.-]+(:\d+)?(/[^\s]*)?$#i"

// Phone (US: +1-555-123-4567 or variants)
"/^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/"

// IPv4
"/^(\d{1,3}\.){3}\d{1,3}$/"

// Hex color
"/^#([\da-f]{3}|[\da-f]{6})$/i"

// Slug (URL-safe)
"/^[a-z0-9]+(-[a-z0-9]+)*$/"

// HTML tag (use a real HTML parser instead!)
"/<([a-z]+)([^>]*)>(.*?)<\/\1>/is"
Don't parse HTML with regex

HTML is not a regular language. Use DOMDocument, SimpleXML, or a library like symfony/dom-crawler. Regex on HTML works for trivial cases and breaks the moment markup gets nested or quoted weirdly.

Next Steps

Frequently Asked Questions

Use string functions (str_contains, str_replace) when matching literal text - they're much faster. Use regex only when you need patterns: variable digits, optional parts, alternatives, or extracting groups.

PHP's PCRE engine inherits Perl syntax which wraps patterns in delimiters: /pattern/i. Anything after the closing delimiter is a modifier (i, m, s, u, x). Pick a delimiter that doesn't appear in the pattern (#, ~, {}) to avoid escaping.

For any pattern that might touch UTF-8 text - yes. Without it, . matches single bytes, breaking multibyte characters. /pattern/u treats the pattern and subject as UTF-8 throughout.

Use the s modifier so . matches newlines too. Use the m modifier so ^ and $ match line boundaries instead of just string start/end.

Yes - badly written patterns with nested quantifiers like (a+)+ can backtrack catastrophically on malicious input. Set pcre.backtrack_limit/pcre.recursion_limit in php.ini and avoid nested quantifiers.