Regular Expressions Cheat Sheet with Real-World Examples
Regular expressions are one of those tools that feel cryptic until they suddenly click — and then you start seeing patterns everywhere. They show up in every language and half the Unix tools you use daily. This cheat sheet covers the syntax you’ll reach for most often, with examples grounded in real tasks rather than contrived strings.
The Building Blocks
Literals and the Dot
hello matches the exact string "hello"
hel.o matches "hello", "helxo", "hel1o" — dot matches any character except newline
hel\.o matches "hel.o" literally — backslash escapes the dot
Anchors
^hello matches "hello" at the start of a line
world$ matches "world" at the end of a line
^hello world$ matches the entire line "hello world"
\bword\b matches "word" as a whole word (word boundary)
Character Classes
[aeiou] any single vowel
[^aeiou] any character that is NOT a vowel
[a-z] any lowercase letter
[A-Za-z0-9] any alphanumeric character
[a-z_][a-z0-9_]* valid Python/JS identifier start
Shorthand Classes
\d digit: [0-9]
\D non-digit: [^0-9]
\w word character: [A-Za-z0-9_]
\W non-word character
\s whitespace: space, tab, newline
\S non-whitespace
Quantifiers
a? zero or one "a" (optional)
a* zero or more "a"
a+ one or more "a"
a{3} exactly 3 "a"s
a{2,5} between 2 and 5 "a"s
a{3,} 3 or more "a"s
By default, quantifiers are greedy — they match as much as possible. Add ? to make them lazy:
<.+> greedy: matches entire "<b>bold</b> and <i>italic</i>"
<.+?> lazy: matches "<b>", then "</b>", etc.
Groups and Alternation
(cat|dog) matches "cat" or "dog"
(gr(a|e)y) matches "gray" or "grey"
(?:non-capturing) groups without capturing
Python: re Module
import re
text = "Order #12345 placed on 2024-03-15"
# Search anywhere in the string
m = re.search(r"\d{4}-\d{2}-\d{2}", text)
if m:
print(m.group()) # 2024-03-15
# Find all matches
ids = re.findall(r"#(\d+)", text)
print(ids) # ['12345']
# Substitute
result = re.sub(r"\d{4}-\d{2}-\d{2}", "[REDACTED]", text)
print(result) # Order #12345 placed on [REDACTED]
# Compile for repeated use
pattern = re.compile(r"^\d{3}-\d{2}-\d{4}$")
print(bool(pattern.match("123-45-6789"))) # True
Named groups make captures readable:
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text)
if m:
print(m.group("year")) # 2024
print(m.group("month")) # 03
JavaScript
const text = "Contact us at support@example.com or sales@company.org"
// Test if a pattern matches
const emailPattern = /\b[\w.+-]+@[\w-]+\.[a-z]{2,}\b/gi
console.log(emailPattern.test(text)) // true
// Find all matches
const emails = text.match(emailPattern)
console.log(emails) // ['support@example.com', 'sales@company.org']
// Replace
const redacted = text.replace(emailPattern, "[EMAIL]")
console.log(redacted) // Contact us at [EMAIL] or [EMAIL]
// Named groups (ES2018+)
const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const m = "2024-03-15".match(dateRe)
console.log(m.groups.year) // 2024
grep
# Basic pattern
$ grep "error" /var/log/app.log
# Extended regex (-E) — enables +, ?, |, ()
$ grep -E "error|warn" /var/log/app.log
# Case-insensitive
$ grep -i "error" /var/log/app.log
# Show line numbers
$ grep -n "timeout" /var/log/app.log
# Invert match (lines that don't match)
$ grep -v "DEBUG" /var/log/app.log
# Count matches
$ grep -c "error" /var/log/app.log
# Show only the matching part
$ grep -oE "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" access.log
# Match whole words
$ grep -w "fail" /var/log/app.log
sed
# Substitute first match per line
$ sed 's/foo/bar/' file.txt
# Substitute all matches (global flag)
$ sed 's/foo/bar/g' file.txt
# Edit in place
$ sed -i 's/localhost/db.internal/g' config.yml
# Delete lines matching a pattern
$ sed '/^#/d' config.conf # remove comment lines
$ sed '/^$/d' file.txt # remove blank lines
# Print lines matching a pattern
$ sed -n '/ERROR/p' app.log
# Extract a capture group (GNU sed)
$ echo "Date: 2024-03-15" | sed -E 's/Date: ([0-9-]+)/\1/'
# 2024-03-15
Lookaheads and Lookbehinds
Lookarounds match a position without consuming characters — they assert what’s around the match but don’t include it in the result.
price(?=\s*USD) matches "price" only when followed by optional space then "USD"
price(?!\s*EUR) matches "price" NOT followed by "EUR"
(?<=\$)\d+ matches digits preceded by "$" — but "$" not in result
(?<!\d)\d{4} matches 4-digit number not preceded by another digit
Python example:
import re
prices = "Total: $150 USD, Tax: €20 EUR"
# Extract only USD amounts
usd = re.findall(r"\$(\d+)(?=\s*USD)", prices)
print(usd) # ['150']
Common Patterns
# Email (simplified)
r"[\w.+-]+@[\w-]+\.[a-z]{2,}"
# IPv4 address
r"\b(?:\d{1,3}\.){3}\d{1,3}\b"
# URL
r"https?://[^\s/$.?#].[^\s]*"
# Date YYYY-MM-DD
r"\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])"
# Indian phone number
r"(?:\+91[-\s]?)?[6-9]\d{9}"
# Hex color
r"#(?:[0-9a-fA-F]{3}){1,2}\b"
# Semantic version
r"\d+\.\d+\.\d+(?:-[\w.]+)?"
# Lines with only whitespace
r"^\s*$"
Flags Reference
| Flag | Python | JS | Effect |
|---|---|---|---|
| Case-insensitive | re.I |
i |
[a-z] matches uppercase too |
| Multiline | re.M |
m |
^/$ match line start/end |
| Dot-all | re.S |
s |
. matches newline |
| Verbose | re.X |
— | Allows whitespace and comments in pattern |
| Global | — | g |
Find all matches (not just first) |
Conclusion
The biggest payoff from learning regex is recognizing when to stop: regex is excellent for validating and extracting patterns in single-line text, but it’s the wrong tool for parsing HTML, JSON, or nested structures. For those, use a proper parser. Within its domain — log analysis, input validation, search-and-replace, text extraction — regex remains one of the most expressive tools in a developer’s kit. Keep this cheat sheet nearby, and you’ll find you reach for it more than you expect.