# Pi-hole regular expressions tutorial
We provide a short but thorough introduction to our regular expressions implementation. This may come in handy if you are designing rules to deny or allow domains (see also our cheat sheet below!). In our implementation, all characters match themselves except for the following special characters: `.[]{}()\*+?|^$`. If you want to match those, you need to escape them like `\.` for a literal period, but no rule without exception (see character groups below for further details).
## Anchors (`^` and `$`)
First of all, we look at anchors that can be used to indicate the start or the end of a domain, respectively. If you don't specify anchors, the match may be partial (see examples below).
Example | Interpretation
--- | ---
`domain` | **partial match**. Without anchors, a text may appear anywhere in the domain. This matches `some.domain.com`, `domain.com` and `verylongdomain.com` and more
`^localhost$` | **exact match** matching *only* `localhost` but neither `a.localhost` nor `localhost.com`
`^abc` | matches any domain **starting** (`^`) in "abc" like `abcdomain.com`, `abc.domain.com` but not `def.abc.com`
`com$` | matches any domain **ending** (`$`) in "com" such as `domain.com` but not `domain.com.co.uk`
## Wildcard (`.`)
An unescaped period stands for any *single* character.
Example | Interpretation
--- | ---
`^domain.$` | matches `domaina`, `domainb`, `domainc`, but not `domain`
## Bounds and multipliers (`{}`, `*`, `+`, and `?`)
With bounds, one can denote the number of times something has to occur:
Bound | Meaning
--- | ---
`ab{4}` | matches a domain that contains a single `a` followed by four `b` (matching only `abbbb`)
`ab{4,}` | matches a domain that contains a single `a` followed by *at least* four `b` (matching also `abbbbbbbb`)
`ab{3,5}` | matches a domain that contains a single `a` followed by three to five `b` (matching only `abbb`, `abbbb`, and `abbbbb`)
Multipliers are shortcuts for some of the bounds that are needed most often:
Multipliers | Bounds equivalent | Meaning
--- | --- | ---
`?` | `{0,1}` | never or once (optional)
`*` | `{0,}` | never or more (optional)
`+` | `{1,}` | once or more (mandatory)
To illustrate the usefulness of multipliers (and bounds), we provide a few examples:
Example | Interpretation
--- | ---
`^r-*movie` | matches a domain like `r------movie.com` where the number of dashes can be arbitrary (also none)
`^r-?movie` | matches only the domains `rmovie.com` and `r-movie.com` but not those with more than one dash
`^r-+movie` | matches only the domains with at least one dash, i.e., not `rmovie.com`
`^a?b+` | matches domains like `abbbb.com` (zero or one `a` at the beginning followed by one or more `b`)
## Character groups (`[]`)
With character groups, a set of characters can be matched:
Character group | Interpretation
--- | ---
`[abc]` | matches `a`, `b`, or `c` (using explicitly specified characters)
`[a-c]` | matches `a`, `b`, or `c` (using a *range*)
`[a-c]+` | matches any non-zero number of `a`, `b`, `c`
`[a-z]` | matches any single lowercase letter
`[a-zA-Z]` | matches any single letter
`[a-z0-9]` | matches any single lowercase letter or any single digit
`[^a-z]` | **Negation** matching any single character *except* lowercase letters
`abc[0-9]+` | matches the string `abc` followed by a number of arbitrary length
Bracket expressions are an exception to the character escape rule. Inside them, all special characters, including the backslash (`\`), lose their special powers, i.e. they match themselves exactly. Furthermore, to include a literal `]` in the list, make it the first character (like `[]]` or `[^]]` if negated). To include a literal `-`, make it the first or last character, or the second endpoint of a range (e.g. `[a-z-]` to match `a` to `z` and `-`).
## Groups (`()`)
Using groups, we can enclose regular expressions, they are most powerful when combined with bounds or multipliers (see also alternations below).
Example | Interpretation
--- | ---
`(abc)` | matches `abc` (trivial example)
`(abc)*` | matches zero or more copies of `abc` like `abcabc` but not `abcdefabc`
`(abc){1,3}` | matches one, two or three copies of `abc`: `abc`, `abcabc`, `abcabcabc` but nothing else
## Alternations (`|`)
Alternations can be used as an "or" operator in regular expressions.
Example | Interpretation
--- | ---
`(abc)|(def)` | matches `abc` *and* `def`
`domain(a|b)\.com` | matches `domaina.com` and `domainb.com` but not `domain.com` or `domainx.com`
`domain(a|b)*\.com` | matches `domain.com`, `domainaaaa.com` `domainbbb.com` but not `domainab.com` (any number of `a` or `b` in between `domain` and `.com`)
## Character classes (`[:class:]`)
In addition to character groups, there are also some special character classes available, such as
Character class | Equivalent | Pi-hole specific | Interpretation
--------------- | ---------------- | ---------------- | ---------------
`[[:digit:]]` | `[0-9]` | No | digits
`[[:lower:]]` | `[a-z]` | No | lowercase letters[^*]
`[[:upper:]]` | `[A-Z]` | No | uppercase letters[^*]
`[[:alpha:]]` | `[A-Za-z]` | No | alphabetic characters[^*]
`[[:alnum:]]` | `[A-Za-z0-9]` | No | alphabetic characters[^*] and digits
`[[:blank:]]` | `[ \t]` | Yes | blank characters
`[[:cntrl:]]` | N/A | Yes | control characters
`[[:graph:]]` | N/A | Yes | all printable characters except space
`[[:print:]]` | N/A | Yes | printable characters including space
`[[:punct:]]` | N/A | Yes | printable characters not space or alphanumeric
`[[:space:]]` | `[ \f\n\r\t\v]` | Yes | white-space characters
`[[:xdigit:]]` | `[0-9a-fA-F]` | Yes | hexadecimal digits
///Footnotes Go Here///
[^*]: FTL matches case-insensitive by default as case does not matter in domain names
Note that character classes are abbreviations, they need to be used in character groups, i.e., enclosed in `[]`. As such, the equivalent of `[0-9]` would be `[[:digit:]]`, *not* `[:digit:]`. It is allowed to mix character classes with classical character groups. For example, `[a-z0-9]` is identical to `[a-z[:digit:]]`.
# Advanced examples
After going through our quick tutorial, we provide some more advanced examples so you can test your knowledge.
## Block domain with only numbers
```text
^[0-9][^a-z]+\.((com)|(edu))$
```
Blocks domains containing only numbers (no letters) and ending in `.com` or `.edu`. This blocks `555661.com`, and `456.edu`, but not `555g555.com`
### Block domains without subdomains
```text
^[a-z0-9]+([-]{1}[a-z0-9]+)*\.[a-z]{2,7}$
```
A domain name shall not start or end with a dash but can contain any number of them. It must be followed by a TLD (we assume a valid TLD length of two to seven characters)
# Cheatsheet
Expression | Meaning | Example
------------ | ------------- | -----------
`^` | Beginning of string | `^client` matches strings that begin with `client`, such as `client.server.com` but not `more.client.server.com` (exception: within a character range (`[]`) `^` means negation)
`$` | End of string | `ing$` matches `exciting` but not `ingenious`
`*` | Match zero or more of the previous | `ah*` matches `ahhhhh` or `a`
`?` | Match zero or one of the previous | `ah?` matches `a` or `ah`
`+` | Match one or more of the previous | `ah+` matches `ah` or `ahhh` but not `a`
`.` | Wildcard character, matches any character | `do.*` matches `do`, `dog`, `door`, `dot`, etc.;
`do.+` matches `dog`, `door`, `dot`, etc. but not `do` (wildcard with `+` requires at least one extra character for matching)
`( )` | Group | Enclose regular expressions, see the example for `|`
`|` | Alternation | `(mon|tues)day` matches `monday` or `tuesday` but not `friday` or `mondiag`
`[ ]` | Matches a range of characters | `[cbf]ar` matches `car`, `bar`, or `far`;
`[^]`| Negation | `[^0-9]` matches any character *except* `0` to `9`
`{ }` | Matches a specified number of occurrences of the previous | `[0-9]{3}` matches any three-digit number like `315` but not `31`;
`[0-9]{2,4}` matches two- to four-digit numbers like `12`, `123`, and `1234` but not `1` or `12345`;
`[0-9]{2,}` matches any number with two or more digits like `1234567`, `123456789`, but not `1`
`\` | Used to escape a special character not inside `[]` | `google\.com` matches `google.com`