# Fossil's Internal 'grep' Command
As of Fossil 2.7, there is a `grep` command which acts something like
POSIX's `grep -E` over all historical versions of a single file name.
This document explains the commonalities and divergences between POSIX
`grep` and Fossil `grep`.
## Options
Fossil `grep` supports only a small subset of the options specified for
POSIX `grep`:
| Option | Meaning
|--------|-------------------------------------------------------------
| `-i` | ignore case in matches
| `-l` | list a checkin ID prefix for matching historical versions of the file
| `-v` | print each checkin ID considered, regardless of whether it matches
No equivalent of other POSIX `grep` options currently exist. Patches to
remove those limitations will be thoughtfully considered.
## Regular Expression Dialect
Fossil contains a built-in regular expression engine implementing a
subset of the [POSIX extended regular expression][ere] dialect:
[ere]: https://en.wikipedia.org/wiki/Regular_expression#POSIX_extended
| Atom | Meaning
|---------|-------------------------------------------------------------
| `X*` | zero or more occurrences of X
| `X+` | one or more occurrences of X
| `X?` | zero or one occurrences of X
| `X{p,q}`| between p and q occurrences of X, inclusive
| `(X)` | match X
| <tt>X\|Y</tt>| X or Y
| `^X` | X occurring at the beginning of a line
| `X$` | X occurring at the end of a line
| `.` | Match any single character
| `\c` | Character `c` where `c` is one of <tt>{}()[]\|\*+?.</tt>
| `\c` | C-language escapes for `c` in `afnrtv`. ex: `\t` or `\n`
| `\uXXXX`| Where XXXX is exactly 4 hex digits, Unicode value XXXX
| `\xXX` | Where XX is exactly 2 hex digits, Unicode value XX
| `[abc]` | Any single character from the set `abc`
| `[^abc]`| Any single character not in the set `abc`
| `[a-z]` | Any single character in the range `a-z`
| `[^a-z]`| Any single character not in the range `a-z`
| `\b` | Word boundary
| `\w` | Word character: `[A-Za-z0-9_]`
| `\W` | Non-word character
| `\d` | Digit
| `\D` | Non-digit
| `\s` | Whitespace character
| `\S` | Non-whitespace character
There are several restrictions in Fossil `grep` relative to a fully
POSIX compatible regular expression engine. Among them are:
* There is currently no support for POSIX character classes such as
`[:lower:]`.
* Fossil does not currently attempt to take your operating system's
locale settings into account when doing this match. Fossil also
currently has no way to mark a given file as having a particular
encoding.
Instead, Fossil `grep` assumes the input files are in UTF-8 format,
so it will not work correctly if the files in your repository are in
an encoding that is not backwards-compatible with ASCII, such as
UTF-16. Partially compatible encodings such as ISO 8859 should work
with Fossil `grep` as long as you stick to their ASCII-compatible
subset.
The Fossil `grep` language is not a strict subset of POSIX extended
regular expressions. Some of the features documented above are
well-understood extensions to it, such as the "word" features `\b`, `\w`
and `\W`.
Fossil `grep` is based on the Unicode engine from [SQLite's FTS5
feature][fts5]. This means it does do things like Unicode-aware case
folding. Therefore, it is usually a user error to attempt to substitute
`[a-z]` for a lack of the POSIX character class `[:lower:]` if you are
grepping over pretty much any human written language other than English.
Use `fossil grep -i` instead, which does Unicode case folding.
[fts5]: https://www.sqlite.org/fts5.html
## Algorithm Details
Fossil `grep` uses a [nondeterministic finite automaton][nfa] for
matching, so the performance is bounded by ***O(nm)*** where ***n*** is
the size of the regular expression and ***m*** is the size of the input
string.
[nfa]: https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
In order to avoid [ReDoS attacks][redos], the Fossil regular expression
matcher was purposely written to avoid [implementation choices][rei]
which have the potential to require exponential evaluation time. This
constrains the possible feature set we can support in the Fossil `grep`
dialect. For instance, we are unlikely to ever add support for
[backtracking][bt].
[redos]: https://en.wikipedia.org/wiki/ReDoS
[rei]: https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times
[bt]: https://en.wikipedia.org/wiki/Backtracking
The `X{p,q}` operator expands to `p` copies of `X` followed by `q-p`
copies of `X?` before RE evaluation. The ***O(nm)*** performance bound
above remains true for this case, but realize that it applies to the RE
*after* this expansion, not to the form as given by the user. In other
words, as `q-p` increases, so does the RE evaluation time.