1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# File Name Glob Patterns
A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file.
Note that although both are notations for describing patterns in text,
glob patterns are not the same thing as a [regular expression or
regexp][regexp].
[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
[regexp]: https://en.wikipedia.org/wiki/Regular_expression
|
|
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# File Name Glob Patterns
A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file.
Note that although both are notations for describing patterns in text,
glob patterns are not the same thing as a [regular expression or
regexp][regexp].
[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
[regexp]: https://en.wikipedia.org/wiki/Regular_expression
|
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
|
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one (or more) globs in the list
matches.
A glob pattern is a collection of characters compared to a target
text, usually a file name. The whole glob is said to match if it
successfully consumes and matches the entire target text. Glob
patterns are made up of ordinary characters and special characters.
Ordinary characters consume a single character of the target and must
match it exactly.
Special characters (and special character sequences) consume zero or
more characters from the target and describe what matches. The special
characters (and sequences) are:
* `*` Matches any sequence of zero or more characters;
* `?` Matches exactly one character;
* `[...]` Matches one character from the enclosed list of characters; and
* `[^...]` Matches one character not in the enclosed list.
Special character sequences have some additional features:
* A range of characters may be specified with `-`, so `[a-d]` matches
exactly the same characters as `[abcd]`. Ranges reflect Unicode
code points without any locale-specific collation sequence.
* Include `-` in a list by placing it last, just before the `]`.
* Include `]` in a list by making the first character after the `[` or
`[^`. At any other place, `]` ends the list.
* Include `^` in a list by placing anywhere except first after the
`[`.
* Beware that ranges in lists may include more than you expect:
`[A-z]` Matches `A` and `Z`, but also matches `a` and some less
obvious characters such as `[`, `\`, and `]` with code point
values between `Z` and `a`.
* Beware that a range must be specified from low value to high
value: `[z-a]` does not match any character at all, preventing the
entire glob from matching.
* Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
and character lists) are allowed to match `/` directory
separators as well as the initial `.` in the name of a hidden
file or directory.
Some examples of character lists:
* `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`;
* `[^a-d]` Matches exactly one character other than `a`, `b`, `c`,
or `d`;
* `[0-9a-fA-F]` Matches exactly one hexadecimal digit;
* `[a-]` Matches either `a` or `-`;
* `[][]` Matches either `]` or `[`;
* `[^]]` Matches exactly one character other than `]`;
* `[]^]` Matches either `]` or `^`; and
* `[^-]` Matches exactly one character other than `-`.
White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE. Note that this does not include any of the many additional
spacing characters available in Unicode, and specifically does not
include U+00A0 NO-BREAK SPACE.
Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file (as for a
versioned setting).
Similarly 'single quotes' and "double quotes" are the ASCII straight
|
|
|
|
|
|
|
|
|
|
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
|
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one (or more) globs in the list
matches.
A glob pattern is a collection of characters compared to a target
text, usually a file name. The whole glob is said to match if it
successfully consumes and matches the entire target text. Glob
patterns are made up of ordinary characters and special characters.
Ordinary characters consume a single character of the target and must
match it exactly.
Special characters (and special character sequences) consume zero or
more characters from the target and describe what matches. The special
characters (and sequences) are:
* `*` Matches any sequence of zero or more characters;
* `?` Matches exactly one character;
* `[...]` Matches one character from the enclosed list of characters; and
* `[^...]` Matches one character not in the enclosed list.
Special character sequences have some additional features:
* A range of characters may be specified with `-`, so `[a-d]` matches
exactly the same characters as `[abcd]`. Ranges reflect Unicode
code points without any locale-specific collation sequence.
* Include `-` in a list by placing it last, just before the `]`.
* Include `]` in a list by making the first character after the `[` or
`[^`. At any other place, `]` ends the list.
* Include `^` in a list by placing anywhere except first after the
`[`.
* Beware that ranges in lists may include more than you expect:
`[A-z]` Matches `A` and `Z`, but also matches `a` and some less
obvious characters such as `[`, `\`, and `]` with code point
values between `Z` and `a`.
* Beware that a range must be specified from low value to high
value: `[z-a]` does not match any character at all, preventing the
entire glob from matching.
* Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
and character lists) are allowed to match `/` directory
separators as well as the initial `.` in the name of a hidden
file or directory.
Some examples of character lists:
* `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`;
* `[^a-d]` Matches exactly one character other than `a`, `b`, `c`,
or `d`;
* `[0-9a-fA-F]` Matches exactly one hexadecimal digit;
* `[a-]` Matches either `a` or `-`;
* `[][]` Matches either `]` or `[`;
* `[^]]` Matches exactly one character other than `]`;
* `[]^]` Matches either `]` or `^`; and
* `[^-]` Matches exactly one character other than `-`.
White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE. Note that this does not include any of the many additional
spacing characters available in Unicode, and specifically does not
include U+00A0 NO-BREAK SPACE.
Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file (as for a
versioned setting).
Similarly 'single quotes' and "double quotes" are the ASCII straight
|
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
|
Beware, however, that all glob matching is case sensitive. This will
not be a surprise on Unix where all file names are also case
sensitive. However, most Windows file systems are case preserving and
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are names of the same file; on Unix they are different files.
Some example cases:
* The glob `README` matches only a file named `README` in the root of
the tree. It does not match a file named `src/README` because it
does not include any characters that consume (and match) the
`src/` part.
* The glob `*/README` does match `src/README`. Unlike Unix file
globs, it also matches `src/library/README`. However it does not
match the file `README` in the root of the tree.
* The glob `*README` does match `src/README` as well as the file
`README` in the root of the tree as well as `foo/bar/README` or
any other file named `README` in the tree. However, it also
matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other
|
|
|
|
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
|
Beware, however, that all glob matching is case sensitive. This will
not be a surprise on Unix where all file names are also case
sensitive. However, most Windows file systems are case preserving and
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are names of the same file; on Unix they are different files.
Some example cases:
* The glob `README` matches only a file named `README` in the root of
the tree. It does not match a file named `src/README` because it
does not include any characters that consume (and match) the
`src/` part.
* The glob `*/README` does match `src/README`. Unlike Unix file
globs, it also matches `src/library/README`. However it does not
match the file `README` in the root of the tree.
* The glob `*README` does match `src/README` as well as the file
`README` in the root of the tree as well as `foo/bar/README` or
any other file named `README` in the tree. However, it also
matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other
|
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
|
* [`add`][]
* [`addremove`][]
* [`changes`][]
* [`clean`][]
* [`extras`][]
* [`merge`][]
* [`settings`][]
* [`status`][]
* [`unset`][]
The commands [`tarball`][] and [`zip`][] produce compressed archives of a
specific checkin. They may be further restricted by options that
specify glob patterns that name files to include or exclude rather
than archiving the entire checkin.
|
|
|
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
|
* [`add`][]
* [`addremove`][]
* [`changes`][]
* [`clean`][]
* [`extras`][]
* [`merge`][]
* [`settings`][]
* [`status`][]
* [`unset`][]
The commands [`tarball`][] and [`zip`][] produce compressed archives of a
specific checkin. They may be further restricted by options that
specify glob patterns that name files to include or exclude rather
than archiving the entire checkin.
|
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
|
With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
fossil:
* "A blank line matches no files..." is the same in fossil.
* "A line starting with # serves as a comment...." not in fossil.
* "Trailing spaces are ignored unless they are quoted..." is similar
in fossil. All whitespace before and after a glob is trimmed in
fossil unless quoted with single or double quotes. Git uses
backslash quoting instead, which fossil does not.
* "An optional prefix "!" which negates the pattern..." not in
fossil.
* Git's globs are relative to the location of the `.gitignore` file;
|
|
|
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
|
With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
fossil:
* "A blank line matches no files..." is the same in fossil.
* "A line starting with # serves as a comment...." not in fossil.
* "Trailing spaces are ignored unless they are quoted..." is similar
in fossil. All whitespace before and after a glob is trimmed in
fossil unless quoted with single or double quotes. Git uses
backslash quoting instead, which fossil does not.
* "An optional prefix "!" which negates the pattern..." not in
fossil.
* Git's globs are relative to the location of the `.gitignore` file;
|
531
532
533
534
535
536
537
538
539
540
|
The actual pattern matching is implemented in SQL, so the
documentation for `GLOB` and the other string matching operators in
[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
course, the SQLite source code and test harnesses also make
entertaining reading:
* `src/func.c` [lines 570-768]
(https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
* `test/expr.test` [lines 586-673]
(https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)
|
|
|
|
531
532
533
534
535
536
537
538
539
540
|
The actual pattern matching is implemented in SQL, so the
documentation for `GLOB` and the other string matching operators in
[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
course, the SQLite source code and test harnesses also make
entertaining reading:
* `src/func.c` [lines 570-768]
(https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
* `test/expr.test` [lines 586-673]
(https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)
|