Fossil

Diff
Login

Diff

Differences From Artifact [9498881004]:

To Artifact [9128657200]:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
# File Name Glob Patterns


A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file. 

Note that although both are notations for describing patterns in text,
glob patterns are not the same thing as a [regular expression or
regexp][regexp].

[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
[regexp]: https://en.wikipedia.org/wiki/Regular_expression






|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
# File Name Glob Patterns


A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file.

Note that although both are notations for describing patterns in text,
glob patterns are not the same thing as a [regular expression or
regexp][regexp].

[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
[regexp]: https://en.wikipedia.org/wiki/Regular_expression
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one (or more) globs in the list
matches.

A glob pattern is a collection of characters compared to a target
text, usually a file name. The whole glob is said to match if it
successfully consumes and matches the entire target text. Glob
patterns are made up of ordinary characters and special characters. 

Ordinary characters consume a single character of the target and must
match it exactly. 

Special characters (and special character sequences) consume zero or
more characters from the target and describe what matches. The special
characters (and sequences) are:

 *  `*` Matches any sequence of zero or more characters;
 *  `?` Matches exactly one character;
 *  `[...]` Matches one character from the enclosed list of characters; and
 *  `[^...]` Matches one character not in the enclosed list.

Special character sequences have some additional features: 

 *  A range of characters may be specified with `-`, so `[a-d]` matches
    exactly the same characters as `[abcd]`. Ranges reflect Unicode
    code points without any locale-specific collation sequence.
 *  Include `-` in a list by placing it last, just before the `]`.
 *  Include `]` in a list by making the first character after the `[` or
    `[^`. At any other place, `]` ends the list. 
 *  Include `^` in a list by placing anywhere except first after the
    `[`.
 *  Beware that ranges in lists may include more than you expect: 
    `[A-z]` Matches `A` and `Z`, but also matches `a` and some less
    obvious characters such as `[`, `\`, and `]` with code point
    values between `Z` and `a`.
 *  Beware that a range must be specified from low value to high
    value: `[z-a]` does not match any character at all, preventing the
    entire glob from matching.
 *  Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
    and character lists) are allowed to match `/` directory
    separators as well as the initial `.` in the name of a hidden
    file or directory.

Some examples of character lists: 

 *  `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`;
 *  `[^a-d]` Matches exactly one character other than `a`, `b`, `c`,
    or `d`; 
 *  `[0-9a-fA-F]` Matches exactly one hexadecimal digit;
 *  `[a-]` Matches either `a` or `-`;
 *  `[][]` Matches either `]` or `[`;
 *  `[^]]` Matches exactly one character other than `]`;
 *  `[]^]` Matches either `]` or `^`; and
 *  `[^-]` Matches exactly one character other than `-`.

White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE.  Note that this does not include any of the many additional
spacing characters available in Unicode, and specifically does not
include U+00A0 NO-BREAK SPACE. 

Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file (as for a
versioned setting).

Similarly 'single quotes' and "double quotes" are the ASCII straight







|


|










|






|


|











|



|










|







37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one (or more) globs in the list
matches.

A glob pattern is a collection of characters compared to a target
text, usually a file name. The whole glob is said to match if it
successfully consumes and matches the entire target text. Glob
patterns are made up of ordinary characters and special characters.

Ordinary characters consume a single character of the target and must
match it exactly.

Special characters (and special character sequences) consume zero or
more characters from the target and describe what matches. The special
characters (and sequences) are:

 *  `*` Matches any sequence of zero or more characters;
 *  `?` Matches exactly one character;
 *  `[...]` Matches one character from the enclosed list of characters; and
 *  `[^...]` Matches one character not in the enclosed list.

Special character sequences have some additional features:

 *  A range of characters may be specified with `-`, so `[a-d]` matches
    exactly the same characters as `[abcd]`. Ranges reflect Unicode
    code points without any locale-specific collation sequence.
 *  Include `-` in a list by placing it last, just before the `]`.
 *  Include `]` in a list by making the first character after the `[` or
    `[^`. At any other place, `]` ends the list.
 *  Include `^` in a list by placing anywhere except first after the
    `[`.
 *  Beware that ranges in lists may include more than you expect:
    `[A-z]` Matches `A` and `Z`, but also matches `a` and some less
    obvious characters such as `[`, `\`, and `]` with code point
    values between `Z` and `a`.
 *  Beware that a range must be specified from low value to high
    value: `[z-a]` does not match any character at all, preventing the
    entire glob from matching.
 *  Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
    and character lists) are allowed to match `/` directory
    separators as well as the initial `.` in the name of a hidden
    file or directory.

Some examples of character lists:

 *  `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`;
 *  `[^a-d]` Matches exactly one character other than `a`, `b`, `c`,
    or `d`;
 *  `[0-9a-fA-F]` Matches exactly one hexadecimal digit;
 *  `[a-]` Matches either `a` or `-`;
 *  `[][]` Matches either `]` or `[`;
 *  `[^]]` Matches exactly one character other than `]`;
 *  `[]^]` Matches either `]` or `^`; and
 *  `[^-]` Matches exactly one character other than `-`.

White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE.  Note that this does not include any of the many additional
spacing characters available in Unicode, and specifically does not
include U+00A0 NO-BREAK SPACE.

Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file (as for a
versioned setting).

Similarly 'single quotes' and "double quotes" are the ASCII straight
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Beware, however, that all glob matching is case sensitive. This will
not be a surprise on Unix where all file names are also case
sensitive. However, most Windows file systems are case preserving and
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are names of the same file; on Unix they are different files.

Some example cases:
 
 *  The glob `README` matches only a file named `README` in the root of
    the tree. It does not match a file named `src/README` because it
    does not include any characters that consume (and match) the
    `src/` part. 
 *  The glob `*/README` does match `src/README`. Unlike Unix file
    globs, it also matches `src/library/README`. However it does not
    match the file `README` in the root of the tree.
 *  The glob `*README` does match `src/README` as well as the file
    `README` in the root of the tree as well as `foo/bar/README` or
    any other file named `README` in the tree. However, it also
    matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other







|



|







124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Beware, however, that all glob matching is case sensitive. This will
not be a surprise on Unix where all file names are also case
sensitive. However, most Windows file systems are case preserving and
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are names of the same file; on Unix they are different files.

Some example cases:

 *  The glob `README` matches only a file named `README` in the root of
    the tree. It does not match a file named `src/README` because it
    does not include any characters that consume (and match) the
    `src/` part.
 *  The glob `*/README` does match `src/README`. Unlike Unix file
    globs, it also matches `src/library/README`. However it does not
    match the file `README` in the root of the tree.
 *  The glob `*README` does match `src/README` as well as the file
    `README` in the root of the tree as well as `foo/bar/README` or
    any other file named `README` in the tree. However, it also
    matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205

 *  [`add`][]
 *  [`addremove`][]
 *  [`changes`][]
 *  [`clean`][]
 *  [`extras`][]
 *  [`merge`][]
 *  [`settings`][] 
 *  [`status`][]
 *  [`unset`][]

The commands [`tarball`][] and [`zip`][] produce compressed archives of a
specific checkin. They may be further restricted by options that
specify glob patterns that name files to include or exclude rather
than archiving the entire checkin.







|







191
192
193
194
195
196
197
198
199
200
201
202
203
204
205

 *  [`add`][]
 *  [`addremove`][]
 *  [`changes`][]
 *  [`clean`][]
 *  [`extras`][]
 *  [`merge`][]
 *  [`settings`][]
 *  [`status`][]
 *  [`unset`][]

The commands [`tarball`][] and [`zip`][] produce compressed archives of a
specific checkin. They may be further restricted by options that
specify glob patterns that name files to include or exclude rather
than archiving the entire checkin.
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476

With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
fossil:

 *  "A blank line matches no files..." is the same in fossil.
 *  "A line starting with # serves as a comment...." not in fossil. 
 *  "Trailing spaces are ignored unless they are quoted..." is similar
    in fossil. All whitespace before and after a glob is trimmed in
    fossil unless quoted with single or double quotes. Git uses
    backslash quoting instead, which fossil does not.
 *  "An optional prefix "!" which negates the pattern..." not in
    fossil.
 *  Git's globs are relative to the location of the `.gitignore` file;







|







462
463
464
465
466
467
468
469
470
471
472
473
474
475
476

With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
fossil:

 *  "A blank line matches no files..." is the same in fossil.
 *  "A line starting with # serves as a comment...." not in fossil.
 *  "Trailing spaces are ignored unless they are quoted..." is similar
    in fossil. All whitespace before and after a glob is trimmed in
    fossil unless quoted with single or double quotes. Git uses
    backslash quoting instead, which fossil does not.
 *  "An optional prefix "!" which negates the pattern..." not in
    fossil.
 *  Git's globs are relative to the location of the `.gitignore` file;
531
532
533
534
535
536
537
538
539
540
The actual pattern matching is implemented in SQL, so the
documentation for `GLOB` and the other string matching operators in
[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
course, the SQLite source code and test harnesses also make
entertaining reading:

 *  `src/func.c` [lines 570-768]
    (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768) 
 *  `test/expr.test` [lines 586-673]
    (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673) 







|

|
531
532
533
534
535
536
537
538
539
540
The actual pattern matching is implemented in SQL, so the
documentation for `GLOB` and the other string matching operators in
[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
course, the SQLite source code and test harnesses also make
entertaining reading:

 *  `src/func.c` [lines 570-768]
    (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
 *  `test/expr.test` [lines 586-673]
    (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)