Fossil

Artifact [f6cae0dc33]
Login

Artifact [f6cae0dc33]

Artifact f6cae0dc33d1b87f3fd69555479e2939dd0d5cd9ac9bb8d84ad3052c4c894ab6:


File Name Glob Patterns

A glob pattern is a text expression that matches one or more file names using wild cards familiar to most users of a command line. For example, * is a glob that matches any name at all and Readme.txt is a glob that matches exactly one file.

Note that although both are notations for describing patterns in text, glob patterns are not the same thing as a regular expression or regexp.

A number of fossil setting values hold one or more file glob patterns that will identify files needing special treatment. Glob patterns are also accepted in options to certain commands as well as query parameters to certain pages.

In many cases more than one glob may be specified in a setting, option, or query parameter by listing multiple globs separated by a comma or white space.

Of course, many fossil commands also accept lists of files to act on, and those also may be specified with globs. Although those glob patterns are similar to what is described here, they are not defined by fossil, but rather by the conventions of the operating system in use.

Syntax

A list of glob patterns is simply one or more glob patterns separated by white space or commas. If a glob must contain white spaces or commas, it can be quoted with either single or double quotation marks. A list is said to match if any one (or more) globs in the list matches.

A glob pattern is a collection of characters compared to a target text, usually a file name. The whole glob is said to match if it successfully consumes and matches the entire target text. Glob patterns are made up of ordinary characters and special characters.

Ordinary characters consume a single character of the target and must match it exactly.

Special characters (and special character sequences) consume zero or more characters from the target and describe what matches. The special characters (and sequences) are:

*       Matches any sequence of zero or more characters.
?       Matches exactly one character.
[...]   Matches one character from the enclosed list of characters.
[^...]  Matches one character not in the enclosed list.

Special character sequences have some additional features:

Some examples of character lists:

White space means the ASCII characters TAB, LF, VT, FF, CR, and SPACE. Note that this does not include any of the many additional spacing characters available in Unicode, and specifically does not include U+00A0 NO-BREAK SPACE.

Because both LF and CR are white space and leading and trailing spaces are stripped from each glob in a list, a list of globs may be broken into lines between globs when the list is stored in a file (as for a versioned setting).

Similarly 'single quotes' and "double quotes" are the ASCII straight quote characters, not any of the other quotation marks provided in Unicode and specifically not the "curly" quotes preferred by typesetters and word processors.

File Names to Match

Before it is compared to a glob pattern, each file name is transformed to a canonical form. The glob must match the entire canonical file name to be considered a match.

The canonical name of a file has all directory separators changed to /, redundant slashes are removed, all . path components are removed, and all .. path components are resolved. (There are additional details we won’t go into here.)

The goal is a name that is the simplest possible for each particular file, and will be the same on Windows, Unix, and any other platform where fossil is run.

Beware, however, that all glob matching is case sensitive. This will not be a surprise on Unix where all file names are also case sensitive. However, most Windows file systems are case preserving and case insensitive. On Windows, the names ReadMe and README are names of the same file; on Unix they are different files.

Some example cases:

Where Globs are Used

Settings that are Globs

These settings are all lists of glob patterns:

All may be versioned, local, or global. Use fossil settings to manage local and global settings, or a file in the repository's .fossil-settings/ folder at the root of the tree named for each for versioned setting.

Using versioned settings for these not only has the advantage that they are tracked in the repository just like the rest of your project, but you can more easily keep longer lists of more complicated glob patterns than would be practical in either local or global settings.

The ignore-glob is an example of one setting that frequently grows to be an elaborate list of files that should be ignored by most commands. This is especially true when one (or more) IDEs are used in a project because each IDE has its own ideas of how and where to cache information that speeds up its browsing and building tasks but which need not be preserved in your project's history.

Commands that Refer to Globs

Many of the commands that respect the settings containing globs have options to override some or all of the settings. These options are usually named to correspond to the setting they override, such as --ignore to override the ignore-glob setting. These commands are:

The commands tarball and zip produce compressed archives of a specific checkin. They may be further restricted by options that specify glob patterns that name files to include or exclude rather than archiving the entire checkin.

The commands http, cgi, server, and ui that implement or support with web servers provide a mechanism to name some files to serve with static content where a list of glob patterns specifies what content may be served.

Web Pages that Refer to Globs

The /timeline page supports the query parameter chng=GLOBLIST that names a list of glob patterns defining which files to focus the timeline on. It also has the query parameters t=TAG and r=TAG that names a tag to focus on, which can be configured with ms=STYLE to use a glob pattern to match tag names instead of the default exact match or a couple of other comparison styles.

The pages /tarball and /zip generate compressed archives of a specific checkin. They may be further restricted by query parameters that specify glob patterns that name files to include or exclude rather than taking the entire checkin.

Platform quirks

The versioned settings files have no platform-specific quirks. Any GLOBs that matter to your workflow belong there where they can be safely edited.

Similarly, settings made through the Web UI are platform independent.

GLOBs at the command prompt, however, may need to be protected from the quirks of the particular shell program you use to type the command.

The GLOB language is based on common features of Unix (and Linux) shells. In some cases, this will cause confusion if the shell expands the GLOB in a way that is similar to what fossil would have done.

When in doubt, the fossil test-glob command can be used to see what fossil saw and what it chose to do. The fossil test-echo command is also handy: it shows exactly what arguments fossil received.

Windows

Various versions of Windows (a phrase that covers more than just Window 7 vs Windows 10 because the actual content of MSVCRT.DLL, other DLLs, and even the specific compiler used to build fossil.exe can change the behavior) have subtle differences in how quoting works.

Even without subtle version changes, there are also differences between the interactive command prompt and .BAT or .CMD files.

The typical problem is figuring out how to get a GLOB passed on the command line into fossil.exe without it being expanded by either the shell (CMD never expands globs so that part is trivial) or by the C runtime startup (which tries hard to expand globs to act like Unix). A typical example is figuring out how to set crlf-glob to *.

One approach is

echo * | fossil setting crlf-glob --args -

which works because the built-in command echo does not expand its arguments, and the global option --args reads from - which is replaced by standard input pipe from the echo command.

Another approach is

fossil setting crlf-glob *,

which, despite including the extra comma in the stored setting value, has the desired effect. The empty GLOB after the comma matches no files at all, which has no effect since the * matches them all.

Similarly,

fossil setting crlf-glob '*'

also works. Here the single quotes are unneeded since no white space is mentioned in the pattern, but do no harm. The GLOB still matches all the files.

Converting .gitignore to ignore-glob

Many other version control systems handle the specific case of ignoring certain files differently from fossil: they have you create individual "ignore" files in each folder, which specify things ignored in that folder and below. Usually some form of glob patterns are used in those files, but the details differ from fossil.

In many simple cases, you can just store a top level "ignore" file in .fossil-settings/ignore-glob. But as usual, there will be lots of edge cases.

Git has a rich collection of ignore files which accumulate rules that affect the current command. There are global files, per-user files, per workspace unmanaged files, and fully version controlled files. Some of the files used have no set name, but are called out in configuration files.

In contrast, fossil has a global setting and a local setting, but the local setting overrides the global rather than extending it. Similarly, a fossil command's --ignore option replaces the ignore-glob setting rather than extending it.

With that in mind, translating a .gitignore file into .fossil-settings/ignore-glob may be possible in many cases. Here are some of features of .gitignore and comments on how they relate to fossil:

Example

In a project with source and documentation:

work
  +-- doc
  +-- src

The file doc/.gitignore might contain:

# Finished documents by pandoc via LaTeX
*.pdf
# Intermediate files
*.tex
*.toc
*.log
*.out
*.tmp

Entries in .fossil-settings/ignore-glob with similar effect, also limited to the doc folder:

doc/*.pdf
doc/*.tex, doc/*.toc, doc/*.log, doc/*.out, doc/*.tmp

Implementation and References

Most of the implementation of glob pattern handling in fossil is found glob.c, file.c, and each individual command and web page that uses a glob pattern. Find commands and pages in the fossil sources by looking for comments like COMMAND: add or WEBPAGE: timeline in front of the function that implements the command or page in files src/*.c. (Fossil's build system creates the tables used to dispatch commands at build time by searching the sources for those comments.) A few starting points:

The actual pattern matching is implemented in SQL, so the documentation for GLOB and the other string matching operators in SQLite is useful. Of course, the SQLite source code and test harnesses also make entertaining reading: