Fossil: branching.wiki at [cbca374c37]

File www/branching.wiki artifact 83ccc0ffb6 part of check-in cbca374c37

<title>Branching, Forking, Merging, and Tagging</title>
<h2>Background</h2>

In a simple and perfect world, the development of a project would proceed
linearly, as shown in Figure 1.

<table border=1 cellpadding=10 hspace=10 vspace=10 align="center">
<tr><td align="center">
<img src="branch01.svg"><br>
Figure 1
</td></tr></table>

Each circle represents a check-in.  For the sake of clarity, the check-ins
are given small consecutive numbers.  In a real system, of course, the
check-in numbers would be long hexadecimal hashes since it is not possible
to allocate collision-free sequential numbers in a distributed system.
But as sequential numbers are easier to read, we will substitute them for
the long hashes in this document.

The arrows in Figure 1 show the evolution of a project.  The initial
check-in is 1.  Check-in 2 is derived from 1.  In other words, check-in 2
was created by making edits to check-in 1 and then committing those edits.
We say that 2 is a <i>child</i> of 1
and that 1 is a <i>parent</i> of 2.
Check-in 3 is derived from check-in 2, making
3 a child of 2.  We say that 3 is a <i>descendant</i> of both 1 and 2 and that 1
and 2 are both <i>ancestors</i> of 3.

<h2 id="dag">DAGs</h2>

The graph of check-ins is a
[http://en.wikipedia.org/wiki/Directed_acyclic_graph | directed acyclic graph]
commonly shortened to <i>DAG</i>.  Check-in 1 is the <i>root</i> of the DAG
since it has no ancestors.  Check-in 4 is a <i>leaf</i> of the DAG since
it has no descendants.  (We will give a more precise definition later of
"leaf.")

Alas, reality often interferes with the simple linear development of a
project.  Suppose two programmers make independent modifications to check-in 2.
After both changes are committed, the check-in graph looks like Figure 2:

<table border=1 cellpadding=10 hspace=10 vspace=10 align="center">
<tr><td align="center">
<img src="branch02.svg"><br>
Figure 2
</td></tr></table>

The graph in Figure 2 has two leaves: check-ins 3 and 4.  Check-in 2 has
two children, check-ins 3 and 4.  We call this state a <i>fork</i>.

Fossil tries to prevent forks. Suppose two programmers named Alice and
Bob are each editing check-in 2 separately. Alice finishes her edits
first and commits her changes, resulting in check-in 3. Later, when Bob
attempts to commit his changes, Fossil verifies that check-in 2 is still
a leaf. Fossil sees that check-in 3 has occurred and aborts Bob's commit
attempt with a message "would fork." This allows Bob to do a "fossil
update" which pulls in Alice's changes, merging them into his own
changes. After merging, Bob commits check-in 4 as a child of check-in 3.
The result is a linear graph as shown in Figure 1. This is how CVS
works. This is also how Fossil works in [./concepts.wiki#workflow |
"autosync"] mode.

But perhaps Bob is off-network when he does his commit, so he
has no way of knowing that Alice has already committed her changes.
Or, it could be that Bob has turned off "autosync" mode in Fossil.  Or,
maybe Bob just doesn't want to merge in Alice's changes before he has
saved his own, so he forces the commit to occur using the "--allow-fork"
option to the <b>fossil commit</b> command.  For any of these reasons,
two commits against check-in 2 have occurred and now the DAG has two leaves.

So which version of the project is the "latest" in the sense of having
the most features and the most bug fixes?  When there is more than
one leaf in the graph, you don't really know, so we like to have
check-in graphs with a single leaf.

Fossil resolves such problems using the check-in time on the leaves to
decide which leaf to use as the parent of new leaves.  When a branch is
forked as in Figure 2, Fossil will choose check-in 4 as the parent for a
later check-in 5, but <i>only</i> if it has sync'd that check-in down
into the local repository. If autosync is disabled or the user is
off-network when that fifth check-in occurs, so that check-in 3 is the
latest on that branch at the time within that clone of the repository,
Fossil will make check-in 3 the parent of check-in 5!

Fossil also uses a forked branch's leaf check-in timestamps when
checking out that branch: it gives you the fork with the latest
check-in, which in turn selects which parent your next check-in will be
a child of.  This situation means development on that branch can fork
into two independent lines of development, based solely on which branch
tip is newer at the time the next user starts his work on it.  Because
of this, we strongly recommend that you do not intentionally create
forks on long-lived shared working branches with "--allow-fork".  (Prime
example: trunk.)

Let us return to Figure 2. To resolve such situations before they can
become a real problem, Alice can use the <b>fossil merge</b> command to
merge Bob's changes into her local copy of check-in 3.  Then she can
commit the results as check-in 5.  This results in a DAG as shown in
Figure 3.

<table border=1 cellpadding=10 hspace=10 vspace=10 align="center">
<tr><td align="center">
<img src="branch03.svg"><br>
Figure 3
</td></tr></table>

Check-in 5 is a child of check-in 3 because it was created by editing
check-in 3.  But check-in 5 also inherits the changes from check-in 4 by
virtue of the merge.  So we say that check-in 5 is a <i>merge child</i>
of check-in 4 and that it is a <i>direct child</i> of check-in 3.
The graph is now back to a single leaf, check-in 5.

We have already seen that if Fossil is in autosync mode then Bob would
have been warned about the potential fork the first time he tried to
commit check-in 4.  If Bob had updated his local check-out to merge in
Alice's check-in 3 changes, then committed, then the fork would have
never occurred.  The resulting graph would have been linear, as shown
in Figure 1.

Realize that the graph of Figure 1 is a subset of Figure 3.  Hold your
hand over the check-in 4 circle of Figure 3 and then Figure 3 looks
exactly like Figure 1, except that the leaf has a different check-in
number, but that is just a notational difference — the two check-ins
have exactly the same content.  In other words, Figure 3 is really a
superset of Figure 1.  The check-in 4 of Figure 3 captures additional
state which is omitted from Figure 1.  Check-in 4 of Figure 3 holds a
copy of Bob's local checkout before he merged in Alice's changes.  That
snapshot of Bob's changes, which is independent of Alice's changes, is
omitted from Figure 1.  Some people say that the approach taken in
Figure 3 is better because it preserves this extra intermediate state.
Others say that the approach taken in Figure 1 is better because it is
much easier to visualize a linear line of development and because the
merging happens automatically instead of as a separate manual step.  We
will not take sides in that debate.  We will simply point out that
Fossil enables you to do it either way.

<h2 id="branching">The Alternative to Forking: Branching</h2>

Having more than one leaf in the check-in DAG is called a "fork." This
is usually undesirable and either avoided entirely,
as in Figure 1, or else quickly resolved as shown in Figure 3.
But sometimes, one does want to have multiple leaves.  For example, a project
might have one leaf that is the latest version of the project under
development and another leaf that is the latest version that has been
tested.
When multiple leaves are desirable, we call this <i>branching</i>
instead of <i>forking</i>:

<blockquote>
<b>Key Distinction:</b> A branch is a <i>named, intentional</i> fork.
</blockquote>

Forks <i>may</i> be intentional, but most of the time, they're accidental.

Figure 4 shows an example of a project where there are two branches, one
for development work and another for testing.

<table border=1 cellpadding=10 hspace=10 vspace=10 align="center">
<tr><td align="center">
<img src="branch04.svg"><br>
Figure 4
</td></tr></table>

The hypothetical scenario of Figure 4 is this:  The project starts and
progresses to a point where (at check-in 2)
it is ready to enter testing for its first release.
In a real project, of course, there might be hundreds or thousands of
check-ins before a project reaches this point, but for simplicity of
presentation we will say that the project is ready after check-in 2.
The project then splits into two branches that are used by separate
teams.  The testing team, using the blue branch, finds and fixes a few
bugs.  This is shown by check-ins 6 and 9.  Meanwhile the development
team, working on the top uncolored branch,
is busy adding features for the second
release.  Of course, the development team would like to take advantage of
the bug fixes implemented by the testing team.  So periodically, the
changes in the test branch are merged into the dev branch.  This is
shown by the dashed merge arrows between check-ins 6 and 7 and between
check-ins 9 and 10.

In both Figures 2 and 4, check-in 2 has two children.  In Figure 2,
we call this a "fork."  In diagram 4, we call it a "branch."  What is
the difference?  As far as the internal Fossil data structures are
concerned, there is no difference.  The distinction is in the intent.
In Figure 2, the fact that check-in 2 has multiple children is an
accident that stems from concurrent development.  In Figure 4, giving
check-in 2 multiple children is a deliberate act.  So, to a good
approximation, we define forking to be by accident and branching to
be by intent.  Apart from that, they are the same.

Fossil offers two primary ways to create named, intentional forks
a.k.a. branches. First:

<pre>
    $ fossil commit --branch my-new-branch-name
</pre>

This is the method we recommend for most cases: it creates a branch as
part of a checkin using the current branch as its basis. After the
checkin, your local working directory is switched to that branch, so
that further checkins occur on that branch as well, as children of the
last checkin on that branch.

The second, more complicated option is:

<pre>
    $ fossil branch new my-new-branch-name trunk
    $ fossil update my-new-branch-name
    $ fossil commit
</pre>

Not only is the first command longer, you must give the second command
before creating any checkins, because until you do, your local working
directory remains on the same branch it was on at the time you issued
the command.

In addition to those problems, the second method is a violation of the
[https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it|YAGNI
Principle]. We recommend that you wait until you actually need the
branch and create it using the first command above.

(Keep in mind that trunk is just another branch in Fossil. It is simply
the default branch name for the first checkin and every checkin made as
one of its direct descendants. It is special only in that it is Fossil's
default when it has no better idea of which branch you mean.)


<h2 id="forking">Justifications For Forking</h2>

The primary cases where forking is justified over branching are all when
it is done purely in software in order to avoid losing information:

<ol>
    <li><p>By Fossil itself when two users check in children to the same
    leaf of a branch, as in Figure 2.  If the fork occurs because
    autosync is disabled on one or both of the repositories or because
    the user doing the check-in has no network connection at the moment
    of the commit, Fossil has no way of knowing that it is creating a
    fork until the two repositories are later sync'd.</p></li>

    <li><p>By Fossil when the cloning hierarchy is more than 2 levels
    deep. If your master repository is cloned by user A and then user B
    clones from user A's repository, check-ins to user B's repo do not
    check the master repo before allowing the check-in even with
    autosync enabled. It isn't until user A syncs her repo with the
    master repo that an inadvertent fork can be detected.
    <br><br>
    Because of this, we recommend that if you're using Fossil in a
    distributed way like this, that check-ins be made only to the master
    or its immediate child repos, and that those further down the chain
    be read-only clones. This is not to say that we repudiate Fossil's
    use as a Distributed Version Control System, just that such use is
    prone to creating forks, by the nature of distributed development.
    We hope we've made it clear in this document why forks on long-lived
    shared working branches are a problem.</p></li>

    <li><p>You've automated Fossil (e.g. with a shell script) and
    forking is a possibility, so you write <b>fossil commit
    --allow-fork</b> commands to prevent Fossil from refusing the
    check-in because it would create a fork.  It's better to write such
    a script to detect this condition and cope with it (e.g. <b>fossil
    update</b>) but if the alternative is losing information, you may
    feel justified in creating forks that an interactive user must later
    clean up with <b>fossil merge</b> commands.</p></li>
</ol>

That leaves only one case where we can recommend use of "--allow-fork"
by interactive users: when you're working on
a personal branch so that creating a dual-tipped branch isn't going to
cause any other user an inconvenience or risk forking the development.
Only one developer is involved, and the fork may be short-lived, so
there is no risk of inadvertently forking the overall development effort.
This is a good alternative to branching when you just need to
temporarily fork the branch's development. It avoids cluttering the
global branch namespace with short-lived temporary named branches.

There's a common generalization of that case: you're a solo developer,
so that the problems with branching vs forking simply don't matter. In
that case, feel free to use "--allow-fork" as much as you like.


<h2 id="fix">Fixing Forks</h2>

If your local checkout is on a forked branch, you can usually fix a fork
automatically with:

<pre>
    $ fossil merge
</pre>

Normally you need to pass arguments to <b>fossil merge</b> to tell it
what you want to merge into the current basis view of the repository,
but without arguments, the command seeks out and fixes forks.


<h2 id="tags">Tags And Properties</h2>

Tags and properties are used in Fossil to help express the intent, and
thus to distinguish between forks and branches.  Figure 5 shows the
same scenario as Figure 4 but with tags and properties added:

<table border=1 cellpadding=10 hspace=10 vspace=10 align="center">
<tr><td align="center">
<img src="branch05.svg"><br>
Figure 5
</td></tr></table>

A <i>tag</i> is a name that is attached to a check-in.  A
<i>property</i> is a name/value pair.  Internally, Fossil implements
tags as properties with a NULL value.  So, tags and properties really
are much the same thing, and henceforth we will use the word "tag"
to mean either a tag or a property.

A tag can be a one-time tag, a propagating tag or a cancellation tag.
A one-time tag only applies to the check-in to which it is attached.  A
propagating tag applies to the check-in to which it is attached and also
to all direct descendants of that check-in.  A <i>direct descendant</i>
is a descendant through direct children.  Tag propagation does not
cross merges.  Tag propagation also stops as soon
as it encounters another check-in with the same tag.  A cancellation tag
is attached to a single check-in in order to either override a one-time
tag that was previously placed on that same check-in, or to block
tag propagation from an ancestor.

The initial check-in of every repository has two propagating tags.  In
Figure 5, that initial check-in is check-in 1.  The <b>branch</b> tag
tells (by its value)  what branch the check-in is a member of.
The default branch is called "trunk."  All tags that begin with "<b>sym-</b>"
are symbolic name tags.  When a symbolic name tag is attached to a
check-in, that allows you to refer to that check-in by its symbolic
name rather than by its hexadecimal hash name.  When a symbolic name
tag propagates (as does the <b>sym-trunk</b> tag) then referring to that
name is the same as referring to the most recent check-in with that name.
Thus the two tags on check-in 1 cause all descendants to be in the
"trunk" branch and to have the symbolic name "trunk."

Check-in 4 has a <b>branch</b> tag which changes the name of the branch
to "test."  The branch tag on check-in 4 propagates to check-ins 6 and 9.
But because tag propagation does not follow merge links, the <b>branch=test</b>
tag does not propagate to check-ins 7, 8, or 10.  Note also that the
<b>branch</b> tag on check-in 4 blocks the propagation of <b>branch=trunk</b>
so that it cannot reach check-ins 6 or 9.  This causes check-ins 4, 6, and
9 to be in the "test" branch and all others to be in the "trunk" branch.

Check-in 4 also has a <b>sym-test</b> tag, which gives the symbolic name
"test" to check-ins 4, 6, and 9.  Because tags do not propagate across
merges, check-ins 7, 8, and 10 do not inherit the <b>sym-test</b> tag and
are hence not known by the name "test."
To prevent the <b>sym-trunk</b> tag from propagating from check-in 1
into check-ins 4, 6, and 9, there is a cancellation tag for
<b>sym-trunk</b> on check-in 4.  The net effect is that
check-ins on the trunk go by the symbolic name of "trunk" and check-ins
on the test branch go by the symbolic name "test."

The <b>bgcolor=blue</b> tag on check-in 4 causes the background color
of timelines to be blue for check-in 4 and its direct descendants.

Figure 5 also shows two one-time tags on check-in 9.  (The diagram does
not make a graphical distinction between one-time and propagating tags.)
The <b>sym-release-1.0</b> tag means that check-in 9 can be referred to
using the more meaningful name "release-1.0."  The <b>closed</b> tag means
that check-in 9 is a "closed leaf."  A closed leaf is a leaf that should
never have direct children.

<h2>Review Of Terminology</h2>

<blockquote><dl>
<dt><b>Branch</b></dt>
<dd><p>A branch is a set of check-ins with the same value for their
"branch" property.</p></dd>
<dt><b>Leaf</b></dt>
<dd><p>A leaf is a check-in with no children in the same branch.</p></dd>
<dt><b>Closed Leaf</b></dt>
<dd><p>A closed leaf is any leaf with the <b>closed</b> tag.  These leaves
are intended to never be extended with descendants and hence are omitted
from lists of leaves in the command-line and web interface.</p></dd>
<dt><b>Open Leaf</b></dt>
<dd><p>A open leaf is a leaf that is not closed.</p></dd>
<dt><b>Fork</b></dt>
<dd><p>A fork is when a check-in has two or more direct (non-merge)
children in the same branch.</p></dd>
<dt><b>Branch Point</b></dt>
<dd><p>A branch point occurs when a check-in has two or more direct (non-merge)
children in different branches.  A branch point is similar to a fork,
except that the children are in different branches.</p></dd>
</dl></blockquote>

Check-in 4 of Figure 3 is not a leaf because it has a child (check-in 5)
in the same branch.  Check-in 9 of Figure 5 also has a child (check-in 10)
but that child is in a different branch, so check-in 9 is a leaf.  Because
of the <b>closed</b> tag on check-in 9, it is a closed leaf.

Check-in 2 of Figure 3 is considered a "fork"
because it has two children in the same branch.  Check-in 2 of Figure 5
also has two children, but each child is in a different branch, hence in
Figure 5, check-in 2 is considered a "branch point."

<h2>Differences With Other DVCSes</h2>

<h3 id="single">Single DAG</h3>

Fossil keeps all check-ins on a single DAG.  Branches are identified with
tags.  This means that check-ins can be freely moved between branches
simply by altering their tags.

Most other DVCSes maintain a separate DAG for each branch.

<h3 id="unique">Branch Names Need Not Be Unique</h3>

Fossil does not require that branch names be unique, as in some VCSes,
most notably Git. Just as with unnamed branches (which we call forks)
Fossil resolves such ambiguities using the timestamps on the latest
checkin in each branch. If you have two branches named "foo" and you say
<b>fossil up foo</b>, you get the tip of the "foo" branch with the most
recent checkin.

This fact is helpful because it means you can reuse branch names, which
is especially useful with utility branches.  There are several of these
in the SQLite and Fossil repositories: "broken-build," "declined,"
"mistake," etc. As you might guess from these names, such branch names
are used in renaming the tip of one branch to shunt it off away from the
mainline of that branch due to some human error. (See <b>fossil
amend</b> and the Fossil UI checkin ammendment features.) This is a
workaround for Fossil's [./shunning.wiki|normal inability to forget
history]: we usually don't want to actually <i>remove</i> history, but
would like to sometimes set some of it aside under a new label.

Because some VCSes can't cope with duplicate branch names, Fossil
collapses such names down on export using the same timestamp based
arbitration logic, so that only the branch with the newest checkin gets
the branch name in the export.

All of the above is true of tags in general, not just branches.