Artifact e35729693c1342f8ce29934749b78b12e0cba91e683954612fd78d6dafc5b5e2:
- File www/blockchain.md — part of check-in [c34ca6299f] at 2021-09-09 16:19:55 on branch trunk — Edit pass on the blockchain doc: mainly clarity improvements, but also some typo and grammar fixes. (user: wyoung size: 23691) [more...]
Is Fossil A Blockchain?
The Fossil version control system shares a lot of similarities with other blockchain based technologies, but it also differs from the more common sorts of blockchains. This document will discuss the term’s applicability, so you can decide whether applying the term to Fossil makes sense to you.
The Dictionary Argument
The Wikipedia definition of "blockchain" begins:
"A blockchain…is a growing list of records, called blocks, which are linked using cryptography… Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data (generally represented as a Merkle tree)."
Point-for-point, Fossil follows this partial definition. The blocks are Fossil’s "manifest" artifacts. Each manifest has a cryptographically-strong SHA-1 or SHA-3 hash linking it to one or more “parent” blocks. The manifest also contains a timestamp and the transactional data needed to express a commit to the repository. To traverse the Fossil repository from the tips of its DAG to the root by following the parent hashes in each manifest is to traverse a Merkle tree. Every change in Fossil starts by adding one or more manifests to the repository, extending this Merkle tree.
Cryptocurrency
Because blockchain technology was first popularized as Bitcoin, many people associate the term with cryptocurrency. Fossil has nothing to do with cryptocurrency, so a claim that “Fossil is a blockchain” may fail to communicate the speaker’s concepts clearly due to conflation with cryptocurrency.
Cryptocurrency has several features and requirements that Fossil doesn’t provide, either because it doesn’t need them or because we haven’t gotten around to creating the feature. Whether these are essential to the definition of “blockchain” and thus disqualify Fossil as a blockchain is for you to decide.
Cryptocurrencies must prevent three separate types of fraud to be useful:
- Type 1 is modification of existing currency. To draw an analogy to paper money, we wish to prevent someone from using green and black markers to draw extra zeroes on a US $10 bill so that it claims to be a $100 bill. 
- Type 2 is creation of new fraudulent currency that will pass in commerce. To extend our analogy, it is the creation of new US $10 bills. There are two sub-types to this fraud. In terms of our analogy, they are: - Type 2a: copying an existing legitimate $10 bill
- Type 2b: printing a new $10 bill that is unlike an existing legitimate one, yet which will still pass in commerce
 
- Type 3 is double-spending existing legitimate cryptocurrency. There is no analogy in paper money due to its physical form; it is a problem unique to digital currency due to its infinitely-copyable nature. 
How does all of this compare to Fossil?
- Signatures. Cryptocurrencies use a chain of digital signatures to prevent Type 1 and Type 3 frauds. This chain forms an additional link between the blocks, separate from the hash chain that applies an ordering and lookup scheme to the blocks. Blockchain: Simple Explanation explains this “hash chain” vs. “block chain” distinction in more detail. - These signatures prevent modification of the face value of each transaction (Type 1 fraud) by ensuring that only the one signing a new block has the private signing key that could change an issued block after the fact. - The fact that these signatures are also chained prevents Type 3 frauds by making the prior owner of a block sign it over to the new owner. To avoid an O(n²) auditing problem as a result, cryptocurrencies add a separate chain of hashes to make checking for double-spending quick and easy. - Fossil has a disabled-by-default feature to call out to an external copy of PGP or GPG to sign commit manifests before inserting them into the repository. You can couple that with a server-side after-receive hook to reject unsigned commits. - Although there are several distinctions you can draw between the way Fossil’s commit signing scheme works and the way block signing works in cryptocurrencies, only one is of material interest for our purposes here: Fossil commit signatures apply only to a single commit. Fossil does not sign one commit over to the next “owner” of that commit in the way that a blockchain-based cryptocurrency must when transferring currency from one user to another, beacuse there is no useful analog to the double-spending problem in Fossil. The closest you can come to this is double-insert of commits into the blockchain, which we’ll address shortly. - What Fossil commit signatures actually do is provide in-tree forgery prevention, both Type 1 and Type 2. You cannot modify existing commits (Type 1 forgery) because you do not have the original committer’s private signing key, and you cannot forge new commits attesting to come from some other trusted committer (Type 2) because you don’t have any of their private signing keys, either. Cryptocurrencies use the work problem to prevent Type 2 forgeries, but the application of that to Fossil is a matter we get to later. - Although you have complete control over the contents of your local Fossil repository clone, you cannot perform Type 1 forgery on its contents short of executing a preimage attack on the hash algorithm. (SHA3-256 by default in the current version of Fossil.) Even if you could, Fossil’s sync protocol will prevent the modification from being pushed into another repository: the remote Fossil instance says, “I’ve already got that one, thanks,” and ignores the push. Thus, short of breaking into the remote server and modifying the repository in place, you couldn’t make use of a preimage attack even if you had that power. Further, that would be an attack on the server itself, not on Fossil’s data structures, so while it is useful to think through this problem, it is not helpful in answering our questions here. - The Fossil sync protocol’s duplication detection also prevents the closest analog to Type 3 frauds in Fossil: copying a commit manifest in your local repo clone won’t result in a double-commit on sync. - In the absence of digital signatures, Fossil’s RBAC system restricts Type 2 forgery to trusted committers. Thus once again we’re reduced to an infosec problem, not a data structure design question. - (Inversely, enabling commit clearsigning is a good idea if you have committers on your repo whom you don’t trust not to commit Type 2 frauds. But let us be clear: your choice of setting does not answer the question of whether Fossil is a blockchain.) - If Fossil signatures prevent Type 1 and Type 2 frauds, you may wonder why they are not enabled by default. It is because they are defense-in-depth measures, not the minimum sufficient measures needed to prevent repository fraud, unlike the equivalent protections in a cryptocurrency blockchain. Fossil provides its primary protections through other means, so it doesn’t need to mandate signatures. - Also, Fossil is not itself a PKI, and there is no way for regular users of Fossil to link it to a PKI, since doing so would likely result in an unwanted PII disclosure. There is no email address in a Fossil commit manifest that you could use to query one of the public PGP keyservers, for example. It therefore becomes a local policy matter as to whether you even want to have signatures, because they’re not without their downsides. 
- Work Contests. Cryptocurrencies prevent Type 2b forgeries by setting up some sort of contest that ensures that new coins can come into existence only by doing some difficult work task. This “mining” activity results in a coin that took considerable work to create, which thus has economic value by being a) difficult to re-create, and b) resistant to debasement. - Fossil repositories are most often used to store the work product of individuals, rather than cryptocoin mining machines. There is generally no contest in trying to produce the most commits. There may be an implicit contest to produce the “best” commits, but that is a matter of project management, not something that can be automatically mediated through objective measures. - Incentives to commit to the repository come from outside of Fossil; they are not inherent to its nature, as with cryptocurrencies. Moreover, there is no useful sense in which we could say that one commit “re-creates” another. Commits are generally products of individual human intellect, thus necessarily unique in all but trivial cases. This is foundational to copyright law. 
- Longest Chain Rule. Cryptocurrencies generally need some way to distinguish which blocks are legitimate and which not. They do this in part by identifying the linear chain with the greatest cumulative work time as the legitimate chain. All blocks not on that linear chain are considered “orphans” and are ignored by the cryptocurrency software. - Its inverse is sometimes called the “51% attack” because a single actor would have to do slightly more work than the entire rest of the community using a given cryptocurrency in order for their fork of the currency to be considered the legitimate fork. This argument soothes concerns that a single bad actor could take over the network. - The closest we can come to that notion in Fossil is the default “trunk” branch, but there’s nothing in Fossil that delegitimizes other branches just because they’re shorter, nor is there any way in Fossil to score the amount of work that went into a commit. Indeed, forks and branches are valuable and desirable things in Fossil. 
This much is certain: Fossil is definitely not a cryptocurrency. Whether this makes it “not a blockchain” is a subjective matter.
Distributed Ledgers
Cryptocurrencies are an instance of distributed ledger technology. If we can convince ourselves that Fossil is also a distributed ledger, then we might think of Fossil as a peer technology, having at least some qualifications toward being considered a blockchain.
A key tenet of DLT is that records be unmodifiable after they’re committed to the ledger, which matches quite well with Fossil’s design and everyday use cases. Fossil puts up multiple barriers to prevent modification of existing records and injection of incorrect records.
Yet, Fossil also has purge and shunning. Doesn’t that mean Fossil cannot be a distributed ledger?
These features only remove existing commits from the repository. If you want a currency analogy, they are ways to burn a paper bill or to melt a fiat coin down to slag. In a cryptocurrency, you can erase your “wallet” file, effectively destroying money in a similar way. These features do not permit forgery of either type described above: you can’t use them to change the value of existing commits (Type 1) or add new commits to the repository (Type 2).
What if we removed those features from Fossil, creating an append-only Fossil variant? Is it a DLT then? Arguably still not, because today’s Fossil is an AP-mode system, which means there can be no guaranteed consensus on the content of the ledger at any given time. An AP-mode accounts receivable system would allow different bottom-line totals at different sites, because you’ve cast away “C” to get AP-mode operation. (See the prior link or Wikipedia’s article on the CAP theorem if you aren’t following this terminology.)
By the same token, you cannot guarantee that the command
“fossil info tip” gives the same result everywhere. You would need to
recast Fossil as a CA or CP-mode system to solve that.
(Everyone not
partitioned away from the majority of the network at any rate, in the CP
case.)
What are the prospects for CA-mode or CP-mode Fossil? We don’t want CA-mode Fossil, but CP-mode could be useful. Until the latter exists, this author believes Fossil is not a distributed ledger in a technologically defensible sense.
The most common technologies answering to the label “blockchain” are all DLTs, so if Fossil is not a DLT, then it is not a blockchain in that sense.
Distributed Partial Consensus
If we can’t get DLT, can we at least get some kind of distributed consensus at the level of individual Fossil’s commits?
Many blockchain based technologies have this property: given some element of the blockchain, you can make certain proofs that it either is a legitimate part of the whole blockchain, or it is not.
Unfortunately, this author doesn’t see a way to do that with Fossil.
Given only one “block” in Fossil’s putative “blockchain” — a commit, in
Fossil terminology — all you can prove is whether it is internally
consistent, that it is not corrupt. That then points you at the parent(s) of that
commit, which you can repeat the exercise on, back to the root of the
DAG. This is what the enabled-by-default repo-cksum setting
does.
If cryptocurrencies worked this way, you wouldn’t be able to prove that a given cryptocoin was legitimate without repeating the proof-of-work calculations for the entire cryptocurrency scheme! Instead, you only need to check a certain number of signatures and proofs-of-work in order to be reasonably certain that you are looking at a legitimate section of the whole blockchain.
What would it even mean to prove that a given Fossil commit “belongs” to the repository you’ve extracted it from? For a software project, isn’t that tantamount to automatic code review, where the server would be able to reliably accept or reject a commit based solely on its content? That sounds nice, but this author believes we’ll need to invent AGI first.
A better method to provide distributed consensus for Fossil would be to rely on the natural intelligence of its users: that is, distributed commit signing, so that a commit is accepted into the blockchain only once some number of users countersign it. This amounts to a code review feature, which Fossil doesn’t currently have.
Solving that problem basically requires solving the PKI problem first, since you can’t verify the proofs of these signatures if you can’t first prove that the provided signatures belong to people you trust. This is a notoriously hard problem in its own right.
A future version of Fossil could instead provide consensus in the CAP
sense. For instance, you could say that if a quorum of servers
all have a given commit, it “belongs.” Fossil’s strong hashing tech
would mean that querying whether a given commit is part of the
“blockchain” would be as simple as going down the list of servers and
sending each an HTTP GET /info query for the artifact ID, concluding
that the commit is legitimate once you get enough HTTP 200 status codes back. All of this is
hypothetical, because Fossil doesn’t do this today.
Anonymity
Many blockchain based technologies go to extraordinary lengths to allow anonymous use of their service.
As typically configured, Fossil does not: commits synced between servers always at least have a user name associated with them, which the remote system must accept through its RBAC system. That system can run without having the user’s email address, but it’s needed if email alerts are enabled on the server. The remote server logs the IP address of the commit for security reasons. That coupled with the timestamp on the commit could sufficiently deanonymize users in many common situations.
It is possible to configure Fossil so it doesn’t do this:
- You can give Write capability to user category “nobody,” so that anyone that can reach your server can push commits into its repository. 
- You could give that capability to user category “anonymous” instead, which requires that the user log in with a CAPTCHA, but which doesn’t require that the user otherwise identify themselves. 
- You could enable the - self-registersetting and choose not to enable commit clear-signing so that anonymous users could push commits into your repository under any name they want.
On the server side, you can also scrub the logging that remembers where each commit came from.
Commit source info isn’t transmitted from the remote server on clone or pull:
the size of the rcvfrom table after initial clone is 1, containing
only the remote server’s IP address. On each pull containing new
artifacts, your local fossil instance adds another entry to this
table, likely with the same IP address unless the server has moved or
you’re using multiple remotes. This table is far more
interesting on the server side, containing the IP addresses of all
contentful pushes; thus the scrub command.
Because Fossil doesn’t remember IP addresses in commit manifests or require commit signing, it allows at least pseudonymous commits. When someone clones a remote repository, they don’t learn the email address, IP address, or any other sort of PII of prior committers, on purpose.
Some people say that private, permissioned blockchains (as you may imagine Fossil to be) are inherently problematic by the very reason that they don’t bake anonymous contribution into their core. The very existence of an RBAC is a moving piece that can break. Isn’t it better, the argument goes, to have a system that works even in the face of anonymous contribution, so that you don’t need an RBAC? Cryptocurrencies do this, for example: anyone can “mine” a new coin and push it into the blockchain, and there is no central authority restricting the transfer of cryptocurrency from one user to another.
We can draw an analogy to encryption, where an algorithm is considered inherently insecure if it depends on keeping any information from an attacker other than the key. Encryption schemes that do otherwise are derided as “security through obscurity.”
You may be wondering what any of this has to do with whether Fossil is a blockchain, but that is exactly the point: all of this is outside Fossil’s core hash-chained repository data structure. If you take the position that you don’t have a “blockchain” unless it allows anonymous contribution, with any needed restrictions provided only by the very structure of the managed data, then Fossil does not qualify.
Why do some people care about this distinction? Consider Bitcoin, wherein an anonymous user cannot spam the blockchain with bogus coins because its proof-of-work protocol allows such coins to be rejected immediately. There is no equivalent in Fossil: it has no technology that allows the receiving server to look at the content of a commit and automatically judge it to be “good.” Fossil relies on its RBAC system to provide such distinctions: if you have a commit bit, your commits are ipso facto judged “good,” insofar as any human work product can be so judged by a blob of compiled C code. This takes us back to the digital ledger question, where we can talk about what it means to later correct a bad commit that got through the RBAC check.
We may be willing to accept pseudonymity, rather than full anonymity. If we configure Fossil as above, either bypassing the RBAC or abandoning human control over it, scrubbing IP addresses, etc., is it then a public permissionless blockchain in that sense?
We think not, because there is no longest chain rule or anything like it in Fossil.
For a fair model of how a Fossil repository might behave under such conditions, consider GitHub: here one user can fork another’s repository and make an arbitrary number of commits to their public fork. Imagine this happens 10 times. How does someone come along later and automatically evaluate which of the 11 forks of the code (counting the original repository among their number) is the “best” one? For a computer software project, the best we could do to approximate this devolves to a software project cost estimation problem. These methods are rather questionable in their own right, being mathematical judgement values on human work products, but even if we accept their usefulness, then we still cannot say which fork is better based solely on their scores under these metrics. We may well prefer to use the fork of a software program that took less effort, being smaller, more self-contained, and with a smaller attack surface.
Conclusion
This author believes it is technologically indefensible to call Fossil a “blockchain” in any sense likely to be understood by a majority of those you’re communicating with. Using a term in a nonstandard way just because you can defend it means you’ve failed any goal that requires clear communication. The people you’re communicating your ideas to must have the same concept of the terms you use.
What term should you use instead? Fossil stores a DAG of hash-chained commits, so an indisputably correct term is a Merkle tree, named after its inventor. You could also use the more generic term “hash tree.”
Fossil is a technological peer to many common sorts of blockchain technology. There is a lot of overlap in concepts and implementation details, but when speaking of what most people understand as “blockchain,” Fossil is not that.