Skip to content

A discussion on Mercurial’s repository format (Good or Bad?)

Keith Packard of X11 fame wrote a blog post on revision control repository formats. In it he describes why he chose git as the new system that will manage the source code for Xorg and he had some choice comments on Mercurial repository format

Mercurial uses a truncated forward delta scheme where file revisions are appended to the repository file, as a string of deltas with occasional complete copies of the file (to provide a time bound on operations). This suffers from two possible problems—the first is fairly obvious where corrupted writes of new revisions can affect old revisions of the file. The second is more subtle — system failure during commit will leave the file contents half written. Mercurial has recovery techniques to detect this, but they involve truncating existing files

However, some people who use Mercurial have answered back to those criticisms.
RVBurke writes:

Mercurial uses a compact representation of data with separate revlog files for each tracked file, manifest and changelog, which are all append-only. Due to the append-only nature of those writes, the changes in each new revision don’t affect previous revisions. You are that way as safe as you can be in any other system with respect to writes and the space usage is very good.

To achieve similar space efficiency git needs to pack the repository data. This is done rewriting the repo, and the operation has to be done from time to time (repack).

IF the atomic append-only writes to the manifest and revlog files in Mercurial can be considered dangerous, then repacking is even more so, as it forces a rewrite of all the repo data, multiplying the chance of a failure.

So, if any corruption can happen on a faulty write it will hit git (unpacked) or Mercurial in the same way, but anytime you pack your repo in git you’re risking your data and the write fails you can corrupt its repository.

and Matt Mackal in an email to the Mercurial mailing list has the following reply:

Mercurial files are append-only. New revisions (corrupted or not), do
not affect old revisions. If a hardware or operating system bug causes
a write to scribble over the wrong part of a file, all bets are off of
course. But such bugs could just as easily scribble over any other
part of the disk, so I think this is no more of an issue than it is
for any other possible system.

Personally, I’m a bit confused. I’m not an expert on repository formats so I don’t fully understand whether the criticism or the defense is stronger. However, I have chosen to use Mercurial in most of my own personal projects because it has one feature for sure that git does not. Windows compatibility. While many will denigrate native Windows compatibility. It’s also good to remember that a huge portion of machines on this planet still run Windows. The fact that there’s little information on running git on windows besides ‘Install cygwin’ makes it hard to evaluate whether git is ‘good enough’ on Windows or not.

Be Sociable, Share!

{ 2 } Comments

  1. Niko | January 17, 2007 at 9:05 pm | Permalink

    what was your initial point to choose mercurial over SVN ?
    looks like most of my IDEs don’t support mercurial.

  2. hoanga | January 17, 2007 at 11:19 pm | Permalink

    The main reason I chose mercurial over SVN is for offline access to the repository. If I’m working remotely, I would really like to be able to query the revision logs on files and make small incremental commits without having to hit the network each time. This is impossible with SVN or CVS.

    I know about solutions such as SVK however, the dependency listing for installing SVK is too much of a burden for me and I’ve found the documentation for using SVK a little lacking.