Darcs, a New Way of Source Management
Saturday, November 27th, 2004 at 5:27 amDarcs is a decentralized source control manager (SCM) that is similar to GNU Arch, but without the complex command line syntax.
Simply, Darcs is the best SCM I’ve ever seen. I am a fan of decentralized SCMs (Subversion and CVS being examples of popular centralized SCMs), and when Arch was originally released I was quite excited. Decentralization allows developers on a project to work asynchronously, and allows developers to push and pull code changes around without anyone needing access to a central repository; you do not need to use the clunky diff/patch. Decentralization also allow changes to survive the death of any single user or repository. As long as someone else has them, chances are you can recover lost work with at least some useful meta data intact.
Darcs uses a concept called atomic commits. Subversion and Archs do atomic commits, but CVS does not; on CVS each file has its own revision history, which makes it very hard to watch fast and varied development. Darcs calls atomic commits: change sets.
Darcs is very simple to use: want to make a new code tree? mkdir ProjectName; cd ProjectName; darcs init and thats it. You add files using darcs add myfile, you make a new change set using darcs record -am "Description of change.", you display the ChangeLog using darcs changes.
You can grab changes from servers running darcs, web servers, or through email. Just use the darcs get command to get the full source tree, and then use the darcs pull command to grab new changes. To commit changes to a remote repository (such as one on disk or a server that runs darcs or through email) use the darcs push command. To use a web sever who doesn’t run darcs, use ftp or ssh/scp to update the copy on the web server.
Now, besides being simple to use, it also tracks context as a part of the meta data. In other words, it tracks the file itself, not the name or the directory it is located in. For example, lets say I have a complex program with hundreds of source files, and I decide to move all similar files into directories to group them together, and I additionally rename them all in the process. My fellow developer has thousands of lines of changes, but he is using the original file layout. He can push his changes into my repository, and everything is merged and preserved perfectly.
This goes well with the next feature I want to mention (though I implied of it’s existence above): being able to create new change sets without needing to push a change set to anyone else first. Each working copy is also a repository, so with the darcs record command I can create thousands of revisions, and my example fellow developer can grab all these revisions and merge them with his repository, and have all the meta data of my changes fully intact (ie, his repository is now thousands of revisions bigger instead of just one revision bigger).
Both of these features together means development is done by multiple developers at once without conflict, and each repository is its own fork. Not only can individual developers share code, groups of developers can push and pull tons of changes around without any other developer needing the changes; and, common in the CVS world, you no longer need to care about preresequite patches as they are included in the change set.
Example: My project now has dozens of developers, and one group wishes to work on Popular Feature A, and another group wants to work on Popular Feature B. Both groups can finish their features, and push A and B to each other or to me, and I can merge A and B even though they were independently developed and may have even edited the same files. (Note: Darcs has very fine grained change tracking. It does it per line instead of per chunk like CVS and other diff/patch-alikes use)
The above example is called parallel development, something that is very difficult with CVS or Subversion, and mildly annoying with Arch. As long as no one edits the same line, conflicts don’t happen.
Now, you’re probably asking, without a central repository, what is the main branch? The main branch, or trunk, is simply a repository hosted on the website or the chosen maintainer (such as the project lead), and only contains changes that are deemed safe to use and is a candidate for stable code release. This repository is simply a place for developers to start hacking on the most recent code, and it also serves as a group of changes that all developers should be using. The main trunk should only be altered by the chosen maintainer; or at the very least, people who can be trusted with the responsibility of not introducing lots of bugs or bad code.
With all these great features, it makes you wonder why anyone would CVS or Subversion: CVS has 15 years of inertia behind it; and Subversion, due to it’s CVS-like behavior, is just along for the ride. As for poor Arch, it has a much smaller user base than CVS or Subversion, and is loosing the majority of it’s user base to the much easier to use Darcs.
Unfortunately, not everything is perfect. Currently Darcs has issues with very large repositories, and will use a lot of memory and CPU time trying to process them. This issue is top priority on the TODO list.
as an ex-Subversion user, i found this handy:
http://www.scannedinavian.org/DarcsWiki/BestPractices
what sold me is the Darcs approach to branches, and the simplicity of pushing changes to them.