Thursday, November 5, 2009

Was Our Switch to Git a Mistake?

My team is part of a larger, multi-team, multi-site organization in our company. The corporate choice for software configuration management is ClearCase. Unfortunately, my team is "remote", and "small". ClearCase does not handle remote teams well. I could find more polite ways to say it, but that is the simple result of our experiment. ClearCase was costing us far too much time due to its poor performance over a wide area network.

We solved the ClearCase problem by creating a bidirectional bridge between ClearCase and Subversion. My team's interface to the source master became Subversion. We saw great performance improvements, and were seeing the ClearCase updates within a few minutes of their arrival on the ClearCase master. The bridge was conceptually very simple, and it worked very, very well for our needs. Life was good.

We were then faced with a new, even more challenging problem. The new development work needed to be spread even more widely than the previous development work, with a larger number of teams involved and more dispersed geography. The new work would create multiple products, each with their own release cycle and their own development lifetime.

At the time we were making this transition, I'd been experimenting with the git version control system. Git is version control software created by Linus Torvalds when he needed to switch from BitKeeper for Linux kernel development. It runs very, very well on Linux. It is significantly faster than Subversion, which (in our environment) is significantly faster than ClearCase. Performance looked like a big winner.

Another challenge of the new environment was branch related. The new teams thought their development model would likely be "branch intensive". My prior experiences with branches have been with CVS and Perforce, where branches are globally visible and merging things between branches is a hassle. I hate branches. However, considering that the new world would be "branch intensive" and Subversion is generally not perceived favorably for branch management, we didn't want to use Subversion in a "branchy" environment.

With those two needs, distributed teams and branch intensive environment, we skipped Subversion and went looking. My recent git experience (and recent Mercurial experience) biased me in favor of a distributed version control system. A key opinion leader in the company had also been using git in a subteam of a very large project, pushing their results back to ClearCase. They reported positive results. My experience had also been positive while I was experimenting with taking a work project off on a "tangent". Git worked well for me, sitting on my underpowered Linux box doing my personal "skunk works" project.

We chose git as the team source control system.

Unfortunately, I had failed to detect my own biases, and the biases of the other early adopters of git. Those biases were very different than the biases of my co-workers.

I'm a command line fan. I'm old enough that my first high school experience programming computers was with the newly installed terminals to the school district mainframe (thanks Davis School District and Layton High for spending the money, the time, and the pain to install those machines!), then I moved to a University that required I submit programs on punch cards (makes me sound old). Before I left the University, they had upgraded to dumb terminals communicating with a DEC minicomputer.

As a command line fan, I found the git "user interface" perfectly comfortable and very similar to CVS, Subversion, and Perforce. There were a few surprises while I tried to understand distributed version control, but those surprises were related to version control concepts, not the specifics of git.

Unfortunately, many in my team and in other teams are not command line fans. They are accustomed to productivity accelerators like graphical user interfaces, integrated development environments, and mouse clicking to perform work much faster. The transition to git has been painful for them. In addition to my transition experience (centralized vs. distributed, new commands, new concepts), they've also had to deal with transitions from robust GUI tools (TortoiseSVN, Perforce Windows client, etc.) to weak and brittle GUI tools (GitSVN, gitk, git gui, etc.).

The challenge has been made worse by our decision as a management team to isolate teams on branches. Two of the managers in the team come from a large scale development organization (5-10x larger than our current organization) and they are accustomed to requiring branches as a way to isolate one team from the potentially breaking changes made by another team. The price of that branch isolation is that we now are required to perform more frequent merges of work, with the resulting complexity and frustrations which come from merging with conflicts. It gets worse when the files to be merged are coming from the Visual Studio IDE, and the meaning of the contents of the files is not always clear.

I think the branch configuration decision has done more damage than the choice of git, but that is probably biased (again) by my command line centric mindset. Unfortunately, we're far enough into the project that we aren't willing to switch SCM systems. We'll remain with git for at least the duration of this project, glad to have a source master, glad to have it connected to our continuous integration servers, and glad to not have the awful performance of remote ClearCase.

In all fairness to git, I still remember the growing pains when we switched from CVS to Perforce. I whined mightily at paying hundreds of dollars per developer for our corporate standard SCM system. Then I whined mightily at the tool changes and use model changes forced upon us by Perforce's way of thinking. After 6 months or a year, I discovered that I had changed my way of thinking, and was now very comfortable using Perforce, getting value from its way of branching, and being very grateful that it was so fast.

Maybe 6-12 months from now I'll say the same things about git. Maybe it is a part of "climbing the learning curve", and unfair to judge our experience this early. Or maybe not...

I still don't know what we should have chosen instead of git, since it is not clear to me that there were any better alternatives for my team at that time. The company was not willing to purchase another SCM system, since they were already paying for ClearCase. That excluded all the purchased SCM systems (Perforce, Microsoft Team System, Accurev, BitKeeper, etc.). The teams were known to be widely distributed, so that pushed us towards distributed SCM. The benchmark comparisons suggested that Git was faster than Mercurial in many operations, and the Bazaar people were still not settled on their final "on disc" format. Subversion was not well perceived for handling "branchy" development, and CVS was worse than Subversion.

The Linux kernel handles massive amounts of change (averaging 2-4 changes per hour continuously for the last 4 years) from many, many developers. It scales well for that widely distributed, branch intensive team, yet we're struggling with it. Of course, Linux kernel developers are even more likely to be command line biased than I am, and scaling the tool is not the issue that is getting in our way, it is more our choice to be "branchy" and the user interface weaknesses in git.

So many things to learn, so little time...

8 comments:

Mark Waite said...

I missed a new revision to the git kernel change volume paper. Greg Kroah Hartman's latest document (http://www.linuxfoundation.org/publications/whowriteslinux.pdf) now shows the change rate of the Linux kernel has increased even further, approaching 6 changes per hour, 24 hours a day, 7 days a week, for the last 12 months of kernel development.

Adam said...

I've been using GIT in anger for 18+ months now. There are tools to work with Visual Studio. Make use of "External Tools" for some of the common commands. You can tie them to shortcuts.

If you want even more integration w/ VS, see the "GIT Extensions" project.

Jakub Narebski said...

Some of GUI for Git (QGit, TortoiseGit, Git-Cheetah, Git Extensions, etc.) can be found on http://git.or.cz/gitwiki/InterfacesFrontendsAndTools page

Mark Waite said...

I found an interesting discussion comparing Git, Perforce, and SVN in various conditions by various users in a series of comments about Google's work on older versions of the Linux kernel.

I found the article fascinating, and some of the points made in the article "rang true" for me. We've made the mistake of placing large binaries in the repository (it was a mistake, I know, but the history seems to be there forever). I came from a perforce environment and liked being able to checkout only a portion of the directory tree (not mandatory, but it was a nice feature).

Adam said...

There is a way of getting the large objects out by deleting those blobs.

The other way is to get your history out as patches and delete the addition of the libraries.

Then fix the repo by introducing a submodule at the point in time that you introduced the large binaries.

Brendan said...

try the GUI SmartGit (Google it). Its really pretty good, makes Git much more accessible.

This article doesn't seem like it's about Git, just a shortage of GUIs for Git which is changing.

Unknown said...

Guys, you should check Plastic SCM if you're looking for branching intensive operations :)... with a nice learning curve. Not as popular as Git, but who knows what the future can bring...

Check the following tutorial on distributed on Windows:

http://codicesoftware.blogspot.com/2010/03/distributed-development-for-windows.html

Mark Waite said...

A recent surprise on GitHub struck the Jenkins continuous integration server development community, in a way that I suspect we were struck internally after our switch to git.

Modifying history is risky and I'm surprised that GitHub allows it on widely used repositories.