Friday, October 23, 2009

Learning and Responding

A mistake was made today. Code was merged from one branch to another branch and the destination branch was broken in its intended use. The break was detected late in the day of the team that caused the break, and they had already mostly left for the day. The break highlighted all sorts of weaknesses in how I was handling things, including things like

  • Why didn't I make it clear to everyone both the purpose and the target configuration of each branch? Poor communication I had not made it clear what the purposes and expected configuration was for each branch, and they assumed that since they could see the branch building in one case, that was sufficient
  • Why didn't the person who performed the merge detect the broken build on the continuous integration server? Unclear information sources. We had configured 3 different continuous integration servers because we needed three different configurations. Unfortunately, I then "muddied the waters" by having one of the branches made compatible with all three configurations, and actively visible on all three configurations. When the developer performed the merge, they saw that it was "green" on the screen they were watching, and thought they were done. It had gone "red" on the other two servers, and those two were the most important to my team
  • Why wasn't the team which performed the harmful merge able to repair their damage? Unavailable spare configuration They had no machine available to them which matched the problem configurations and could be used for diagnosis and development. Their machines were all configured for their needs, and the break was in an area needed by other teams
  • Why did it take half a day to recover from the damage? Inexperience with our tools We recently switched from Perforce to Subversion to Git and the transition has left us less skilled in dealing with the complexity of this type of failure.

All told, the damage cost my team less than a day to recover, and because we're using a distributed version control system, they were able to continue their work locally, but they were not able to push to the central repository.

Moral of the story: Communicate clearly, listen carefully, and be willing to change as better ideas arrive

No comments: