Re: bear shaving

2010-05-10

I was going to submit the stuff below in a shortened form as a comment to this fun little blog post on “bear shaving” but it sort of grew into a full blown article, again. To summarize the original article, there’s this nice analogy of shaving bears to help them cope with global warming and how that is not really addressing the core issues (not to mention dangerous). The analogy is applied to integration builds and people patching things up. Then the author sort of goes off and comes up with a few arguments against git and decentralization.

While some of the criticism is valid, this of course ticked me off :-)

I see Git as a solution to increase the amount of change and dealing more effectively with people working in parallel. Yes, this puts a strain on integrating the resulting changes. But less change is the equivalent of bear shaving here. Change is good. Change is productivity. You want more productivity not less. You want to move forward as fast as you possibly can. Integration builds breaking are a symptom of a larger problem. Bear shaving would be doing everything you can to make the integration builds work again, including forcing people to sit on their hands. The typical reflex to a crisis like this in the software industry is less change, complete with the process to ensure that people do less. This is how waterfall was born. Iterative or spiral development is about the same thing but doing it more frequently and less longer. This was generally seen as an improvement. But you are still going to sit on your hands for pro longed periods of time. The real deal these days is continuous deployment and you can’t do this if you are sitting on your hands.

Breaking integration builds have a cause: the people making the changes are piling mistake on mistake and keep bear shaving (I love the metaphor) the problem because they are under a pressure to release and deliver functionality. All a faster pace of development does is make this more obvious. Along with the increased amount of change per time-unit comes also an increased amount of mistakes per time unit. Every quick fix and every misguided commit makes the system as a whole a little less stable. That’s why the waterfall model includes a feature freeze (aka. integration) where no changes are allowed because the system would never get finished otherwise.

A long time ago I wrote an article about design erosion. It was one of the corner stones of my phd thesis (check my publication page if you are interested). In a nutshell: changes are cumulative and we take design decisions in the context of our expectations of the future. Only problem: nobody can predict the future accurately and as a consequence, there will be mistakes from time to time. It is inevitable that you will get it wrong sometimes and won’t realize right away. You can’t just rip out a single change you made months/years ago without the depending subsequent changes being affected. In other words, change is cumulative: rip one piece out and the whole sand castle collapses. Some of the decisions will be wrong or will have to be reconsidered at some point and because changes are inter dependent, fixing design erosion can be painful and expensive. Consequently, it is inevitable that all software designs erode over time as inevitably such change is delayed until the last possible moment. Design erosion is a serious problem. You can’t just fix a badly eroded system that you had for years over-night. Failing to address design erosion in time can actually kill your company, product or project. But you can delay the inevitable by dealing with the problems closer to where they originate instead of dealing with it later. Dealing with the problem close to where it originates means less subsequent changes are affected, meaning that you minimize the cost of fixing the problem. Breaking integration builds are a symptom of an eroding design. Delaying the fix makes it worse.

So, the solution is to refactor and rethink the broken parts of the system to be more robust, easier to test, more flexible to meet the requirements, etc. Easier said then done of course. However, Git is a revolutionary enabler here: you can do the more disruptive stuff on a git branch and merge it back in when it is ready instead of when you go home and break the nightly build. This way you can do big changes without destabilizing your source tree. Of course you want continuous integration on your branches too. That way, you will push less mistakes between branches, thus solving problems closer to their origin and without affecting each other. You will still have breaking builds, but they will be cheaper to fix. Decentralization is the solution here and not the problem as is suggested in the blog post I linked above!

Here’s why decentralization works: testing effort grows exponentially to the amount of change. Double the amount of change, and you quadruple the testing effort. So don’t do that and keep the testing effort low. In a centralized world you do this through feature freeze. By stopping all change, you can actually find all the problems you introduced. In a decentralized world you do this by not pushing your changes until the changes you pull are no longer breaking your local branch. Then you push your working code. Why is this better? 1) you integrate incoming changes with your changes instead of the other way around. 2) you do this continuously (every time you pull changes), so you fix problems when they happen. 3) your changes only get pushed when they are stable which means that other people have less work with #1 and #2 on their side. 4) By keeping changes isolated from each other, you make it easier to test them. Once tested, the changes are a lot easier to integrate.

Continuous integration can help here but not if you only do it on the production branch: you need to do it all over the place. Serializing all the change through 1 integration environment turns it into a bottleneck: your version system may be decentralized but if your integration process is not you are still going to be in trouble. A centralized build system works ok with a centralized version system because centralized version system serializes the changes anyway (which is a problem and not something to keep bear shaving). The whole point of decentralizing version management is decentralizing change. You need to decentralize the integration process as well.

In a nutshell, this is how the linux kernel handles thousands of kloc of changes per day with hundreds of developers. And, yes, it is no coincidence that those guys came up with git. The linux kernel deals with design erosion by a continuous re development. The change is not additive, people are literally making changes all over the linux source tree, all the time. There is no way in hell they could deal with this in a centralized version management type environment. As far as I know, the linux kernel has no automated continuous integration. But they do have thousands of developers running all sorts of developer builds and reporting bugs against them, which is really the next best thing. Nothing gets in the mainline kernel without this taking place.