Re: bear shaving

I was going to submit the stuff below in a shortened form as a comment to this fun little blog post on “bear shaving” but it sort of grew into a full blown article, again. To summarize the original article, there’s this nice analogy of shaving bears to help them cope with global warming and how that is not really addressing the core issues (not to mention dangerous). The analogy is applied to integration builds and people patching things up. Then the author sort of goes off and comes up with a few arguments against git and decentralization.

While some of the criticism is valid, this of course ticked me off 🙂

I see Git as a solution to increase the amount of change and dealing more effectively with people working in parallel. Yes, this puts a strain on integrating the resulting changes. But less change is the equivalent of bear shaving here. Change is good. Change is productivity. You want more productivity not less. You want to move forward as fast as you possibly can. Integration builds breaking are a symptom of a larger problem. Bear shaving would be doing everything you can to make the integration builds work again, including forcing people to sit on their hands. The typical reflex to a crisis like this in the software industry is less change, complete with the process to ensure that people do less. This is how waterfall was born. Iterative or spiral development is about the same thing but doing it more frequently and less longer. This was generally seen as an improvement. But you are still going to sit on your hands for pro longed periods of time. The real deal these days is continuous deployment and you can’t do this if you are sitting on your hands.

Breaking integration builds have a cause: the people making the changes are piling mistake on mistake and keep bear shaving (I love the metaphor) the problem because they are under a pressure to release and deliver functionality. All a faster pace of development does is make this more obvious. Along with the increased amount of change per time-unit comes also an increased amount of mistakes per time unit. Every quick fix and every misguided commit makes the system as a whole a little less stable. That’s why the waterfall model includes a feature freeze (aka. integration) where no changes are allowed because the system would never get finished otherwise.

A long time ago I wrote an article about design erosion. It was one of the corner stones of my phd thesis (check my publication page if you are interested). In a nutshell: changes are cumulative and we take design decisions in the context of our expectations of the future. Only problem: nobody can predict the future accurately and as a consequence, there will be mistakes from time to time. It is inevitable that you will get it wrong sometimes and won’t realize right away. You can’t just rip out a single change you made months/years ago without the depending subsequent changes being affected. In other words, change is cumulative: rip one piece out and the whole sand castle collapses. Some of the decisions will be wrong or will have to be reconsidered at some point and because changes are inter dependent, fixing design erosion can be painful and expensive. Consequently, it is inevitable that all software designs erode over time as inevitably such change is delayed until the last possible moment. Design erosion is a serious problem. You can’t just fix a badly eroded system that you had for years over-night. Failing to address design erosion in time can actually kill your company, product or project. But you can delay the inevitable by dealing with the problems closer to where they originate instead of dealing with it later. Dealing with the problem close to where it originates means less subsequent changes are affected, meaning that you minimize the cost of fixing the problem. Breaking integration builds are a symptom of an eroding design. Delaying the fix makes it worse.

So, the solution is to refactor and rethink the broken parts of the system to be more robust, easier to test, more flexible to meet the requirements, etc. Easier said then done of course. However, Git is a revolutionary enabler here: you can do the more disruptive stuff on a git branch and merge it back in when it is ready instead of when you go home and break the nightly build. This way you can do big changes without destabilizing your source tree. Of course you want continuous integration on your branches too. That way, you will push less mistakes between branches, thus solving problems closer to their origin and without affecting each other. You will still have breaking builds, but they will be cheaper to fix. Decentralization is the solution here and not the problem as is suggested in the blog post I linked above!

Here’s why decentralization works: testing effort grows exponentially to the amount of change. Double the amount of change, and you quadruple the testing effort. So don’t do that and keep the testing effort low. In a centralized world you do this through feature freeze. By stopping all change, you can actually find all the problems you introduced. In a decentralized world you do this by not pushing your changes until the changes you pull are no longer breaking your local branch. Then you push your working code. Why is this better? 1) you integrate incoming changes with your changes instead of the other way around. 2) you do this continuously (every time you pull changes), so you fix problems when they happen. 3) your changes only get pushed when they are stable which means that other people have less work with #1 and #2 on their side. 4) By keeping changes isolated from each other, you make it easier to test them. Once tested, the changes are a lot easier to integrate.

Continuous integration can help here but not if you only do it on the production branch: you need to do it all over the place. Serializing all the change through 1 integration environment turns it into a bottleneck: your version system may be decentralized but if your integration process is not you are still going to be in trouble. A centralized build system works ok with a centralized version system because centralized version system serializes the changes anyway (which is a problem and not something to keep bear shaving). The whole point of decentralizing version management is decentralizing change. You need to decentralize the integration process as well.

In a nutshell, this is how the linux kernel handles thousands of kloc of changes per day with hundreds of developers. And, yes, it is no coincidence that those guys came up with git. The linux kernel deals with design erosion by a continuous re development. The change is not additive, people are literally making changes all over the linux source tree, all the time. There is no way in hell they could deal with this in a centralized version management type environment. As far as I know, the linux kernel has no automated continuous integration. But they do have thousands of developers running all sorts of developer builds and reporting bugs against them, which is really the next best thing. Nothing gets in the mainline kernel without this taking place.

2 Replies to “Re: bear shaving”

  1. Cool post! S mostly I agree with you that DSCM is a better way of solving the problem than using something like Subversion or similar but I think it’s important to keep pulling from the centralised repository as often as possible otherwise you end up having to do some crazy merging all in one go.

    We’re using Mercurial at the moment and trying to ensure that every checkin (even locally) still passes the build. Do you find you aim for taht as well or do you check in more frequently and sometimes it might be broken?

    With respect to using Git how I described in the original post I think it can be dangerous if you don’t update regularly but that’s the same with any source control system where you create a branch and work on that for any length of time.

    Cheers,
    Mark

  2. Hey Mark, very sorry about completely missing your reply here. I completely agree with you on synchronizing. However, in a DSCM synchronization is bidirectional. You can choose where you pull from and where you push too. And when you do that of course.

    If two people are working on the same bits and pieces it makes sense that they both pull changes from each other. Likewise if there is some centralized repository where tested and integrated bits and pieces end up, it makes sense to pull from there as well (note that this is not a mutually exclusive kind of thing).

    The key decision is when to push changes. In my view this depends on the size of the team/community and how badly the change is needed or how disruptive it is. DSCM gives you the option to delay and continue to absorb the upstream changes. In a team with only a handful of active developers, they probably need to be synchronizing all the time. In a team with hundreds of developers, most won’t be interested to see any change land until they actually need it.

    DSCM allows for a just in time delivery of change and allows for isolating the integration process. It also enables for the parallellization of a massive amount of change and for minimizing the time those changes need to get integrated.

    In any case, what I would like to do and what I’m actually able to do at work are two things. I’m stuck in an environment where largely people are not interested in changing the way they work and don’t realize how much problems their current way of working is producing. Basically, I’m depending heavily on git svn and am not pushing/pulling changes with other developers directly even though I would prefer to work that way.

    Typically we have several stories going on in a sprint with different groups of people working on each. What I would love to see is those stories taking place on their own branches with their own hudson builds and their own QA tracks and not landing in our stable code base until they are fully QAed and considered ready for sign off. What we have instead is the works in progress being committed on what should be our stable code base. We have a lot of process around this (which btw was co-designed with some thoughtworkers) but it’s not ideal. I’m actually guilty of breaking builds myself because it’s not always practical to kick off 25 minute builds on my laptop (which is one of the many interesting problems we have).

    What I’d love to have is my own private branch on some remote Git server that are properly hooked up to hudson jobs on some fast CI server that I can break/fix as often as I want and from which I can push whenever I know things are all good. You can even automate the pushing of changes that are stable and automatically roll back in case the upstream hudson jobs start failing. Basically with a setup like that the decision process could be fully automated. I just blindly push and unless I get my hudson builds to go green it will just never end up in stable.

Leave a Reply