Using Git and feature branches effectively

There is a bit of a debate raging in the last few weeks that started when somebody commented on a few things that Martin Fowler wrote about git and using feature branches and feature toggles. Martin Fowler makes a few very good points about how feature branches can cause a lot of testing and integration headaches and lot of people seemed to have picked up on this one and there seems to be a bit of an anti-feature branch movement emerging.

The main problem I have with this is that these people are confusing problems and causes here and effectively blaming a solution they have for causing a problem they have. Roughly the argument goes as follows: feature branches cause people to accumulate change that then becomes very disruptive when it lands on the main branch. There will be CI breakage and lots of problems for people on other feature branches that are suddenly faced with merge issues. I think Martin Fowler actually gets it but some of his followers seem to be confused. By all means, feature toggles are a great way to get changes in early and have them exposed. It’s a powerful tool that you can use to get changes out earlier. Early is good: use it. However, it is not the case that if you use feature branches there has to be a lot of pain for everybody. In other words, feature branches are not inherently evil and don’t have to be problematic.

It’s not the practice of feature branching that is the problem but the fact that testing and continuous integration are not decentralized in a lot of organizations. In other words until your changes land on the central branch, you are not doing the due diligence of testing. Even worse, you are not making sure you have tested your changes before you add them to the main branch.

You can’t do decentralized versioning unless you also decentralize your testing and integration. Git has value when used as a SVN replacement. Git has more value when used as a DVCS. There is no good reason why you can’t do decentralized testing and integration with git. Rather the opposite: it has been designed with exactly this in mind. The whole point of git is divide and conquer. Break the changes up: decentralize the testing and integration work and solve the vast majority of problems before change is pushed upstream. If you push your problems along with your changes you are doing it wrong. Decentralized integration is a very clever strategy that is based on the notion that the effort involved with testing and integration scales exponentially rather than linearly with the amount of change. By decentralizing you have many people working on smaller sets of changes that are much easier to deal with: the collaborative effort on testing decreases and when it all comes together you have a much smaller set of problems to deal with.

This is how the linux kernel manages to remain stable, despite the fact that the amount of change there is measured in thousands of LOC/day. If you need proof that thousands of people can work collaboratively on millions of lines of code using git branches: look at the linux kernel. The amount of change, integration and testing effort, etc. in whatever you are working on probably pales in comparison with that: your problems are easy.

Here’s a few simple practices that will address most of the issues:

  1. No change that will break CI on the main branch is allowed on the main branch. Zero tolerance on this one. In git terms: rebase against main, run ci test on the feature branch, fix any problems, push. You can automate this even: jenkins pretty much supports this out of the box. If main breaks ever, somebody doesn’t get the basics: educate them with a big clue bat. Rationale, it is vital to keep main stable at all times. That way everybody on a feature branch will know it is safe to rebase.
  2. Rebase against main frequently, especially if you do big changes. Rationale: you are doing the changes, it is you that will take the pain of doing the integration work when things go wrong, not everybody else. The earlier you know about problems, the easier it is to fix. Feature branches are not about stopping rebases. If your feature branch is way behind, you have done something very wrong. Don’t ever do that without good reason. If it is on the central branch, you will have to deal with it at some point: so get it over with ASAP.
  3. Commit frequently, keep commits as small as you can. Rationale: smaller commits are easier to analyze and fix when you have conflicts. Also, Git is really good at applying commits one by one. If you isolate the merge problems to a handful of commits, rebasing is pretty much painless.
  4. Push as early as you can. If the CI builds are green on your branch and you are confident things are fine, push and don’t wait. Don’t accumulate integration work for others. Feature branches are not about hiding change but about isolating change. There’s a difference. One is a communication problem and the other is a proven strategy of divide and conquer. If appropriate, feature toggles are indeed a great way to land experimental changes. I believe this was the main point Martin Fowler was trying to make.
  5. Communicate clearly around big code restructurings. Rationale, everybody rebasing against your changes will experience some pain. You are causing people to have to do work, so tell them it is going to happen. I always ask people to push their changes before I push my big changes. That way, I can fix the integration issues on my side before I push.
  6. Collaborate with people by pushing changes back and forth without involving the main branch. Git format-patch is your friend: you can do this by email. If somebody needs a change before you land it, you can give it to them.
  7. Be aware of the cost of things. Any time people spend on things like resolving merge conflicts, doing rebases, etc. costs you. Inevitably when you branch you accept that there is going to be some cost. Keeping a branch alive means you add cost. Don’t do it needlessly. So, branch and do what you have to do and then rebase, push and delete the branch. And again, not committing is effectively the same as branching in git. It has the same cost and risk attached to it.
  8. Don’t push branches to origin unless they are going to be long lived and need to be worked on by multiple people. For simple work, keep the branches local to your machine. If you are going to be doing all of the commits and integration work, the only valid reason for pushing it upstream would be backups. There are alternatives to backing this way.
  9. Beyond a certain team size, the stable branch needs to be protected. Pull change rather than pushing it (this is a severely underused git feature). Junior on the third floor says his patch is ready: pull it, have a look at it and give him feedback but don’t allow him to push and bypass checks and balances. Push only works in small teams. Pull forces people to communicate.

When will (feature) branches get you in trouble? Antipatterns:

  • You are working on a big change. You haven’t updated for days. Do I need to spell it out? That is just wrong. Update!
  • You are pushing a big change, all the CI builds go red. Oops, test before you push you dumb idiot. You deserve all the angry looks you are going to get. If this happens a lot, consider setting up your CI environment for having a stable branch for tested commits and an unstable branch for incoming commits that need to be tested. Jenkins supports this and it will keep unstable change out of stable.
  • You are doing a big change, somebody else is also doing a big change. You find out about that when you rebase and spend hours dealing with merge conflicts. Seriously: communicate before you do anything drastic and give people an opportunity to get their changes in before you ruin their day. Pushing massive changes that you know are going to cause problems when people rebase is a very egocentric thing to do. Be a team player.
  • You are working on a big change. You haven’t updated or committed anything for several days on your local branch. You are effectively using your local branch as a feature branch: everybody who is not pushing change is effectively on a feature branch. Not committing is NOT a strategy to avoid using feature branches. Also, you are not committing? Why???? What’s your excuse for not committing to your own local branch? Seriously, consider using version management and stop treating git as a file server for code backups.
  • You are working on a branch for an extended period of time but your CI builds only run on the central branch. Congratulations, you have just tossed out CI as a good practice. Fix it. Either have the discipline to run the CI tests manually on every commit to the branch or set up a CI build for it. Either way, don’t break the feedback loop you get with CI.

Now at this point I have to admit something: I don’t use feature branches a lot but I do tend to accumulate a few commits locally before I push them. Also. we haven’t set up stable and unstable branches in jenkins (yet, planning to). We have the occasional breakage of our CI builds. I’m actually guilty of breaking some of the builds myself. The reason/weak excuse is: I’m having a hard time changing people’s way of working. You turn your back and people stop committing and continue treating git like they are using cvs. But I’m at least aware what the real problem is here: not the fact that I have branches but the fact that our testing needs to be decentralized.

9 Replies to “Using Git and feature branches effectively”

  1. The biggest problem with Feature Branches is that they remove your ability to . You mention “big changes” in a lot of your points, but if you’re really into doing TDD’s Red/Green/Refactor cycle and “Leaving the campground cleaner than you found it” , you know that much more often what you have are small improvements, or little redesigns.

    In a feature branch situation, you end up with people fixing the same issues, most probably in different manners, or people having the codebase morph under their feet; and in both cases, big merge pains.

    More than once in a project I had to neglect my duties as a professional programmer to refactor and clean up on the small due to FEAR of someone else changing that code in a branch that lives for 2 weeks. Those same projects were infested with “big refactors”… from your points it sounds like that seems natural to you – it is not.

    By the time the merge is done and conflicts arise, the sequence of steps I took when refactoring will have vanished from my mind already, and it will be painful for me and the other programmer to sort it out.

    Some people are scared to merge “unfinished” code into the codebase, but most of the time, it doesn’t matter, or can just be disabled. In rails you could just switch it off based on environment, if you don’t wanna use Feature Toggles. Most of the time you can get away by just omitting it from the UI navigation and using a direct URL to access new functionality when you must.

    In my experience, the best programmers are those that take baby steps, incrementally building functionally all the while keeping the code compiling and tests passing at the majority of the time – certainly at each git commit. There’s no justification for the fear of working on master; other than failing to follow this discipline, and having bad tests that don’t point out regressions well enough.

    1. Good points. However, big refactorings are not necessarily a technical issue here. If anything, git is actually quite helpful when it comes to this. I have in the past done big refactorings on git branches. It actually works. E.g. big package renames in Java can cause a lot of fallout. I’ve done such changes on branches and successfully rebased against upstream changes for several weeks. The challenge here is communicating your change is going to happen and picking the right moment. With subversion, your only option is to tell everybody to sit on their hands and not commit while you cross your fingers and do the changes and commits.

      In git you do these things on a branch. Even if you don’t call it a branch, your local repository is for all practical purposes a branch. Not pushing your commits for a few days is exactly the same as having a feature branch. You’re just not keeping it in a safe place (i.e. in a backed up location on a server), which is not a very bright thing to do.

      The real problem here is the transactional view of a version control where anyone can claim a global lock (a.k.a. commit freeze) on the repository and do their thing. This does not scale to larger projects. Git was in fact designed to solve this problem but it requires you change the way you work.

      Instead of everybody pushing their changes commit by commit whenever they feel like it, you instead use pull requests (or the old school email based version of that as the Linux kernel guys do). A pull request is nothing but a serialized feature branch. Most large open source projects work this way by now for good reason: it works. Pull requests are very simple: either they merge cleanly without causing errors and test failures or they don’t. Only the former type gets merged typically.

      Using pull requests forces people to communicate about their changes. Suddenly a big refactoring becomes your problem and not everybody else’s. If your big refactoring pull request doesn’t apply cleanly to the master, it gets rejected and you have to do more work. You iterate on that while rebasing against upstream changes, effectively doing the integration work before your pull request is integrated. Whether you like it or not: you are on a feature branch at this point.

      If somebody else has a conflicting feature branch, you had better learn to talk to each other before you find out the hard way by having your patches rejected. Git allows you to rebase against each other so you can resolve whatever conflicts you have before you dump your changes on master.

      The key point in my blog post is that if you isolate yourself like this from the master, you have to also decentralize your testing. A key problem here is that in many organizations master is the only thing that is continuously built and tested. So everybody is anxious to have their half broken code on master so they can know the impact. If you’ve ever experienced a project with widespread commit anxiety, you’ll know that this is a very negative thing to have.

      If you practice continuous deployment it is not acceptable for the master to ever be in a broken state because it means you can’t deploy. This implies that all change has to be fully tested and integrated before landing on master. This means by definition that practicing continuous deployment means you also have to use feature branches.

      1. I think the “big refactoring feature branch/pull request” combined with other people working on other features (on related areas of code) based on the old master is what creates the problem here. However well communicated it is, after your refactoring is merged into the master, likely some of the other feature branches are broken (when they rebase), and their owners need to fix this by integrating your changes with their ones.

        (We recently had a case where both me and a colleague decided to clean up a JUnit test case for the same problem it had. In a different way, unfortunately, which caused some pains for me as I integrated my (the second) feature branch with the master. This could have been improved by better communication, but I guess announcing each test case fixup before committing is still a bit too much, isn’t it?)

        The fear of this “extra work” for colleagues is what causes the mentioned refactoring anxiety.
        I guess it can be reduced by having feature branches (both refactoring and “real” features and bugfixes) as short-lived as possible (and rebasing/merging master as often as possible) to avoid having to integrate many changes at once.

        1. With Git how old a branch is is much less of an issue than how big the per commit diffs are. A good commit creates the minimal amount of change, Even a big refactor you can usually break down into such small commits. In git every checkout is effectively a branch. Whether you give it a nice name doesn’t actually matter a whole lot.

          The nice thing about Git is that it gives all parties all the tools to stay up to date at the moment of their choosing. So if I create a branch, work on it for two months, and then merge it to master, I will likely be crying a lot about the ton of change that happened on master that is causing me headaches trying to merge. But at least it is my problem. However, not merging upstream changes to my local branch is a choice, and a bad one I’d argue. I tend to merge master changes to my branch frequently and then when I push I know it is not going to create conflicts.

          Rebase your local master branch and merge it to your feature branch and push that to remote. People with other branches doing the same will then rebase against your gazillion commits after you merge. By that time the only potential for conflicts is the work they haven’t pushed to master yet.

          So anyone else doing the same might still need a little headsup that hey I’m about to merge in a gazillion rebased commits, but fundamentally I’ve already dealt with the conflicts and all the other feature branches should not be too much out of sync with master if their maintainers did the same. The only risk is conflicts between your and my unintegrated changes.

          Refactoring anxiety has usually more to do with the functional fallout. If you are not running integration tests on your feature branches, you are essentially sitting on a huge amount of untested change. Pushing that in all at once creates a combinatorial explosion of things that can and probably will go wrong. It’s better to trickle in small changes over time from a QA point of view.

          For larger projects like the linux kernel this is not feasible, so they do pull instead of push for all changes. Github used in the right way supports this as well. Basically the responsibility is on the committer to convince his peers to pull his changes. This is a much more scalable approach to large scale software projects and it enforces sensible aversion to big, disruptive commits. Because why would you disrupt a large group of developers unless it cannot possibly be avoided?

  2. Apart from communicating about changes, especially big ones, wouldn’t it be better to also push early, using feature toggling as example, so that other in the team have a chance to pull? Or should we pull from each other feature branches?

    (Sorry if I sound dizzy that’s probably because I am. Feature branches are new to me.)

    1. The key goal with git for Linus Torvalds was to make development asynchronous and to unblock people from waiting for each other and allow them to contribute change when it is ready instead of ASAP. Feature branches are the key enabler for this. The Linux kernel is the living proof that this works at scale. I’d recommend people to take a look at the activity graphs on Github for the kernel. It’s by far the most active software project (lines of code changed/time unit) I know of and it certainly dwarfs any kind of corporate development project that I know of. What you see there is the tip of the iceberg. Most Linux development happens downstream on feature branches (every clone is a branch effectively, whether you like it or not) that are only pulled into the mainline when ready and convenient. This merging is done by Linus or one of his people who pull and scrutinize the change. The key difference with many other projects is that there is no push and all the pulls are done via mailing lists so communication is baked into the process.

      Feature flags are indeed a great way to introduce experimental changes if you want to do e.g. AB testing. In Linux, however, experimental changes tend to live outside the main repository until such time it is actually in a shippable state. There is no technical need to pull early since pulls work bi-directionally: you can always update your branch with the latest upstream changes. So, there is no urgency to land change early. It’s actually better to not pull early because that way you don’t burden the community with untested code. It forces more work to be done before things are added to the mainline rather than after. This is actually a good thing because testing upstream is way more expensive than testing downstream. The rationale is that integration cost scales exponentially with the amount of change. So if you have 10 half tested features landing upstream, you have a much higher integration cost than when you have 1 untested feature downstream plus 9 fully tested new features that you pull from upstream. Keeping change downstream until it is verified to be shippable keeps integration cost low. The flip-side is that if you have a centralised testing infrastructure that only tests a central branch, you have effectively no integration test until after your change lands. Long lived branches in such a situation are not a great idea unless you can ensure those branches get tested too.

Leave a Reply