SureshJoshi.com ▼

Git Busy Coding or Git Busy Dying


2015-06-25

I’ve spent much of my career building up development teams and training junior developers. The first major stumbling block comes straight out of The Joel Test, and it’s always the same. Source Control. It blows my mind everytime I see a software development company that doesn’t use source control, because for the life of me, I can’t figure out how they made it as far as they did. Well, if you’re in one of those companies, I have one simple compound word for you… GitFlow…

Why source control?

This won’t be a discussion of why you should use source control in the first place, however, I will re-post this excellent summary written on StackOverflow.

Have you ever: * Made a change to code, realised it was a mistake and wanted to revert back? * Lost code or had a backup that was too old? * Had to maintain multiple versions of a product? * Wanted to see the difference between two (or more) versions of your code? * Wanted to prove that a particular change broke or fixed a piece of code? * Wanted to review the history of some code? * Wanted to submit a change to someone else’s code? * Wanted to share your code, or let other people work on your code? * Wanted to see how much work is being done, and where, when and by whom? * Wanted to experiment with a new feature without interfering with working code?

If asking those questions doesn’t work, you can try my ninja tactic of taking that developer’s codebase (which is naturally a zip - interestingly never a tarball), inject a couple of innocuous runtime crashes, and then go through all their files and do a ‘tabs to spaces’ (I guarantee they use tabs, ‘spaces’ people just get it).

When the dev tries a diff and sees 10,000 modified lines, they’ll start to see the light (I know they could ‘space to tab’ but my point is usually made). Even in source control, the same issue could crop up, but I tend to isolate, and tell my developers to split code changes vs code-formatting changes when possible. Thus, 2 commits and crashes are found in seconds not hours.

Anyways, on to the heart of this post:

GitFlow and its required reading

Before you go any further in this post, jump over to this link (http://nvie.com/posts/a-successful-git-branching-model/) and once you’ve read it, read it again. I’ll wait here…

Done? You sure? Because nothing I write after this will make much sense otherwise.

… … …

Just for a little extra spice, in addition to reading Vincent’s post, I -HIGHLY- recommended using Atlassian’s SourceTree and reading through their implementation of GitFlow (and this one too.)

In fact, on my teams, I insist on using SourceTree and SourceTree’s GitFlow/HgFlow functionality for consistency across developers. I tend to get one or two people who want to do it all command-line style, and inevitably, they eff everything up.

My best practices

In addition to the Vincent and Atlassian links above, I usually tack on a couple of other conventions and best practices, just to remove as much ambiguity as possible. Why remove ambiguity? Is source control where you want your developers to exercise their creative side? Methinks not.

Branching convention?

I always follow SourceTree’s GitFlow/HgFlow branching convention (the post-name comments are mine):

GitFlow: master  ← Only ever 1 branch develop  ← Only ever 1 branch feature/nameOfFeature  ← No branch limit release/v1.2.3  ← No branch limit, but only 1 per release hotfix/v1.2.4    ← Note the version bump - no limit, but should be rare

HgFlow: default  ← Only ever 1 branch develop  ← Only ever 1 branch feature/nameOfFeature  ← No branch limit release/v1.2.3  ← No branch limit, but only 1 per release hotfix/v1.2.4    ← Note the version bump - no limit, but should be rare

What’s in a name?

When naming features, the branch names should be concise, but illustrative - and should rarely contain references to being a ‘dev_’ branch, or be numerically incremented.

Bad: ‘dev_suresh_test_2’ Yes, I have seen this, with ‘suresh’ replaced with the developer in question. This naming doesn’t tell us anything about the work contained in the branch, and only tells us the name of the developer who probably created the branch.

Slightly Better: ‘feature/social’ It is concise and gives some insight into the fact that this branch is a new feature, some idea into what is going on in the branch, and what the scope of that branch is.

Even better: ‘feature/socialAuthentication’ or ‘feature/socialAuth’ Removes any ambiguity about the high-level branch functionality. Clear, but also concise and all at a glance (no need to read commit history).

What’s in a branch?

I like to consider GitFlow a form of a feature-driven development branching model. What this means is that each branch should be constrained to the scope of 1 feature (and only 1 feature).

The benefit is that any unstable, or unused feature, may easily be backed out by simply doing a reverse commit on the commit which merged from ‘feature’ to ‘develop’. Also, it allows many developers to work on different features, without impacting each other’s development, except slightly during merges.

There are no specific rules for creating feature branches, and how large they should be, but a few safe rules of thumb in my opinion:

  • A feature branch usually doesn’t cross more than 1-2 sprints, otherwise the feature is too big and should be split up
  • A feature for each JIRA (or other tracker) story will probably work nicely
  • A feature for a JIRA (or other tracker) epic could also work, if there is no good sub-feature model

The key point is that having long-running, all-inclusive features is bad. For example, the dev_someguy_test_2 branch I saw lasted for a few months, had all changes and features incorporated there, and was the source of about 20 released builds. How do I know when a build was produced? Well, I had to go through each commit log and diff until I saw a build number changed… Yeah…

Never have I ever

Here are a few ‘Never’ and ‘Always’ points… Usually, I’m not so dogmatic

  • however, I like to keep these firm rules in place for the first few months of a developer learning how to source control. Afterwards, the adverbs change to ‘Rarely’ and ‘Usually’.

    • NEVER rebase a feature branch onto develop (always merge)

      • A rebase cannot be (easily) reverted or uncommitted
      • A rebase rewrites history, and could thus introduce errors into Git history, for previously stable branches
    • NEVER do cross-branch merges (e.g. merging from feature/f1 to feature/f2)

      • This makes it almost impossible to revert features, as now they are tightly coupled
    • ALWAYS create and merge features from/to ‘develop’ only (again, use SourceTree’s GitFlow)

    • ALWAYS create a release branch from ‘develop’ before an official (or unofficial) release

    • ALWAYS merge release branches to ‘master’ and ‘develop’

    • NEVER push a broken build (e.g. doesn’t compile or doesn’t run)

      • Before a push, everything needs to build and all unit tests should still pass
      • Otherwise, other devs will inherit broken code, and this reduces efficiency
    • NEVER push a merge conflict

      • These should all be handled locally
    • ALWAYS provide commit messages that others can understand - even better, mention an issue number in the commit, as often a bug tracker and source control repo can auto-link

      • Bad: “Bug fixes” ← Which bugs? What was fixed? What was not fixed?
      • Good: “Resolved SVR-32 Updated login API so that Sync Gateway cookie not returned unless user’s email is confirmed” ← Not much ambiguity
      • Easier to code review, and find later
    • ALWAYS err on the side of many smaller commits vs singular monolithic commit

      • Maybe 1-2 small bugs per commit, for example
      • Easier to track introduced bugs this way
      • Easier to code review
    • ALWAYS make sure that ‘master’ is completely and 100% stable and fully tested

      • ‘master’ is what is in production, and cannot be buggy or in constant flux
      • ‘master’ is where SDK or lib versions will be pulled from, commits to develop or feature branches will never be pulled
    • ALWAYS increment version and build numbers for any commits that leave the developer’s environment

      • A mild exception might be going into an unstable stream, where the consumer knows everything is in flux

Semantics and such

Wikipedia has a good explanation of software versioning, however, I find section 5 (political and cultural significance) tends to shape how a lot of companies handle versioning. For better or worse. Alternatively, a lot of other companies just keep incrementing numbers, and every time they hit 10, they roll over (e.g. 2.5.9 -> 2.6.0)

Build numbers

There is a distinction between versions (e.g. v1.2.3) and build numbers (e.g. build 3029), which is that build numbers are usually monotonically increasing integers. Other times, they might be the date of the build (like a manufacturing date on something you buy). Build numbers are often for the purpose of app stores (in Android and iOS land anyways) to ensure that they are always using the latest builds when they automatically update app users. A simple buildNumber > previousBuildNumber, if yes, then update kind of thing.

Build numbers can also provide a level of granularity that is good for developers to have, but you don’t want to always make public. Something like, we’re releasing build 254 of v1.2.3 (or… v1.2.3.254, but that looks ugly, so they call it v1.2.3 and track internally that it was build 254).

Version numbers

Version numbering is a bit more fluid, because there is a real political side to it… “Our version 10 is sooo much better than Initech’s archaic version 9!” As a developer with handfuls of OCD, this lack of logical structure doesn’t jive well with me, but whatever.

There are so many versioning patterns to use (I’ve read through about 20 in the past) - and when you increment the Major/Minor changes between different philosophies - but a common theme is: Major.Minor.Build Major.Minor.Hotfix (or Major.Minor.Patch)

I didn’t realize it, but when I’ve versioned my code, I’ve ended up using a less-structured form of Semantic Versioning. Semver is used extensively in Node/NPM, and I really like it… And for those of you who find the language on that page to be too dry, there is a slightly spiced up explanation at the beginning of this article.

Semver just formalizes the versioning concept around public API functionality. Basically comes down to

  • Are you releasing code?
  • Are you in production?
  • Have you released new functionality?
  • Have you broken compatibility?

I recommend to my teams to use Semver (or something like it) for any/all APIs, SDKs, and libraries that any other human or machine will need to consume.

I’ve noticed, anecdotally, that App versioning is partially a business/marketing thing (e.g. “Guess what everyone, we’re releasing Version 3.0!!! Woot!! Life is so much better for everyone! Buy our product!”), not entirely a development thing - however, when building SDKs and libraries, we need solid versioning to see when we can update libs/SDKs, and when we can’t.

Agree a lot? Or agree less?

More than many of my other articles, this one is really opinion-based (as best practices usually are). So if I got anything wrong here, or if anyone has a better methodology I should consider - I would love to hear it! I’m a process geek, so I want to keep up with the Joneses… Or Jetsons… When they use source control… Hmm, that saying really fell apart on me.