SJ cartoon avatar

Development Software Development Processes and Digital Paperwork

I've briefly tackled this topic before, but I think it's worth revisiting and expanding upon.

Each client and development team I work with has their own processes and conventions for how they do things. Some of these are good, some are bad, and some are just plain ugly. Honestly, I don't really care - so long as they're internally consistent, functional, and so long as they work for the team, I'm happy.

However, for the clients who don't really have a semi-formalized process - I try to put some light guidelines in place. That's led to me writing up a bunch of similar documentation for multiple clients over many years, and I thought I'd finally put it all in one place as an occasionally updated reference.

I don't claim that this is complete or perfect, but it's a good starting point for anyone who's looking for some basic guidelines for their team's development process.

Note: This page will be periodically updated. Check the changelog at the bottom for any significant updates.

Source Control

Why?

Source control is a tool that allows you to track changes to your code over time. It's a great way to keep track of what you've done, and to roll back changes if you need to. It's also a great way to collaborate with other developers, and to share code with other teams.

To quote this excellent summary from StackOverflow:

Have you ever:

  • Made a change to code, realised it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Had to maintain multiple versions of a product?
  • Wanted to see the difference between two (or more) versions of your code?
  • Wanted to prove that a particular change broke or fixed a piece of code?
  • Wanted to review the history of some code?
  • Wanted to submit a change to someone else's code?
  • Wanted to share your code, or let other people work on your code?
  • Wanted to see how much work is being done, and where, when and by whom?
  • Wanted to experiment with a new feature without interfering with working code?

In these cases, and no doubt others, a version control system should make your life easier.

If there are any lingering doubts about the value of source control, I'd recommend reading this article from Atlassian: Why Use Git for Your Organization. It lightly touches on why non-developers should care about Git/Source Control as well.

Repo Naming

Naming is hard. Naming things is even harder. Naming things in a way that is consistent, descriptive, and easy to remember is really hard. Naming things in a way that is consistent, descriptive, easy to remember, and doesn't conflict with other things is really, really hard.

Repos are no exception.

Here are some guidelines and considerations for naming repos, citing some advice from this thread:

  • Use lowercase words
  • Use hyphens to separate words
  • Prefer specific over generic naming
  • Be (reasonably) consistent with other repos
  • Repo names should be (reasonably) self-explanatory

My preference is to use case-by-case variants of this following format:

{optional client}-{project/purpose}-{type}-{optional platform/specifier}

While I don't exactly follow the advice in this GitHub Semantic Naming article from Modus on what their team does, I think it's worth a read for their rationale on choosing naming conventions.

I've worked on over 300 repos, so consistent naming is important to me and I even use this format of naming when locally cloning and re-naming repos for client or open-source projects.

Aside: OSS

I'd be remiss if I didn't mention that these suggestions don't really apply for public libraries or open source project repos. Those projects involve some amount of branding in lieu of highly-specific names, and there typically aren't LOTS of those repos per team. What I've noticed instead is that these types of projects tend to create a monorepo, and then follow a naming convention for each package or plugin.

For example, from the moshi repo:

  • moshi
  • moshi-adapters
  • moshi-kotlin
  • moshi-kotlin-codegen

Client

This applies more to contractors, consultants, and agencies - but this should be self-explanatory. If you're working for a single client (or all your projects are internal), you can probably skip this to reduce typing.

This prefix can also depend on your source control host. For example, using GitLab (or BitBucket), you can use groups to organize repos by client. In that case, you can probably skip this prefix. An exception would be if the client’s code is public-facing, where the extra branding is useful to 3rd parties.

Example client prefixes:

  • apple
  • google
  • microsoft
  • msft
  • pantsbuild
  • robotpajamas
  • rpj
  • vicara

Project/Purpose

The "thing" this repo represents. This is the most important part of the name, and should be the most descriptive. It should be unique within the client, and should be the same across all repos for the same project/purpose.

Preferably a single word, but can be multiple hyphen-separated words. In practice, this part of the name should be very "obvious" as it will likely line up with what you call the project conversationally, in Slack, or in JIRA/Basecamp/etc.

Conveniently, these names are probably going to line up with the name of the project, or the name of a user-facing application as determined by the client.

Example project/purpose names:

  • awesome-product
  • blueteeth
  • grpc
  • hackernews
  • harvest
  • swiftyteeth

If these repos are microservices, then the purpose should be the name of the service. This should still be obvious, but maybe slightly less so than the above examples. The obviousness of the name is related to how constrained the microservice itself is.

Example microservice names:

  • authentication
  • big-ball-of-mud - Hopefully you don't have this as a microservice
  • image-processing
  • payment
  • search

Type

The type of software this repo represents. There is no hard and fast rule for this, but it's useful to have a consistent set of types to choose from and to filter across.

Here are some common types I've come across and used (including common synonyms):

  • api - A REST API, GraphQL API, gRPC API, etc (similar to service, but more likely to be external facing)
  • app - A mobile application (sometimes a web application)
  • bsp - Board support package - a collection of drivers and other software for a specific hardware platform
  • demo - A demo application or feature, intended to be used as a showcase rather than a production-centric piece of software - could be used interchangeably with prototype or example
  • docs - Standalone documentation that, for some reason, aren't kept alongside the code or in a wiki
  • fw - Software that is written for embedded systems (possibly bare metal, possibly an RTOS, possibly embedded Linux)
  • lib - A shared library containing some common code, to be consumed internally or externally - typically containing a single area of functionality (e.g. image processing library)
  • sdk - A software development kit - one or more libraries (plus supplementary material) to aid in developing code for another piece of software. While not strictly necessary, SDKs tend to be consumed by external developers, and are often language/platform specific.
  • service - A backend web service (REST API, GraphQL, gRPC, etc), unless the word "service" has a specific meaning in the context of the project
  • web - A website or, sometimes, web application. I've also seen app for web applications, and web for marketing websites, blogs, etc...

Platform/Specialization

For cases where the rest of the repo name may be the same, but there is some “unique” aspect to this repo. For example, ios or android may be specifiers, or v1, v2 when there is some reason versions can’t be kept in a single repo.

Essentially, if the software is fundamentally the same, but there is an operating system, language, or other platform-specific aspect, this should be the specifier.

Programming language is always not a great specifier, because unless that really matters (e.g. a repo is a fork of a library written in a different language, or an implementation of a standard in a specific language - grpc-python, grpc-node, grpc-java), it’s not really relevant to the repo’s overarching purpose.

The exception to this rule tends to be SDKs or imported libs, where the language (or platform) is the primary purpose of the repo. In this case, the language should be the specifier.

One specifier I use often is when I'm cloning the same repo multiple times for different features that I want to work on in parallel. In this case, I'll use the feature name as the specifier. An example of this is with Pants, which has a decent build-time cost when I change branches - so I just maintain one local repo per feature.

Example specifiers:

  • android
  • ios
  • linux
  • js
  • python
  • nrf52840
  • stm32l4

Examples

Unfortunately, I can't display the repos I actually work on - so I've anonymized them. I've also tried to keep the examples as generic as possible, but I've included some real-world examples for context.

  • grpc-python - Using language because it's a language-specific implementation of a standard
  • pantsbuild-pants-pyright - Using feature name because I'm working on multiple features in parallel
  • myclient-awesomesauce-sdk-kotlin - Using language because it's a language-specific SDK implementation
  • myclient-awesomesauce-app-ios - Using platform because we have multiple apps
  • myclient-theirclient-publiclibrary-bsp-nrf52 - I have a client, they have a client, we're working on an embedded BSP for a specific platform that is external-facing. This is my local naming, but the repo is actually publically available as theirclient-publiclibrary-bsp-nrf52.

Branch Management

Once you've got a named repo, you'll need to start working on it and my heavy recommendation is to never commit directly to main/master and to have lots of short lived branches, which are PR'd back into main.

Branches should be created from the latest version of main, and should be squash-merged back into main/master (via a pull request) when complete.

There are no hard and fast rules about how many branches should be created, but my general rule of thumb is that a branch should be created for each issue, and should be deleted once the issue is complete. In order to avoid stale branches, the hope is that branches will be short-lived (a few days at most, ideally less than a day), and will be merged back into main as soon as possible. Any longer, and the branch risks becoming stale and incurring merge conflicts.

If the branch is stale, it should either pull from main and handle merge conflicts, or be deleted and recreated from main (with stashed changes re-applied).

Bugs

The reliability of this process depends on the quality of the CI/CD pipeline, and the quality of the tests. If the CI/CD pipeline is flaky, or the tests are flaky, then the branch management process will be unreliable - as you'll constantly be merging bugs into main.

If you've accidentally merged a bug into main, depending on the size/scope of the bug, you can either revert the commit - or create a new branch, fix the bug, and follow the process as normal.

This is a light variant of the GitHub Flow branching model.

Aside: GitFlow

Previously I recommended using GitFlow/HGFlow. I still like GitFlow, and I still use it for some projects, but I've found that it's not always the best fit for every project. I find that it depends on the size of the project, the number of people working on it, the release cadence, and the associated QA/CI processes.

It also matters if you need to support multiple versions of a software/firmware in the field, where hotfix branches are used to fix bugs in older versions. This is a common use case for embedded software, where you may have a device in the field that is running a specific version of the software, and you need to fix a bug in exactly that version. This is also a common use case with large, non-SAAS, licensed enterprise software - where a customer may (by contract) run specific versions of software only.

If you can simply always apply features and bug fixes to the latest version of software and release that, "forcing" everyone to update (i.e. backend services, sometimes mobile apps, etc), then you can get away with a simpler branching model. I'd recommend this when possible, with GitFlow being the fallback.

Branch Naming

Branches should usually be named after the issue they are working on, and should be prefixed with the issue number. If there is no issue, then the branch should be prefixed with the initials of the person working on it.

Here are some guidelines for branch naming:

  • Use lowercase words
  • Use hyphens to separate words
  • Prefix with issue identifier, then concise (2-3 words) title

Examples:

  • Bad: “dev_suresh_test_2”
  • Meh: “social”
  • Less Meh: “sj-social-wip”
  • Good: “123-google-auth”
  • Good: “124-google-login-fix”

This naming convention doesn't change the fact that each of these branches should be short-lived, and should be deleted once the issue is complete. The name "sj-social-wip" doesn't mean I can leave it around for a month. It's just a way to make it easier to identify the branch, which (for some reason) doesn't have an associated ticket.

Git Commit Messages

Chris Beams has a great article on how to write good commit messages.

Personally, I don't write all of my commit messages this way - it's just too tedious. However, I do think those seven rules are a great recommendation for writing squash-merged PRs. The goal is that a developer's intermediate commits are squashed into a single commit when the PR is merged (and are thus irrelevant).

Enumerating the seven rules here for reference:

  1. Separate subject from body with a blank line
  2. Limit the subject line to 50 characters
  3. Capitalize the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why vs. how (either in paragraph form, or bullet points)

The more complicated the PR, the more important the guidelines above become. The goal is to make it as easy as possible for someone to understand what the PR is doing, and why it is being done without digging into the code - both during the PR review process and at some point in the far future.

Note: It's more important to keep these guidelines in mind, rather than strictly adhering to them. If the PR is simple, and the commit message is clear, then it's not necessary to follow the guidelines to the letter.

Here is an example commit in the Pants project that doesn't strictly follow the guidelines, but conveys the intent of the commit. Also, the code changes are relatively clear to anyone working in this project (without going overboard).

I've also appended the PR number to the commit message, which makes it a bit easier to find the PR that introduced the change.

[internal] Subsystems, Options, and a Process for the cc backend (#17844)

- Created a set of subsystems and options for a system-located gcc or clang
- Added ability to download a self-contained toolchain (e.g. the ARM gcc toolchain).
- Using `cxx` naming to be consistent with Make, CMake, etc
- Skipped tests if there is no clang or gcc present on other dev's computers

Changelog

  • 2023-01-04: First commit.