The older a codebase gets, the more WTF moments will occur to the people working on it. Therefore, documenting architecture decisions over time is crucial to prevent them from pulling all their hair.
Published on Sun, February 06, 2022
I learned the hard way multiple times that there is no perfect codebase. No matter how good the early intentions, how experienced the developers and architects, and how well laid out the plan to structure and grow the code – reality will constantly thwart that plan.
Code does not exist in a vacuum. It will always be highly influenced by all kinds of stakeholders and environmental quirks. Be it the sales department selling features that are not even specified yet, strategical changes like business model pivots, or simply people working on software with different backgrounds, levels of experience, etc.
So when looking on some existing codebase, one might often wonder what the actual fuck led to its current state. Context matters. And that context, more often than not, cannot be “reverse engineered” from looking at the code. If you are lucky, some information is buried somewhere in a ticket system or some requirement spec. But even then, the actual implementation might already have diverged when the feature was shipped for the first time.
What I found to be a pretty effective tool to mitigate the problem laid out above are architecture decision records (short ADR). In their simplest form, they are simple text files containing information that will help your future self or anyone else understand what influenced a specific part of your codebase. It does not discuss the how in detail but instead focuses on the why. In doing so, it provides the critical piece of the puzzle that is missing: context.
I found them very helpful even for (longer-running) projects I worked on alone. Because after six months, I often barely remember having implemented a feature at all. Being able to look at an ADR is quite helpful then.
So, what does an ADR look like? There are plenty of resources, and sophisticated tooling for larger teams exists, e.g., to build reports, etc. But as so often, you won’t need most (or any) of it to get started. I am still happy with an adr
folder located at the root folder of the Git repository of an application. In that folder, every decision is recorded in a specific file.
That file contains some metadata, a description of the context and the problem at hand, the considered options, and the decision outcome. Let me conclude with a real-world example:
# Blobs for deleted Clients
- Status: accepted
- Deciders: A, B, C
- Date: YYYY-MM-DD
- Technical Story: XY-123
## Context and Problem Statement
When a client gets deleted, all blobs should be deleted as well. But as this must happen asynchronously because of a potentially large amount of blobs + error-prone IO processes, we need to remove the client from the database first. However, a blob record references the client it was created for.
## Considered Options
1. Delete blob records along with the client deletion action
2. Set client_id to null for all blob records of a deleted client
## Decision Outcome
Chosen option: 2.
It's not perfect as it doesn't clearly communicate what is happening here, but it's the simplest and fastest solution right now: We will set client_id for blobs to null when the corresponding client is deleted. In addition we will have scheduled clean-ups for all orphaned blobs.
Apart from keeping the record itself simple, having it around as close as possible to the actual code it references is crucial, too. Having it stored in Git makes it a part of the codebase, and it also allows you to find it more easily and quickly right from the code, e.g., through a PR or simply Git blame.
Check out the ADR GitHub organization’s documents if you want to learn more.