In evolutionary architecture, data models get overloaded – inevitably. Some new requirement comes in and we introduce some property that changes how the data is interpreted. Everybody using the data needs to agree on the implications. In software quality terms, this is a connascence of meaning: downstream usage of the data depends on understanding this property’s meaning. More connascence means more complexity; more complexity, more problems.
The data evolves with a boolean flag here, a type enum there. Next thing we know, as Sandi Metz writes in The Wrong Abstraction,
the code no longer represents a single, common abstraction, but has instead become a condition-laden procedure which interleaves a number of vaguely associated ideas.
The software becomes “hard to understand and easy to break”, making engineering costs “brutal”.
So what’s the “right” way to evolve software architecture? Well… it depends. Let’s say we’re about to overload the data. Who needs to know about this newly overloaded data? The answer suggests seams in the abstraction. If these entire modules over here only care about one usage, and those modules over there only care about the other usage, something about those modules is different. These differences in turn guide naming, software/data design, and software packaging.
Software packaging guides how code over here does or doesn’t access code over there. Scalable software is easy to understand and hard to break.
Let’s see how this might play out with a startup transaction system. A transaction could be anything really; a financial transaction, a payroll, a purchase order…
The “v-zero product” creates transactions – ahem – transactionally: when we make a transaction it either succeeds or fails right there aka synchronously. The saved database record represents the real world result of the transaction. I’ll say txn
for short.
class Transaction
integer id
decimal amount
function make_a_transaction(amount)
txn = database.create_transaction()
txn.amount = amount
database.save(txn)
When some caller has a transaction, printing its amount is straightforward:
function print_amount(txn)
print(txn.amount)
So what does print_amount
need to know?
concept mapping: a transaction’s amount is read off the amount
field.
data mapping: the amount
field came from somewhere (like the database)
In some schools of thought, instance variables would never be accessed directly; instead all accesses go through some abstraction (like methods).
function print_amount(txn)
print(txn.get_amount())
Why is this useful? get_amount
separates print_amount
from how amount
is actually obtained. One example is that it enables zero-downtime database migration. But that’s another story…
Some frameworks encourage you to write these getters explicitly via a record or entity object. Other frameworks like ActiveRecord set up the abstraction implicitly using Rails magic. ✨
So how much does print_amount
actually need to know about the database? It depends on how implicit your mapping into the data is. A friendly database object (like an ActiveRecord) might read from the database when you didn’t mean to… like in a loop…
Alternatively the transaction might be a passive data structure, the amount field is either there or it is not. But we need to know something about the model, after all we are doing something with the modeled data.
Soon enough we want asynchronous transaction processing. Maybe saving transactions takes a while and holds up whoever’s entering transactions. So we introduce the pending status: a transaction that’s about to be real … probably? Until processing succeeds, pending transactions are not yet real.
class Transaction
integer id
boolean is_pending
decimal amount
function save_transaction(amount)
txn = database.create_transaction()
txn.amount = amount
txn.is_pending = true
start_processing(txn)
database.save(txn)
// some asynchronous time later...
function processing_finished(txn)
txn.is_pending = false
database.save(txn)
All amount
accesses must correctly determine which transactions are “real” from their perspective to maintain system behavior.
- the transaction processor cares very much about pending status.
- a system operator needs to see if a transaction is stuck in the pending status.
- but the reporting & bookkeeping modules only care about real aka non-pending transactions.
Of all the places we read amount
let’s work with print_amount
. Our nifty little function has been reused in a few places, say:
// used in the operator’s panel
function print_user_txns(user_id)
txns = fetch_txns_for_user(user_id)
for txn in txns:
print_amount(txn)
// used by accounting to keep the books
function print_daily_txns(date)
txns = fetch_txns_on_date(date)
for txn in txns:
print_amount(txn.amount)
Here’s two classic options for handling the pending condition:
1- keep the print_amount
abstraction for printing amounts, and parameterize it
2- create another abstraction aka API for the behavior
Option 1: parameterize it!
Parameterizing the API means callers need to know about this data overloading.
function print_amount(txn, include_pending)
if !txn.pending || include_pending:
print(txn.amount)
Unifying the abstractions with a parameter means each caller knows about the abstraction as well as its parameter value.
It’s tempting to default to some value. This gives an appearance of continuity and reduces code changes required. Since our system previously assumed any transaction was real, we’d probably default to include_pending=false
. Importantly, this guarantees that any missed case won’t treat pending transactions as real!
The default is tempting because it apparently cuts the dependency between the pending concept and most usages. But it does this by just making the dependency implicit. ✨
Option 2: duplicate it!
Duplicating the “print the transaction” concept means creating at least two functions:
- one that just prints the amount
- one that only prints the amount for real transactions
Maybe we write something like:
function print_amount(txn)
print(txn.amount)
function print_real_amount(txn)
if !txn.is_pending:
print(txn.amount)
Who knows what now?
The implementations are simpler than the parameterized version. Generally speaking, too many parameters and flags create confusion. Despite this simplification, the caller must still understand which function to call (if aware of both).
The naming choice shapes future engineers’ assumptions:
print_amount
vsprint_real_amount
suggests the amount field is the standard.print_raw_amount
vsprint_amount
suggests real amounts are standard
Is it too easy to conclude that the standard value is the real one?
Remember the method approach? Where we wrote txn.get_amount()
instead of txn.amount
? Same problem as the above, but for all usages of amount
not print_amount
.
Let’s say we made get_amount
aware of pending transactions:
function Transaction.get_amount()
if self.is_pending:
return 0.0
else
return self.amount
This is a vital dependency on is_pending: as with duplication, naming matters!
Our ability to write high-quality software at speed and scale is directly impacted by how much we need to know. Do we still write assembly? Of course not – not unless we really need to know the machine details. The more we need to know, the more things we can get wrong.
In the above scenario, there are at least two groups of system users: those who might need to deal with unprocessed transactions, and those who never, ever need to. It turns out that most people only deal in real, not pending data. This reveals a crucial seam in the abstraction, and we can remove the implicit dependency by simply never exposing that data to those consumers in the first place.
Each condition added to an abstraction is an opportunity to ask: who actually needs to know about that overload? In the real-world it’s single fields and also, functions and modules and entire system components. Sometimes I like not knowing how it works. Sometimes, it very much matters to me… the larger the component being abstracted, the larger the impact if I don’t get what I expected.
I don’t need to understand that which I don’t need to know.
With thanks to Lynn Langit, Upeka Bee, Ishmael King, and Stephan Hagemann for review & feedback.