Ben Ng @_benng

How To Make Better Decisions

01.15.15

I recently learnt about the "Rule of Three" from this tweet by Jacob. It says that code should be copied once, and extracted into a procedure only the third time it has to be used. After further thought, I realized that this simple programming rule is a domain-specific manifestation of a more general decision-making guideline.

Lets go back to computing for a brief moment, since many of you reading this are programmers. Abstracting too early is more dangerous than it seems. When abstraction is done too early, it increases the complexity of the product before a complete understanding of the problem has been obtained. As the decision was made with insufficient information, it is more likely to be wrong than right. As more abstractions are layered on top of these bad decisions, it becomes more and more difficult to backtrack as time goes on. Furthermore, as the size of a team grows, it becomes difficult to switch out core infrastructure without stalling the entire team's progress.

This idea of delaying abstraction can be generalized to all decision making.

In general, decisions that are difficult to reverse should be made as late as possible. Decisions that are easily reversible are great because they are a thinly veiled version of "heads I win, tails you lose". When you guess correctly, you win, and when you guess wrongly, you get cheap information that can inform your next decision. For example, startups frequently make easily reversible decisions as part of their search for a repeatable business model.

A Framework For Decision Making

  1. If you don't have to make the decision now, wait
  2. If you have to make a decision now, do something you can undo

The rest of this blog post takes this framework for decision making and applies it to decisions that get progressively more important. I'll begin with a harmless code quality issue developers can relate to, and end with infrastructure decisions that affect the competitiveness of a business. If you're not a programmer, I would scroll down to "Decision Making For Managers" and start from there.

Decision Making For Programmers

Functions As Comments

Functions are for defining reusable logic. With very few exceptions, if you are not going to reuse something, don't make it a function.

Some of you might be wonder why anyone would create a function that only gets called in one place. As a TA for an introductory C++ course who saw this happen quite often, the most common reason I got is that teachers required code to look well organized. Novices then pick up this bad habit of "organizing" code and take it to the workplace.

That is how we end up with code that looks like this:

class Car {

public:

Car () {
  Person owner = createOwner();
  Chassis chassis = createChassis();
  Key key = createKey(chassis);
}

private:

Person createOwner () {
  return new Person("Jimbo");
}

Chassis createChassis () {
  return new Chassis("Honda Accord");
}

Key createKey (Chassis chassis) {
  return new Key(chassis);
}

};

These "do the thing" methods don't increase the robustness or maintainability of the code, they just slow down the person reading it. This is not abstraction, this is using function names to annotate blocks of code that are usually so simple they don't need any annotation to begin with.

This is a fairly benign example of knee-jerk decision making. Abstraction for the sake of abstraction is a best, annoying. At worst, it's a source of bugs.

Recommendation: Unless you get paid per line of code you write, if your function is private and only called in one place, you probably should just use a comment. The "rule of three" is a good guideline here.

Frameworks Are Glaciers

Asynchrony is a powerful way to improve the performance of code. While async APIs are slower by nature, total throughput of the program may be improved because of reduced blocking.

Asynchrony, like multi-threading, is not a silver bullet. It has the potential to introduce insidious bugs, and one that I have been dealing with is attempting to speed up a sync API by doing something asynchronously in it.

I've been working with a view system that has gotten just about everything right. It's modular, has just the right amount of convention without being draconian, and keeps it API lean enough that I hardly ever have to pull up the documentation to do something.

Many months back a small change was made to improve the speed of rendering views: elements were now appended to the DOM in batches using RequestAnimationFrame to stop the browser from locking up when rendering expensive components like long list views.

This addition has caused quite a headache. Render tasks used to have a well defined beginning and end. Now, some components in the view hierarchy may not actually be on-screen when their render method returns. This is an issue in an event-driven system where renders are being triggered all the time. Without knowing when a view is actually done rendering or not, you get weirdness like duplicate components being appended when two render events are triggered in quick succession. There are also more sinister problems, like memory leaks that result from nodes being detached from the DOM while hot code still retains references to them.

Later on, we realized that appending wasn't even causing the slowdown in the first place. The problem was in parsing large amounts of template code repeatedly, and creating too much DOM at a time. Unfortunately, since we made a bad decision at the framework level, we now had an ecosystem of modules depending on the flawed API, often employing hacks to smooth over the problems bubbling up from the framework.

In hindsight, we should have resisted the pressure to find a quick fix for our users, and taken the time to properly understand the performance issue before we modified a critical piece of our infrastructure. Nevertheless, the lean API will make reverting the bad behavior quite easy, and the bulk of the work will be updating dependents that relied on the misbehaving methods.

Recommendations: Be extra careful when tinkering with lower levels of abstraction, and even more so if there is an ecosystem on top of it. Be as restrictive as possible with your API.

Decision Making For Managers

Organize Your Data

Poor abstraction doesn't just happen in code. It's even worse then it happens with data.

Every part of a business is affected by how your data is modeled. Poorly modeled data puts constraints on user interface design, bogs down developers with technical debt, and cripples the ability of the business to take advantage of new opportunities.

Document oriented (noSQL) databases have been growing in popularity lately and for good reason: relational databases are suboptimal for managing semi-structured data. While there are good use cases for a document database, they are being used in situations where a relational database is a much better fit.

The highly abstracted nature of document databases creates an illusion of freedom and speed, and the early stages of a software project will speed by without talk of schemas and migrations. However, from my experience the vast majority of data is relational, and document databases struggle to match the performance and simplicity of an RDBMS when burdened with highly relational data.

Using a document database from the start is therefore a prime example of making a difficult to reverse decision too early. Before core questions like "is our data relational?" have been answered, the developer has already climbed so high up the tower of abstraction that they will encounter significant resistance trying to return to a technology that better fits their needs.

Concretely, the permissiveness of document storage leads to the team accumulating massive amounts of tacit knowledge about their data structure that cannot be easily codified as a schema. This makes moving to an RDBMS later extremely expensive, since they require well defined models and relationships. Furthermore, being able to work directly with plain objects leads to poorly defined or nonexistent interfaces between application code and the database. Since there is no clean interface between the application and the database, a prerequisite for changing the database is defining such an interface and performing a major refactor on the application. For these reasons, starting with a document database can lead to technical debt and switching cost snowballing out of control.

In contrast, it is far simpler to move from a restrictive data model to a more permissive one. Simply turn each row of an SQL database into JSON and throw it into document storage.

Recommendation: Unless you are modeling data that is very well understood and clearly non-relational, resist the temptation to go with a document database, and start with an RDBMS. It is the reversible choice.

The Snowball of Technical Debt

Defensive Outsourcing

With the rise of SaaS and module ecosystems it is easier than ever to outsource everything from payroll to server operations. Outsourcing is trading money for time, so it makes sense if you expect to see a net gain in productivity.

Outsourcing comes with its own problems. The goals of your partner are seldom aligned with yours. For example, while you would prefer minimizing switching costs, it is to your partner's advantage to maximize them. Sometimes, these switching costs are not apparent until you try to break up with a provider.

We were using Cloudant as our database service for a while. To our pleasant surprise, we added a major feature in a few hours by building on top of Cloudant's integrated search. This came back to haunt us when we grew tired of the terrible performance and excessive downtime.

We went back to self-hosting our database, but this meant figuring out how to implement the search functionality ourselves. Since our app is quite portable, we did not encounter many issues moving it to a different host.

Outsourcing also happens in code. Module ecosystems like npm make it easy for anyone to publish reusable bits of code. It is tempting to rely on someone else's code, but building on top of someone else's work without doing your due diligence can come back to haunt you.

When we adopted prova last year as our test runner, we were won over by its beautiful reporting interface. We started building on top of it despite its immaturity, and all was good for a few months. Later we realized that prova had been ignoring and silencing uncaught exceptions. We even caught prova silently skipping entire test cases because of a syntax error! With thousands of assertions already written, it was demoralizing to find out that we couldn't trust our test runner.

The good news is that prova uses the same unopinionated API that tape uses. Since our test suite is portable, we will be able to switch test frameworks quite painlessly. This wouldn't have been possible if we started with a very opinionated solution.

Recommendation: Have an exit strategy when building on top of something you've outsourced. Keep things portable, so you can undo any bad infrastructure decisions.

Wisdom Is More Valuable Than Reputation

It is not uncommon to see a new technology gain massive adoption. Regardless of the credentials of the body pushing the technology, the risk of adoption is inversely proportional to the age of the technology. This is the same reason why refactors are usually a bad idea: old solutions often look ugly because they have been painstakingly patched to handle edge cases.

In a previous example, I talked about how my experience with Cloudant has been less than ideal. I should mention that Cloudant is an IBM company, a huge name in IT enterprise, and one of the reasons why we trusted them. Another tech behemoth is Microsoft, whose new Windows Azure service has been unreliable enough that companies like K2 are afraid of committing fully. At the same time, K2 has demonstrated some smart decision making by keeping some of its capacity on AWS. This makes the move to Windows Azure an easily reversible decision, should they decide that it fails to meet their needs.

For another example of early adoption gone wrong, here is Khan Academy's experience as an early adopter of Swift, a promising new technology created by brilliant people under the direction of the well respected Chris Lattner of LLVM fame. 20,000 lines of code later, here is what Andy Matuschak has to say:

In terms of problems, the tooling is not there, and that’s not a dig on the engineers. I happen to think, actually, that the Swift engineering team is one of Apple’s finest engineering teams, but it’s not done yet. It’s full of bugs, it crashes too often. It generates faulty code occasionally. Most critically for us, it’s just really slow. Really, really slow.

Since new technology is so unpredictable, it is wise to use a mature solution today, and consider migrating in the future.

Recommendation: Reputation is not a substitute for maturity. Be cautious when adopting new technology, especially if your core competencies as a business depend on them.

The Leaning Tower of Pisa

Conclusion

I've chosen these examples to show how making reversible decisions can lead to competitive advantage. Like any framework, there are exceptions. Risk usually comes with a commensurate reward; those who made their riches in the early days of the Internet certainly aren't regretting being early adopters.

Nevertheless, the next time you are faced with a decision, consider the reversibility of your choices, and see if it leads you to a better conclusion. I certainly could have benefited from this thinking a year ago.


Thank you Oscar, Jacob, Matt, Chris, and Dan for reviewing my drafts.

If you like what I write about, you should follow me on twitter 😉