Building stable applications

Let’s be clear before we start, this is about systems that are

  1. “Business software”: the conclusions here are mostly concerning the special kind of complexity that is faced in business problems, but isn’t faced in, say, compilers or graphics engines or games; they have different types of complex problem
  2. “Enterprise software”: Software that is what Martin Fowler describes as “interesting”. That is, it is software that connects to other bits of software and tries to do something that has some relevance in the real world.

It is a classic truism that developers are very good at solving the wrong problem. When a user presents with a problem — call it x1 — and says that maybe they will have the problem x2 in the future. The developer listens and very deliberately goes off and solves the set of problems x* which contains x1, x2, x3 and any other related problems. They also try to generalise their program so that when problems y1, y2, y3.. present themselves they can just change some config and have that licked too.  Of course, if we just add one more layer of abstraction, one more interface, one more pattern then we can generalise it to do anything..

I’ve made this sound ridiculous but in some cases it can actually work. It depends on the developer doing two things

  1. correctly interpreting the problems presented by the user
  2. casting the problem into a suitable programming problem.

1. Interpreting the problem
This can be seriously complicated by the user trying to express the problem in “computer language”. We all know that you shouldn’t give people what they ask for; but what they need . If the developer is very skilled – and it seems to be a mix of experience and talent – then they can solve the underlying problem that the user has, even when they a set of symptoms that seem unrelated. If we can see past the symptoms and diagnose the underlying problem we can sometimes solve many problems at a stroke. Even better than that, it can stop the kind of low level chatter of bugs that drives a developer nuts. The user is constantly raising bugs about “the system” intermittently failing and their machine needing reboots. There are no intermittent problems, only intermittent symptoms and maybe some of those bugs are all linked to a common cause. If only you could see through the veil.

This is a wonderful feeling when you can do it for someone. What this really requires is not a requirements capture process and a business analyst or a focus group. It requires talking to people. This is very obvious in a big company where people can’t talk to each other as the support team is in Boston and the user group is in Sydney and the desktop support people who get the call are in London. If you can ever get the right person on the phone, you can fix the problem in just one minute.

2. Solving the right programming problem
Real-world programming is not about solving the problem that someone gives to you.

My daughter has a shape-sorter.

That is a problem that she can solve by herself. However, it is not a real problem. If it were a real problem it would be possible to jam some of the shapes through the wrong holes by twisting them around or taking advantage of the materials that the thing was made out of and bending the holes or the shapes. But this is a problem that has been made to be solved. It has been made by people trying to make a problem, not by people trying to make a solution. So the problem is engaging and tricky but not impossible and there is exactly one solution.

In the computer science classroom you must solve the binary-sort problem as you are given it. In the real-world the best system developers don’t solve hard problems, they work around them. The skill in casting the problem into a simple form and drawing the boundaries around systems so that they can present consistent, stable and self-contained interfaces to the world. And, of course, unlike the shape sorter you should recognise when there are exactly zero solutions to the problem, then go and solve a different, related but still useful problem.

A stable problem gives a quality solution… eventually
Some programmers have a knack for turning real-world problems into programming problems that have neat solutions. Part of that neatness is a problem that doesn’t change every five minutes. It can be coded once and coded right and it seems that life is very simple for these people!

All it means is that a person knows how to look at a whole mess of concepts and data and process and can pull it into some smaller chunks. The important thing about those problem chunks is that they are stable in some sense so they can solved by a system. The problem chunks need to be:

  • internally cohesive so all the stuff that is together belongs together; then the system is conceptually unified, so all of the features are related
  • well separated from other chunks so they only interact along the chosen interfaces; this means the systems are conceptually normalised, there is little or no overlap in function between systems

The person who is excessively good at doing this may not even know that they are doing it; just as a person with a good sense of direction knows where they are. It just seems to make sense to them to break up the user tasks in that way, in a way that provides a nice edge or interface to the system. I’m not talking about the actual interfaces of the object-oriented language, but system boundaries.

For instance, a relational database has a very nice system boundary. It contains literally any type of data that can be serialised into a stream of bytes – and humans have got that down, the only things we haven’t reliably serialised are smells – and it can organise that data into “lists”, then search and retrieve that data. Simple.

Early Spreadsheets like VisiCalc used to have a good boundary. Anything involving tables of numbers, it did; anything else, it did not. And VisiCalc was programmed by one guy in about 8 months. Then things like Lotus 1-2-3 came along and the lines started to get blurred. Graphics, charting, database but still a coherent system based around tabular data (and the first versions Lotus 1-2-3 were written a year or two by a single small team).

And then, you get to recent versions of Excel which is, in my opinion, everything you would want from a application development platform (except type safety, of course 😉 ) as well as being a phenomenal spreadsheet, database, graphics program, etc, etc. However, Excel has more engineering hours in it than the space shuttle; and the space shuttle didn’t have to have marketing focus groups on where the buttons would be and what the default font would look like. Solving all those problems together was hard. It has taken Microsoft more than 20 years and probably 20,000 man years of effort; let’s think about that number for a second: 20,000 years of effort. Of course, they have solved a unstable problem (in fact, many unstable problems) that are prone to many small changes as features are added; but does anyone want them all? Well 200,000,000 users can’t be wrong but maybe another solution that contained 20% of the features at 20% of the cost would have captured 80% of the market.

The stuff that is being done at Google docs (where I am writing this) or 37Signals has been done like this. Find a group of user tasks that goes together, solve them together and then stop. If you play with it for 30 minutes you see that it is all very slick; and self-symbiotic (I just made that term up). Every feature complements another feature. It is complete, not because there is nothing to add, but because there is nothing you can take away. That kind of application is very stable; the cloud of functionality that is Excel can never be stable, without huge effort that instability will really hurt the quality of the product.

What is interesting is that if you can cast the problem into a stable, well-bounded problem then you can attack it iteratively as the stability means that domain experts and application users can get a feel for what the application is doing; they have a good mental model of the problem that accurately maps to the system and they can still navigate the application even when there are changes. What is even more interesting is that if the problem is stable then you don’t need to attack it iteratively. You can go waterfall or spiral or whatever you want because when it is solved, it is solved. Ok, in five/fifty years time you might want to slap a web/telepathic interface on it, but your core system won’t need to change. The system is durable because the problem is durable.

My favourite example of this is double-entry accounting that I experienced first-hand. My company is a very small financial company and we don’t mind having multiple releases – sometime multiple releases per week – that increase functionality; but, in general, people are against refactoring because it means that you got something “wrong”. I couldn’t understand this for a long while; what is wrong with refactoring if you don’t mind the multiple releases? And what is more, they seemed to have got by without refactoring perfectly well and in most cases the systems were durable enough to survive for years.

In one particular case, the system had been almost untouched for nearly 10 years. By any metric, to survive 10 years in production is pretty impressive, and I couldn’t understand how that had happened without any refactoring. The developers claimed that it was all down to “thinking really hard”; the implication being that people who refactor are stupid. It took me a while to realise that the stability was, in part, down to solving the right problem. The system that lasted 10 years was the double-entry accounting system (database and application tier, not the reports, they change every other minute!) and that is something that hasn’t changed a great deal in decades. Of course, compliance like SOX and best practices for public accounting have changed but the fundamentals of double-entry are very, very old. Now the system didn’t do much just kept a list of the balances in the difference accounts but it was sufficiently generalised to cope with any of the new situations but sufficiently specific to be useful just as it was. One of the nice things about the double-entry is that you can represent any kind of asset, even types that don’t exist when you create the system because new types of asset, new types of income, new types of anything to do with anything that can be written down as a number that is an amount of money can be stored as data that doesn’t require changes to the application or the database schema.

Of course, the code was also pretty neat, but if the problem is neat the code can be neat as there are no special cases. And neat code is good because it is easy to test and easy to review, and that means that the implementation quality can be very high; as you don’t have messy code you can concentrate on things that are outside the domain of user-visible features like using reliable messaging, distributed transactions, or driving up performance by using multithreading or even assembly language;as the problem isn’t changing you can concentrate on driving up the quality to the point where quality is a feature.

A stable problem allows you to create a system with a stable design, and that stable design allows you to concentrate on making an application that has no hacks.

Advertisements

2 thoughts on “Building stable applications”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s