Published (nearly)

If you haven’t seen the site 97 things that every software architect should know, I suggest you check it out. There are some great articles from some fantastic software people and trainers like
* Greg Hohpe (of Enterprise Integration Patterns fame)
* Kevlin Henny (who I’ve seen at DevWeek)
* Neal Ford (of ThoughtWorks, who I’ve also seen at DevWeek)

They will be published as a book in 2009 by O Reilly.

My personal favourite, is Build systems to be Zuhanden.

And there are two others that I am proud to say that I wrote that will be included in the book:
* A rose by any other name will end up as a cabbage
* Stable problems get high quality solutions

and one other that was accepted to the site, but not into the book(Actually, I like this one more than the other, but as this isn’t a book about agile techniques, it wasn’t accepted).
* Quality is a feature

I’m published, don’t you know ;-). I don’t think my thesis counts, as no one ever read it..

Advertisements

There is no design but the code

Intention vs Implementation

When a person designs anything – a house, a car, a class – they have some intent. They may intend for the house to feel spacious, the car to be fast or the class to be an efficient string parser. The purpose of design is to fulfill your intent and the purpose of implementation is to fulfill design. If the design is “bad” it does not meet the intentions of the designer; but if the implementation exactly matches the design then the implementation has been a success.

The word “design” conjures up many things, most of the associations for me are visual and revolve around solving the essence of the problem. In contrast, “implementation” is a mechanical, algorithmic, boring process that converts the pure ideas of the designer into the concrete reality. However, even the most basic piece of work in other engineering disciplines know that this is not so. In civil engineering, the construction of buildings can never be fully specified on paper. Even with the most complete set of working drawing the full realities of implementing, that is, mixing the concrete, selecting the timber, laying the bricks and even hammering the nails will make the differences in whether the building is well finished or maybe even structurally sound. The recent earthquake in China has shown even if the design is sound, if corners are cut in implementation the design will not fulfill the intentions of the designer.

Mostly, things are never even close to being finished in the design stage. The architect wants to pin the design down like a butterfly. The implementers – the builders – want the design to be practical and easily implemented. The clients – as always – want the design to change because “it isn’t like it seemed on paper!”. In manufacturing engineering, a design that fully solves the problem and is a beautiful piece of work mechanically and aesthetically will be rejected if it has not been made with mass production in mind. In chemical engineering, the most fantastic product won’t be a success unless you can get it to flow down the pipes in the factory.

Aside: this is what was so revolutionary about the sports snack PowerBar; most snacks at the time had a high fat content to allow the mixture to be pumped around factories during manufacture, the fat acting as a lubricant. The discovery with PowerBar was a recipe that allowed mass production with low fat.

Of course, the skill of the designer can extend into the realm of implementation. The most skilled designers are expert implementers. The bad designers are out-and-out fantasists who complain that the implementers ruin the purity of their designs. The worst designers can’t even convert their intent into a design (like me picking colours, I have no sense for what colours “go” together).

Where is the design?

The problem with designing code is spotting the difference between your intent and your implementation. Design should be the bridge between intention and implementation. When we make a house, we can see the difference between the plans for the house and the house itself. We hope that the implementation is like the design but how can we know if the drawings are like the design? For the architect of a house, the creation of the plans is an implementation in itself.

Particularly when we talk about code, the activity of turning the code into a program running on a computer is mechanical. So for making computer applications the implementation is writing the code, so where is the intent and where is the design?

Aside: It is very, very rare these days for people to examine the output of their compilers and for there to be a problem with the execution of the compiled code. Of course, there are cases like games engines that depend on bugs in the graphics cards for extra speed, these are fascinating examples of implementation affecting design, but I think that a discussion of those would rather clutter the ideas I’m going for.

That just pushes us one step back up the chain; a designer has intentions and she converts them into a design, a mental model, that fulfills her intentions, but that first step is entirely internal and discussing that would take us deep into the region of philosophy of mind, and I don’t think that is productive.
She then converts that mental model of the design – which, for designers of physical objects, may be already visual – into some design artefacts that can be passed to other people. It may be a sentence that describes the design, it may be a picture. She then may, if she wishes, enter a feedback loop where the design artefacts are used to refine her understanding of her design, to explore areas that she may not have fully or correctly imagined. Imagined is a powerful word here, we are creating something and then holding in our heads while we explore it; we may be exploring it visually – literally turning it over in an image in our minds or flying down into the details to imagine “in close up”.

Aside: many of these analogies revolve around the visual. Visualising is to designing what language is to thinking, it provides both a scaffold and straitjacket. For areas of design which have no physical representation, like people creating mathematical algorithms often a symbolic language will have to stand in for a visual one, but when we create we may need to invent personal languages or personal visualisations that we use while creating and then discard when we come to create the design artefacts for communication with other people who don’t have ourpersonal viewpoint.

In that case, what is the design? Is it the original mental model or the refined design artefacts that are passed to others? I would say the analogy of a bridge is appropriate: the conversion of one thing to another, the intent to the design artefacts. Like the old joke: what crosses a river every day but never moves? A bridge. The designer may cross the “bridge” many times passing from intent and mental model – her internal representation of the problem – and crossing over to her solution to the problem – the design artefacts which are external – that will be used by implementers.

I think that when we write code – which remember, is our implementation – we still pass through these stages of intent, mental model and design artefacts. For a simple problem we may move through them so swiftly and easily that they are accomplished subconciously, like the throwing of a ball. For a large problem we may struggle under even trying to get a clear grip on the intent. Especially if the problem is so large that we require many people to be involved in the process. That means we need not only need design artefacts, we need intent artefacts to communicate with. All of the transitions between stages can be hampered by miscommunication.

The design is the bridge, it happens somewhere between the clear formulation of intent and the clear formulation of the design artefacts, but that bridge may be crossed many times. How that bridge is formed is a question for philosophy and neuroscience.

How precise are your design artefacts?

The key question that I posed above – is the drawing like the design? – is very important to coding. We can have architectural plans that vary from a sketch on a napkin to full working-drawings that specify every nail and bolt. Those plans may or may not accurately represent the mental model of the designer. She may have only a vague impression of the shape of the building in which case the napkin sketch is conveying her design (and possibly her intent) but will be no good to implementers. She may have imagined every conceivable detail but doesn’t wish to provide all that when she is trying to put across an idea quickly in a sales pitch, in that case her mental model is fully formed but her design artefact is not. As I mentioned earlier most physical designers think visually and many of them will iterate as their mental model interacts with their design artefacts.

So what about coding? What are my design artefacts? Well there is a huge range, as we would expect for a non-physical design. The simplest things are sketches like this one I did yesterday

or they could be a list of comments that you make in an empty body of a function before you start to implement the code. Of course, they can also be things like a set of UML diagrams or a Z spec. The simplest of all is that you make no actual, physical artefacts; you just have an full understanding of the design that you have imagined that you can convert at will to any one of the above representations. I think that full understanding must at least be expressible in some language, even if that language is a language that only you fully understand; it may be symbolic or use normal human languages in a special way (like a jargon that only you know).

These all vary in their precision, but they are all languages of a sort.

Why is making computer programs not like making physical things?

Well, if you are going to have to express your design in a language, you would choose a language that has a natural fit with your design and your problem domain. I know a good one for expressing the designs of computer programs; it is human readable, compact, can be sent over any electronic media and is very, very precise for specifying information processing tasks; it is called c#.

That, to me, is the core of the difference between computer programming and many other engineering disciplines: the design artefact that completely expresses the design is also the implementation of the design. Of course, it doesn’t have to be that way – we can include UML, Z, a sketch etc etc – but as we have a language that completely expresses the design and implements the design why do anything else? Of course, that forces us to combine the designer and the implementer in one person. But really, as we have discussed there is no “pure” design that can be implemented just as it is; all forms of design artefact other than code have to be interpreted (or compiled, if you will 😉 ) in such a way that they can be made into code. All design artefacts are incomplete, only the implementation is complete. Only then can we see if we have something that matches the mental model of the designer, and only then can we see if the intent has been satisfied.

I’m not, of course, arguing against all forms of documentation that aren’t compilable. What I’m saying is that design artefacts should not try and be too precise. You should stop adding detail when you have conveyed the intent. More than likely, for a large system, you will need many types of design documentation; you should not expect one type of artefact to meet all needs. Of course, the more artefacts that you have the greater the chance that they will not agree with each other, nor with the intent nor with the final implementation. You won’t get these things to match unless there is some process to go and check. The simplest example is finding that the comments in your code no longer match the code; worse than useless!

Of course, this is why agile processes prefer conversation over documentation as the back and forth of conversation allows you to completely ensure that the other person has understood your intent.

Udi Dahan on durable messaging

Udi Dahan has written a good MSDN article on messaging and some further comments on durable messaging.

(I’m going to reply here and link to here in the comment)

Udi,

I particularly like your observations on when durable messages work against you. I work in the finance industry and as you note we often use a mix of durable and non-durable messaging solutions. For applications like price streams you may need to ship thousands of messages a second but don’t care if you lose a few, but for the submission of orders you must be certain that you don’t lose any.

There are several messaging middleware providers that target the finance industry specifically — e.g., TIBCO — that try to address the low-latency requirement. We have used non-transactional MSMQ and have got up to a few thousand messages per second; TIBCO claims to support up to 50,000 messages per second! However, they don’t specify if you need a z/OS mainframe to do that…

Your comments on very large messages were also interesting. The decision to allow multi-message orders seems like it caused really far-reaching changes to the design. I wonder if you considered solving the problem “internally” by inserting a message processor in your inbound message stream that broke up large messages – a splitter – then you have smaller messages to deal with but you are in control of the message order and flow. If necessary you could have diverted them into another queue and maintained order that way and allowed other messages to “overtake” in the regular queues. Of course, this is still something that is painful as it changes the way messages flow and you still have to parse the 50MB XML. However, in the case I have seen with similar problems the counterparty was such a lumbering behemoth there was no chance of them being able to refactor their solution to make any changes to the message format or the message choreography (which I think is a lovely way to say the “request-response pattern for the message conversation”).

That is the power and the pain of messaging: it provides a clean interface – the message format – to work with counterparties, but if the message conversation starts to change then the asynchronous nature of messaging can make the changes pop up everywhere in the message chain.

But, great article to introduce a really interesting subject.

Building stable applications

Let’s be clear before we start, this is about systems that are

  1. “Business software”: the conclusions here are mostly concerning the special kind of complexity that is faced in business problems, but isn’t faced in, say, compilers or graphics engines or games; they have different types of complex problem
  2. “Enterprise software”: Software that is what Martin Fowler describes as “interesting”. That is, it is software that connects to other bits of software and tries to do something that has some relevance in the real world.

It is a classic truism that developers are very good at solving the wrong problem. When a user presents with a problem — call it x1 — and says that maybe they will have the problem x2 in the future. The developer listens and very deliberately goes off and solves the set of problems x* which contains x1, x2, x3 and any other related problems. They also try to generalise their program so that when problems y1, y2, y3.. present themselves they can just change some config and have that licked too.  Of course, if we just add one more layer of abstraction, one more interface, one more pattern then we can generalise it to do anything..

I’ve made this sound ridiculous but in some cases it can actually work. It depends on the developer doing two things

  1. correctly interpreting the problems presented by the user
  2. casting the problem into a suitable programming problem.

1. Interpreting the problem
This can be seriously complicated by the user trying to express the problem in “computer language”. We all know that you shouldn’t give people what they ask for; but what they need . If the developer is very skilled – and it seems to be a mix of experience and talent – then they can solve the underlying problem that the user has, even when they a set of symptoms that seem unrelated. If we can see past the symptoms and diagnose the underlying problem we can sometimes solve many problems at a stroke. Even better than that, it can stop the kind of low level chatter of bugs that drives a developer nuts. The user is constantly raising bugs about “the system” intermittently failing and their machine needing reboots. There are no intermittent problems, only intermittent symptoms and maybe some of those bugs are all linked to a common cause. If only you could see through the veil.

This is a wonderful feeling when you can do it for someone. What this really requires is not a requirements capture process and a business analyst or a focus group. It requires talking to people. This is very obvious in a big company where people can’t talk to each other as the support team is in Boston and the user group is in Sydney and the desktop support people who get the call are in London. If you can ever get the right person on the phone, you can fix the problem in just one minute.

2. Solving the right programming problem
Real-world programming is not about solving the problem that someone gives to you.

My daughter has a shape-sorter.

That is a problem that she can solve by herself. However, it is not a real problem. If it were a real problem it would be possible to jam some of the shapes through the wrong holes by twisting them around or taking advantage of the materials that the thing was made out of and bending the holes or the shapes. But this is a problem that has been made to be solved. It has been made by people trying to make a problem, not by people trying to make a solution. So the problem is engaging and tricky but not impossible and there is exactly one solution.

In the computer science classroom you must solve the binary-sort problem as you are given it. In the real-world the best system developers don’t solve hard problems, they work around them. The skill in casting the problem into a simple form and drawing the boundaries around systems so that they can present consistent, stable and self-contained interfaces to the world. And, of course, unlike the shape sorter you should recognise when there are exactly zero solutions to the problem, then go and solve a different, related but still useful problem.

A stable problem gives a quality solution… eventually
Some programmers have a knack for turning real-world problems into programming problems that have neat solutions. Part of that neatness is a problem that doesn’t change every five minutes. It can be coded once and coded right and it seems that life is very simple for these people!

All it means is that a person knows how to look at a whole mess of concepts and data and process and can pull it into some smaller chunks. The important thing about those problem chunks is that they are stable in some sense so they can solved by a system. The problem chunks need to be:

  • internally cohesive so all the stuff that is together belongs together; then the system is conceptually unified, so all of the features are related
  • well separated from other chunks so they only interact along the chosen interfaces; this means the systems are conceptually normalised, there is little or no overlap in function between systems

The person who is excessively good at doing this may not even know that they are doing it; just as a person with a good sense of direction knows where they are. It just seems to make sense to them to break up the user tasks in that way, in a way that provides a nice edge or interface to the system. I’m not talking about the actual interfaces of the object-oriented language, but system boundaries.

For instance, a relational database has a very nice system boundary. It contains literally any type of data that can be serialised into a stream of bytes – and humans have got that down, the only things we haven’t reliably serialised are smells – and it can organise that data into “lists”, then search and retrieve that data. Simple.

Early Spreadsheets like VisiCalc used to have a good boundary. Anything involving tables of numbers, it did; anything else, it did not. And VisiCalc was programmed by one guy in about 8 months. Then things like Lotus 1-2-3 came along and the lines started to get blurred. Graphics, charting, database but still a coherent system based around tabular data (and the first versions Lotus 1-2-3 were written a year or two by a single small team).

And then, you get to recent versions of Excel which is, in my opinion, everything you would want from a application development platform (except type safety, of course 😉 ) as well as being a phenomenal spreadsheet, database, graphics program, etc, etc. However, Excel has more engineering hours in it than the space shuttle; and the space shuttle didn’t have to have marketing focus groups on where the buttons would be and what the default font would look like. Solving all those problems together was hard. It has taken Microsoft more than 20 years and probably 20,000 man years of effort; let’s think about that number for a second: 20,000 years of effort. Of course, they have solved a unstable problem (in fact, many unstable problems) that are prone to many small changes as features are added; but does anyone want them all? Well 200,000,000 users can’t be wrong but maybe another solution that contained 20% of the features at 20% of the cost would have captured 80% of the market.

The stuff that is being done at Google docs (where I am writing this) or 37Signals has been done like this. Find a group of user tasks that goes together, solve them together and then stop. If you play with it for 30 minutes you see that it is all very slick; and self-symbiotic (I just made that term up). Every feature complements another feature. It is complete, not because there is nothing to add, but because there is nothing you can take away. That kind of application is very stable; the cloud of functionality that is Excel can never be stable, without huge effort that instability will really hurt the quality of the product.

What is interesting is that if you can cast the problem into a stable, well-bounded problem then you can attack it iteratively as the stability means that domain experts and application users can get a feel for what the application is doing; they have a good mental model of the problem that accurately maps to the system and they can still navigate the application even when there are changes. What is even more interesting is that if the problem is stable then you don’t need to attack it iteratively. You can go waterfall or spiral or whatever you want because when it is solved, it is solved. Ok, in five/fifty years time you might want to slap a web/telepathic interface on it, but your core system won’t need to change. The system is durable because the problem is durable.

My favourite example of this is double-entry accounting that I experienced first-hand. My company is a very small financial company and we don’t mind having multiple releases – sometime multiple releases per week – that increase functionality; but, in general, people are against refactoring because it means that you got something “wrong”. I couldn’t understand this for a long while; what is wrong with refactoring if you don’t mind the multiple releases? And what is more, they seemed to have got by without refactoring perfectly well and in most cases the systems were durable enough to survive for years.

In one particular case, the system had been almost untouched for nearly 10 years. By any metric, to survive 10 years in production is pretty impressive, and I couldn’t understand how that had happened without any refactoring. The developers claimed that it was all down to “thinking really hard”; the implication being that people who refactor are stupid. It took me a while to realise that the stability was, in part, down to solving the right problem. The system that lasted 10 years was the double-entry accounting system (database and application tier, not the reports, they change every other minute!) and that is something that hasn’t changed a great deal in decades. Of course, compliance like SOX and best practices for public accounting have changed but the fundamentals of double-entry are very, very old. Now the system didn’t do much just kept a list of the balances in the difference accounts but it was sufficiently generalised to cope with any of the new situations but sufficiently specific to be useful just as it was. One of the nice things about the double-entry is that you can represent any kind of asset, even types that don’t exist when you create the system because new types of asset, new types of income, new types of anything to do with anything that can be written down as a number that is an amount of money can be stored as data that doesn’t require changes to the application or the database schema.

Of course, the code was also pretty neat, but if the problem is neat the code can be neat as there are no special cases. And neat code is good because it is easy to test and easy to review, and that means that the implementation quality can be very high; as you don’t have messy code you can concentrate on things that are outside the domain of user-visible features like using reliable messaging, distributed transactions, or driving up performance by using multithreading or even assembly language;as the problem isn’t changing you can concentrate on driving up the quality to the point where quality is a feature.

A stable problem allows you to create a system with a stable design, and that stable design allows you to concentrate on making an application that has no hacks.

Smarts alone are not enough

Working in a company that is obsessed by hiring people on smarts alone I can attest to the fact the being smart isn’t always helpful. The use with of Google-style puzzle questions to select candidates rather than “boring” questions about “what does this piece of code do?” drives me nuts.

I overheard someone here saying “All the new graduates we interview seem to be better at the puzzles than the more experienced hires, I wonder what that means?”. The answer is: it means they are better at puzzles, nothing more. I think the important thing is that software engineering is not like pure mathematics. Your achievments are not 100% correlated with your IQ.

The point is that there is such a large body of knowledge – admittedly not as developed as the BOK for structural engineering – that it is not possible to work out the best way of doing things from first principles. Just because a person has a 140 IQ it doesn’t mean that they will figure out how to make an enterprise strength messaging system – or a secure cryptosystem – because it takes tens of man-years to do such a thing.

Tinkering around with little programs doesn’t teach some of the things that an experienced programmer on large (i.e., more than 100k lines) systems take for granted. Having spent some time trying to tell very clever people (i.e., PhD from Cambridge, self-taught programmer) that they should do things in a certain way because that is the best practice, I know that they aren’t always responsive. They think that they can manage the complexity because they are clever, and immature enough to be in love with their own cleverness.

Of course, there is a point where the system outgrows them and having no structure means that point is meltdown for maintainability.

Experience actually does mean something. Raw knowledge of best practice obtained from books means slightly less on its own. But the two together are really useful. Now, if you can combine them with a person with enough IQ to do the job, and enough maturity to not want to show off their IQ… then you have a software engineer and not just a clever hacker.

Exceptionally fast

scale is in seconds
time scale is in seconds

This is the result of a test I did with some very stupid code to test the speed of exceptions. We sometimes hear that exceptions are slow and I wondered how much the stack depth affects the result. I used very simple code that recursed if the level of recursion was less than a threshold and threw an empty ApplicationException when the threshold was reached. The test was repeated one thousand times and the results are graphed above.

The verdict? Well – as you might expect – the answer is : it depends on what you mean by slow!
i think that the really slow exceptions are ones that occur as a result of a problem with COM or p/invoke or ones that are marshalled across remoting or AppDomain boundaries. I’d like to do something with recursed AppDomains as that could be really painful.
PS: apologies for the image, I am simply too lazy to figure out how to make it display properly. I just cut as pasted it from Google docs where I wrote this post. I have no idea how that is working!

Project thoughts: Every task takes a week

I recently went on a basic project management course, and while discussing while estimating I had a minor insight.

It is a frequently cited problem in software development that estimating how long it will take to do a task – whether desgin or implementation – is hard. On the project course we discussed using ranges instead of simple estimates and using the size of the range as a measure of risk. Some people even objected to that, saying that you could only estimate what you had done before. There is a grain of truth in that but, IMHO, once you have written your first “hello, world” in a language everything is similar to a greater or lesser extent.

When I have done any estimating myself I noted how frequently I answered “how long will this take?” with “a week” or “two weeks”. My feeling is that, any task that is too big to be done in a week is generally too big to even attempt to estimate or even give a title to, so we split it up into chunks that generally take, well, about a week. And any task that is so small that it would take less than a week is combined with other tasks that add up to, well, about a week!

If you have ever seen the movie The Money Pit or worked with house builders at all then you are familar with the two week estimate. There is only ever two weeks of work: what we do this week and what we intend to do next week!