Archive for December, 2008

Turning applications into middleware

December 19, 2008

In my post on building stable applications I referred to  RDBMS as an example of a stable problem that has a durable solution. Even though I also provided spreadsheets and double entry accounting systems as other examples, it occurs to me that this kind of “durable problem” is most often encountered in middleware. For example: network stacks, messaging systems (MQ, MSMQ, Tibco, 29West etc),  object-relational mapping systems or component-hosting environments (e.g., COM+, servlets like Tomcat or even “web” servers like WAS/IIS7).

Of course, saying “durable” just means “boring”; these aren’t real applications, they are middleware. That means that they don’t change and don’t get any thrilling new features. If they do get new features then people complain about they are destabilizing everything. But isn’t this a good thing? Aren’t we always talking about re-use of components, and how – in the future – software will be built by slapping together existing components? Well that concept has been in the future for at least 20 years, and it looks like it is staying in the future forever (and thank goodness, or we will all be out of work). We aren’t anywhere near clicking on some objects and making the backend for a bank and never will be.

Isn’t middleware the example that breaks that rule? It really is reusable and ends up in all sorts of interesting stuff and generally we don’t have to worry if the database is going to lose that data and we can save the testing effort and make sure we are saving the right data. And middleware is just the “right size” too: exposing enough to be flexible and hiding enough to be usable; unlike an abstract enterprise dependency-injection software-factory framework component that needs so much XML configuration that you worry about it becoming sentient. By their very existence, reliability and re-usability these components have shaped architectures. Indeed I’d go so far as to say that they have shaped architects. We have changed the way that we make software, because these stable problems have been solved and have enterprise-strength inmlpementations that are easily available.

Sometimes we need to be super-agile: providing just that one button the user needs or those little shell scripts you knock up just for the day, knowing that if you ever need it again it will be quicker to cut and paste. Mostly we need to remember our YAGNI and don’t solve all the problems in the world by solving the generic case with a abstract class. But sometimes, we can encounter a problem that has a genuinely reusable core that is so stable that it will last and last. This kind of problem is nearly always rooted in the real-world and connected to the way that human beings think and interact. For example: relational databases exist because humans organise information in lists that have references to other lists and indexes and so on; XML exists because even though humans can comprehend unstructured documents, computers can’t; IP networking exists because sometimes bits of the world get disconnected by physical events.

If we can turn our application into middleware, we can really start to live the reusable software dream. However, if you were at a retirement party and someone stood up and said “I’d like to give a massive vote of thanks to Susan because of her incredible lifetime of work on FooEngines. When they started out, they were just a solution to our problem with Foo, but we soon found that the FooEngine was solving a problem that thousand of developers had, and, in fact, the recurring nature of the problems of Foo were so wide that FooEngine became the backbone of many architectures of people working with Foo. More than that, people have begun to force their Foo problems into forms that can be solved by FooEngines, because they are so ubiquitous, well-understood and high-quality. It is no exageration to say that Susan has changed the way we work with Foo forever.”. Wouldn’t that seem like an achievement? But you notice that the praise sounds implausible even though we know examples where that has happened. These problems and their solutions must be very, very rare.

Again, the issue comes down to identifying a problem that is so stable that it recurs thousands of times and the problem is fundamental and intrinsic to the real world and can’t be removed. For instance, when doing business with a counterparty we use a contract, and when that contract is enforced we behave according to its rules and know that the counterparty will not respond if we don’t. We also know that if the contract represents the carrying out of a transaction – like purchasing a house – there is a well-defined moment when we are committed to the contract. Lawyers have ways of making these things work that have evolved over thousands of years, these problems are durable and highly recurrent. If we make a system that addresses them, we can be sure that the system will also be durable and reusable; if the problems are durable and recurrent enough, our system will evolve over years into middleware.

Should we try and make all our applications middleware? Of course not. Should we try and design middleware? No, I think that they are patterns and patterns are not invented, they are discovered. If the problem has a stable core, then we have opportunity to turn our application into middleware and we have a golden chance to make a piece of software that will last for decades, or even generations.

Iterative development feedback can be virtuous or vicious

December 18, 2008

Iterative development can go two ways. Either the system goes from strength to strength, with users able to get more features and keep a stable system or the system degenerates and instead of accumulating features it accumulates bugs and bad data until it is no longer used. The latter situation will find people crying out for a “properly” analysed and designed system that they know will have the features that they need.

How do we end up in one situation rather than the other? I think that the key point is whether the system matches the business process that it is supposed to replace. That doesn’t mean that the domain model (http://en.wikipedia.org/wiki/Domain_model) needs to match the business process exactly, but you need to be able to present the users with language and entities that map exactly onto the relevant business objects. Let’s look at a 2×2 matrix to chop the world up

2×2 matrix of stable/unstable problems

There is a feedback look that operates when we are in the top-box “sweet spot” that constantly drives the improvement of the software in a virtuous circle where the users understand and like the system, even when there are bugs to fix. Likewise there is a “sour spot” that drives the system downwards into chaos as more changes require more bugs that the system is ill-equipped to handle. The quality of its output declines and drags down confidence and interest in the system. In that case, collapse and a re-write are the inevitable conclusion but there may be irrevocable damage to your reputation and the reputation of iterative development.

Iterative feedback loops

If you are lucky, you will have a stable problem will be able to make a stable high-quality solution, but most of us aren’t that lucky.

Aside: what do we think of Google docs presentations? It’s not quite powerpoint 2007 but for free and online, really quite impressive.

Code review checklist

December 11, 2008

As I was discussing in my previous rant, a code review should consist of 200-400 lines that are looked over in an hour or less.

The point of the checklist is to be S.M.A.R.T. that is

  • Specific – the points are specific enough to say YES or NO.
  • Measurable – you can “sign off” and say that you did the review, and people can see if you did (you can even store the ‘results’ as a completed check-list)
  • Agreed – the whole team working on the codebase can look at the list and understand and agree
  • Relevant – relevant to the code at hand. C# specific, relevant to high performance, multi-threading, items relating to sql statements or pooling DB connection. If the system changes focus then you need to remove the items that are irrelevant.
  • Time-boxed – the checklist should be short enough so that you can remember most of the items (actually for me, it is too long; i think i could only really remember about 5-7 items) and complete them in a reasonable time

It would be possible to create some regex-based code checks to help with things like this, but they would only be helpers. You can’t fully automate all of code-review. And note that this is for code review only; that is, the nuts-and-bolts of implementation notwithstanding previous comments on there being No design but the code. The interaction of code review and design review and agile practices like pairing.. well that is a whole other cake for a whole other day.

So, the list:
For C# only, not for design review, CODE REVIEW ONLY

(not in order of importance)

  • Are the sections of code (i.e., methods, structs, classes, enums etc) of small enough to be reasonable?
  • Does the class wrap unmanaged resources? if so ensure that it implements IDispose correctly?
  • Does the code contain anything “dead” – i.e., empty interfaces, methods/classes that aren’t called, inheritance hierarchy that isn’t used?
  • Correct exception handling: no empty catch, no generic exceptions, no losing of stack trace, no losing inner exception
  • If the uses try..finally see if they could be replaced with Using blocks
  • If classes have explicit finalizers, ensure that they are necessary and do no evil
  • Goldilocks principle should be applied to logging, not too much, not too little.
  • Static members should be checked for thread safety
  • Every public method should be justifiably public, not just by accident or for testing.

Best kept secrets of peer code review: a summary

December 11, 2008

I read this book: Best Kept Secrets Of Peer Code Review so I thought I’d write about it.

Warning: the people who wrote this book sell code review software, so unsurprisingly they suggest that doing code review is worth the effort if you are using some code review software to make things easier. Also, the “waterfall” is everywhere; coding is an activity that is done after designing and before testing.

The book attempts to be an evidence-based review of code review. It looks at published literature and includes some new research carried out on code reviews that were done with their own software. The evidence, to my eyes, mostly looks like it doesn’t give any clear conclusions on how effective code review is likely to be. Mostly, unsurprisingly, the main suggestion is to do some reviews and gather lots of data on how many bugs were found for the hours taken to do the reviews. No shock there. There are a few very good things, including a review of eye movements in code reviewers.

The most interesting stuff is the concrete measurements of how much code can be reviewed effectively. This is given by two numbers·

How fast you can look through code; their measurements suggest that on Java code you should be aiming for no more than 400 lines per hour and less if the module being inspected is complex or critical

· How much time should the review take: they suggest that you should spend no more than 60 minutes reviewing a code set, after that time a person becomes “saturated” and can’t see any more bugs no matter how long they spend looking. (a code set could be a class, a function, a module, a group of related check ins in many files)

They also discuss expected “defect densities”. That is, the number of defects per thousand (k) Lines Of Code (kLOC).

  • Of course, the question is; what is a defect? It can be anything that the reviewers think should be changed in the code and obviously vary from the trivial to the critical. There are no hard rules about what a defect is; that will depend on your system but they include such things as no checking errors, not validating inputs, not checking for nulls, “off by one” iteration, inconsistent comments, comments with spelling mistakes etc
  • The harder and longer you look, the more “defects” you find. There is no real upper limit for the number of changes that could be requested after a very intensive review by multiple reviewers. Obviously at some point, the changes begin to be matters of taste rather than absolute “this is good engineering”.
  • If you, really really want to know some hard numbers defect rates are between
    • 5 defects per kLOC for very mature stable code with tight controls on the changes
    • 100 defects per kLOC for new code written by “junior” developers or in a loose development environment
    • And bear in mind that this isn’t “significant” lines of code, this includes comments and this is – in theory – working code.

Also,

  • Critical code should be reviewed by more than one person to get some discussion going on best practices
  • The book emphasized code review and didn’t really look into “design” review. The examples seemed to be more about reviewing check-ins to a stable code base, rather than large commits of new code, so there was some discussion of doing a “guided” review where the author writes a document that lists the changes to various files and attempts to guide the review through the changes. The risk here, of course, is that the reviewer sees what they expect to see, not what is actually there. The advantage is that in preparing for the review, the author checks more things and will find their own errors.
  • They also suggest that author preparation is correlated with low defect rates and when people know that their code may be reviewed, it will have less defects as there is a social effect that means people raise their game so they don’t keep making code with the same defects

So, combining these things they recommend that a review should ideally be around 200 lines and last no longer than an hour.

I’ve made my own code review checklist that I think embodies these things.

Advice to a young agile developer written by an old one

December 5, 2008

On reading Franklin’s essay Advice to a young tradesman I thought that it had some specifics for agile developers. Of course, it is a total piece of genius that is centuries ahead of its time, and I’m adding nothing to it. Particularly in the writing. Franklin’s prose is clean and humourous with not a word wasted. Even if you disagree with him – and I don’t see how you could – then you have to love the writing. So with apologies (and I’m not really old…):

Time is money

Every hour spent working on something that is not directly related to a feature request is time wasted, and time wasted is money wasted. Every feature that is made that isn’t needed also has a cost in that is measured in features that are needed that don’t get made. And if they do get made, the work is rushed

Quality is money

If work is rushed or for other reasons has low quality then you will spend more time – and therefore more money – on repairing it and making good. Of course, bearing in mind that quality is a feature you should bear in mind point #1 and only deliver what is needed.

Trust is money

When you deliver something and it doesn’t work, your agile methods of rapid development will be doubted. You’ll be forced to spend more time proving that quality is high, and so you’ll have less time to develop new features, and as we know, time is money. If you always deliver features on time and with known quality, then you and your methods will be trusted. Users will engage with the process more, they will be more interested in the systems you produce and will be able to produce more focused requests and more accurate bug reports.

Having trouble automating your tests? Then don’t!

December 5, 2008

We all love automated tests, right? Unit tests, functional tests, all kinds of test are better when they are automated. Unless of course, they aren’t.

The general pattern of testing software is configure-execute-assert. That is, we attempt to set the software into some known state. That includes constructing particular objects (including parameters for methods), setting configuration and even building test databases and inserting known data. Then we poke the appropriate part of the system by calling a function or injecting a message or event. The we write some more code that accesses the state of the system again and checks it against some expected value. That expected value might be dynamic (like today’s date) or static (method should return 0). The simplest code to test can skip step 1 as there is no stored state (barring simple parameter inputs to the method) and the assert step is as simple as checking what the method returns.

The kind of test that won’t be squeezed to fit this pattern are the tests that involve users. You might very easily have a requirement that “the users of the website can find what they need easily”. This is an important requirement to be sure, but not a testable one. What do you do, ignore it? There are often less vague requirements like “the report should show the individual transactions on the front page when there are less than 10 of them, but they should be on another page when there are more than 10″. That is a very reasonable – and very common – requirement for reporting applications. Unless you have hand-crafted your own reporting framework then it is also impossible to make an automated test. The *stuff* that will make that report will be a mix of data, configuration in binary files, code and all of it will be hidden in the reporting system framework; the only input will be the data from the database, the only output will be the PDF report. Creating a test that relies on doing a diff of the actual PDF file with the expected PDF file is not really an option; the test will be brittle unless the report is very stable and the test  will be constantly breaking as other parts of the report are updated.

So how do you automate? You don’t. At least, not fully.

You have to verify the requirement has been met and expecting the end-user to do all the work by clicking buttons is unreasonable. Particularly if you have to regression test. So what to do? My suggestion is that you only use people for the “assert” part and close up the rest with automation (See Scrum and XP from the trenches page 70).

So you make a test system that does the following for each test case:

  1. The configuration part. In this example, you have database entries that have 2 sets of report data; one with more than 10 transactions and one with less, you have the data stored in the test case so the user doesn’t need to remember it or input it.
  2. The execute part; the test system runs both reports
  3. The first part of assert is capturing the state; the system presents both reports to the tester in one screen
  4. The test case has a full description of what is expected; maybe including a diagram that shows what the tester is to check on the reports. Maybe some samples reports linked in.
  5. Some buttons to say “pass” and “fail” (and maybe “test broken”) that the user can click to store the result and move to the next test (N.B.: it would be good to store timing data too, be nice to know if some tests are ultra-time-consuming)

This means that the human testers are not wasting their time doing things the system can do and you don’t waste your time trying to program things that a computer can’t do. One example of this human computation in a talk from google is image recognition which simply can’t be done at present by computer, but humans can do very fast. This kind of Google approach to mass-collaboration also gives us a way around systems with hundreds of test cases. One person would go mad if they had to do them all, but if you provided a nice way for dozens of people to all take a few test cases each then no one would get too demotivated or lose concentration and the cycle-time could be kept short.

Of course, minimizing human testing is important as even the smoothest human computation is hundreds of times slower than true automation. Also, these tests would generally be system integration tests and whilst they are more important than unit tests, they won’t be helpful for catching all the bugs. So it is best to save the human-driven regression tests for the big releases and the parts that other testing approaches can’t reach.

Double D problems

December 5, 2008

I’ve been trying to come up with a cool name for problems where the devil is in the detail; the best I’ve come up with so far is “double D” but that doesn’t really do it justice. Perhaps D^2 would be better. Anyway, you know the kind of thing I’m talking about: the problem seems simple but it just gets harder and harder the more work that you do. Instead of the solution covering more of the problem, it seems to get bigger and bigger without really getting the 100% coverage that you really need. First you find the edge cases where things don’t work then, if you are lucky, you find the corner cases where two edge cases meet. If you aren’t lucky you find them only when someone tries to do something and it all crashes horribly, data lost, reputation ruined.

My current example that I’m bashing my head against is a testing framework. And right there you have the problem. It’s a framework. I didn’t want to create a framework, but I though it would be OK as it was only one interface. So I wrote a set of tools to generate test cases from the interface. The test cases would be in an XML file so more tests could be added by hand (or by excel sheet in the manner of FitNesse). Fine. Good. Except that it wasn’t. The simple cases got done very fast, just a day to knock something up. Then the slightly annoying cases (parameters that were arrays, etc). Then the really awkward ones: System.Int32&, out parameters that were datatables and callback objects. Tricky. So I fiddled around for 3-4 days trying to get it all automated and then finally realised that the test generation code I was sweating over was for 1% of the interface! Clearly automation wasn’t saving as much time as it was. So I just generated the test data once, excluded the cases that didn’t work by hand and then checked the file into source control. Done.

In cases like this; the Pareto principle is your friend. Whilst I’m a big fan of automation in general and it is very useful to automate the generation or consumption of documents like test cases or diagrams of database schemas, there is a point where you need to stop. So what are the options when the automation time-saving curve begins to bend:

Well my favourite options are
1) exclude it: you should notify loud and early when you get to a case that isn’t automated, just so you remember it
2) hardcode it: be honest: how often is that thing going to change? if changing the hardcoded value is quicker than rewriting the code to handle all the cases then hardcode it. Of course, remember that other classic of time-management: the rule of 3. If you do the same thing 3 times then find another way.
3) special case sister system: if you really have to automate the special cases, then create a different system which is designed entirely around those special cases. In my case my “one excel row=one call to a method by reflection” wasn’t going to work for methods that took callback objects to subscribe to events, but it was easy to create a test client that only did callback functions. Then you can get back to your high-productivity 20% effort for 80% results programming.