Having trouble automating your tests? Then don’t!

We all love automated tests, right? Unit tests, functional tests, all kinds of test are better when they are automated. Unless of course, they aren’t.

The general pattern of testing software is configure-execute-assert. That is, we attempt to set the software into some known state. That includes constructing particular objects (including parameters for methods), setting configuration and even building test databases and inserting known data. Then we poke the appropriate part of the system by calling a function or injecting a message or event. The we write some more code that accesses the state of the system again and checks it against some expected value. That expected value might be dynamic (like today’s date) or static (method should return 0). The simplest code to test can skip step 1 as there is no stored state (barring simple parameter inputs to the method) and the assert step is as simple as checking what the method returns.

The kind of test that won’t be squeezed to fit this pattern are the tests that involve users. You might very easily have a requirement that “the users of the website can find what they need easily”. This is an important requirement to be sure, but not a testable one. What do you do, ignore it? There are often less vague requirements like “the report should show the individual transactions on the front page when there are less than 10 of them, but they should be on another page when there are more than 10”. That is a very reasonable – and very common – requirement for reporting applications. Unless you have hand-crafted your own reporting framework then it is also impossible to make an automated test. The *stuff* that will make that report will be a mix of data, configuration in binary files, code and all of it will be hidden in the reporting system framework; the only input will be the data from the database, the only output will be the PDF report. Creating a test that relies on doing a diff of the actual PDF file with the expected PDF file is not really an option; the test will be brittle unless the report is very stable and the test  will be constantly breaking as other parts of the report are updated.

So how do you automate? You don’t. At least, not fully.

You have to verify the requirement has been met and expecting the end-user to do all the work by clicking buttons is unreasonable. Particularly if you have to regression test. So what to do? My suggestion is that you only use people for the “assert” part and close up the rest with automation (See Scrum and XP from the trenches page 70).

So you make a test system that does the following for each test case:

  1. The configuration part. In this example, you have database entries that have 2 sets of report data; one with more than 10 transactions and one with less, you have the data stored in the test case so the user doesn’t need to remember it or input it.
  2. The execute part; the test system runs both reports
  3. The first part of assert is capturing the state; the system presents both reports to the tester in one screen
  4. The test case has a full description of what is expected; maybe including a diagram that shows what the tester is to check on the reports. Maybe some samples reports linked in.
  5. Some buttons to say “pass” and “fail” (and maybe “test broken”) that the user can click to store the result and move to the next test (N.B.: it would be good to store timing data too, be nice to know if some tests are ultra-time-consuming)

This means that the human testers are not wasting their time doing things the system can do and you don’t waste your time trying to program things that a computer can’t do. One example of this human computation in a talk from google is image recognition which simply can’t be done at present by computer, but humans can do very fast. This kind of Google approach to mass-collaboration also gives us a way around systems with hundreds of test cases. One person would go mad if they had to do them all, but if you provided a nice way for dozens of people to all take a few test cases each then no one would get too demotivated or lose concentration and the cycle-time could be kept short.

Of course, minimizing human testing is important as even the smoothest human computation is hundreds of times slower than true automation. Also, these tests would generally be system integration tests and whilst they are more important than unit tests, they won’t be helpful for catching all the bugs. So it is best to save the human-driven regression tests for the big releases and the parts that other testing approaches can’t reach.

Advertisements

Double D problems

I’ve been trying to come up with a cool name for problems where the devil is in the detail; the best I’ve come up with so far is “double D” but that doesn’t really do it justice. Perhaps D^2 would be better. Anyway, you know the kind of thing I’m talking about: the problem seems simple but it just gets harder and harder the more work that you do. Instead of the solution covering more of the problem, it seems to get bigger and bigger without really getting the 100% coverage that you really need. First you find the edge cases where things don’t work then, if you are lucky, you find the corner cases where two edge cases meet. If you aren’t lucky you find them only when someone tries to do something and it all crashes horribly, data lost, reputation ruined.

My current example that I’m bashing my head against is a testing framework. And right there you have the problem. It’s a framework. I didn’t want to create a framework, but I though it would be OK as it was only one interface. So I wrote a set of tools to generate test cases from the interface. The test cases would be in an XML file so more tests could be added by hand (or by excel sheet in the manner of FitNesse). Fine. Good. Except that it wasn’t. The simple cases got done very fast, just a day to knock something up. Then the slightly annoying cases (parameters that were arrays, etc). Then the really awkward ones: System.Int32&, out parameters that were datatables and callback objects. Tricky. So I fiddled around for 3-4 days trying to get it all automated and then finally realised that the test generation code I was sweating over was for 1% of the interface! Clearly automation wasn’t saving as much time as it was. So I just generated the test data once, excluded the cases that didn’t work by hand and then checked the file into source control. Done.

In cases like this; the Pareto principle is your friend. Whilst I’m a big fan of automation in general and it is very useful to automate the generation or consumption of documents like test cases or diagrams of database schemas, there is a point where you need to stop. So what are the options when the automation time-saving curve begins to bend:

Well my favourite options are
1) exclude it: you should notify loud and early when you get to a case that isn’t automated, just so you remember it
2) hardcode it: be honest: how often is that thing going to change? if changing the hardcoded value is quicker than rewriting the code to handle all the cases then hardcode it. Of course, remember that other classic of time-management: the rule of 3. If you do the same thing 3 times then find another way.
3) special case sister system: if you really have to automate the special cases, then create a different system which is designed entirely around those special cases. In my case my “one excel row=one call to a method by reflection” wasn’t going to work for methods that took callback objects to subscribe to events, but it was easy to create a test client that only did callback functions. Then you can get back to your high-productivity 20% effort for 80% results programming.

Forcing the garbage collector (2)

OK some updates that I have to share.

1) GC modes

If you add this setting to the app.config file and you have more that one physical CPU/core you get different GC behaviour.

<configuration>
  <runtime>
    <gcServer enabled="true" />
  </runtime>
</configuration>

This puts the GC on the other core, that has some interesting effects on memory and CPU. This is similar to Java VMs but I didn’t really think about it for .NET.

300000_server

you can see the CPU spikes as well as the different memory graph. Unsurprisingly this changes the throughput as well and you don’t get the “bad polls” nearly so much. Overall, thoroughput is improved as well as latency.

2) garbage collection spy class

Add this class:

    class GCNotifier
    {
        ~GCNotifier()
        {
            Console.WriteLine("**GC**");
            new GCNotifier();
        }
    }

and initialise it somewhere in main


static void Main(string[] args)
{
	new GCNotifier();
	. . .
}

then it will be touched each time a GC runs and you will see a console message although not necessarily at the same time as the GC and finalizer are on different threads. It is also a really cunning piece of code (not written by me).

Here is an excellent post on GC “secrets” from a blog I’ve not seen before, but now on my list

http://blogs.msdn.com/tess/archive/2007/04/10/net-garbage-collector-popquiz-followup.aspx

Forcing the garbage collector

I’m currently working on a system that is being designed to minimize problems with garbage collection. In the past I would have said that the garbage collection system is not the concern of application developers. I stand by that for most cases but this is a somewhat unusual application in that it combines a large number of operations per second with a very large number of objects in play (the “working set”).

I have written a small program to expose similar behaviour and it is below in full (short and complete). The basic operation is:
* create a large array of objects which hold some data
* select an object at random from the array and replace it with a new object; repeat this as fast as possible
* monitor progress on another, very light, thread that polls every second or so and reports the number of operations completed since the last poll.

The idea is that the large number of new objects makes for a very high memory allocation velocity, this means that there is a lot of garbage to clean up. The Large working set means that the garbage collector has a lot of objects to traverse and the random nature of the deletions mean that the memory becomes fragmented and will require extensive compaction in order to create contiguous blocks of memory for allocating of new objects.

Each of the objects in the array is about 4kB (= 10*100*sizeof(int)) and I started to observe the effects of garbage collection at about 20,000 objects on my dual-core desktop machine.

To run the program:

  1. save it (as say, c:\temp\Garbage.cs)
  2. open the Visual studio command prompt
  3. change to the c:\temp folder
  4. compile it running csc c:\temp\garbage.cs
  5. run the program, specifying the number of objects in the working set as an argument (in this case 1000) Garbage.exe 1000

First, let’s look what happens when the working set is small. If I run with 100 objects I see something like this:

C:\Temp>garbage.exe 100
starting main..
Filling the working set with 100 objects
starting monitor thread..
starting worker thread..
press enter to kill it
the count this iteration is:8,         ticks :0
the count this iteration is:17489,     ticks were:2127710960
the count this iteration is:17480,     ticks were:2127296384
the count this iteration is:17178,     ticks were:2127757912
the count this iteration is:17488,     ticks were:2127710808
the count this iteration is:17339,     ticks were:2127596432
the count this iteration is:17083,     ticks were:2132044856
the count this iteration is:17323,     ticks were:2127488824
the count this iteration is:17412,     ticks were:2127741720
the count this iteration is:17537,     ticks were:2127549544

so the program polls regularly and the number of updates to the data is recorded as the “count this iteration”, also the ticks elapsed since the last poll is recorded. If I run this for a while I get this kind of distribution of counts:

updates
per
iteration    #

===============

15900        2
15950        0
16000        2
16050        1
16100        0
16150        7
16200        20
16250        27
16300        42
16350        49
16400        27
16450        11
16500        1
16550        0
16600        0

average=16276
standard dev = 123
std/average = 0.8%

which looks “normal” we don’t see any spikes in memory, the garbage collector is keeping up with no pressure on memory or CPU. This is a typical app and we don’t need to worry about garbage collection. This behaviour actually persists — on my machine — up to about 20,000 objects (=20,000*4k = 80 MB). Above that, things start to happen.

When I run with 100,000 on my machine, I get something like this:

C:\Temp>garbage.exe 100000
starting main..
Filling the working set with 100000 objects
starting monitor thread..
starting worker thread..
the count this iteration is:1,           ticks were:0
press enter to kill it
the count this iteration is:12498,     ticks were:2127792200
the count this iteration is:12502,     ticks were:2127633784
the count this iteration is:12666,     ticks were:2127541688
the count this iteration is:7239,      ticks were:2127698056

the count this iteration is:12174,     ticks were:2127482736
the count this iteration is:8522,      ticks were:3823133528

the count this iteration is:12649,     ticks were:2127596272
the count this iteration is:12781,     ticks were:2186759112
the count this iteration is:12856,     ticks were:2127987848
the count this iteration is:12435,     ticks were:2127509440
the count this iteration is:6981,      ticks were:3197386008

the count this iteration is:12589,     ticks were:2148094192

we notice 2 things immediately:
1) the number of operations per poll is no longer distributed around an average, there are “bad polls” when a lot less operations complete
2) during those “bad polls” actually the monitoring thread is affected as well. The poll actually takes 50% or so longer to complete, so these bad polls are very bad, the throughput of operations on the data has dropped by as much as 75%

The distribution of operations per poll looked like this for me (your mileage may vary):

updates
per
iteration    #

===============

1000        3
2000        6
3000        2
4000        4
5000        2
6000        0
7000        0
8000        1
9000        0
10000        0
11000        6
12000        133
13000        43
14000        1

average = 10975
standard dev = 2715
std/ave = 25%

The big things to notice are
1) the operations per poll has dropped, even on good polls, but that seems somewhat reasonable but the decrease is quite a lot
2) the distribution goes nearly down to 0; the the range of operation counts went from 13681 to just 261! This “fat tail” is the result of the “bad polls”. This is what sometimes happens to stock prices, wiping out years of modest gains with a single big loss and it is why you shouldn’t always trust salespeople who tell you that the “average” gain on this investment is 10%, if, once every 10 years it loses 50% of its value you won’t do so well..
3) nearly 10% of the polls were “bad”; that is, having less than 50% of the average count

A picture is worth 1k words so let’s look at task manager:

garbagecollection100000

and the saw-tooth pattern of repeated garbage collections is quite clear. Now we don’t see a peak in CPU activity, as we are already running at high CPU but I guess that the CPU has switched from doing the real work to doing the garbage collection work!

So, the question is: how do we avoid this.
The answer is mostly that you don’t need to. If you aren’t trying to do 10,000 operations per second on 10,000 objects of reasonably large size (I don’t regularly create data objects with 100 integers inside). If you do have to do that then the best thing to focus on is object lifetime. If you need to create objects, make sure that they go out of scope quickly so that the can be collected in the generation 0 garbage collection, which is much faster. If you just need to store some data, don’t create new objects but modify the data stored in an object that you already have so the object lifetime becomes infinite, then it won’t be collected at all. Finally, be careful how you pass objects around, if references to them get stuck somewhere – for instance in a collection – you might be extending lifetimes that you don’t intend to. This kind of non-reference passing, controlled object-lifetime code is tough to write and maintain, so don’t do it if you don’t need it. The other alternatives like specialized hardware like Azul for Java or buying in a product that is an in-memory enterprise database (sometimes called an Enterprise Data Fabric or EDF) are both even more expensive.

code follows:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Threading;

namespace GarbageCollectConsole
{
class Program
{
/// <summary>
/// the large working set that will be the garbage collectors
/// main problem; as we update it, it has to work hard to keep it clean.
/// </summary>
private static ComplexObject[] _data;

private static int counter = 0;
private readonly static ReaderWriterLockSlim counterLock = new ReaderWriterLockSlim();
private static int lastCount;

/// <summary>
/// better than using DateTime.Now.Ticks
/// </summary>
private static readonly Stopwatch clock = new Stopwatch();

private volatile static bool stopNow = false;

/// <summary>
///  the entry point to the program
/// </summary>
/// <param name=”args”></param>
static void Main(string[] args)
{
Console.WriteLine(“starting main..”);

int arraySize = GetArraySize(args);
Console.WriteLine(“Filling the working set with {0} objects”, arraySize);
_data = new ComplexObject[arraySize];
InitialArrayFill();

var mon = new Thread(MonitorCounter);
Console.WriteLine(“starting monitor thread..”);
mon.Start();

var worker = new Thread(AllocateObjects);
Console.WriteLine(“starting worker thread..”);
worker.Start();

Console.WriteLine(“press enter to kill it”);
Console.ReadLine();
stopNow = true;
Console.WriteLine(“press enter to exit main…”);

Console.ReadLine();
Console.WriteLine(“about to exit main…”);
}
/// <summary>
/// parse the command line to get the size of the working set, start with
/// at least 10,000
/// </summary>
/// <param name=”args”></param>
/// <returns></returns>
private static int GetArraySize(string[] args)
{
if(args.Length<1)
{
return 100000;
}
else
{
return int.Parse(args[0]);
}
}

/// <summary>
/// start another thread that will put messages to the console
/// this is so slow that it won’t really affect things on the
/// worker thread
/// </summary>
static void MonitorCounter()
{
while (!stopNow)
{
int localcounter;
try
{
counterLock.EnterReadLock();
localcounter = counter;
}
finally
{
counterLock.ExitReadLock();
}

clock.Stop();
Console.WriteLine(“the count this iteration is:{0},\t\t ticks were:{1}”,
(localcounter – lastCount),
clock.ElapsedTicks);
clock.Reset();
clock.Start();

lastCount = localcounter;

Thread.Sleep(1000);
}
}

static void InitialArrayFill()
{
for (int i = 0; i < _data.Length; i++)
{
_data[i] = new ComplexObject();
}
}

/// <summary>
/// randomly abandon a referenced object. One of the objects
/// that is referenced from the _data array is replaced. As the _data
/// array is the only reference to the object, it is available for
/// garbage collection. As this loop only hits about 10,000 per second
/// — on my machine — most of the objects in the array generally live
/// for a long time, thus they get into generation 2.
/// </summary>
static void AllocateObjects()
{
var indexGenerator = new Random();

while(!stopNow)
{
Interlocked.Increment(ref counter); //safe to do as this is the only thread updating
_data[NextIndex(indexGenerator)] = new ComplexObject();
}

}
/// <summary>
/// use random, as we don’t want to just go through the array as it will
/// end up being contiguous in memory again. We want the heap memory to get
/// as fragmented as possible
/// </summary>
/// <param name=”indexGenerator”></param>
/// <returns></returns>
static private int NextIndex(Random indexGenerator)
{
return (int)(indexGenerator.NextDouble() * _data.Length );
}
}

/// <summary>
/// a simple object, I had to include the _data array
/// to add some memory pressure. Without it the memory allocation
/// velocity was so small that the memory was cleared without even
/// noticing a CPU or memory spike
/// </summary>
class SimpleObject
{
public int Number { get; set;}
public readonly Guid Foo;
public readonly DateTime stamp = DateTime.Now;
public readonly int[]_data = new int[100];

public SimpleObject()
{
Number = (int) (new Random()).NextDouble();
Foo = Guid.NewGuid();
for (int index = 0; index < _data.Length; index++)
{
_data[index] = index;
}
}
}

/// <summary>
/// create a deeper object graph, may not make a big difference
/// </summary>
class ComplexObject
{
public readonly SimpleObject[] _data = new SimpleObject[10];

public ComplexObject()
{
for(int index =0; index< _data.Length ;index++)
{
_data[index] = new SimpleObject();
}
}
}

}

Love in an elevator: the right way to do a stand-up meeting

Lunchtime: people in my team got up to go and get a sandwich. Not wanting to be left behind even people who didn’t need a sandwich went out with the others to get some air. As soon as we got into the lift we automatically formed an inward facing circle — a huddle — and downloaded what we had been doing that morning. By the time we got to the lobby we had done a mini, half-day scrum!

What an excellent way to enforce standing up, standing close and timeboxing! Sadly, it doesn’t work for distributed scrum teams, you can’t even use your mobiles to conference people in as the elevator makes such a good faraday cage.

performance of decimal

Why decimal? If you don’t know when and why you should be using decimal rather then double you are probably, like me, a person with no formal knowledge of computer science and you shouldn’t be allowed to do anything other than a hobby project. But, just for review, here are some nice articles about the subject:

An overview of the different types of number:
http://www.extremeoptimization.com/resources/Articles/FPDotNetConceptsAndFormats.aspx

A full series from Eric Lippert on numbers and particularly floating point numbers:
http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx

For my purposes the interesting thing is how much faster is double than decimal. The reasons for the speed difference are discussed in some articles but the essential points are the size of the representation of the number is much larger in decimal than it is in double and also there is good hardware support for double calculations. That is, the CPU is better at double than decimal arithmetic. If you don’t see that the representation of numbers makes a difference to how easy arithmetic is then you should try doing long division using roman numerals.

Here is my test code, it is a short and complete program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DecimalVersusDouble
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 1)
{
throw new ApplicationException(“should only have one argument”);
}

int count = Convert.ToInt32(args[0]);
long start;
long end;

start = DateTime.Now.Ticks;
DoDouble(count);
end = DateTime.Now.Ticks;
Console.WriteLine(“double took :” + TicksToSecondsToString(end, start));

start = DateTime.Now.Ticks;
DoDecimal(count);
end = DateTime.Now.Ticks;
Console.WriteLine(“decimal took:” + TicksToSecondsToString(end, start));
}

private static string TicksToSecondsToString(long end, long start)
{
return (((double)end – (double)start)/10000000.0).ToString();
}

private static decimal[] theDecimals = {1.2m, 2.3m, 4.5m, 6.7m, 8.9m};
private static double[] theDoubles = {1.2, 2.3, 4.5, 6.7, 8.9};

private static void DoDecimal(int totalCount)
{
decimal result = 1;
for(int i = 0; i < totalCount; i++)
{
foreach(decimal number in theDecimals)
{
result = result + number;
}
}
}

private static void DoDouble(int totalCount)
{
double result = 1;
for (int i = 0; i < totalCount; i++)
{
foreach (double number in theDoubles )
{
result = result + number;
}
}
}
}
}

I, with my liking of the command line, use it like this:

  1. save file as decimal.cs
  2. open powershell, and run the script that allows powershell to work like the visual studio command prompt
  3. run csc decimal.cs
  4. run the progam by calling decimal.exe 100000
  5. repeat your measurements by using this powershell one-liner  for($i=0;$i -lt 10; $i++){ .\decimal.exe 10000000 }

results look like this:

PS C:\dev> for($i=0;$i -lt 10; $i++){ .\decimal.exe 10000000  }
double took :0.09376
decimal took:3.5624704
double took :0.0937472
decimal took:3.5624704
double took :0.0937472
decimal took:3.5624832
double took :0.09376
decimal took:3.5624704
double took :0.0937472
decimal took:3.5780992
double took :0.0937472
decimal took:3.5624832
double took :0.109376
decimal took:3.5624832
double took :0.0937472
decimal took:3.6718464
double took :0.109376
decimal took:3.6874752
double took :0.1093632
decimal took:3.6093568

So if you are doing 10,000,000 decimal operations you are burning up seconds.

Of course, if you need to use decimal, you probably need it no matter what. Also bear in mind that the size of decimal will mean that if you are regularly passing these things around a few million times a second, then you will be passing by value and doing a lot of copying. If you are have a set of numbers that you are doing a lot of recalculation on them – say, a portfolio of financial instruments that needs to be revalued every time the prices move – then you can hold them in a collection and pass the collection by reference. Then you won’t be copying them, but you might be opening your class internals in a bad way (see Eric Lippert again: “Arrays considered somewhat harmful” ).

Of course, the converse is true: if you want to do less than 100,000 arithmetic operations per second, then decimal is better and there is no measurable performance cost.

When is a unit test not a unit test

Working with people who have a good attitude to testing is great. Every new piece of code has some tests associated with it, not a huge number, but a few. And a little test is better than nothing especially when they get run many times a day in CruiseControl.net. It gives you a place to put more tests when you need them. What is bad about it, is that these tests are referred to as unit tests. These aren’t unit tests and people don’t seem to realise a) why they aren’t genuine unit tests and b) why genuine unit tests are good.
Why they aren’t unit tests?

The biggest problem with people who think that unit tests don’t test anything is making them realise that their unit tests don’t test anything because they aren’t writing OO code. How can you tell that they aren’t doing OO? Easy: they aren’t using interfaces. If you aren’t using interfaces you aren’t doing OO; you are just doing strongly-typed coding based around groups of functions. That has a name in some languages: a module. In C# it is called a static class, but really it is the same thing (don’t start me on the evil overloading of the word “class” here). Of course, you can group a bunch of methods and test that bunch of methods and that will be testing a unit, but it won’t be OO code and it won’t be an OO unit test.

How does the lack of interfaces prevent OO unit testing? Well, nearly every class that you use has to talk to some other class. Simple classes consist of some logic working on primitive types (integer, char, maybe string etc). Complex classes contain logic that brings together different simple classes (or maybe multiple instances of simple classes, like a collection). So a simple class has a unit test that goes through its public interface and drives its logic, but a complex class may rely on having working instances of the simple classes that it wraps. The first time you see this, you think that is not so bad. You can still use the tests to see the errors, if the simple types pass and the complex types fail, then you can still get value from the tests. true enough.

Of course, as soon as you enter the world of “enterprise” applications the wheels come off this very quickly. Your “simple” classes are often resource wrappers like messaging middleware or a database connection or other external resource. When it is time to run the “unit” tests, you find that only one developer can run them at a time, they need to run on a special machine, you can’t just check the code out and run, you need to run a script file to register all the components locally, or a hardcoded connection to a test database. All these things are warnings that what you have written aren’t unit tests, but are integration tests that are trying to run in an inadequate unit/class test environment.

Why do you care?
Well, what is wrong with integration tests? Well, nothing. In fact, if only one set of tests could be written then you should be writing the integration tests. Only full system integration tests will verify application behaviour. This assumes that you consider unit testing to be a part of testing. Most agile development considers unit testing to be part of development work as a way of exploring class and interface design.

So, if integration tests are better tests, then why are we spending time working on unit tests? Well, unit tests are the key to improving design. If your tests are integrating half the application stack (i.e., database, messaging etc) and is in and out of all kinds of resources, can you assess the test quality? If you are writing a GUI application that throws some data in and out of a database, how confident are you that when you make a change that your “unit” tests will catch it? You are confident? Good. I’m glad that you are confident, but you probably get that same confidence by starting the app and clicking the buttons. Now, what if your applications have dozens of threads running in multiple, dynamically-loaded AppDomains in multiple processes on multiple machines; all caching and updating data and state and communicating with each other. Now, should I be confident that my half-stack integration “unit” tests are catching all the bugs?

Somewhere in between these extremes is a point at which you need to make changes very carefully, the only way to make careful change is to have tests that operate at all levels.

Composable, flexible, testable
You often hear that you shouldn’t design for testability. There is some truth in that… if you make your systems testable by adding more public members on the classes! However, good design – and by “good” I mean a design that allows the code to be maintained and modified – is often very testable. If the units are decoupled, then the units can be tested.

This makes an impact on the other big criticism of using unit tests: “Every time I change my class, I need to change all my unit tests. This is a waste of time”. If you need to do this often, you should think about the public members on your class. A really common kind of change is adding an extra field to a GUI; then that field needs to be added to the business logic, the service layer, the database and – of course – the tests. If you are doing this often then it is not an indication that unit tests are bad, it is a indication that your class is likely not well encapsulated and is exposing too much of itself.

This work, breaking an application up into really composable units, is often unnecessary in a small application. An intellligent person can mostly manage the complexity, but as soon as the application becomes large – and my rule of thumb is something like 5000 lines – you will start to see the benefits of this interface-based, testable design.

How to change
So how to you fix this?

Well if it a big system – and if it isn’t, then don’t bother – then nothing is simple. It isn’t just a case of getting your refactoring tool and hitting Extract Interface. But how else could you do it? Well there are no silver bullets and it will be hard work. “More work” might not look like the right thing to do, but you are going to change your application to be more composable that means hiding implementation behind interfaces, and that means refactoring. And for refactoring to succeed it means more testing and coding carefully and baby-stepping towards the goal. And to do that you need to know what the goal is and how you will know when you get there.

The point is to make your complex classes that integrate simple classes work with interfaces not instances, so that you can replace external resources with fake resources. So instead of making a call out the MSMQ messaging API via COM and waiting for a response, you insert a stub class that just returns “true” or whatever. So you remove the dependency on the messaging system or database when you are testing. This isn’t just useful for classes that wrap external resources, replacing instances simple classes with interfaces allows unit testing to concentrate on only one class – system under test – and not all classes under it. Reducing the class coupling, reduces the complexity of verifying test behaviour and also, as a nice side effect, can speed up your test execution by an order of magnitude.

So…
Adding unit tests is not at all simple for a large system, and it certainly won’t be free. It might be worth doing for a complex system if you can break the system into good parts that can hide behind interfaces. If you can’t hide implementation behind interfaces then you should be afraid, very afraid; even if you are doing some testing it isn’t unit testing and if your system can’t be unit tested, you should spend a lot of effort on automating the tests you do have, you will certainly need them!

See also:
http://feeds.feedburner.com/~r/UdiDahan-TheSoftwareSimplist/~3/407626945/
http://www.martinfowler.com/bliki/TestDouble.html
http://xunitpatterns.com/Test%20Double%20Patterns.html