So, today was a good day.
I’ve had an application in production for a couple of years. It’s a server-side framework providing a rich context for server-side work “atoms” of a type that has been written, no doubt, hundreds of times (and made obsolete by things like IIS 7 and WAS). Of course, it is heavily multithreaded and has all kinds of stuff going on like AppDomains, MSMQ, external libraries and all kinds of contributions from all kinds of developers in the company, some of whom understand unmanaged resources better than others. So you can see where this is going: over the years more and more functionality has accumulated in this beast and now the thing is running 24×6, responding to requests every minute of every day and now the bugs are coming to the surface. The worse of these is under very heavy load, the occasional request just goes missing.
Well, why didn’t you test it? I hear you cry. Well, I did. 2 years ago I had the unit tests, all the automated builds and automated functional command line test clients and so on. Of course, once it was working I stopped testing. And tweak after tweak has accumulated and when I went into the code to investigate the problem I found not only could I reproduce the bug with an embarassingly small number of concurrent requests but, even worse, over time I’d commented out all the unit tests on the critical section where all the threading issues were centred! And when I uncommented them, none of them worked.
So, I tried a little free solo refactoring, making some changes in the code to try and clean it up just using my command line client to load test the server. Obviously, that didn’t work so I bit the bullet and started on a new set of unit tests.
After I started, I realised what a lot I’ve learned in the last 2 years. How much more I understand testing and how much better my code is. I saw so many things that now made me cringe. But I’ve stayed true and written my tests first, using plain old stubs that removed all the connections to MSMQ and so on. And today, with the help of NUnit, TestDriven.Net and some old-school logging I’ve found the damn thing and bingo! concurrency up in the hundreds.
It feels even better as I was getting an ear-bashing all day about how writing unit tests isn’t very useful if you can’t tell whether the tests are failing because there is a problem with the multi-threaded test context. Actually, chaps, there is nothing wrong with the test context. It is the SUT code that is broken, as always.
So, bug fixed. Ready to refactor on Monday!