"All programs have bugs" makes programmers complacent, says David Norfolk. He takes the heretical view that bugs are there because, subconsciously, we want them there.
Heresy
There's an apocryphal story I was told back when I was a young programmer about a system, built in assembler, for a large West Coast (USA) utility company, which was essentially defect free over a long lifetime. They held a party to celebrate a significant anniversary for this system and someone asked one of the original system builders about the methodology they used to achieve this feat. No methodology, he said; you forget that this was all built years ago before programming was a career choice. We were just engineers and no-one told us we were allowed to build it wrong…
Apocryphal indeed. Engineers build things wrong - the Tacoma Narrows Bridge collapsed because wind blowing over it excited its resonant frequency. Do something new and unexpected problems appear, nothing you can do about that. Who'd ever expect that stuffing more bytes into a buffer than it was designed to hold would spill the surplus bytes into parts of the computer where they can execute as valid but malicious code? Certainly no-one who was used to working on the sort of "legacy" operating system that doesn't allow that sort of thing.
However, there is a difference between an engineer's mistakes and a programmer's mistakes - no bridge ever fell down again because of the Tacoma Narrows mistake - existing bridges round the world were checked and aerofoil section bridge decks fixed the problem in new bridges. On the other hand, buffer overflow problems are still happening years after the first example was found, despite the availability of tools to find such things and even to prevent them happening. And, despite the well-documented and very expensive virus and "Trojan horse" exploits which depend on buffer overflows - a simple programming mistake.
Why do programmers tolerate such mistakes? The glib answer is that they have to because "all programs have bugs" (presumably it's something Bill Gates did when he invented the first computer <irony alert>). In support of this you can quote Fred Brooks in the Mythical Man Month, where he explains that programs are just about the most complex artefact produced by men or women (a bridge repeats identical assemblies; in a structured program every component is different). This doesn't justify mistakes; however, when we need to make defect-free safety critical systems we mostly can (and just as well, since the world mostly runs on computers these days).
No, we have defects, mistakes ("bugs" is such a weasel word) in our programs because, unconsciously or not, we want them there - or, at least, their existence doesn't make us embarrassed or affect our career. It only remains to ask "why?"
Well, often we are rewarded for our mistakes. I once knew someone who was almost sacked from a major bank for disloyalty - because he only worked 9 to 5. He was pretty low profile, everything he built just worked, and he never had to come in after hours to fix anything, because his systems didn't go wrong. In contrast, other teams got brownie points for delivering systems more quickly - and more brownie points for turning up at 3 am to fix the inevitable bugs. Oh, by the way, my friend wasn't sacked - possibly because his bank sold his system to another bank and the transfer was totally hassle free, which impressed someone important. However, the people whose reputation came from conspicuous delivery - and conspicuous firefighting - weren't sacked either.
Margaret Edney, a member of a worldwide network of experienced software testers, programme managers and consultants that helps organisations address IT problems told me of a similar situation, where the "I was in working on this program until 4 o'clock" syndrome is encouraged and the developers who exploit this expect to get the largest bonus when the project is a success - and usually do. The problem comes when the tester, or customer, tries to run the code. "I've worked with a senior developer who would work until late at night, making changes to code which he would release onto the live system. He might test but was usually too tired to notice anything except major failures. He would then go home; leaving a note that he would be in late, because he'd worked so hard all night," she explains. "In the morning, we would find a part of the live system was unusable and the changes would have to be backed out and re-done, by another developer," she continues. "However, the senior developer got the kudos for dedication to the company and the moral of the story for the other developers seemed to be - work long hours and you will be rewarded, whatever the results".
A second reason for putting bugs into code is that, for some people, finding them is fun. I'm not suggesting that anybody deliberately puts bugs into a program because they enjoy testing but it's much easier to avoid breaking the flow by putting off looking up some detail of an interface or record format, isn't it? Take a guess, get the code finished and then go through fixing the details later. The trouble is, if testing only finds, say, 90% of the mistakes in your code, the more errors you have when you start testing, the more errors you'll finish up with - but the process of winkling out the errors you do find and correcting them is rather satisfying. In other words, finding your own errors makes you feel good.
And then, there are reports of actual sabotage, of people who depend on their overtime payments to support their lifestyle or who plant "time bombs" they can activate if they ever get the sack, but let's not talk about this, it's too frightening - and people who subconsciously encourage defects are quite interesting enough.
Perhaps the most usual reason for producing a defect-prone environment is a misguided pursuit of productivity above all else - when what matters is your production of usable automated business functionality, not raw code. Another member of the same network as Margaret Edney told us about a time when he was working at a telecommunications company where management assessed programmers on their KLOC (thousands -kilo - of lines of code) productivity - which lead first to lots of commented lines and then (when the metric algorithm got cleverer) to lots of "no operation" lines, which - people hoped - did nothing but still counted as "productivity". "The whole productivity thing finally lost all credibility," I was told, "when a bug which had been puzzling everyone for weeks was solved by REMOVING a faulty line of code. The productivity of our most senior programmer was apparently negative…."
Most of the examples I'm citing go back some years. Does that mean that such practices are dying out, or just that people don't like to talk about them as they're going on? Since the Standish Chaos Reports, and newspaper reports of failed projects, suggest that project failure is a continuing problem, I fear it's the latter.
Anti-heresy
There is no question in my mind that the cheapest, most cost-effective way to produce error free programs is to aim to write them without mistakes in the first place. Then, the cheapest way to remove any mistakes that do creep in is to eliminate them as early as possible. Correcting mistakes takes time, risks introducing new mistakes and, if the mistake has been in place for any time, may result in you abandoning or redoing code built on a faulty foundation.
So, how do you discourage people from putting mistakes into code without making their lives dreary and uninteresting? Well, you start with culture. If management philosophy is "don't get it right get it writ" and talk is always of code productivity and the dangers of gold-plating solutions, then keeping mistakes out of code will be low priority. However if you measure defect rates in aggregate (don't make them attributable to individuals) and reward/promote the people in teams associated with low defect rates your staff will soon get the message - and peer pressure will look after individual performance. Tom DeMarco and Timothy Lister in their classic "Peopleware" include a whole chapter on the use of a theatrical Black Team of testers to make it absolutely clear, in a fun (more or less) and socially acceptable way, that developers would be much happier if they didn't make mistakes.
Next, give people good tools, so that it is easier to copy in the actual interface than take a guess and look it up later. And, consider "test driven development" methods such as XP - although allowing the architecture to just evolve, as many XP proponents recommend, seems awfully risky to me. An early mistake in the architecture of a system can blight the whole of development.
But, what about those people who actually enjoy "debugging programs". Well, give them the job of being testers - of other people's programs. Except in particular cases, don't make them into full-time testers - they probably enjoy programming too- but arrange it that each team spends some time testing some other team's programs. By all means, let people test their own programs but don't rely on this (it is likely that the test cases embody the same misconceptions, if there are any, as the code.
However, there is no silver bullet, not even Extreme Programming, with its emphasis on "test driven development". Management must put time, effort and thought into producing a culture where mistakes are seen as unacceptable, not as some sort of badge of programming honour - and calling "bugs" "mistakes" (or worse) is a good start.
-------------------------------------------------------------------------------
This article was originally published in Application Development Advisor as "Dirty Rotten Cock-Ups".
If you would like a free subscription to ADO visit http://www.appdevadvisor.co.uk/subscribe/free/index.html
References:
CHAOS Chronicles v3.0, from the Standish Group: www.standishgroup.com/chaos/toc.php
Program Complexity: Fred Brooks "The Mythical Man Month" Anniversary Edition, Addison Wesley, reprinted 1995, ISBN 0-201-83595-9, p 182.
History of the Tacoma Narrows Bridge: http://www.lib.washington.edu/specialcoll/tnb/
The sociology of making defects unacceptable (The Black Team): Peopleware, Demarco and Lister, Dorset House 1987, ISBN 0-932633-05-6, chapter 19.