Calling system failures ''glitches'' only masks the problem.
Visa prepaid debit cards recently suffered a rather public and embarrassing problem. It seems that a "small" number of users made normal purchases only to be charged $23,148,855,308,184,500-- basically, $23 quadrillion and change. Visa later indicated that fewer than 13,000 transactions were affected. The credit card company subsequently removed the charges and magnanimously waived the overdraft fee.
This was, as is quite common in these situations, described as a computer glitch. But the term "computer glitch" really frustrates me. People have been conditioned to think of a computer glitch as something that just happens when the computer goes haywire and that there is nothing that can be done to prevent it.
The truth is, computers don't make mistakes; people do. There are no late-night TV infomercials selling videos of "Computers Gone Wild" for only $19.95. Computers just don't go wild, and there is something we can do to prevent mistakes.
Although the conditioned acceptance of these failures by everyone seems like a great solution for IT as it lets us off the hook, it is really a dangerous trap. Saying something was just a glitch lulls us into complacency.
It's easy to buy into the concept of a glitch--we've all done it. However, rather than downplay or trivialize the failure by calling it a glitch, let's proclaim it to be what it really is: a human failure that must be addressed with appropriate resolve. That way we really can improve the systems we provide.
Glitches can be manifested a number of ways:
- A simple coding error.
- Inadequate user testing.
- Reducing testing to save time and money.
- Poorly defined requirements.
- Incorrect data entered.
- Failure to apply six-sigma concepts (error-proofing or poka-yoke) to application development.
If you've ever watched a TV show that goes into detail explaining an airplane crash, you will notice that rarely is there a single cause for the crash. Rather, it's an unusual combination of safeguard failures. If any single safeguard had worked properly, the crash would have never happened. The same is true of the Visa case and of most other so-called glitches.
As some have suggested, it appears a coding error at Visa was the original point of failure. Second, inadequate or nonexistent testing failed to identify the error before it was put into production. Finally, a lack of error-proofing allowed an obvious mistake to be posted to customer accounts.
This last feature of error-proofing is one in which we are particularly weak. Simply adding a provision to kick out any transaction above a likely maximum amount for review would have prevented the error. For example, if Visa's system had been programmed to flag any transaction above, say, $1 million or some other appropriate number, the problem could have been solved before it ever reached customers.
When we see problems as glitches, we tend to focus on the immediate failure--the coding error or the data entry error. We fix these and declare victory, ignoring the rest of the failures of testing and error-proofing. If we see them as serious failures, we need to look for all the reasons.
We in IT tend to focus mostly on design, less on testing and very little (at least in my opinion) on error-proofing. There is something to be said for not continuing to do so. After all, as any good engineer will tell you, it is essential to build in quality.
I'm not suggesting we reduce our focus on design, but I do feel that we cannot ignore testing and error-proofing until we have truly perfected those processes. This is true whether we are in the development phase or in the midst of fixing one of those nasty little glitches. Referring to as them failures, rather than trivializing them as glitches, can be the first step in that process.
"Steve looks shocked as he receives his bill" photo by Neil Crosby
This article is also posted on Forbes.com. Feel free to join in the discussion either on this site or at Forbes.com
If this topic was of interest, you might also like these:
- Let's Hang Up The Gloves
- Mind Your Posture or RTFM?
- IT's Weasel Words
- Or the posts in the "Communications" category
Recent Comments