Healthcare.gov is not the first large software project to fail, and it won’t be the last. It does, however, hold the distinction of being higher profile and better publicized than most software rollouts. The site will, eventually, function more or less as intended, and future systems integration classes will devote sections to Healthcare.gov, and how not to make the same mistakes.
A couple of weeks ago, I went through some of the early reports on the problems at Healthcare.gov, and, while It’s still too early to know all the details, the consensus that is forming looks like a lack of oversight and systems integration played a large part in the site’s problems. In a preview of FCW’s new CIO Perspectives feature, former DHS CIO Richard Spires sums up the best practices plan that should have been in place before the site went live:
1) Completion and testing of all subsystems six months prior to public launch.
2) Three months of end-to-end functional integration testing.
3) Concurrent performance testing that would have simulated loads up to 10 times greater than expected (especially since it was difficult to model expected peak loads).
4) A subsequent three-month pilot phase in which selected groups of users were using the system to work off problems not caught in testing.
He then analyzes why he thinks best practices were not followed:
1) Not enough time scheduled to develop components in the system.
2) Lack of a strong and competent Project Management Office (PMO) to oversee the contractors and make sure the components integrated correctly.
3) An immovable launch date caused corners to be cut in testing, and possibly functionality.
4) There was no authority to enforce IT requirements, allowing a major change in functionality to be added at essentially the last minute, and not addressing warnings that there were serious problems.
The author’s analysis of the situation is persuasive and authoritative, but the problem is that the analysis, whole or in part, applies to the problems found in most failed software projects. There are good, solid lessons to be learned from the failures at Healthcare.gov, but they’re the same lessons that should have been learned from other failed software projects, and if they haven’t been learned by now, it’s doubtful they will be learned in the future. The amount of time needed for projects is routinely underestimated, the time is made up by rushing through testing, and very often no one has enough oversight and control to be able to put their foot down and say, no, you can’t have user selected color coded menus in this version, but we’ll add it to our list of things to consider for the next release, and by the way, we need at least another month of testing before the software can go live.
For my part, I’m working in an environment with pretty good IT project oversight, so the dissection of the Healthcare.gov issues serves as a reminder of just how bad things can get if software projects are not well managed. I do, however, empathize with the system admins trying to keep Healthcare.gov up and running while the integration issues are addressed, and trying to track down bottlenecks and differentiate between when more resources would make a difference, and when they won’t. The first reaction to a site slowdown is usually that there aren’t enough resources, and adding more will alleviate the problem. But given the convoluted maze of back end connections for the web site, increasing resources at any one point increases the traffic to the next connection in line, either creating another bottleneck, or, worst case scenario, crashing.
The post Another look at Healthcare.gov appeared first on Heroix Blog.