Bug estimations and velocity

Imagine a team that normally expects to do 25 points in a 2 week Sprint. They normally produce high quality, well tested, well reviewed, work, and as a result very few immediately apparent defects make it into a release. Their releasable work is normally deployed to production a few days after the end of each Sprint.

Now put them under pressure to deliver a project with a hard deadline in one sprint. They work late, they put in a huge effort, but they also cut corners. Yes, this will happen. Worse, the project's on that horrible system that no one likes to touch. That one with very little (or no) automated test coverage. The one that has a billion user facing features, many of them long forgotten by your team. Yes, that one.

On paper they get through 50 points of work in the Sprint. No one is entirely comfortable with this, but it's going out on time. It is, of course, a bit of a mess. Bugtastic perhaps. Some of the bugs are "fix this right now" levels of fail. Many are "fix this next sprint or the world will end."

These bugs are clearly going to seriously interfere with the next Sprint. It's also pretty inevitable that bug fixes rushed through initially will result in more production bugs too, though hopefully less urgent ones. These may well impact a second sprint as well.

You start work on the next sprint, and bugs are sure enough reports of problems in production are pouring in. You decide to apply estimates to them, and assign developers to fixing them as they appear.

So, do you count these bugs towards velocity or not?

Lets start with no. You get through a big pile of bugs (0 points) plus a few planned stories (lets say 10 points each for two sprints.) Your velocity looks like 50+10+10. That's not actually too awful. Over the last 3 sprints you got through 70 points of the product backlog, while averaging 23 points a sprint. You're actually still being quite consistent with the product owner's roadmap.

Excellent. Now, what if we did count those bugs towards velocity? Lets say you get 20 points worth of them to fix in those two following sprints. Your velocity is now 50+30+30. This looks pretty damn awesome! Over the last 3 sprints you've just averaged 36 points. Pats on backs all round, we should do this more often! Next sprint we'll go for another 30, yeah? This is clearly wrong. You should not be able to boost your velocity by doing work badly. It screws with roadmap planning because it artificially inflates your velocity, making it look like you're capable of working faster than you are. It also screws with measuring how much work has been done since that wasn't 110 points of planned work from your product owner's backlog, it was only 70!

So, what happened? You adjusted your definition of done. You cut the level of quality and attention to detail put into your work in order to meet a deadline. This may be fine, it might have been life or death for the company. You might have saved the business, or made it millions, despite the bugs that got into production!

In an ideal world you should have re-estimated when it was apparent that you were going to cut corners. A pile of "then fix it" work (preferably thought through in advance, rather than reacting to a stream of bugs) would account for the difference (plus a bit) between the new and original estimates. You'd have been able to count all of that towards your velocity. That pretty much never happens though, if you're dropping testing and reviews in the desperate rush to production then there's no hope you're going to pause for long enough to do that! You don't sit down and carefully plan through what corners you're going to cut, and what the implications are, you just get the work out the door.

This all still holds true for the general case of release bugs, it's just more obviously wrong when you consider it in the context of this kind of project, as the outcomes are so extreme. You're supposed to be aiming for producing defect free code, and a failure to do that should not make it look like you got more work done.

So, am I saying you shouldn't count bugs towards velocity? Well, it's not that simple.

Sometimes you've inherited a huge pile of bugs from the past, with no immediate urgency for a fix. Those should be estimated and the product owner will schedule them as and when they feel they're important. Since they're now on the product backlog they should be counted towards velocity when they're completed.

As and when new bugs come up, it's up to the product owner. If the product owner wants to defer fixing the bug (and this shouldn't be the norm), then yes, it goes into the product backlog and will be counted towards velocity when it eventually gets done in a sprint. If they want it fixed in the current sprint (which should be the normal case for newly introduced defects) then it must not be counted, since it isn't progress towards the product backlog and is a result of failing to fully deliver a previous sprint.