BlogRoll insight statistics

Software Testing Industry Benchmarks

There is a bit of a discussion on “industry norms” on the yahoo based software testing group run by James Bach and Cem Kaner. It’s about the problems with trying to compare your company’s software testing against industry norms.
One of the comments made by James Bach was:

“But, nobody is studying industry norms except grad students in CS who
use badly designed surveys to poll non-randomly selected people who
then either make stuff up or simply don’t know the answers to the
questions on the survey. I’ve been polled that way four or five times.”

One Australian company that is trying to make some sort of benchmark is K J Ross & Associates. They’re asking for industry participation into a survey on software testing.

My thoughts on this are, that whilst it’s something to be applauded, in the end, people tend to use data to justify a decision than to be drive a decision. So what exactly is the benefit of such a survey?  I’d like to hear K J Ross’s  thoughts on this!

Do “industry norms”  have a place in business?  I’ve worked in enough large and bureaucratic organisations to know that some things just don’t get done without some data to backup a decision. I don’t necessarily agree or disagree with this, I just use it to my benefit.

Perhaps this is just me, I’m not an evangelistic software tester, I don’t have a new way to software test, or a new technique to revolutionise the testing industry.

My goals a more basic, and perhaps I lack the vision and drive to see something like this change. I work with what I have, not with what I want to have.

I think that’s “good enough”.

4 replies on “Software Testing Industry Benchmarks”

Hi Anne-Marie,
Good post to get some debate going.

The goal of the industry benchmark is to collate a profile of how organisation manage software testing and quality. I have found through consulting with many large organisations that they want to know how others are doing it, they want to know where they are different. We are trying to provide some visibility into this without having to separately visit each of these organisations yourself. Also it becomes a basis for recommendation that I may have. As a consultant, I could just say this is my idea, but if I can justify it with some practice-based evidence, then the stakeholders have confidence that they are not going it alone, bleeding edge, or just some new hair-brained idea of a crazy consultant (or new test manager!).

We could argue about metrics, what are useful and what are not, for decades. I agree metrics and benchmarks will have flaws, but in my view that does not make them irrelevant.

While undertaking academic research we learn that undertaking a study you must consider the “threats to validity” in the data collection and analysis. Such threats, and I think this is what James and Cem are likely to refer to, exist in even the best scientific studies, even where use of double blind and control group testing. It becomes very expensive to try to minimise the threats, yet even where various levels of threat exist, does this mean that empirical data is irrelevant and has no value – I don’t think so. I think that we can debate the data, and the interpretation of the meaning under consideration of the threats to validity. Analysis of metrics help us to gain a better understanding of the processes that we use, and it should lead us as managers to ask more questions regarding the implications.

As a manager I often need to have evidence to reinforce my proposal or justify changes that I would like to make. Without evidence it will come down to an argument dependent more on strength of personality than one of insight.

As a business manager there are lots of things that I like to obtain industry benchmarks, that way I can start to understand the health of my business. I like to consider these items as part of our balanced scorecard, things such as:
– what level of profit is considered healthy for a consulting business
– what are the salary levels that we should pay for key roles within the organisation to stay competitive?
– what proportion of time should I invest in R&D and training?
– how much should I spend on marketing and sales?
– how long does it take to recover payment on invoices?
– and so on

It’s not that I want to do exactly what my peers or competitors are doing, but I want to reflect on where I am different and is there a strategic reason for this. It also helps me reflect on efficiency of different parts of my business and I can start to see relationships between metrics. It does not mean that I run my business by the numbers (or the ratios), but I use those numbers to gain insight and to make healthier management decisions.

Translating this into how I would run a testing project or department, I would want to understand similar things, but specific to testing outcomes:
– how much should I pay testing staff?
– what proportion of project budget is considered average?
– where am I finding defects in my development processes?
– how should I resource my team: from permanents, contractors or outsourcers?

By having an industry norm it enables us to see what most other organisations do. It is not that we have to be average, but it means we then have a way of seeing where we are the norm, and then we can consider or explain why we have chosen a particular strategy. It helps us understand where we may be different.

So that is the goal, but lets get back to threats. I think a lot of issues will come out of the study. How can we segment the data, so that way I might be able to refer to organisations that are similar to mine, in terms of industry, size, etc.? How do we ensure that data collected is accurate, and we use similar measurement units? Also the contributors to the survey are perhaps likely to be more sophisticated to quality and testing, those more unaware are less likely to participate. So does this mean that results are further skewed – perhaps it does.

Some of the high level benchmarks are relatively easy to survey, e.g. perceptions and salary for instance. However, some of the lower benchmarks, such as process and defect statistics are harder to collect accurately. I think we need to try though, don’t you?

At the end of the day I think starting to collect empirical data is step towards maturity of our industry. I think our industry currently lacks empirical metrics, I can rarely find any published metrics from software testing. I am sure we will get it wrong, studies will be (rightly) criticised, as well as perhaps incorrectly referred to. However I think debate over such data will mean better understanding by the practitioners.

If you’re going to study a social system (software projects are social systems), then you are engaging in social science. The problem Cem and I have is that the people who push these studies in the computer field so seldom appear to know anything about it.
Kelvin speaks of “evidence.” He says he wants evidence. No, I think what he wants is an excuse. That’s what bad science does– it gives reckless, willful people an excuse to believe whatever they like, while asserting with a clear conscious that they were duly diligent.

It seems to me there is a plenty of evidence of a certain sort– direct experience. Unfortunately, it’s difficult to preserve and recall our direct experiences. When we are asked about them, we remember some of it, misremember other things, misinterpret it, edit it for consumption (so that we don’t look bad or so that other people do), and redact the bits we aren’t permitted to share. And even if we were perfectly honest, our listeners wouldn’t be able to understand or accept all of what we say.

Along comes the survey. What happens next? You might as well answer randomly. That leaves us with fairy tales; mythology. It leaves us with silly ideas like the CMM and TQM and TMAP that hurt us and hinder innovation.

I speak against these because I’m dedicated to real progress. Those of us who care to have made a lot of progress in recent years, not by studying mythical norms, but by learning how to see and think; conferring with each other; and applying the lessons of qualitative research.

I’ve coached lots and lots of testers. I’ve visited a couple hundred companies. The only thing I can say about norms in testing is that it is normal for testers to be confused, untrained, and unable to explain what they do– and they will try to APPEAR unconfused, competent, and they will make explanations that sound superficially okay, unless you probe a little. That’s normal.

For those interested in helpful science, rather than the pseudo kind, Google the following: ethnomethodology (Garfinkle), symbolic interactionism (Blumer), naturalistic inquiry (Guba), situated action (Suchman), grounded theory (Glaser and Strauss), bounded rationality and heuristics (Simon), the “mangle” of practice (Pickering), sensemaking (Weick), research programs (Lakatos), conjectures and refutations (Popper), paradigms (Kuhn), general systems thinking (Weinberg), radical constructionism (von Glasersfeld), cognitive biases (Kahneman), social life of information (Duguid and Seely Brown).

My reading in these areas, and others, combined with years of seeing people looking for cheap, easy answers and utterly failing, cured me of any urge to look for wisdom in the form of “industry norms.” Instead, let us create a norm called: vigorous self-education.

Leave a Reply

Your email address will not be published. Required fields are marked *