Getting
measurements right can be devilishly difficult, but getting them wrong
can be downright dangerous. If you look underneath most self-defeating
behavior in organizations, you will often find a well-intentioned
measurement which has gone wrong. Consider the rather innocent-sounding
measurement of productivity, and it’s close cousin, utilization. One of
the biggest impediments to adopting Just-in-Time manufacturing was the
time-honored practice of trying to extract maximum productivity out of
every machine. The inevitable result was that mounds of inventory
collected to feed machines and additional piles of inventory stacked up
at the output side of the machines. The long queues of material slowed
everything down, as always queues do. Quality problems often took days
to surface, and customer orders often took weeks to fill. Eventually
manufacturing people learned that running machines for maximum
productivity was a sub-optimizing practice, but it was a difficult
lesson.
As software development organizations
search for productivity on today’s tight economy, we see the same lesson
being learned again. Consider the testing department which is expected
to run at 100% utilization. Mounds of code tend to accumulate at the
input side of the testing department, and piles completed tests stack up
at the output side of the testing department. Many defects lurk in the
mountain of code, and more are being created by developers who do not
have immediate feedback on their work. When a testing department is
expected to run at full utilization, the likely result will be an
increased defect level, resulting in more work for the testing
department.
Nucor Steel grew from a startup in 1968
into a $4 billion giant, attributing much of its success to an incentive
pay system based on productivity. Productivity? How did Nucor keep
their productivity measurement robust and honest throughout all of that
growth? How did they avoid the sub-optimization so common most
productivity measurements?
The secret is that Nucor measures
productivity at a team level, not at an individual level. For example,
a plant manager is not rewarded on the productivity of his or her plant,
but on the productivity of all plants. The genius of Nucor’s
productivity measurement is that it avoids sub-optimization by measuring
results at one level higher than one would expect, thus encouraging
knowledge sharing and system-wide optimization.
How can this be fair? How can plant
managers be rewarded based on productivity of plants over which they
have no control? The problem is, if we measure people solely on results
over which they have full control, they have little incentive to
collaborate beyond their own sphere of influence to optimize the overall
business. While local measurements may seem fair to individuals, they
are hardly fair to the organization as a whole.
Measure-UP, the practice of measuring
results at the team rather than the individual level, keeps measurements
honest and robust. The simple act of raising a measurement one level up
from the level over which an individual has control changes its dynamic
from a personal performance measurement to a system effectiveness
indicator.
In the book “Measuring and Managing
Performance in Organizations”, Dorset House 1996, Cutter Consortium
Consultant Robert Austin discusses the dangers of performance
measurements. The beauty of performance measurements is that “You get
what you measure.” The problem with performance measurements is that
“You get only what you measure, nothing else.” You tend to loose the
things that you can’t measure: insight, collaboration, creativity,
dedication to customer satisfaction.
Austin recommends aggregating individual
performance measurements into higher level informational measures that
hide individual results in favor of group results. As radical as this
may sound, it is not unfamiliar. Edward Demming, the noted quality
expert, insisted that most quality defects are not caused by
individuals, but by management systems that make error-free performance
all but impossible. Attributing defects to individuals does little to
address the systemic causes of defects, and placing blame on individuals
when the problem is systematic perpetuates the problem.
Software defect measurements are frequently
attributed to individual developers, but the development environment
often conspires against individual developers and makes it impossible to
write defect-free code. Instead of charting errors by developer, a
systematic effort to provide developers with immediate testing feedback,
along with a root cause analysis of remaining defects, is much more
effective at reducing the overall software defect rate.
By aggregating defect counts into an
informational measurement, and hiding individual performance
measurements, it becomes easier to address the root causes of defects.
If an entire development team, testers and developers alike, feel
responsible for the defect count, then testers will tend to become
involved earlier and provide more timely and useful feedback to
developers. Defects caused by code integration will become everyone’s
problem, not just the unlucky person who wrote the last bit of code.
It flies in the face of conventional wisdom
to suggest that the most effective way to avoid the pitfalls of
measurements is to use measurements that are outside the personal
control of the individual being measured. But conventional wisdom is
misleading. Instead of making sure that people are measured within
their span of control, it is more effective to measure people one level
above their span of control. This is the best way to encourage
teamwork, collaboration, and global, rather than local, optimization.