
“I used to have
everyone I needed in my business unit,” she told me. “But then we
reorganized in the interest of efficiency, and all of my software people
were moved into a central group. Before the reorganization, I had a
great software architect who helped design all of our new products.
Today we’re supposed to figure out exactly what we want before we get a
project approved. Then the IT people assigned to my projects don’t
know anything about my business, so the team spins its wheels for a
long time before it gets traction. Efficiency? What a joke.”
A couple years
earlier, this rapidly growing young company (call it XYZ) saw its
revenue flattening and had regulators questioning its processes. In an
attempt to impose discipline and cut costs, XYZ centralized software
development. Executives felt that this would give them more visibility
into the development portfolio and ensure that a standard process was
used throughout the organization. They also hoped to even out the
software development workload, increase utilization of resources, and
eventually be able to outsource a good deal of development.
The problem was,
XYZ’s products were largely based on software. When the software
developers left the business units, the company's track record for
fielding successful new products plummeted. Time-to-market stretched
out for even the most important initiatives, and market share dropped.
What went wrong?
People or Resources?
Company XYZ did very
well when new products were designed and implemented inside of a
business unit. When a key part of the product development team, those
developing the software, were removed from the business unit, many
important features of product development were lost. For example,
funding was no longer incremental, based on stages and gates. Complete
funding for product development had to be justified and allocated in
order to get software development resources assigned. In turn, a
complete specification and estimate of spending was required before work
could begin. Software architects were no longer involved at the fuzzy
front end of new product development, and the discovery loops that used
to be part of product development were no longer acceptable.
Even as feedback
loops were removed from product development, they became more important
than ever, because the software development people on the product team
were usually not familiar with the business, and worse, they were not
even familiar with each other. In places where tacit knowledge and team
cohesiveness are important, treating people as interchangeable resources
just doesn’t work.
Scheduling
Prior to the
reorganization, when software people were embedded in divisions, some
people were regularly assigned to several projects at a time, while
others appeared to be less than fully utilized. The company decided
that it would be more effective to assign individuals to only one
project at a time, and if possible, the projects should be small.
Unfortunately, this created a scheduling nightmare, so XYZ invested in a
computerized resource scheduling system to help sort out the complex
resource assignments.
Computerized
scheduling systems have a well known problem: they do not accommodate
variation. XYZ discovered that if projects didn’t end when they were
scheduled to end, the system’s assumptions were invalid, so the system’s
resource assignments were often out of touch with reality. The company
tried to fix such problems by keeping some teams intact and by holding
weekly management meetings to arbitrate the conflicts between the
computer’s schedule and reality. In practice, the overhead of
management intervention and idle workers waiting for teams to assemble
outweighed any efficiencies the system generated.
Company XYZ also tried
to reduce the variability of project completion by urging teams to make
reliable estimates and rewarding project managers who delivered on
schedule. Unfortunately, such attempts to reduce variability generally
don’t work. The reason for this becomes clear with a quick look at the
theory of variation.
Variation
W. Edwards Deming[1]
first popularized the theory of variation, which is now a cornerstone of
Six Sigma programs. Deming taught that there are two kinds of
variation: common variation and special variation. Common variation is
inherent in the system, and special variation is something that can be
discovered and corrected. Common variation can be measured and control
charts can be used to keep the system within the predicted tolerances.
But it is not possible for even the most dedicated workers to reduce
common variation; the only way to reduce common variation is to change
the system. And here’s the important point: Deming felt that most
variation, (95%+)[2]
is common variation, especially in systems where people are involved.
The other kind of
variation is special variation, which is variation that can be
attributed to a cause. Once the cause is determined, action can be
taken to remove it. But there is danger here: “tampering” is taking
action to remove common variation based on the mistaken belief that it
is special variation. Deming insisted that tampering creates more
problems that it fixes.
In summary: The
overwhelming majority of variation is inherent in a system, and trying
to remove that variation without changing the system only makes things
worse. We can assume that most of the variation in project completion
dates is common variation, but since computerized scheduling systems are
deterministic, they can’t really deal with any variation. The
bottom line: a computerized scheduling system will almost never work at
the level of detail that XYZ was trying to use it. Exhorting workers to
estimate more carefully and project mangers to be more diligent in
meeting deadlines is not going to remove variation from projects. We
need to change the rules of the game.
We know that
estimates for large systems and for distant timeframes have a wide
margin of uncertainty, made wider if the development team is an unknown.
We should stop trying to change that; it is inherent in the system. If
we want reliable estimates, we need to reduce the size of the work
package being estimated and limit the estimate to the near future.
Furthermore, estimates will be more accurate if the team implementing
the system already exists, is familiar with the domain and technology,
makes its own estimates covering a short period of time, and updates
these estimates based on feedback. The good news is, once such a team
establishes a track record, its variability can be measured and
predicted.
Utilization
Unfortunately,
Company XYZ believed that efficiency would be improved by increasing
resource utilization. Trying to maximize utilization can have serious
unexpected side effects, not the least of which is decreased efficiency
and reduced utilization. If this seems odd, think about how efficient
our highways are during rush hour. Most systems behave like traffic
systems; as utilization of resources passes a critical point, non-linear
effects take over, and everything slows to a crawl. Even the most
brilliant scheduling system cannot prevent delays if you insist
on 100% utilization.
When a computer
operations manager looks at the utilization history of her equipment,
she would never say: “Look at that – we’re only using 80% of our server
capacity and 85% of our SAN’s. Let’s use them more efficiently!” She
knows that such high utilization is a warning that the systems are
operating on the edge of their capacity, and even now response times are
probably slowing down.
But when a
development manager takes a look at the utilization history of his
department, he will often say: “Look at that – we are only using 95% of
our available hours. We have enough free time to add another project!”
At the same time, he is probably asking himself, “I wonder why I’m
getting all these complaints about our response time?” And all too often
his solution is, “We’ll just have to set more aggressive deadlines.”
Response Time
Consider the release
manager of a software product. Assume she has service level agreement
which calls for critical defects to be found and patched in four hours,
serious defects to be found and patched in 48 hours, and normal defects
to be fixed in the next monthly release. You can be sure that her
primary measurement is response time and she adjusts staffing until the
service level is achieved. Because of this, there will always be people
available to attack defects, and occasionally people may have a bit of
spare time.
In one of my
classes, two teams did value stream maps for almost the same problem –
deliver on a feature request which would take about 2 hours of coding.
One team documented an average response time of 9 hours to deployment,
the other team documented an average response time of 32 days. In the
first case, the policy was: “When a request is approved, there will
always be someone available to work on it.” In the second case, the
request got stuck twice in two-week-long queues waiting for resources.
The interesting thing is that the first organization actually did more
work with fewer people, because they did not have to manage queues,
customer queries, change requests and the like. They were more
efficient despite, or perhaps because of, a focus on response time
rather than resource utilization.
The bottom line is
that managing response time, or time-to-market, is more efficient and
more profitable than managing utilization. You need some slack to keep
development and innovation flowing. As any good
operations manager already knows, when work flows rapidly and reliably
through an organization, its efficiency and utilization will be higher
than in a organization jammed up with too much work.
Rules of the Game
Queuing theory gives
us six rules for reducing software development cycle time:
-
Limit work to
capacity
-
Even out the
arrival of work
-
Minimize the
number of Things-in-Process
-
Minimize the
size of the Things-in-Process
-
Establish a
regular cadence
-
Use pull
scheduling
1. Limit work to
capacity
The biggest favor
you can do for your organization is not to accept any more work than it
can handle. Of course, to do this, you have to know the capacity of
your organization. One way to estimate the capacity is to look at
output. If you currently complete one large system a year, deliver
about three services a quarter, and respond to about seven change
requests per week, this is a rough approximation of your capacity, and a
good limit on the amount of new work you should accept.
Next you might
calculate how much work you have already accepted. In one of my
classes, an executive did the math and discovered that he had seven
years worth of work in a queue that was reviewed every week. He
decided that he could toss out all but a few months of work; the rest
would never get done, but it was consuming a lot of time.
2. Even out the
arrival of work
At Company XYZ, one
of the scheduling headaches was caused by a huge workload during the
first six months of the year, and a relatively low demand for the second
half of the year. At first this puzzled me, because the company’s
business was not seasonal, so there seemed to be no reason for the
uneven demand. I suspected that there was a sub-optimizing measurement
somewhere that might be the cause. When I asked if the annual budgeting
cycle or executive performance measurement system might be driving
uneven demand, my suspicions were confirmed. I recommended that the
organization work to change the measurement system, rather than
accommodate it.
3. Minimize the
number of Things-in-Process
One of the basic
laws of queuing theory is Little’s Law[3]:

According to
Little’s Law there are two ways to improve response time: you can spend
money to improve the Average Completion Rate, or you can apply
intellectual fortitude to reduce the number of Things-in-Process. For
example, assume you can respond to about six feature requests per
month. If twelve requests are released for work, they will take an
average of two months to complete. If, however, only three requests are
released at a time, it will take an average of two weeks to
respond to a feature request.
4. Minimize the
size of the Things-in-Process
We’ve already noted
the effect of high utilization on cycle time; we should also note that
as batch size increases the effect is much more pronounced.[4]
This is shown in the graph:

So if you want high
utilization, you should develop in very small batches. For example, you
will get much faster throughput and higher utilization if you develop
ten services one at a time, rather than developing all ten at the same
time.
5. Establish a
regular cadence
In a lean factory,
every process is runs at a regular cadence called ‘tact time.’ If you
want to produce 80 cars in 8 hours, you produce 10 cars per hour, so one
car rolls of the line every 6 minutes. In software development the
recommended practice is to establish an iteration cadence of perhaps two
weeks or a month, and deliver small batches of deployment-ready software
every iteration.
A regular cadence,
or ‘heartbeat,’ establishes the capability of a team to reliably deliver
working software at a dependable velocity. An organization that
delivers at a regular cadence has established its process capability and
can easily measure its capacity.
A regular cadence
also gives inter-dependent teams synchronization points that they can
depend on. Synchronization points are good places to get customer
feedback, they are useful for coordinating the work across multiple
feature teams, and they can help decouple hardware development from
software development.
6. Use pull
scheduling
Once both batch and
queue size have been reduced and a cadence has been established, pull
scheduling is the best method to compensate for variation and limit work
to capacity. At the beginning of an iteration, the team ‘pulls’ work
from a prioritized queue. They pull only the amount of work that they
have demonstrated they can complete in an iteration. When a team is
first formed or the project is new, it may take a couple of iterations
for the team to establish its ‘velocity’ (the amount of work it can
complete in an iteration). But once the team hits its stride, it can
reliably estimate how much work can be done in an iteration and that is
the amount of work it pulls from the queue.
There are other
points where queues might be established: there could be a queue
of proposed work that needs a ROM (Rough Order of Magnitude estimate).
There may be a queue of work for a preliminary architecture assessment.
(See figure below.) Note that these queues should be short, and a
team should not pull work from a queue until it has available time to do
the work.

A pull system
assures that everyone always has something to do (unless a queue is
empty), but no one is overloaded. The development process is managed by
managing the queues. Management intervention is accomplished by
changing the priority or contents of the queues. The cadence should be
fast enough that changes can wait until the next iteration; in which
case, changes are accommodated at the cadence of the process.
Conclusion
Development teams
can do a lot to control their own destiny. They can make sure they have
the right information, the necessary skills, and the appropriate
processes to do a good job. But some things that impact the performance
of the development team are outside of their control. Managing the
pipeline is one of those things. If a development organization is
swamped with work, no amount of good intentions or good process can
overcome the laws of physics. If deterministic rules are applied to an
inherently variable system, no amount of exhortation, reward, or
punishment can make the system work. When the rules of the game have to
change, the six rules for reducing cycle time are a good place to
start.

[3] Usually the
numerator is WIP (Work-in-Process). The term Things-in-Process
comes from Michael L. George and Stephen A. Wilson,
Conquering Complexity in Your Business, McGraw-Hill,
2004, p.37.
[4] See Factory
Physics by Wallace Hopp and Mark Spearman, McGraw-Hill,
2000
Disclaimer:
XYZ Company is not a real company, it is an amalgamation of
companies I have worked with.