# Can We Estimate Software Development
Time?

## Chuck Connell

#

# Background

## n
This
talk is a response to a paper by J.P. Lewis titled *Large Limits to Software Estimation* in ACM Software Engineering Notes,
July 2001.

## n
Lewis’s
paper generated some good discussion within the software engineering community
and on Slashdot.org.

## n
My article on this topic appeared on Developer.com and Slashdot in
January 2002.

#

# Outline

## n Motivation for the topic

## n Summary of Lewis’s paper

## n My critique of Lewis’s paper

#

# Why estimating matters

## n
Estimating
the size of future projects is one of the most important areas of software
engineering. (Size = $$, person-days, calendar days, opportunity cost.)

## n
Estimates
are crucial to go/no-go decisions on every project. (Example: $1M project for
this group.)

## n
Questions
about the ability of SE to provide meaningful estimates call into question all
plans for future projects. (And all current projects.)

#

# SE status within CS

## n
Software
engineering’s ability to estimate is part of a broader discussion about the
overall validity of SE.

## n
If SE
cannot make good estimates about future projects, what is the whole field
doing?

## n
SE is
frequently derided by computer scientists as unscientific, non-rigorous, or
“just programmer sociology”.

#

# Lewis: what software is

## n
The
behavior of any program can be thought of as a table of inputs and outputs.

## n
There
is one row for each possible set of inputs. (Example: 0-3 x 0-3)

## n
Assumes
finite number of finite inputs.

## n
Assumes
we can represent human interaction as another finite/finite input.

## n
Assumes
finite/finite output for each set of inputs.

## n
Above
are reasonable assumptions for practical purposes.

#

# Lewis: what software is

## n Real software would have a *very* large table.

## n Something like:

### u 2^{32} raised to # of numeric inputs, times

### u 2^{8} raised to # of text input chars

## n But the table is finite and mechanically
constructible.

##

# Lewis: software spec as a string

## n A software description is a finite table.

## n Any such table can be transformed into a string.

## n The string represents what we want the software to do.

## n Example: using previous

## n So a software specification is a finite string.

#

# Lewis: estimating software tasks

## n
Estimating
a software task is the same as asking, “How hard is it to write a program to
create the spec string?”

## n
(We
assume a small driver to call program with every input.)

## n
Lewis
claims that writing a smaller program is easier/faster than writing a larger
program.

## n
So,
Lewis asserts that estimating is equivalent to asking: “What is the smallest
program (in bytes) to create spec string?”

#

# Lewis: complexity of estimating

## n
What is
the smallest program to create spec string?

## n
By
definition, this is identical to asking for the *algorithmic complexity* of the string (related to Kolmogorov
complexity).

## n
But,
there is a well-known result from complexity theory: There is no program to compute the
algorithmic complexity of arbitrary strings.

## n
Therefore,
there is no rigorous method for estimating software development!

## n
Lewis
extends this result to estimates with a given percentage accuracy.

#

# Summary of Lewis’s paper

## n
Software
specs are tables / strings.

## n
It is
faster to write small programs than large programs.

## n
Software
estimates seek the algorithmic complexity of the specification string. (What is
the smallest program to print out the spec table?)

## n
But
there is no method for finding the algorithmic complexity of arbitrary strings.

## n
Therefore,
rigorous/mechanical software estimates are not possible, even allowing % error.

#

# My critique

## n
Lewis
is “correct”, given the definitions he uses and the technical results he
obtains, but…

## n
He
mischaracterizes the programming process.

## n
He
fails to show a relationship between algorithmic complexity and software
development times.

## n
He
mischaracterizes promises made by well-known development frameworks, especially
the Software Capability Maturity Model (CMM).

#

# Programming process

## n
Do
programmers try to write the shortest possible program?

## n
Maybe,
sometimes.

## n
But
modern software engineering teaches that other aspects of software are more
important: readability, modifiability, scalability, speed…

## n
The
shortest program may not have these qualities.

## n
So
programmers may not be trying to do something directly related to the AC of the
spec table.

#

# Estimation problem

## n
Assume
Lewis’s description of software algorithmic complexity (AC) is well-defined…

## n
*Does
it take longer for people to write a program for a high AC task than for a
lower AC task?*

## n
The
answer to the above must be *YES, *or AC is unrelated to the software
estimation problem.

## n
But we
don’t know the answer to this question, a priori.

#

# Estimation problem

## n
*Does
it take longer for people to write a program for a high AC task than for a
lower AC task?*

## n
This is
an empirical question for psychology.

## n
Maybe
there are tricky low-AC tasks that are hard (long) to program…

## n
Maybe
there are high AC tasks that a human can grasp quickly…

## n
So
programming estimates may not be directly related to differences in AC.

##

# Claims about estimating

## n
Lewis
attacks a straw man: objective, mechanical estimating procedures with
guaranteed accuracy.

## n
It is
true that some writers (or book marketers) have appeared to offer such methods,
but no serious researchers are trying to achieve this.

## n
Software
engineering researchers are really looking for an estimating method that is
somewhat accurate, most of the time.

#

# Claims about estimating

## n
Imagine
an estimating technique that is 80% right 80% of the time.

## n
For the
other 20%, the estimate is worthless.

## n
Everyone
in the software business would be leaping with joy!

## n
Such *average-case algorithms* are well-studied in computer
science, and have been used for prime number tests and solving simultaneous
equations.

## n
Lewis’s
mathematical arguments do not disallow such a solution, because of the
unbounded errors.

#

# Claims about estimating

## n
In
particular, Lewis claims that CMM promises guaranteed accuracy of software
estimates, and provides a quote from CMM to show this.

## n
In
fact, CMM promises no such thing; the quote is taken out of context.

## n
CMM
does predict better estimates *on average* the more closely one follow CMM
guidelines. (A claim which may or may not be true.)

## n
Lewis
is not alone in this error; because CMM is so dominant, it is often perceived
to be more rigid than it is.

##

# How should we estimate software
projects?

## n
1.
Expert judgment based on experience with the project particulars. (Example from
Lotus NLM.)

## n
2.
Iterative development, with iterative estimates (e.g. study, POC, prototype,
R1, R2, etc.)

## n
Iterative
development accepts the fact that large, complex projects cannot be estimated
up front. Therefore don’t do large, complex projects all at once.

#

# Questions ?