Can We Estimate Software Development Time?

Chuck Connell

 

Background

n    This talk is a response to a paper by J.P. Lewis titled Large Limits to Software Estimation in ACM Software Engineering Notes, July 2001.

n    Lewis’s paper generated some good discussion within the software engineering community and on Slashdot.org.

n    My article on this topic appeared on Developer.com and Slashdot in January 2002.

 

Outline

n   Motivation for the topic

n   Summary of Lewis’s paper

n   My critique of Lewis’s paper

 

Why estimating matters

n    Estimating the size of future projects is one of the most important areas of software engineering. (Size = $$, person-days, calendar days, opportunity cost.)

n    Estimates are crucial to go/no-go decisions on every project. (Example: $1M project for this group.)

n    Questions about the ability of SE to provide meaningful estimates call into question all plans for future projects. (And all current projects.)

 

SE status within CS

n    Software engineering’s ability to estimate is part of a broader discussion about the overall validity of SE.

n    If SE cannot make good estimates about future projects, what is the whole field doing?

n    SE is frequently derided by computer scientists as unscientific, non-rigorous, or “just programmer sociology”.

n    I address this broad question in Why Software Engineering Is Not B.S., inspired by a CS professor’s remark.

 

Lewis: what software is

n    The behavior of any program can be thought of as a table of inputs and outputs.

n    There is one row for each possible set of inputs. (Example: 0-3 x 0-3)

n    Assumes finite number of finite inputs.

n    Assumes we can represent human interaction as another finite/finite input.

n    Assumes finite/finite output for each set of inputs.

n    Above are reasonable assumptions for practical purposes.

 

Lewis: what software is

n   Real software would have a very large table.

n   Something like:

u 232 raised to # of numeric inputs, times

u 28 raised to # of text input chars

n   But the table is finite and mechanically constructible.

 

Lewis: software spec as a string

n   A software description is a finite table.

n   Any such table can be transformed into a string.

n   The string represents what we want the software to do.

n   Example: using previous

n   So a software specification is a finite string.

 

Lewis: estimating software tasks

n    Estimating a software task is the same as asking, “How hard is it to write a program to create the spec string?”

n    (We assume a small driver to call program with every input.)

n    Lewis claims that writing a smaller program is easier/faster than writing a larger program.

n    So, Lewis asserts that estimating is equivalent to asking: “What is the smallest program (in bytes) to create spec string?”

 

Lewis: complexity of estimating

n    What is the smallest program to create spec string?

n    By definition, this is identical to asking for the algorithmic complexity of the string (related to Kolmogorov complexity).

n    But, there is a well-known result from complexity theory: There is no program to compute the algorithmic complexity of arbitrary strings.

n    Therefore, there is no rigorous method for estimating software development!

n    Lewis extends this result to estimates with a given percentage accuracy.

 

Summary of Lewis’s paper

n    Software specs are tables / strings.

n    It is faster to write small programs than large programs.

n    Software estimates seek the algorithmic complexity of the specification string. (What is the smallest program to print out the spec table?)

n    But there is no method for finding the algorithmic complexity of arbitrary strings.

n    Therefore, rigorous/mechanical software estimates are not possible, even allowing % error.

 

My critique

n    Lewis is “correct”, given the definitions he uses and the technical results he obtains, but…

n    He mischaracterizes the programming process.

n    He fails to show a relationship between algorithmic complexity and software development times.

n    He mischaracterizes promises made by well-known development frameworks, especially the Software Capability Maturity Model (CMM).

 

Programming process

n    Do programmers try to write the shortest possible program?

n    Maybe, sometimes.

n    But modern software engineering teaches that other aspects of software are more important: readability, modifiability, scalability, speed…

n    The shortest program may not have these qualities.

n    So programmers may not be trying to do something directly related to the AC of the spec table.

 

Estimation problem

n    Assume Lewis’s description of software algorithmic complexity (AC) is well-defined…

n    Does it take longer for people to write a program for a high AC task than for a lower AC task?

n    The answer to the above must be YES, or AC is unrelated to the software estimation problem.

n    But we don’t know the answer to this question, a priori.

 

Estimation problem

n    Does it take longer for people to write a program for a high AC task than for a lower AC task?

n    This is an empirical question for psychology.

n    Maybe there are tricky low-AC tasks that are hard (long) to program…

n    Maybe there are high AC tasks that a human can grasp quickly…

n    So programming estimates may not be directly related to differences in AC.

 

Claims about estimating

n    Lewis attacks a straw man: objective, mechanical estimating procedures with guaranteed accuracy.

n    It is true that some writers (or book marketers) have appeared to offer such methods, but no serious researchers are trying to achieve this.

n    Software engineering researchers are really looking for an estimating method that is somewhat accurate, most of the time.

 

Claims about estimating

n    Imagine an estimating technique that is 80% right 80% of the time.

n    For the other 20%, the estimate is worthless.

n    Everyone in the software business would be leaping with joy!

n    Such average-case algorithms are well-studied in computer science, and have been used for prime number tests and solving simultaneous equations.

n    Lewis’s mathematical arguments do not disallow such a solution, because of the unbounded errors.

 

Claims about estimating

n    In particular, Lewis claims that CMM promises guaranteed accuracy of software estimates, and provides a quote from CMM to show this.

n    In fact, CMM promises no such thing; the quote is taken out of context.

n    CMM does predict better estimates on average the more closely one follow CMM guidelines. (A claim which may or may not be true.)

n    Lewis is not alone in this error; because CMM is so dominant, it is often perceived to be more rigid than it is.

 

How should we estimate software  projects?

n    1. Expert judgment based on experience with the project particulars. (Example from Lotus NLM.)

n    2. Iterative development, with iterative estimates (e.g. study, POC, prototype, R1, R2, etc.)

n    Iterative development accepts the fact that large, complex projects cannot be estimated up front. Therefore don’t do large, complex projects all at once.

n    See Rapid Development by Steve McConnell.

 

Questions ?