in Data Engineering

Software Estimations Using Reference Class Forecasting

18 years ago I’m sitting in my cubicle doing Java programming, and my tech lead comes up to me to chat about my next project. We discuss the details, and then she asks me the dreaded questions programmers fear which is “how long will it take?”. I stumble with some guestimate based off my limited experience and she goes along her merry way and plugs the number into a gantt chart.

Even with the emergence with the agile manifesto, and now the current paradigms of using 1-2 week sprints to plan projects, business and customers still are asking technologists to provide how long a project will take.

The unfortunate thing about agile is that even though it is an ideal way to run a project, financial models rarely follow that methodology. Meaning, most statement of works are written with a time estimate on a project. There are some exceptions to the rule where some customers pay for work 2 weeks at a time, but it is pretty rare.

Throughout my technical career, I have rarely seen any formalized software estimation models emerge that we all use, so I was surprised when I was reading How Big Things Get Done, a mention about software project estimation. The beginning chapters talked about the challenges and successes of large architectural projects ranging from the Sydney Opera House (problematic project) all the way to the Guggenheim in Bilbao (amazingly under budget).

The book proposes using reference class forecasting which asks you to

  1. Get software estimates of all similar projects perform in the past in your organization with your current project
  2. Take the mean value
  3. Use that as an anchor

For example, if I was doing an application modernization of Hadoop to EMR and I had no idea how long it would take, I would try to get references to other projects of similar complexity. Let’s say I had data of 10 previous projects and the mean came out to 6 months. Then 6 months would be your anchor point.

The book does immediately point out that the biggest problem isn’t this approach, it is obtaining the historical data of how long previous projects took. Think about it this way, out of all the projects you have ever estimated, have you compared the actuals to your forecast? I bet you, most of us haven’t done these retros at all.

Some take aways for me is:

  1. If you are in a large organization and you have done multiple projects, take the time to do a retro on projects you have done and store in a spreadsheet what project you have done, the tasks, complexities, and the actual time it took to finish. Unfortunately large companies have this valuable data but don’t go through the exercises to calculate this. With this, some rudimentary reference class forecasting can start to be used instead of subjective software estimations.
  2. If you are a small organization or don’t have a history of projects and don’t have any reference point, then unfortunately I just think you are out of luck.

At the end of the day, I think industry needs to get better at software estimation, and the only way is to develop some type of methodology and refine it over time.

Write a Comment

Comment