The software that end users are creating... is riddled with errors.
Burnett & Myers (2014)
Your spreadsheets may be disasters in the making.
Caulkins, Morrison, & Weidemann (2006)
...few incidents of spreadsheet errors are made public and these are usually not revealed by choice.
Kruck & Sheetz (2001)
Research on spreadsheet errors is substantial, compelling, and unanimous.
Panko (2015)
Despite overwhelming and unanimous evidence... companies have continued to ignore spreadsheet error risks.
Panko (2014)
People tend to believe their spreadsheets are more accurate than they really are.
Caulkins, Morrison, & Weidemann (2006)
Never assume a spreadsheet is right, even your own.
Raffensperger (2001)
A lot of decisions are being made on the basis of some bad numbers.
Ross (1996)
Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment.
Abreu, et al (2015)
Spreadsheets are often hard, if not impossible, to understand.
Mireault & Gresham (2015)
Spreadsheet development must embrace extensive testing in order to be taken seriously as a profession.
Bock (2016)
Spreadsheets are dangerous to their authors and others.
Durusau & Hunting (2015)
Studies have shown that there is a high incidence of errors in spreadsheets.
Csernoch & Biro (2013)
Developing an error-free spreadsheet has been a problem since the beginning of end-user computing.
Mireault (2015)
Spreadsheet errors are pervasive, stubborn, ubiquitous and complex.
Irons (2003)
Spreadsheet shortcomings can significantly hamper an organization's business operation.
Reschenhofer & Matthes (2015)
Spreadsheets are more fault-prone than other software.
Kulesz & Ostberg (2013)
The quality and reliability of spreadsheets is known to be poor.
Bishop & McDaid (2007)
1% of all formulas in operational spreadsheets are in error.
Powell, Baker, & Lawson (2009)
Spreadsheets are notoriously error-prone.
Cunha, et al (2011)
Overconfidence is one of the most substantial causes of spreadsheet errors.
Sakal, et al (2015)
Most executives do not really check or verify the accuracy or validity of [their] spreadsheets...
Teo & Tan (1999)
Spreadsheets... pose a greater threat to your business than almost anything you can imagine.
Howard (2005)
60% of large companies feel 'Spreadsheet Hell' describes their reliance on spreadsheets.
Murphy (2007)
The issue is not whether there is an error but how many errors there are and how serious they are.
Panko (2007)
Spreadsheets are alarmingly error-prone to write.
Paine (2001)
Every study that has looked for errors has found them... in considerable abundance.
Panko & Halverson (1996)
Spreadsheets are extraordinarily and unacceptably prone to error.
Dunn (2010)
Programmers exhibit unwarranted confidence in the correctness of their spreadsheets.
Krishna, et al (2001)
It is now widely accepted that errors in spreadsheets are both common and potentially dangerous.
Nixon & O'Hara (2010)
Even obvious, elementary errors in very simple, clearly documented spreadsheets are... difficult to find.
Galletta, et al (1993)
Most large spreadsheets have dozens or even hundreds of errors.
Panko & Ordway (2005)
Spreadsheets are the most popular live programming environments, but they are also notoriously fault-prone.
Hermans & van der Storm (2015)
Spreadsheets are easy to use and very hard to check.
Chen & Chan (2000)
Spreadsheets contain errors at an alarmingly high rate.
Abraham, et al (2005)
Spreadsheets can be viewed as a highly flexible programming environment for end users.
Abreu, et al (2015)
Every study, without exception, has found error rates much higher than organizations would wish to tolerate.
Panko (1999)
Untested spreadsheets are riddled with errors.
Miller (2005)
Errors in spreadsheets are as ubiquitous as spreadsheets themselves.
Colbenz (2005)
The untested spreadsheet is as dangerous and untrustworthy as an untested program.
Price (2006)
The results given by spreadsheets are often just wrong.
Sajaniemi (1998)
Spreadsheets are commonly used and commonly flawed.
Caulkins, Morrison, & Weidemann (2008)
Errors in spreadsheets... result in incorrect decisions being made and significant losses incurred.
Beaman, et al (2005)
A significant proportion of spreadsheets have severe quality problems.
Ayalew (2007)
94% of the 88 spreadsheets audited in 7 studies have contained errors.
Panko (2008)
Spreadsheet errors have resulted in huge financial losses.
Abraham & Erwig (2007)
Spreadsheet errors... a great, often unrecognised, risk to corporate decision making & financial integrity.
Chadwick (2002)
Spreadsheet errors are still the rule rather than the exception.
Nixon & O'Hara (2010)
It is irrational to expect large error-free spreadsheets.
Panko (2013)
Spreadsheets have a notoriously high number of faults.
Rust, et al (2006)

Calculation cascade: A common cause of catastrophe

People are remarkably accurate when doing a wide range of activities, including working with spreadsheets. Even so, the bottom line result of every spreadsheet is almost certainly wrong.

For example, if you have a spreadsheet consisting of 100 used cells and a moderate 3% probability that each cell contains an error, then the probability that the spreadsheet's results are wrong is 95%.

This article explores how and why the cascading structure of spreadsheets leads to a high error rate, and then we consider what we can do about it.

A trivial spreadsheet

Sales comparison example
Trivial sales spreadsheet - with an error
This spreadsheet is intended to calculate the change in sales for this year so far compared with last year.

Suppose that you want to know how much your sales have changed so far this year compared with the same months last year.

To answer this question, you build a trivial spreadsheet that collates the sales figures for January to June of the years 2015 and 2016, sums over the months of each year, and then calculates the change between the years.

A screenshot of the spreadsheet is shown to the right. Note that this example is available for download at the bottom of this page.

The spreadsheet shows that January to June sales increased from 86 in 2015 to 98 in 2016, a change of 12. That is a good result, so you are pleased.

Unfortunately, the spreadsheet is wrong.

Before proceeding, can you see the error?

Spot the error

The problem with our trivial sales comparison spreadsheet is that the 2015 total is calculated as =SUM(B4:B8). That is, the 2015 total omits the June figure, while the 2016 total includes all the months. If the 2015 total included June too, then the change in sales would be zero — a very different result.

This trivial example includes a simple error that cascades down to the bottom line result (the change in sales). The error is not obvious because the spreadsheet displays reasonable looking values in each cell. Only when we carefully check the data and the formulae does the error become apparent.

Excel does give us a hint that something might be wrong, by adding a small green triangle to cell B10. If we select that cell, then Excel displays a warning: "The formula in this cell refers to a range that has additional numbers adjacent to it." However, the heading numbers 2015 and 2016 are considered to be adjacent numbers, so the warning also appears in cell C10 even though that formula is correct. Therefore, the warning doesn't really help in this example.

Essentially the problem here is that the formulae are hidden behind the displayed results. If the results look reasonable, then we tend to believe them.

We're 95% to 99% accurate when doing a wide range of activities

People are remarkably accurate, so it seems sensible to believe results that look reasonable. Across a wide range of non-trivial cognitive activities — such as reading a number from a chart, writing a line of computer programming code, and dialling a 10 digit number — we get things right typically between 95% and 99% of the time.

On the rare occasions when we do make an error, it is likely to be minor. Often we don't even notice that we made an error. But when we do notice an error, we usually correct it immediately. As a consequence, we tend to be very confident about our accuracy and about our ability to detect and correct the few errors that we do make.

Building a spreadsheet is no different

In a spreadsheet, the basic unit of activity is the cell. That's where we enter data or write a formula. Research has shown that when working with spreadsheet cells we are, again, between 95% and 99% accurate (after correcting errors that we notice immediately).

This means that we have a "cell error rate" (CER) of 1% to 5%. This error rate is consistent with other activities, and it seems low, so we are generally confident about the accuracy of our spreadsheets.

Beware the calculation cascade effect

But our confidence is misplaced. Given how spreadsheets work, a cell error rate of only 1% to 5% is very likely — or in larger spreadsheets, almost certain — to lead to incorrect bottom line results.

To see why this is, consider that a typical spreadsheet consists of a cascade of calculations:

  • We start with some data cells containing values.
  • Then we have some formulae that refer to the data cells.
  • There may be further formulae cells that refer to previous formulae cells.
  • Finally, we have more formulae that refer to previous cells, calculating the results we want.

If any of the data or formulae cells contains an error, then that error will cascade down to subsequent cells and through to the results. This happens because the cells are linked in a series of calculations.

The errors we make in spreadsheets are often minor, like referencing the wrong cell, typing a number incorrectly, or inserting a row without adjusting formulae that reference that range. The issue is that for a bottom line result to be correct it depends on all previous calculations being correct.

This situation is different to, for example, making an error in a document. A typographical error may change the meaning of a word or a sentence, but it is unlikely to change the meaning of the entire document because subsequent words are not linked in a cascade like they are in a spreadsheet.

Modelling the calculation cascade

Given the calculation cascade effect, how likely is it that a spreadsheet's bottom line result is correct? To find out, let's develop a simple model. We can assume that:

  • The data or formula in each cell has a small probability of containing an error.
  • The probability of an error in a cell is independent of all other cells.
  • All cells are linked to other cells, so that errors cascade through to the bottom line results.

Of course, in building a spreadsheet the independence assumption may not be true. For example, we might write a formula in a cell and then copy it to multiple cells. Consequently, we often have groups of similar cells. But we can also make errors in copying cells, such as having an incorrect relative reference, or failing to copy to all relevant cells. Cell independence is a useful approximation for most spreadsheets, so we'll use it here as an assumption.

Given our assumptions, the probability that the bottom line results contain at least one error can be modelled as follows:

  • Let's define p to be the probability of an error in a cell (i.e. the Cell Error Rate).
  • Therefore, the probability that a cell does not contain an error is 1-p.
  • Suppose that there are n cells used in the spreadsheet.
  • Let's also define k to be the number of used cells that contain an error.
  • The probability that none of the used cells contains an error is P(k = 0) = (1-p)n.
  • Finally, the probability that the spreadsheet contains at least one error is P(k >= 1) = 1 - (1-p)n.

Cascade model results

For various cell error rates (0% to 5%) and number of used cells (15, 100, and 1000), the following chart shows the probability that a spreadsheet contains at least one incorrect cell.

Our sales spreadsheet example above contains 15 used cells — twelve for data and three formulae. If we have a moderate 3% probability of making an error in any cell, then the probability that the resulting change in sales calculation is wrong is 37%. That is, even with this trivial spreadsheet, there’s only a 63% probability that the result is correct.

If we have a somewhat larger, though still small, spreadsheet consisting of 100 used cells and a moderate cell error rate of 3%, then the probability that the spreadsheet's results are wrong is 95%. That is, even a small spreadsheet is almost certain to be wrong. Such errors might be minor, but they could also be catastrophic.

This is a surprising result, so it is worth contemplating for a moment. Even though we are remarkably accurate, only rarely making errors, the linked cascading structure of a spreadsheet means that it is almost certain that the results are wrong.

How many errors is my spreadsheet likely to contain?

Given our simple model of errors, we can represent the probability distribution of the number of errors using a Binomial distribution.

For example, for a cell error rate of 3% and 100 used cells, the following chart shows the probability distribution for the number of cells (k) that contain an error.

In our spreadsheet of only 100 used cells, there are most likely 2 or 3 errors — though there is a 35% probably of there being 4 or more errors.

Note that the value for zero error cells equals the (1-p)n value that we derived previously — in this example, (1-0.03)100 = 4.8%, i.e. about 5% probability that the spreadsheet is correct (or, equivalently, a 95% probability that the spreadsheet is wrong).

Download the spreadsheet at the bottom of this page to see how these calculations are implemented, along with additional explanatory notes.

Are spreadsheet errors that common in practice?

Research shows that almost all spreadsheets contain errors. The spreadsheet error rate found by academic studies is similar to our simple error model, at about 95%. So, the short answer is yes, spreadsheet errors are that common in practice.

One leading researcher concluded that "Every study, without exception, has found error rates much higher than organizations would wish to tolerate." In other words, spreadsheet errors are a real and substantial problem.

To reduce errors: Use good practices and test your spreadsheets

Given that even a small spreadsheet is likely to contain errors, what can we do?

In creating software, programmers have learnt that they need to follow good development practices and do extensive testing. Typically, 30% to 60% of the total software development effort goes into detecting and correcting errors. Good program testing practice includes inspecting every line of code and testing modules to ensure that they behave as expected.

In contrast, spreadsheet builders typically put little or no effort into testing. Provided the spreadsheet produces a result that looks reasonable, we assume that it is correct. The outcome is, unlike computer software, spreadsheets are almost always essentially untested.

There are basically two things that we can do to reduce spreadsheet errors:

  • Use good spreadsheet development practices to help reduce the occurrence of errors and make them easier to find.
  • Learn from our software programming colleagues and thoroughly test our spreadsheets. That means inspecting every cell to look for errors and testing each part of the spreadsheet to ensure that it behaves as expected.

Further reading

There is an extensive academic and practitioner literature dedicated to improving the quality of spreadsheets. As a starting point for learning more, have a look at the following papers.

Spreadsheet errors:

Spreadsheet development good practice:

Spreadsheet testing:

Download the Sales example and error models spreadsheet

Download the spreadsheet used in this article, including the Sales example and the error cascade models: ErrorCascade.xlsx.

If you have any comments about this article, then please contact us.

Go to top