Most large spreadsheets have dozens or even hundreds of errors.
Panko & Ordway (2005)
Spreadsheet errors are still the rule rather than the exception.
Nixon & O'Hara (2010)
Spreadsheets are extraordinarily and unacceptably prone to error.
Dunn (2010)
Spreadsheets are alarmingly error-prone to write.
Paine (2001)
Spreadsheets are commonly used and commonly flawed.
Caulkins, Morrison, & Weidemann (2008)
Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment.
Abreu, et al (2015)
Programmers exhibit unwarranted confidence in the correctness of their spreadsheets.
Krishna, et al (2001)
Spreadsheets are notoriously error-prone.
Cunha, et al (2011)
People tend to believe their spreadsheets are more accurate than they really are.
Caulkins, Morrison, & Weidemann (2006)
It is irrational to expect large error-free spreadsheets.
Panko (2013)
Research on spreadsheet errors is substantial, compelling, and unanimous.
Panko (2015)
Studies have shown that there is a high incidence of errors in spreadsheets.
Csernoch & Biro (2013)
Developing an error-free spreadsheet has been a problem since the beginning of end-user computing.
Mireault (2015)
The untested spreadsheet is as dangerous and untrustworthy as an untested program.
Price (2006)
Every study, without exception, has found error rates much higher than organizations would wish to tolerate.
Panko (1999)
Most executives do not really check or verify the accuracy or validity of [their] spreadsheets...
Teo & Tan (1999)
Spreadsheets are the most popular live programming environments, but they are also notoriously fault-prone.
Hermans & van der Storm (2015)
Spreadsheets are easy to use and very hard to check.
Chen & Chan (2000)
Spreadsheets are dangerous to their authors and others.
Durusau & Hunting (2015)
Spreadsheet development must embrace extensive testing in order to be taken seriously as a profession.
Bock (2016)
Every study that has looked for errors has found them... in considerable abundance.
Panko & Halverson (1996)
The software that end users are creating... is riddled with errors.
Burnett & Myers (2014)
A lot of decisions are being made on the basis of some bad numbers.
Ross (1996)
A significant proportion of spreadsheets have severe quality problems.
Ayalew (2007)
Errors in spreadsheets... result in incorrect decisions being made and significant losses incurred.
Beaman, et al (2005)
60% of large companies feel 'Spreadsheet Hell' describes their reliance on spreadsheets.
Murphy (2007)
Even obvious, elementary errors in very simple, clearly documented spreadsheets are... difficult to find.
Galletta, et al (1993)
Spreadsheets can be viewed as a highly flexible programming environment for end users.
Abreu, et al (2015)
Untested spreadsheets are riddled with errors.
Miller (2005)
Spreadsheet errors... a great, often unrecognised, risk to corporate decision making & financial integrity.
Chadwick (2002)
Spreadsheet shortcomings can significantly hamper an organization's business operation.
Reschenhofer & Matthes (2015)
Errors in spreadsheets are as ubiquitous as spreadsheets themselves.
Colbenz (2005)
Spreadsheets contain errors at an alarmingly high rate.
Abraham, et al (2005)
Spreadsheets... pose a greater threat to your business than almost anything you can imagine.
Howard (2005)
Never assume a spreadsheet is right, even your own.
Raffensperger (2001)
The results given by spreadsheets are often just wrong.
Sajaniemi (1998)
Spreadsheet errors are pervasive, stubborn, ubiquitous and complex.
Irons (2003)
It is now widely accepted that errors in spreadsheets are both common and potentially dangerous.
Nixon & O'Hara (2010)
94% of the 88 spreadsheets audited in 7 studies have contained errors.
Panko (2008)
Spreadsheets are often hard, if not impossible, to understand.
Mireault & Gresham (2015)
Despite overwhelming and unanimous evidence... companies have continued to ignore spreadsheet error risks.
Panko (2014)
Spreadsheets have a notoriously high number of faults.
Rust, et al (2006)
The quality and reliability of spreadsheets is known to be poor.
Bishop & McDaid (2007)
1% of all formulas in operational spreadsheets are in error.
Powell, Baker, & Lawson (2009)
Overconfidence is one of the most substantial causes of spreadsheet errors.
Sakal, et al (2015)
Your spreadsheets may be disasters in the making.
Caulkins, Morrison, & Weidemann (2006)
...few incidents of spreadsheet errors are made public and these are usually not revealed by choice.
Kruck & Sheetz (2001)
Spreadsheets are more fault-prone than other software.
Kulesz & Ostberg (2013)
The issue is not whether there is an error but how many errors there are and how serious they are.
Panko (2007)
Spreadsheet errors have resulted in huge financial losses.
Abraham & Erwig (2007)

Spreadsheet bibliography

Title Spreadsheet tools for data analysts
Authors Daniel W. Barowy
Year 2017
Type Ph.D thesis
Publication University of Massachusetts Amherst
Series Doctoral Dissertations, 1045, September
Abstract

Spreadsheets are a natural fit for data analysis, combining a simple data storage and presentation layer with a programming language and basic debugging tools.

Because spreadsheets are accessible and flexible, they are used by both novices and experts. Consequently, spreadsheets are hugely popular, with more than 750 million copies of Microsoft Excel installed worldwide. This popularity means that spreadsheets are the most popular programming language on the planet and the de facto tool for data analysis.

Nevertheless, spreadsheets do not address a number of important tasks in a typical analyst's pipeline, and their design frequently complicates them. This thesis describes three key challenges for analysts using spreadsheets:

  • 1) Data wrangling is the process of converting or mapping data from a "raw" form into another form suitable for use with automated tools.
  • 2) Data cleaning is the process of locating and correcting omitted or erroneous data.
  • 3) Formula auditing is the process of finding and correcting spreadsheet program errors.

These three tasks combined are estimated to occupy more than three quarters of a data analyst's time. Furthermore, errors not caught during these steps have led to catastrophically bad decisions resulting in billions of dollars in losses. Advances in automated techniques for these tasks may result in dramatic savings in both time and money.

Three novel programming language-based techniques were created to address these key tasks:

  • The first, automatic layout transformation using examples, is a program synthesis-based technique that lets spreadsheet users perform data wrangling tasks automatically, at scale, and without programming.
  • The second, data debugging, is technique for data cleaning that combines program analysis and statistical analysis to automatically find likely data errors.
  • The third, spatio-structural program analysis unifies positional and dependence information and finds spreadsheet errors using a kind of anomaly analysis.

Each technique was implemented as an end-user tool - FlashRelate, CheckCell, and ExceLint respectively - in the form of a point-and-click plugin for Microsoft Excel. Our evaluation demonstrates that these techniques substantially improve user efficiency.

Finally, because these tools build on each other in a complementary fashion, data analysts can run data wrangling, cleaning, and formula auditing tasks together in a single analysis pipeline.

Full version Available
Sample
ExceLint's regularity map
ExceLint's regularity map

This figure shows ExceLint's regularity map visualization, which is built on top of spatio-structural analysis.

It is immediately apparent that something is unusual with the cells in column H. First, cell H57 is colored blue, which indicates that it is data like the cells found to its left. In fact, this cell should be a formula.

Cells H58, H59, and H62 stand out because they are colored orange. All of these cells exhibit an off-by-one reference error that instead computes the row total for the row one below.

Finally, cell H60 is colored yellow. This cell exhibits an off-by-two reference error that computes the row total for the row two cells below.

When using the regularity map visualization, all of these problems immediately "pop out."

Go to top