The untested spreadsheet is as dangerous and untrustworthy as an untested program.
Price (2006)
A significant proportion of spreadsheets have severe quality problems.
Ayalew (2007)
60% of large companies feel 'Spreadsheet Hell' describes their reliance on spreadsheets.
Murphy (2007)
Spreadsheets can be viewed as a highly flexible programming environment for end users.
Abreu, et al (2015)
Despite overwhelming and unanimous evidence... companies have continued to ignore spreadsheet error risks.
Panko (2014)
Spreadsheet errors are pervasive, stubborn, ubiquitous and complex.
Irons (2003)
Spreadsheet errors have resulted in huge financial losses.
Abraham & Erwig (2007)
Spreadsheet shortcomings can significantly hamper an organization's business operation.
Reschenhofer & Matthes (2015)
It is irrational to expect large error-free spreadsheets.
Panko (2013)
People tend to believe their spreadsheets are more accurate than they really are.
Caulkins, Morrison, & Weidemann (2006)
...few incidents of spreadsheet errors are made public and these are usually not revealed by choice.
Kruck & Sheetz (2001)
Spreadsheets are easy to use and very hard to check.
Chen & Chan (2000)
Most executives do not really check or verify the accuracy or validity of [their] spreadsheets...
Teo & Tan (1999)
1% of all formulas in operational spreadsheets are in error.
Powell, Baker, & Lawson (2009)
The quality and reliability of spreadsheets is known to be poor.
Bishop & McDaid (2007)
Every study that has looked for errors has found them... in considerable abundance.
Panko & Halverson (1996)
Research on spreadsheet errors is substantial, compelling, and unanimous.
Panko (2015)
Spreadsheet development must embrace extensive testing in order to be taken seriously as a profession.
Bock (2016)
Studies have shown that there is a high incidence of errors in spreadsheets.
Csernoch & Biro (2013)
Untested spreadsheets are riddled with errors.
Miller (2005)
Spreadsheets are often hard, if not impossible, to understand.
Mireault & Gresham (2015)
Your spreadsheets may be disasters in the making.
Caulkins, Morrison, & Weidemann (2006)
Spreadsheets are dangerous to their authors and others.
Durusau & Hunting (2015)
Even obvious, elementary errors in very simple, clearly documented spreadsheets are... difficult to find.
Galletta, et al (1993)
Spreadsheets are alarmingly error-prone to write.
Paine (2001)
The software that end users are creating... is riddled with errors.
Burnett & Myers (2014)
Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment.
Abreu, et al (2015)
Spreadsheets are notoriously error-prone.
Cunha, et al (2011)
Spreadsheets contain errors at an alarmingly high rate.
Abraham, et al (2005)
Spreadsheets... pose a greater threat to your business than almost anything you can imagine.
Howard (2005)
Developing an error-free spreadsheet has been a problem since the beginning of end-user computing.
Mireault (2015)
Spreadsheets have a notoriously high number of faults.
Rust, et al (2006)
Most large spreadsheets have dozens or even hundreds of errors.
Panko & Ordway (2005)
Spreadsheet errors... a great, often unrecognised, risk to corporate decision making & financial integrity.
Chadwick (2002)
Spreadsheets are more fault-prone than other software.
Kulesz & Ostberg (2013)
Spreadsheets are extraordinarily and unacceptably prone to error.
Dunn (2010)
Never assume a spreadsheet is right, even your own.
Raffensperger (2001)
Overconfidence is one of the most substantial causes of spreadsheet errors.
Sakal, et al (2015)
Spreadsheet errors are still the rule rather than the exception.
Nixon & O'Hara (2010)
A lot of decisions are being made on the basis of some bad numbers.
Ross (1996)
The results given by spreadsheets are often just wrong.
Sajaniemi (1998)
Programmers exhibit unwarranted confidence in the correctness of their spreadsheets.
Krishna, et al (2001)
94% of the 88 spreadsheets audited in 7 studies have contained errors.
Panko (2008)
It is now widely accepted that errors in spreadsheets are both common and potentially dangerous.
Nixon & O'Hara (2010)
Every study, without exception, has found error rates much higher than organizations would wish to tolerate.
Panko (1999)
Errors in spreadsheets are as ubiquitous as spreadsheets themselves.
Colbenz (2005)
Errors in spreadsheets... result in incorrect decisions being made and significant losses incurred.
Beaman, et al (2005)
Spreadsheets are the most popular live programming environments, but they are also notoriously fault-prone.
Hermans & van der Storm (2015)
Spreadsheets are commonly used and commonly flawed.
Caulkins, Morrison, & Weidemann (2008)
The issue is not whether there is an error but how many errors there are and how serious they are.
Panko (2007)

Spreadsheet bibliography

Title Spreadsheet tools for data analysts
Authors Daniel W. Barowy
Year 2017
Type Ph.D thesis
Publication University of Massachusetts Amherst
Series Doctoral Dissertations, 1045, September
Abstract

Spreadsheets are a natural fit for data analysis, combining a simple data storage and presentation layer with a programming language and basic debugging tools.

Because spreadsheets are accessible and flexible, they are used by both novices and experts. Consequently, spreadsheets are hugely popular, with more than 750 million copies of Microsoft Excel installed worldwide. This popularity means that spreadsheets are the most popular programming language on the planet and the de facto tool for data analysis.

Nevertheless, spreadsheets do not address a number of important tasks in a typical analyst's pipeline, and their design frequently complicates them. This thesis describes three key challenges for analysts using spreadsheets:

  • 1) Data wrangling is the process of converting or mapping data from a "raw" form into another form suitable for use with automated tools.
  • 2) Data cleaning is the process of locating and correcting omitted or erroneous data.
  • 3) Formula auditing is the process of finding and correcting spreadsheet program errors.

These three tasks combined are estimated to occupy more than three quarters of a data analyst's time. Furthermore, errors not caught during these steps have led to catastrophically bad decisions resulting in billions of dollars in losses. Advances in automated techniques for these tasks may result in dramatic savings in both time and money.

Three novel programming language-based techniques were created to address these key tasks:

  • The first, automatic layout transformation using examples, is a program synthesis-based technique that lets spreadsheet users perform data wrangling tasks automatically, at scale, and without programming.
  • The second, data debugging, is technique for data cleaning that combines program analysis and statistical analysis to automatically find likely data errors.
  • The third, spatio-structural program analysis unifies positional and dependence information and finds spreadsheet errors using a kind of anomaly analysis.

Each technique was implemented as an end-user tool - FlashRelate, CheckCell, and ExceLint respectively - in the form of a point-and-click plugin for Microsoft Excel. Our evaluation demonstrates that these techniques substantially improve user efficiency.

Finally, because these tools build on each other in a complementary fashion, data analysts can run data wrangling, cleaning, and formula auditing tasks together in a single analysis pipeline.

Full version Available
Sample
ExceLint's regularity map
ExceLint's regularity map

This figure shows ExceLint's regularity map visualization, which is built on top of spatio-structural analysis.

It is immediately apparent that something is unusual with the cells in column H. First, cell H57 is colored blue, which indicates that it is data like the cells found to its left. In fact, this cell should be a formula.

Cells H58, H59, and H62 stand out because they are colored orange. All of these cells exhibit an off-by-one reference error that instead computes the row total for the row one below.

Finally, cell H60 is colored yellow. This cell exhibits an off-by-two reference error that computes the row total for the row two cells below.

When using the regularity map visualization, all of these problems immediately "pop out."

Go to top