Daniel W. Barowy, Emery D. Berger, & Benjamin Zorn
Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences.
We present a static analysis specifically designed to find spreadsheet formula errors.
Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that are especially surprising disruptions to nearby rectangular regions.
We present ExceLint, an implementation of our static analysis for Microsoft Excel. We demonstrate that ExceLint is fast and effective: across a corpus of 70 spreadsheets, ExceLint takes a median of 5 seconds per spreadsheet, and it significantly outperforms the state of the art analysis.
Since there are many possible layouts and because user intent is impossible to know, ExceLint uses simplicity as a proxy: the simplest layout that fits the data is most likely the intended layout.
In this setting, formula errors manifest as aberrations in the rectangular layout.
For example, the formula shown in cell J30 has an off-by-one error that omits a row.
2018, ACM on Programming Languages, Volume 2, Number OOPSLA, Article 148, November, pages 1-26