i-nth logo

Authors

Kyle Dewey

Abstract

Within the biological sciences, spreadsheets are commonly used as a data entry and storage medium. While this practice is simple and generally well understood, the unrestrained flexibility of the spreadsheet medium allows errors to accumulate and potentially propagate. Such errors impede accurate analysis, hindering research.

The underlying problem is that the error correction facilities of typical spreadsheet programs are lackluster at best, if they exist at all. For this reason, Error Sentinel was developed. Error Sentinel is a spreadsheet program with programmable error correction facilities. These facilities allow users to define exactly what clean data is, along with corrections for erroneous data. Such rules are specified via a custom visual programming language.

Once error correction rules are written, users inputting data need not be familiar with the rules or even have programming skills in order to utilize them. Error Sentinel can be used interactively like a typical spreadsheet program, or non-interactively as with more traditional error correction techniques.

To test Error Sentinel's real-world capabilities, it was successfully applied to the correction of the mtHaplogroups data set. This application has shown that Error Sentinel requires far less time and code to perform error correction than with previous methods. Benchmarking has shown that such gains are at only a modest cost in performance.

While Error Sentinel appears quite simplistic compared to typical spreadsheet programs, its error correction facilities are robust, and it is fully capable of being applied to arbitrary data sets represented in the spreadsheet medium.

Sample

Example of a correction in Error Sentinel
Example of a correction in Error Sentinel

This example shows a selected cell with an available correction. In this case, "J2A" is not valid, but "J2a" is.

Publication

2011, Master's thesis, Rochester Institute of Technology, June

Full article

Error Sentinel: A rule-based spreadsheet program for intelligent data entry, error correction, and curation