i-nth - Understanding and inferring units in spreadsheets

Authors

Jack Williams, Carina Negreanu, Andrew D. Gordon, & Advait Sarkar

Abstract

Numbers in spreadsheets often have units: metres, grams, dollars, etc. Spreadsheet cells typically cannot carry unit information, and even where they can, users may not be motivated to provide it.

However, unit information is extremely valuable: it allows us to detect and prevent an entire class of spreadsheet errors, such as accidentally adding values of different units. What if we could infer the unit of any value in a spreadsheet, with little or no work from the user?

We present a novel method for predicting units and dimensions in spreadsheets, the first such method that combines logical constraint solving and probabilistic unit labelling. Our approach identifies and formalises the critical cells in spreadsheets that bound the user cost of unit annotation.

Separately, we apply machine learning to infer probabilistic unit labels from cell text. To contextualise the accuracy of our system, we discuss the attention investment trade-off for unit inference.

Sample

The figure shows a spreadsheet with a potential unit error. The formula in cell B3 adds two quantities that appear to have different units, according to the user labels in cells A1 and A2.

How to best assist the user depends on our belief in the relative correctness of different parts of the spreadsheet. In particular, the relative correctness of the formula and the labels:

If we believe the formula is more likely to be correct than the labels, we could highlight the labels in A1 and A2 to the user as being potentially misleading.
If we believe the labels are more likely to be correct, we could highlight the formula as being potentially erroneous.

Publication

2020, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), August

Full article

Understanding and inferring units in spreadsheets