|Title||Detecting faulty empty cells in spreadsheets|
|Authors||Liang Xu, Shuo Wang, Wensheng Dou, Bo Yang, Chushu Gao, Jun Wei, Tao Huang|
|Publication||IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)|
Spreadsheets play an important role in various business tasks, such as financial reports and data analysis. In spreadsheets, empty cells are widely used for different purposes, e.g., separating different tables, or default value "0".
However, a user may delete a formula unintentionally, and leave a cell empty. Such ad-hoc modification may introduce a faulty empty cell that should have a formula.
We observe that the context of an empty cell can help determine whether the empty cell is faulty. For example, is the empty cell next to a cell array in which all cells share the same semantics? Does the empty cell have headers similar to other non-empty cells?
In this paper, we propose EmptyCheck, to detect faulty empty cells in spreadsheets. By analyzing the context of an empty cell, EmptyCheck validates whether the cell belong to a cell array. If yes, the empty cell is faulty since it does not contain a formula.
We evaluate EmptyCheck on 100 randomly sampled EUSES spreadsheets. The experimental result shows that EmptyCheck can detect faulty empty cells with high precision (75.00%) and recall (87.04%). Existing techniques can detect only 4.26% of the true faulty empty cells that EmptyCheck detects.
This is a real spreadsheet extracted from the EUSES corpus.
This spreadsheet contains many empty cells. We cluster them into two categories according to the degree of their harmfulness:
EmptyCheck marks the faulty empty cells with a red triangle.