i-nth logo

Authors

Bennett Kankuzi

Abstract

In this thesis, we tackled the problem of errors in spreadsheets by studying spreadsheet authors' mental models.

It is a common assertion that humans have mental models of the systems they interact with, and it is difficult to explain many aspects of human behaviour without resorting to a construct such as mental models.

We therefore argue that it is important to first of all understand what types of mental models spreadsheet authors possess when they are doing different spreadsheet tasks in order to better understand why the spreadsheet process is so error-prone and to be able to devise new tools that better correspond to the way they think.

In the first empirical study we conducted in this research work, we investigated and characterized mental models of spreadsheet authors as they are doing various spreadsheet tasks. We found that spreadsheet authors have (at least) three mental models of a spreadsheet:

  • The real world model that comprises general knowledge of the world around us.
  • The domain model that represents knowledge of the problem domain and the functionality of the spreadsheet in problem domain or application terms.
  • The spreadsheet model that codes the expressions and data relationships in the spreadsheet.

When explaining a spreadsheet, the real-world and domain mental models are prominent and the spreadsheet model is less evident, but when locating and fixing an error, one must constantly switch back and forth between the domain model and the spreadsheet model, which requires frequent use of the mapping between problem domain concepts and their spreadsheet model counterparts.

These results suggest that a tool intended to aid in comprehension and debugging of spreadsheets should make prominent real-world and problem domain concepts and map them easily to spreadsheet-specific details.

We thus developed and evaluated a spreadsheet visualization tool that demonstrates that it is possible to devise spreadsheet authoring and debugging tools that are easy to use and that correspond to spreadsheet authors' mental models of spreadsheets by relieving spreadsheet authors from spreadsheet details and letting them utilize more of their mental model of the application or problem domain. The tool translates traditional spreadsheet formulae into problem domain narratives and highlights referenced cells.

The tool was evaluated in the second empirical study of this research work and was found to be easy to learn and helped spreadsheet authors to locate more errors in spreadsheets. Furthermore, the tool increased the use of the domain mental model when spreadsheet authors were describing errors and seemed to improve the mapping between the spreadsheet model and the domain model which is crucial in spreadsheet debugging and comprehension.

We have also put forward a case for the need to shift from the traditional spreadsheet paradigm to another paradigm in which a spreadsheet author should also be able to debug a spreadsheet in problem domain terms rather than just using traditional spreadsheet cell references. In this proposed paradigm, a spreadsheet author should also ideally be allowed to create spreadsheets by writing formulae in domain terms.

We thus also developed a prototype spreadsheet visualization tool that allows spreadsheet authors to fix errors in a spreadsheet using domain terms. The tool was evaluated in the third empirical study of this research work and was found to promote spreadsheet authors to think more in domain terms in a way that overshadows the way they traditionally think when they are describing errors and even when fixing errors, hence promoting a paradigm shift from traditional spreadsheets.

Sample

An illustration of different views of a spreadsheet
An illustration of different views of a spreadsheet

A spreadsheet is usually perceived only as a two-dimensional grid of cells populated mainly with numerical values although every spreadsheet has a formula view as well as an underlying data-flow graph.

A data-flow graph represents the network-structure of cell dependencies expressed by the references in the individual formulae. However, the data-flow graph is normally "hidden" from the spreadsheet developer.

It is therefore not surprising that most spreadsheet developers superficially view a spreadsheet as a word processor for numbers and not necessarily as a complex data-flow graph that spreadsheets really are.

Publication

2015, Ph.D thesis, University of Eastern Finland, Dissertations in Forestry and Natural Sciences No. 183, August

Full article

Deficiencies in spreadsheets: A mental model perspective

Also see

A summary of my PhD thesis in not so technical language