i-nth - Combining spreadsheet smells for improved fault prediction

Authors

Patrick Koch, Konstantin Schekotihin, Dietmar Jannach, Birgit Hofer, Franz Wotawa, & Thomas Schmitz

Abstract

Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making.

Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells.

Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited.

In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classifier.

Experiments on two public datasets containing real-world spreadsheet faults show significant improvements in terms of fault prediction accuracy.

Sample

Precision-recall performance for the Enron Errors Corpus

The proposed ensemble learning approach, AdaBoost, significantly outperforms the baseline techniques.

Many of the individual smells have limited predictive power when used in isolation, leading to low recall and precision values.

Overall, the use of isolated smells and their simple combinations is not very helpful for fault prediction, whereas combining them as proposed in this work leads to substantially higher predictive power.

This indicates that actual faults in spreadsheets emerge from a combination of specific deficiencies which are difficult to capture by means of simple metric thresholds.

Publication

2018, 40th International Conference on Software Engineering (ICSE), May

Full article

Combining spreadsheet smells for improved fault prediction