i-nth logo
Data clones

Guidelines:

  • Each input, assumption, and data value is entered in only one location.
  • If values are copied, then clearly mark the copies.

Duplicating values, also known as cloning, can cause many problems. To avoid those problems, ensure that data, assumptions, and inputs appear only once and are then referenced.

Types of cloned data

Cloning of data can occur in many ways, including:

  • Data is copied and pasted from one part of a workbook to another.
  • Data is copied and pasted from one workbook to another workbook.
  • The same data is entered in multiple cells.
  • The same data is entered in multiple formulae.
  • The same data is entered in cells and formulae.
  • Formula cells are copied and pasted as values elsewhere.

Problems caused by data cloning

Potential problems caused by data cloning include:

  • Duplicates may have different values.
  • Updating the data may be problematic, as the user cannot be confident that all instances have been found and updated correctly.
  • Results may be incorrect, due to inconsistent data being used in different parts of the workbook.

Data clones might be in the same workbook, or in different workbooks (which may or may not be linked). Having data clones in different workbooks is especially problematic, as they are exceedingly difficult to detect.

A special case of data cloning occurs when we have data in a cell and the same data hard coded in formulae. This situation can also be difficult to detect.

But sometimes we need to copy results

Yes, we may need to copy a block of results, possibly along with their data, so we can (for example) compare scenarios. If so, then clearly mark the cells as being copies – using formatting and headings.

The idea is to make it obvious that the cells are copies, so they must not be used instead of the original values.

Example of a cloned assumption

When we're creating or changing a workbook, it is easy to forget that we already have a specific assumption or piece of data. Therefore, when we need the data in a different location, we may create it again. This duplication is even more likely to occur when different people work on a workbook.

The image shows a sales and advertising model. The model consists of two worksheets:

  • Sales. Calculates the projected number of sales, given an assumption about the number of sales prospects and the annual sales rate (packets) per person. The Sales Manager prepared this worksheet.
  • Advertising. Calculates the advertising budget, given an assumption about the market size and the annual advertising spend per person. The Marketing Manager prepared this worksheet.
Sales and advertising budgets
Clone example

Each manager made their estimates without speaking to the other manager, though the Advertising Manager used the annual sales budget from the Sales worksheet.

As is often the case, the two managers use different terminology. That is, in this workbook, "Prospects" and "Market size" are synonyms. Therefore, the managers have inadvertently created a data clone. Since they made different assumptions, the model produces incorrect results.

If the Advertising worksheet is corrected to refer to the market size assumption on the Sales worksheet, then the advertising budget would be $156,000 pa rather than $195,000 pa – a substantial error.

In this example, the data clone is obvious because the two worksheets are simple and have similar structure. But often data clones are not so obvious – the clones could be in separate workbooks, with neither manager aware of the other's assumption. Using a good structure, with all data and assumptions together in a Data worksheet, can help make data clones easier to identify.

A subtle variation of this example would occur if the two managers used the same numerical assumption, say sales of $156,000 pa, but still have separate assumption cells. If the expected sales are later revised to be $200,000 pa, there is a risk that only one of the assumptions is updated. We would then have the situation where the Sales and Advertising budgets are inconsistent.