STRUCTURAL VALIDATION TO MINIMIZE ERRORS IN SPREADSHEET MODELS

Tuesday, October 21, 2014
Poster Board # PS3-21

Stephanie L. Bailey, PhD and April D. Kimmel, PhD, Virginia Commonwealth University School of Medicine, Richmond, VA

Purpose: To improve transparency and accessibility to decision makers, mathematical models applied in resource-limited settings increasingly are implemented using spreadsheet software. However, model implementation using spreadsheets is prone to programming error. Structural validation of spreadsheet models can facilitate error identification and minimize errors. We examined the impact of using parallel model structures to reduce model error.

Method: We used Microsoft Excel (Redmond, WA) to develop alternative structures of a multi-cohort, state-transition model of HIV disease in Haiti. “Single Cell” is a compact form, with all transitions to a specific health state characterized in a single cell formula and which relies on individual cell references. “Single Name” retains the same structure and format, but relies on cell names. “Matrix” uses transition probability matrices and cell names. We performed a case study representing typical changes to operational spreadsheet models. This involved adding new health states, as well as new transitions from existing and new health states, to better reflect re-entry into care in Haiti. Outcome measures included programmer time to make changes; number of original, non-copied errors; and programmer time required to identify and fix errors, all by model structure. Main error classifications included: mistakes in logic, incorrect cell ranges, incorrect cell references, incorrectly copied formulae, and misuse of built-in functions. We preliminarily evaluated the impact of errors as the percentage of total cells containing errors and the number of errors per total programmer time.

Result: Revisions to Single Cell yielded the greatest number of detected errors but the shortest total programmer time. In contrast, revisions to Matrix yielded the fewest detected errors but the longest total programmer time (Table 1). The percentage of total cells with errors was 1.1% (Single Cell), 0.9% (Single Name), and 0.03% (Matrix), with approximately 5.3, 3.2, and 1.5 errors, respectively, for each hour of programmer time. Results were sensitive to the type of changes made. When we also linked and operationalized a user interface to each model version, we identified no errors.

Conclusion: While using parallel model structures may minimize errors in spreadsheet models, trade-offs exist across different model structures between programmer time and susceptibility to error. Spreadsheet models should use more than one model structure to reduce potential error. Next steps include evaluating the impact of errors on model predictions.