Candidate for the Lee B. Lusted Student Prize Competition
Purpose: To explore the extent to which the application of a common scoring procedure ameliorates the comparability of EQ-5D and SF-6D responses. Poor agreement between preference-based health-related quality of life instruments has been widely-reported across patient and community-based samples. Between-measure discrepancies can be attributed to the descriptive systems of the respective instruments, the valuation techniques used to derive preference weights, or a combination of the two. Research comparing different valuation techniques (e.g. time-trade off (TTO) versus standard gamble (SG)) has demonstrated systematic differences in resulting index scores. Due to considerable methodological challenges, little research has attempted to isolate the effect of different descriptive systems with regard to the comparability of index scores.
Method: Scoring algorithms for the EQ-5D and SF-6D have been generated using the same discrete choice experiment (DCE) approach, using an Australia-representative online sample. Empirical analysis to examine the nature of the relationship between index scores comprised descriptive statistics, assessment of agreement (Bland-Altman plots, interclass correlation coefficient (ICC)) and explorative ordinary least squares regressions. The comparative assessment uses the same dataset that compared TTO-derived EQ-5D scores and SG-derived SF-6D scores across 7 patient/population groups, reported by Brazier and colleagues in 2004 (n=2112). This analytic framework enables the direct comparability of scenarios where both the descriptive and valuation systems differ (2004 study) and where only the descriptive systems differ (current study).
Result: DCE-derived EQ-5D scores were consistently higher than DCE-derived SF-6D scores, with mean differences exceeding 0.17 across each patient/population sample. ICC for the whole sample was 0.557, indicating ‘fair’ agreement, ranging from 0.373 to 0.638 within the subsamples. Comparable TTO/SG results: mean scores were within 0.10 in all 7 subsamples (with mean SF-6D scores greater than mean EQ-5D scores in 6 of 7 subgroups); whole sample ICC = 0.522 (ranging from 0.352 to 0.547).
Conclusion: A common scoring procedure did not reduce the level of disagreement between EQ-5D and SF-6D responses, indicating that the instruments provide substantially different ways for respondents to describe their health state. Accordingly, poor agreement between the instruments is inevitable. Normative unknowns relating to the descriptive components of preference-based measures (e.g. conceptual framing of questions and response options, length of recall etc.) require further attention. Reference: Brazier J, et al. Health Econ. 2004; 13(9): 873-84