METHODS Preferences were elicted using a titration variant of the standard gamble. Long term test-retest reliability was measured on three sets of descriptions presented four months apart: heart failure (5 states), colorectal cancer (7 states) and eczema (3 states). Reliability coefficients were calculated for each health state description and by each panel member.
RESULTS Reliability coefficients were significant in 11 of the 15 health states and ranged from -0.060 (p=0.79) to 0.829 (p<0.01). For heart failure, all correlations were significant at p<0.01 and ranged from 0.710 to 0.829. For colorectal cancer four of seven coefficents were significant and the range was wide: -0.060 (ns) to 0.724 (p=0.002). For eczema, two of the three coefficients were significant: range 0.279 (ns) to 0.670 (p=0.001). Reliability at the level of participants was strikingly high between episodes of preference elicitation. Although a small number of people showed low reliability, precision was very limited at this end of the scale. The number of available observations for the analysis per person were much smaller. However, reliability was generally high (mean 0.765, median 0.840) and although the range was very wide (-0.060 to 0.998), was very positively skewed (e.g. 60% were >0.8).
CONCLUSIONS Although variable across a range of health states, in general, reliability was high indicating that the panel generates values with acceptable reliability. The consistency shown in individuals' utilities, despite the long interval between measurements, is striking and may be relevant to debates about the completeness of preferences.