Candidate for the Lee B. Lusted Student Prize Competition
Purpose: Accurate biomarkers to routinely assess the effectiveness of treatments for chronic diseases are not always available. Therapies for such conditions are often only effective in a subgroup of patients, where effectiveness is indicated by the rate of disease relapse or death and may or may not be correlated with quality of life on treatment. Medical decision makers must decide what treatments to prescribe for such diseases based on noisy, subjective feedback from patients. We develop a model for choosing between two treatments with stochastic effectiveness: a safe treatment whose effectiveness distribution is known, and a risky treatment which, unknown a priori, could be either superior or inferior to the safe treatment.
Methods: We develop a Bayesian, continuous-time, two-armed bandit model where the alternative treatments give Brownian rewards (health utilities). We model disease relapses according to Poisson processes. If the risky treatment is superior, it leads to relapses at a lower rate than the safe treatment. Unlike classic bandit models which maximize rewards over a deterministic time not influenced by the decisions, our objective is to maximize the patient's cumulative discounted health utilities over the random time interval that ends with the first disease relapse or death, which depends stochastically on the chosen treatment. We apply the model to the problem of choosing between symptom management (safe treatment) or disease-modifying agents (DMA) (risky treatment) for patients with multiple sclerosis (MS), where DMA is considered superior if the patient is a treatment responder and inferior if she is a non-responder.
Results: We find a closed-form analytical solution that provides a threshold probability, representing the current belief of the risky treatment being good, below which it is optimal to choose the safe treatment. For MS, if we assume a constant negative shift in quality of life on DMA due to side effects and maximize cumulative discounted quality-adjusted life years, the optimal threshold for acceptance is 25.8%. If we do not consider DMA side effects and only maximize the discounted time until the next relapse or death, the threshold becomes 6.3%.
Conclusions: By optimally balancing the rewards that the patient receives and the amount of information acquired about the treatment, our model can inform treatment decisions for chronic diseases where patients have unknown responsiveness to treatment.