Purpose: Markov decision processes (MDPs) are increasingly used in medical decision making to optimize sequential (and embedded) decisions. However, appropriate methods for conducting cost-effectiveness analysis in the framework of MDPs have not been well described. Our purpose was to provide a systematic approach to find the most cost-effective policy using MDPs, and compare it with a commonly used approach of maximizing net benefit (NB-approach),
Method: NB-approach converts QALYs to benefits using a willingness-to-pay (WTP) threshold and selects an action in each decision epoch that maximizes net benefit, allowing the process to maximize a single outcome. We provide an alternative approach that chooses the action with the highest QALYs such that the incremental cost-effectiveness ratio (ICER) in comparison with baseline (defined as a non-dominated action with lowest QALYs) does not exceed the specified WTP threshold in each decision epoch (we call it maximum constrained QALYs [MCQ]-approach). We demonstrate our approach using a hypothetical example of a progressive disease with three health states—mild, moderate, and severe; and three treatment regimens—X, Y, and Z. Disease progression and treatment costs depend upon the state and regimen (Z being the most expensive and effective regimen). We formulate the problem as a finite-horizon discrete-time MDP with 10 decision-epochs. Our objective was to find the optimal regimen in each decision epoch that is also the most cost-effective. We compare the mathematical structure and numerical results obtained by the two approaches.
Result: We demonstrate mathematically and empirically that MCQ- and NB-approach lead to the same optimal policy when there are only two actions. However, for three or more actions, the two approaches yield different optimal policies (Table). The ICERs of the optimal policies in comparison to the baseline policy (regimen X in all decision epochs) were under WTP=$50,000/QALY with both approaches; however, total QALYs obtained with MCQ-approach were 5%–12% higher than those obtained with NB-approach.
Conclusion: We present an intuitive framework to evaluate the most cost-effective policy using MDPs. Our method provides policies that are cost-effective at a given WTP threshold and have higher QALYs than those obtained with a common used approach of maximizing net-benefit. Our method of maximum constrained QALYs will result in superior optimal policies under limited resource settings.
See more of: The 34th Annual Meeting of the Society for Medical Decision Making