MEDICAL DECISION MAKING PROBLEMS WITH LARGE POLICY SPACES: WHY MARKOV DECISION PROCESSES TRUMP SIMULATION

Tuesday, October 21, 2014
Poster Board # PS3-58

Anahita Khojandi, PhD1, Lisa Maillart, PhD1, Oleg Prokopyev, PhD1 and Mark S. Roberts, MD, MPP2, (1)University of Pittsburgh, Pittsburgh, PA, (2)University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA

Purpose: Simulation can be a valuable tool for analyzing Markov models of medical decision making problems aimed at determining treatment strategies for individual patients over time: A set of candidate treatment strategies is proposed, the implementation of each policy is simulated and the “best” strategy is identified. Such an approach works well for small problems in which both the patient health states and the available treatment actions are relatively sparse or for larger problems with obviously structured policies, e.g., a threshold policy that prescribes one treatment below some value of patient health (e.g., do nothing below a certain MELD score) and another (e.g., transplant) above. However, to analyze very large problems for which such structure may or may not be obvious a priori, a true optimization technique (i.e., Markov decision processes (MDPs)) is needed. Here, we demonstrate such an instance in which simulation is not a viable option.

Method: Consider the sequential decision making problem discussed by Khojandi et al. (2013), namely whether to extract/abandon cardiac leads at the time of failure. The decision is made as a function of patient age and the age of up to five implanted leads. Figure 1 illustrates just a portion of a MDP-generated optimal extract/abandon policy for a specific, single chamber pacemaker patient. From Figure 1, the decision for each implanted lead is of threshold-type in lead age, patient age, the lead's age rank and the total number of implanted leads. Despite this structure, because the policy space for this problem is so large, searching for an optimal policy by simulating all possible policies of this form is close to impossible.

Results:  For the problem considered, the specification of a full extract/abandon policy for any given patient requires 15 plots like the eight included in Figure 1. Considering that patient age and lead age vary between 30-100 and 1-69 years, respectively, evaluating all possible thresholds for each patient in just one of these 15 plots would require (70+69)!/(70!69!)≃4.7×1040 simulations, which, if each simulation required only one millisecond, would take approximately 1.5×1030 years.

Conclusion: Simulation is unable to find an optimal policy for complicated medical decision making problems even when the policy is well-behaved. As a result, powerful optimization techniques such as MDPs are needed to address these problems.