##
MEDICAL DECISION MAKING PROBLEMS WITH LARGE POLICY SPACES: WHY MARKOV DECISION PROCESSES TRUMP SIMULATION

** Purpose: ** Simulation can be
a valuable tool for analyzing Markov models of medical decision making problems
aimed at determining treatment strategies for individual patients over time: A
set of candidate treatment strategies is proposed, the implementation of each
policy is simulated and the “best” strategy is identified. Such an approach
works well for small problems in which both the patient health states and the
available treatment actions are relatively sparse or for larger problems with
obviously structured policies, e.g., a threshold policy that prescribes one
treatment below some value of patient health (e.g., do nothing below a certain
MELD score) and another (e.g., transplant) above. However, to analyze very large
problems for which such structure may or may not be obvious

*a priori*, a true optimization technique (i.e., Markov decision processes (MDPs)) is needed. Here, we demonstrate such an instance in which simulation is not a viable option.

** Method: ** Consider the sequential
decision making problem discussed by Khojandi et al. (2013), namely whether to extract/abandon
cardiac leads at the time of failure. The decision is made as a function of patient
age and the age of up to five implanted leads. Figure 1 illustrates just a
portion of a MDP-generated optimal extract/abandon policy for a specific,
single chamber pacemaker patient. From Figure 1, the decision for each implanted
lead is of threshold-type in lead age, patient age, the lead's age rank and the
total number of implanted leads. Despite this structure, because the policy
space for this problem is so large, searching for an optimal policy by simulating
all possible policies of this form is close to impossible.

** Results: ** For the problem
considered, the specification of a full extract/abandon policy for any given
patient requires 15 plots like the eight included in Figure 1. Considering that
patient age and lead age vary between 30-100 and 1-69 years, respectively, evaluating
all possible thresholds for each patient in just one of these 15 plots would
require (70+69)!/(70!69!)≃4.7×10

^{40}simulations, which, if each simulation required only one millisecond, would take approximately 1.5×10

^{30}years.

** Conclusion: **Simulation is
unable to find an optimal policy for complicated medical decision making
problems even when the policy is well-behaved. As a result, powerful
optimization techniques such as MDPs are needed to address these problems.

** **

See more of: The 36th Annual Meeting of the Society for Medical Decision Making