regimes that accommodates competing outcomes by recommending sets of
feasible treatments rather than a unique treatment at each decision
point.
Method: Dynamic treatment regimes model sequential clinical
decision-making using a sequence of decision rules, one for each
clinical decision. Each rule takes as input up-to-date patient
information and produces as output a single recommended treatment.
Existing methods for estimating optimal dynamic treatment regimes, for
example Q-learning, require the specification of a single outcome
(e.g. symptom relief) by which the quality of a dynamic treatment
regime is measured. However, this is an over-simplification of
clinical decision making, which is informed by several potentially
competing outcomes (e.g. symptom relief and side-effect burden.) Our
method is motivated by the CATIE clinical trial of schizophrenic
patients: it is aimed at patient populations that have high outcome
preference heterogeneity, evolving outcome preferences, and/or
impediments to preference elicitation. To accommodate varying
preferences, we construct a sequence of decision rules that output a
tailored set of treatments rather than a unique treatment. The set
contains all treatments that are not dominated according to the
competing outcomes. To construct these sets, we solve a non-trivial
enumeration problem by reducing it to a linear mixed integer program.
Result: We illustrate the method using data from the CATIE
schizophrenia study by constructing a set-valued dynamic treatment
regime using measures of symptoms and weight gain as competing
outcomes. The sets we produce offer more choice than a standard
dynamic treatment regime while eliminating poor treatment choices.
Conclusion: Set-valued dynamic treatment regimes represent a new
paradigm for data-driven clinical decision support. They respect both
within- and between-patient preference heterogeneity, and provide more
information to decision makers. Set-valued decision rules may be used
when patients are unwilling or unable to communicate outcome
preferences. The mathematical formalization of set-valued dynamic
treatment regimes offers a new class of decision processes which
generalize Markov Decision Processes in that the process involves two
actors: a screener which maps states to a subset of available
treatments, and a decision maker which chooses treatments from this
set. We believe this work will stimulate further investigation and
application of these processes.