Candidate for the Lee B. Lusted Student Prize Competition

**Purpose: ** Automated evidence-based methods exploiting widely available electronic health records (EHR) for understanding congestive heart failure (CHF) patient treatment histories require methods robust to treatment variability. The goal of this paper is to test the optimal matching algorithm for finding high "quality" representatives of CHF patient groups.

**Method: ** Optimal Matching (OM) originates in genomics by matching like-sequences and was generalized by social scientists to generic sequences. The algorithm runs in R [1] by the package TraMineR [2] with a fixed subset of 100 patients from UVA's Clinical Database Repository each with sequences of length 19 (median for the original dataset). The input parameters to the algorithm are the substitution and insertion/deletion costs. Patient groups are formed by hierarchical clustering using percent overlap of procedures between patients with the Dunn index determining the number of groups. Representative sequences are the patient treatment histories, which best represent the remaining cluster members in terms of "quality" as mathematically defined in TraMineR documentation. The representatives reveal the procedural makeup of the cluster. Such insight is useful in automated evidence-based approaches to understanding CHF as it shows decision makers how the health system has responded to patients of similar treatment. To obtain the best input parameters a Kriging response surface of 100 grid points (cost combinations) was created and plotted.

**Result: ** The optimal input combination was (Sub, InDel) = (0.722, 0.658) with corresponding quality 0.503 and is shown in the figure. Kriging output suggests that costs and quality are nonlinear and non-smooth in relation. Small input changes result in non-smooth output changes (see figure).

**Conclusions: ** Automated methods of analysis require predictable outputs in order to be repeatable and reliable. As the response surface showed significant non-smoothness, the "quality" measure from OM must be better explored in relation to EHR data in order to exploit this algorithm's desirable properties and rich research body in other fields. Future research is needed to define the conditions and properties for which EHRs may be used with OM to be able to exploit its properties for evidence-based methods of inquiry. Research supported by NSF Graduate Research Fellowship. [1] R. D. C. Team, "R," 2011. [2] A. Gabadinho et al., "Analyzing and visualizing state sequences in r with TraMineR," 2011.