Candidate for the Lee B. Lusted Student Prize Competition
Purpose: Simulation modelers require transition probabilities between disease states that are often not directly observed. While data may be collected on timescales of years or even decades, underlying disease dynamics evolve at much shorter timescales. Accurate transition probability estimates are difficult to obtain, and may require solving complex mathematical optimization problems.
Method: We consider a cohort model over time. Disease dynamics evolve according to xt+1 = Axt, where xt describes the proportion of the population in a finite number of categories, and A is the transition matrix. The transition probabilities must be estimated from cross-sectional samples of the state of the cohort at a subset of time points. This gives rise to equations: xt+L = ALxt , where L is the interval between samples. In the general case, samples could be unevenly-spaced and A could vary across different sample intervals. Our goal is to find an A that best fits the observations, given the observations’ precision and assumptions about disease progression and regression. We develop an iterative algorithm using a sequence of simple optimizations. We select arbitrary initial values for A and estimate the cohort states x0, …, xt+L (including values at unobserved time points) that minimize the sum of residuals ∑t=0,…,L-1 (xt+1 - Axt )2 , subject to constraints. Then, we fix the cohort states x0, …, xt+L to our estimated values, and solve for the transition matrix A that again minimizes the residuals. We repeat this procedure until the estimated probabilities in A converge.
Result: We apply our method to a previously-developed model of progressive, diabetic macular edema to infer monthly transition probabilities between visual acuity levels from cross-sectional data measured at 5-year intervals. We compare our iterative approach to a traditional Nelder-Mead algorithm, running both algorithms from 1,000 random starting locations. While Nelder-Mead identified a slightly better fit overall than the iterative algorithm, the iterative algorithm achieved a better mean fit with lower variability, identifying a solution within 15% of the best-fit residual for over 90% of starting points; Nelder-Mead only did so for 8% of starting locations.
Conclusion: A fundamental problem faced across a range of modeling applications is how to consistently infer transition probabilities from multiple cross-sectional prevalence estimates. We describe an iterative algorithm that produces accurate and consistent solutions.