* Candidate for the Lee B. Lusted Student Prize Competition
Purpose: Evidence of effectiveness can potentially influence the clinical choices made by physicians and the value of health care delivered, yet few studies have sought to examine the types of evidence physicians draw upon as they make decisions in real time. This pilot study examines physicians' self-reported basis for their clinical decisions.
Method: Ten pediatric cardiologists recorded every clinically significant decision made during procedures, test interpretation, or delivery of inpatient and outpatient care during five full days and five half days of care delivery. The physician indicated the basis for each decision by selecting from 10 pre-determined categories, ranging from ‘arbitrary and anecdotal’, to evidence from published studies, to ‘parental preference’ and ‘avoiding a lawsuit’. The physicians could cite a published study only if it was specifically recalled. However, decisions reported to be based on ‘guidelines’ were classified post-hoc according to the evidence upon which the guideline recommendations were based.
Result: During the 7.5 days, 1188 decisions (158/day) were made. Almost 80% of decisions were deemed, by the physicians, to have no basis in any prior published data and fewer than 3% of decisions were based on a study specific to the question at hand. More than one-third were attributed to experience or personal anecdote, the most common basis for a decision. Senior physicians were more likely than residents to attribute a decision to specific experience or anecdote (41% vs. 26%), and less likely, to “taught to do it” (9% vs 29%).
Conclusion: This pilot study found that a group of pediatric cardiologists were unable to cite a published evidence source for most of their real-time clinical decision making, including those that consumed significant medical resources. Yet despite the lack of formal evidence base, there has been tremendous progress in the field of pediatric cardiology in the past 2 decades, suggesting that information about effectiveness is obtained and transmitted through an alternative process. Novel approaches to building an evidence base produced from real-time clinical decisions may be critical to comparative effectiveness research.
Purpose: The KBIT tutorial approach presents many (>50) case examples of a target presentation, such as chest pain, to sharpen students’ ability to recognize and discriminate different diagnostic categories. In a previous study, study booklets highlighting particular symptoms’ ability to discriminate confusable diagnoses improved students’ diagnostic accuracy. Here we compared three formats of computer-generated error feedback, varying their focus on symptoms that discriminate right case diagnosis from student’s wrong answer.
Method: 53 physician assistant students (Study 1) and 54 PA, 15 MD, and 15 other students (Study 2) completed pretest, tutorial, posttest, and 2 week follow-up test. The students studied each of 9 diagnoses’ symptom lists, then diagnosed 49 practice cases described in terms of history and physical, with multiple choice response and immediate error feedback. Students were randomized to receive different error feedback for misdiagnosed cases: prototype of right answer (1 column feature list); features common to both right answer and student’s wrong answer plus features unique to right answer (2 column); or common features plus those unique to each disease (3 column). Study 1 participants saw 3 diseases in each of the three formats, counterbalanced. Study 2 participants saw just one format. Tests presented similar cases, without feedback, with 17 items repeated on all three occasions. Cases’ surface details and case order were varied upon repetition.
Result: In Studies 1&2, participants diagnosed significantly more of the 17 repeated cases on post test (74%/67%) and 2-week follow up (59%/49%) than on pretest (43%/36%). At each time point, students diagnosed correctly more items considered “easier” on the basis of KBIT’s underlying prototype theory of category learning. The expected differences in accuracy gain due to the format of error feedback were not observed.
Conclusion: The study demonstrated that the tutorial, with its error feedback for many cases, using any of the three forms of error feedback, contributes to student learning of chest pain diagnosis. Contrary to expectation, students did not learn more about those diseases for which they had received 2 or 3 column feedback, which highlight symptoms’ ability to discriminate between diagnoses, compared to the simple reminder of the correct diagnosis’ symptoms. Possible explanations include: insufficient training in use of tutorial feedback, limited exposure (few errors), test cases insensitive to lessons learned, or adequacy of the prototype feedback.
Purpose: Plots to assess the calibration of a prediction model, when applied to a validation dataset, are critical for judging the adequacy of the model or comparing rival models. Traditionally, these plots have applied a smoothing function to a plot of the “actual” outcome on the vertical axis and “predicted” outcome on the horizontal axis. In this way, the reader can compare the smoothed line to that of a perfectly straight 45 degree line that denotes perfect calibration. While such plots are helpful, 2 deficiencies remain. First, this plot does not naturally indicate where the bulk of the predictions lie. Second, and related to the first, is that the prevalence of the predictions in a region of miscalibration cannot be inferred. The purpose of the present study was to introduce a plot that repairs both deficiencies of a traditional calibration plot.
Method: After several unsuccessful iterations involving the manipulation of axes, addition of shading, etc., a plot was constructed that appeared to solve the deficiencies above plus provide ready interpretation. The vertical axis is displayed as prediction error (actual value – predicted value). The horizontal axis is predicted value spaced in proportion to the frequency of predicted values. In other words, the spacing of the x-axis is such that a histogram of predicted values would indicate a uniform distribution. This approach makes it easy to infer where the bulk of the predictions lie. More importantly, it quickly illustrates the frequency of predictions which might lie in a miscalibrated area of the prediction model.
Result: Figure 1 presents a traditional calibration curve for a prediction model applied to a validation dataset. Note that this figure suggests quite poor calibration of the prediction tool. Figure 2 is the novel “miscalibration curve.” Note that this curve suggests a substantially different interpretation, indicating excellent calibration of the model for the vast majority of its predictions.
Conclusion: The miscalibration curve is a useful plot for providing improved insight into the performance of a prediction model, relative to the traditional calibration curve.
Purpose: Many factors affect the balance of true and false test results, and the interaction of two such factors – disease prevalence and the positive threshold – cause results to differ in high versus low-prevalence settings. We used an example of testing for latent tuberculosis infection (LTBI) to demonstrate the importance of disease prevalence in decisions regarding positive thresholds and test strategies.
Method: We compared number of true and false positive results when using two LTBI screening tests (in-tube QuantiFERON-TB Gold [QFT-IT] and T-SPOT.TB) in five countries of varying prevalence. We used estimates from test manufacturers to ascertain each test’s positive thresholds, from published literature to determine sensitivity (81%, QFT-IT; 88%, T-SPOT.TB) and specificity (99%; 88%), and from the World Health Organization to estimate country-specific LTBI prevalence. We assumed sensitivity and specificity remained stable, with prevalence the only difference between settings.
Result: In switching from QFT-IT to T-SPOT.TB, the 7% increase in sensitivity impacted number of true positives more in high-prevalence settings, and the 11% decrease in specificity impacted number of false positives more in low-prevalence settings. Tradeoffs between increasing case identification and decreasing unnecessary treatments thus differed by orders of magnitude as prevalence varied, with lower-prevalence settings paying a “price” of accepting more false positives for each true positive gained. For example, the number of false positives per true positive gained in the United States, with 5% LTBI prevalence, was close to 10-fold higher than in Mexico with 29% prevalence, and 30-fold higher than in Ivory Coast with 55% prevalence. Lower-prevalence countries may therefore determine that a 7% increase in early case detection benefits too few people to justify the high burden of false positives, while higher-prevalence countries may decide that a greater increase in early detection is worth the increased treatment of false positives, especially in settings with limited access to care.
Conclusion: Sensitivity and specificity of tests such as QFT-IT and T-SPOT.TB differ in large part because of positive test thresholds, which are applied by test manufacturers equivalently – yet can result in largely different outcomes – between settings. To optimize test performance and improve outcomes, sensitivity and specificity should be set locally not globally, by incorporating prevalence in conjunction with other disease- and setting-specific factors when making testing decisions.
Purpose: This randomized trial was conducted to assess the impact of a mediated decision support intervention on primary care patient prostate cancer screening knowledge, decisional conflict, informed decision making (IDM), and screening.
Method: Before a routine office visit, 313 male patients eligible for prostate cancer screening completed a baseline telephone survey and received a mailed brochure on prostate cancer screening. At the visit, participants were randomized to either an enhanced intervention (EI) or a standard intervention (SI) group. Before meeting with their physician, EI Group men had a nurse-led "decision counseling" session, while SI Group men completed a practice satisfaction survey. An endpoint survey was administered. Survey data, encounter audio recordings, and chart audit data were used to assess study outcomes.
Result: Knowledge increased in the EI Group (mean difference of +0.8 on a 10-point scale, p=0.001), but decisional conflict did not change (mean difference of -0.02 on a 4-point scale, p=0.620). The EI Group had higher IDM (rate ratio=1.30, p=0.029) and lower screening (odds ratio=0.67, p=0.102).
Conclusion: Nurse-mediated decision counseling increased participant prostate cancer screening knowledge. The intervention also influenced informed decision making and screening use.
Purpose: To explore (1) whether general practitioners (GPs) are sensitive to patient preferences for survival gains when they consider initiating statin therapy, and (2) whether GPs have realistic expectations for survival gains of statin therapy.
Method: Norwegian GPs (n=3,270) were invited to participate in an internet-based survey. Participants were presented with Mr. Smith, a 55-year-old non-smoker who had total cholesterol 7.1 mmol/l, blood pressure 158/96 mmHg and a family history of heart attack. Mr. Smith would consider using a statin if it provided a substantial benefit. Mr. Smith stated what he meant by “substantial” in terms of survival gain. The amount varied across six versions of the vignette with survival gains of 3, 6 and 12 months and 2, 4 and 8 years, respectively. Each GP was randomly allocated to one version. We asked whether the GPs would recommend Mr. Smith to take a statin. Subsequently we asked the GPs to estimate the average survival gain of lifelong simvastatin therapy for patients like Mr. Smith. Possible response categories were <12, 12, 18, 24, 30, 36, 42, 48 and >48 months. We used logistic regression to evaluate trends in proportions recommending therapy across the levels of survival gains.
Result: We obtained responses from 1,296 GPs (40%). Across the six levels of survival gains (3 months to 8 years), the proportions of GPs recommending statin therapy were 87%, 79%, 81%, 78%, 76% and 83%, respectively (OR per level 0.94, 95% CI 0.86 - 1.02). The average survival gain of simvastatin therapy for patients like Mr. Smith was correctly estimated at <12 months by 25% of the GPs. Female GPs, older GPs, GPs with long patient lists and GPs working in rural areas were more likely to overestimate the survival gain. The GP’s estimate of survival gain was a statistically significant predictor of recommending statin therapy for Mr. Smith. The OR adjusted for age, sex, specialty attainment, place of residence and workload was 1.85 (CI 1.66 - 2.08) per level across the response options.
Conclusion: GPs were insensitive to patient preferences for survival gains when recommending statin therapy. The GP’s own estimate of survival gain had greater impact on their recommendations than patients’ preferences. The majority of GPs overestimated the survival gain of simvastatin therapy.