Purpose: We compare two information theory algorithms for determining the most informative mammographic features to be used in estimation of breast cancer risk.
Method: Our database consists of 9,986 consecutive mammography reports with information on thirty-three features including individual risk factors and mammographic findings, linked to an institutional cancer registry for determination of outcomes (benign or malignant). “Mutual information” quantifies interdependence between random variables using Shannon’s entropy measure. “Relevance” is defined as mutual information of features with outcomes, and “redundancy” is defined as mutual information of features with each other. In the multidimensional mutual information (MMI) algorithm, features are ranked by relevance, penalized for redundancy, while single-dimensional mutual information (SMI) algorithm ranks features on relevance only. We investigated the predictive performance of Bayesian networks (BN) trained and tested on sequences of features ranked by each algorithm. The most informative feature set was defined as the smallest feature set having area under the ROC curve not statistically significantly different from the BN trained on the entire set of thirty-three features.
Result: SMI identified mass margin and mass shape as the two most informative features. While MMI analysis concurred that mass margin was the most informative feature, mass shape was determined to be substantially less important because of high redundancy with mass margin. This observation was in concert with clinical findings; a highly suspicious mass has an irregular shape with spiculated margins while a benign mass typically has a round shape with well-circumscribed margins. The size of the most informative feature set was smaller for MMI than for SMI (ten features versus thirteen).
Conclusion: By considering redundancy as well as relevance, MMI outperforms SMI in determining the smallest set of informative individual risk factors and mammographic findings with equivalent performance to the entire feature set. MMI-based rankings may have greater clinical utility to the extent that a smaller set of features allows clinicians to focus attention sequentially on those findings with the highest yield. Furthermore, in other applications where addition of features incurs additional time or monetary cost, MMI may help reduce the cost of diagnostic testing.