31SDM A NEURAL NETWORK-BASED DECISION MAKING PROCESS TO CLASSIFY LEUKEMIA BLOOD SAMPLES

Wednesday, October 22, 2008
Columbus A-C (Hyatt Regency Penns Landing)
Melvin Ayala, PhD, Malek Adjouadi, PhD, Mercedes Cabrerizo, PhD, Nuannuan Zong, PhD and Armando Barreto, PhD, Florida International University, Miami, FL

Purpose of the study: A novel artificial neural network (ANN) algorithm is proposed for optimizing the classification of multidimensional data, focusing on acute leukemia samples.

Methods: Flow cytometry data contains usually 10's of thousands of events with commonly 4 to 12 parameters per event. Comparison of different blood samples with such high dimensionality can be a difficult task. Therefore, a hypothesis of this study was that it is possible to simplify flow cytometry data and still obtain acceptable results for classification of leukemia samples. The behavior of parametric data clusters is analyzed using a well-configured ANN. The data set size is increased gradually to create different case scenarios and find out how the data set size affects the classification results. The programming tool established around the ANN architecture focuses on the classification of normal and abnormal blood samples, namely acute lymphocytic leukemia (ALL) and acute myeloid leukemia (AML). There were 220 blood samples considered with 60 abnormal samples and 160 normal samples.            

Summary: The algorithm produced very high sensitivity results that improved up to 96.67% with increased data set size. With this type of accuracy, this programming tool provides information to medical doctors in the form of diagnostic references for the specific disease states that are considered for this study. The performance of the classifier was remarkably high at 90 % for the TP fraction and 2 % for the FP fraction (see Table 1 and Fig.1).

Conclusions: The results proved that when a neural network classifier is well configured and trained with cross-validation, it can perform remarkably well for this type of flow cytometry data.  This is even more significant since experiments reveal that as the data considered is larger, the more accurate are the classification results. The main contributions can be summarized as follows: (1) It is possible to train ANNs to classify blood samples suspected of ALL or AML by using a reduced amount of parameters that are obtained from flow-cytometry; (2) a method was developed to ensure that the ANN architecture is optimally configured yielding the best possible results under the complexity of this multidimensional problem; and (3) the algorithm can be extended to the analysis of other disease states.