ANALYZING THE OUTBREAK SURVEILLANCE AND RESPONSE SYSTEM IN ETHIOPIA USING DATA MINING TECHNIQUES

Wednesday, October 23, 2013
Key Ballroom Foyer (Hilton Baltimore)
Poster Board # P4-44
Health Services, and Policy Research (HSP)
Candidate for the Lee B. Lusted Student Prize Competition

Adamu Addissie, MD, MPH, MA, Addis Ababa University, School of Public Health, Addis Ababa, Ethiopia
Purpose: To show the applicability of data mining techniques for the development of descriptive and predictive model to disease outbreak surveillance datasets in Ethiopia. 

Method: Three data mining applications such as classification, clustering and association rules mining were undertaken to explore the important applications to the datasets of the PHEM sectors from different perspectives. The researcher analyzed two classification algorithms for the prediction of Epidemic typhus disease cases with decision tree J48 classifiers and Naïve Bayes classifiers. Finally the more performing algorithm has been taken for model development.

Result: Decision tree algorithm had a better performance to classify the disease cases in place and time setting. The accuracy rate of correctly classifying the Epidemic Typhus disease cases by the use of decision tree J48 algorithm was 87.44% whereas with Naïve Bayes classifier was 83.70%.  The sensitivity and specificity test was also done for the two classifiers. The researcher also attempted to analyze the application of association rule mining to find some sort of correlation or patters among disease cases of the surveillance data. The attributes were selected only from the disease cases for the occurrence and nonoccurrence, which were collected in time and place bases. Here, Apriori association rule mining algorithm was run to find interesting patterns among the occurrence and co-occurrence of disease cases which were correlated together. The researcher used 20% for the minimum support and 90% for minimum confidence threshold before the application of the mining algorithm. The researcher took the combined (integrated) datasets for cluster analysis with the total numbers of 8796 records with 9 attributes. Simple K-Means clustering algorithm was used for the combined datasets since; the algorithm showed the grouping of disease cases with respect to time and place. 

Conclusion: In general data mining techniques were important and applicable in the classification, clustering and association rules model development for emerging and reemerging disease cases. But the data has to have good quality with the inclusion of  important attributes of variables for better prediction and description model development  The results of the research, apart from its education purpose, were also used for the planning, preparedness, decision making, and disease control and prevention activities to the domain experts.