Document Type : Research Paper
Information Technology Management, Iran University of Science and Technology, Tehran.Iran.
Background and objective: Proper and quick diagnosis of disease is necessary in the medical field for the correct and timely treatment. This issue becomes more important when faced to different diseases with similar symptoms, such as thyroid disease, which has similar symptoms to some disease such as cardiovascular disease. Data mining and machine learning techniques are reliable and valuable methods that can improve the ability of physicians for correctly diagnosis and treatment. The main goal of this research is to extract rules of thyroid disease,
Method: Create the features and analyze feature selection algorithms including filter-based, wrapper based and the genetic algorithm to select the most effective features for thyroid diagnosis. The analysis also performed using decision trees models, random forest, bagging, boosting, and stacking methods for diagnosis and improvement of the illness classes precision that including Hypothyroidism and Hyperthyroidism. Model evaluation was performed with four metrics of accuracy, precision, recall, and F-measure.
Results: This research was conducted on data from the University of California (UCI), which included 7200 records with 21 features. Experimental results showed that the genetic algorithm (GA) has a maximum efficiency in feature selection, and the boosted tree with created features produced maximum F-measure among other classifier.