Monday , September 21 2020

Applying Data Mining Techniques in Healthcare

Petroleum-Gas University of Ploieşti,
39, Bucureşti Blvd., Ploieşti, 100680, Romania,

* Corresponding author

Abstract: Healthcare sector provides huge volume of data on patients and their illnesses, on health insurance plants, medication and treatment schedules for different diseases, on medical services and so forth. Nowadays, there is a growing demand for the healthcare community to transform the existing quantities of healthcare data into value-added data, by discovering unknown patterns and relations between these data and by using them in the decision-making process, even if they refer to management, planning or treatments. Data mining consists in discovering knowledge and techniques such as classification and regression trees, logistic regression and neural networks that are adequate to predict the health status of a patient, by taking into account various medical parameters (also known as attributes) and demographic parameters. This paper presents a case study on the classification of patients with thyroid dysfunctions into three classes (i.e. 1 – hypothyroidism, 2 – hyperthyroidism, 3- normal) by using data mining algorithms and discusses possible methods to improve the accuracy of the considered classification models.

Keywords: data mining, classification and regression trees (CART), healthcare.

>>Full text
Irina IONITA*, Liviu IONITA,
Applying Data Mining Techniques in Healthcare, Studies in Informatics and Control, ISSN 1220-1766, vol. 25(3), pp. 385-394, 2016.

1. Introduction

Data mining is the area of artificial intelligence, which refers to knowledge discovery from real world data sets. It is an interesting, interdisciplinary field that interacts with machine learning, statistics, pattern recognition, databases, information retrieval etc. Over time, researchers have tried to build software tools that may better incorporate data mining models, such as classification and regression trees, logistic regression, neural networks, fuzzy rules and so forth. Before starting a data mining application, one needs to:

  1. understand the problem they have to solve;
  2. possess sufficient data to build and test the model;
  3. know how to prepare data before using them (cleaning, transforming, partitioning etc.);
  4. choose (build) the best model, adequate to the current problem;
  5. set the model characteristics;
  6. explain the results and give the best interpretation;
  7. test the model and increase the accuracy of the data mining model;
  8. use the obtained knowledge in the decision-making process.

The difficulty of analyzing large amount of data stored in databases existing under various formats led to the application of data mining techniques. Organizations from fields such as commerce, finance, education, medicine, industry, telecommunication and others focus on discovering knowledge from their “raw material” and decide, based on these new patterns, relations or rules, which one is better for their organization in terms of management and planning strategies.

In this paper the authors present a case study from the healthcare sector, namely the classification of patients who have thyroid disorders, by means of data mining algorithms. A database on this category of patients is analyzed by using predictive data mining, namely Salford Predictive Modeler 8.0 [20], a data mining software provided by Salford Systems. In the second section of the paper, the authors briefly analyze the status of data mining application in the healthcare domain, and provide several examples from different healthcare sectors such as the medical device industry, the pharmaceutical industry and hospital management. The third section details the proposed case study: after formulating the problem (i.e. the classification of patients with thyroid dysfunctions), the authors present the methods and materials to be used and follow the steps pointed in the introductory section. The results and their interpretation are described in the last section, before the authors’ conclusions.


  1. BROSSETTE, S. E., A. P. SPRAGUE, M. K., HARDIN, B. WAITES, W. T. JONES, S. A. MOSER, Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance, Journal of the American Medical Informatics Association, vol. 5(4), 1998, 373-381.
  2. CHANG, C. Y., M. F. TSAI, S. J. CHEN, Classification of the Thyroid Nodules Using Support Vector Machines, International Joint Conference on Neural, Networks, 2008, pp. 3093-3098.
  3. CHEN, H. L., B. YANG, G. WANG, J. LIU, Y. D. CHEN, D. Y. LIU, A Three-Stage Expert System Based on Support Vector Machines for Thyroid Disease Diagnosis, Journal of Medical Systems, vol. 36(3), 2012, 1953-1963.
  4. DESIKAN, P., K. W. HSU, J. SRIVASTAVA, Data Mining for Healthcare Management, SIAM International Conference on Data Mining, Arizona USA, 2011.
  5. DIWANI, S., S. MISHOL, D. S. KAYANGE, D. MACHUVE, A. SAM, Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region, International Journal of Computational Engineering Research, vol. 3(8), 2013, pp. 73-77.
  1. DURAIRAJ, M., V. RANJANI, Data Mining Applications In Healthcare Sector: A Study, International Journal of Scientific & Technology Research, vol. 2(10), October 2013.
  2. GHAREHCHOPOGH, F. S., M. MOLANY, F. D. MOKRI, Using Artificial Neural Network in Diagnosis of Thyroid Disease: A Case Study, International Journal on Computational Sciences & Applications (IJCSA) vol. 3(4), Aug. 2013.
  3. KELES, A., A. KELES, ESTDD: Expert System for Thyroid Diseases Diagnosis, Expert Systems with Applications, vol. 34(1), 2008, pp. 242-246.
  4. LOH, W. Y., Classification and Regression Trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1(1), 2011, pp. 14-23.
  5. MORGAN, J. N., J. A. SONQUIST, Problems in the Analysis of Survey Data, and a Proposal, J. of American Statistical Association, vol. 58, 1963,
    pp. 415-434.
  6. PAGE, D., M. CRAVEN, Biological Applications of Multi-Relational Data Mining, files/Page.pdf, accessed in January 2016.
  7. PRERANA, E., P. SEHGAL, K. TANEJA, Predictive Data Mining for Diagnosis of Thyroid Disease using Neural Network, International Journal of Research in Management, Science & Technology (E-ISSN: 2321-3264) vol. 3(2), April 2015.
  8. QUINLAN, J. R., 5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
  9. QUINLAN, J. R., Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research, vol. 4, 1996, pp. 77-90.
  10. QUINLAN, J. R., Induction of Decision Trees, Machine Learning, vol. 1(1), 1996, pp. 81-106.
  11. QUINLAN, J. R., Learning with Continuous Classes, 5th Australian Joint Conference on Artificial Intelligence, vol. 92, 1992.
  12. RANJAN, J., Application of Data Mining Techniques in Pharmaceutical Industry, Journal of Theoretical and Applied Information Technology, Vol.3, No.4, 2007.
  13. RIDINGER, M., American Healthways uses SAS to Improve Patient Care, DM Review, 12:139, 2002.
  14. RUBEN, D., CANLAS Jr., Data Mining in Healthcare: Current Applications and Issues, Mining_Health.pdf, accessed in July 2015.
  15. Salford Predictive Modeler 8.0, spm, accessed in January 2016.
  16. SHUKLA, A., P. KAUR, Diagnosis of Thyroid Disorders using Artificial Neural Networks, IEEE International Advance computing Conference, Patiala, India, 2009, 1016-1020.
  17. TreeNet, products/treenet, accessed in July 2015.
  18. UCI Machine Learning Repository,
    disease/, accessed in January 2016.
  19. UPADHAYAY, A., SHUKLA, S., KUMAR, S., Empirical Comparison by Data Mining Classification Algorithms (C 4.5 & C 5.0) for Thyroid Cancer Data Set, International Journal of Computer Science & Communication Networks,vol. 3(1), 2013, pp. 64-68.
  20. WITTEN, I., E. FRANK, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005.
  21. WU, X., V. KUMAR, J. ROSS QUINLAN, et al., Top 10 Algorithms in Data Mining, Knowledge and Information Systems, vol. 14(1), 2008, pp. 1-37.