Page banner

Data4SCD

Risk Factors Analysis and Prediction of Mortality in Hospitalized Sickle Cell Disease Patients using Machine Learning and Artificial Intelligence Deep Learning

The objective of this project was to identify risk factors that significantly impact the mortality in hospitalized sickle cell disease (SCD) patients and to develop a model to predict the mortality risk in future patients. The analysis was conducted on the National Hospital Discharge Survey (NHDS) from 1997-2009 in African-American adults 20 years of age or older who had a diagnosis of Sickle Cell Disease. The risk factors analyzed include demographic variables, hospital related data and co-morbidities, including but not limited to cardiovascular diseases, infectious diseases, mental disorders and musculoskeletal diseases. In addition, several machine learning and artificial intelligence deep learning classification systems were used to develop a predictive model using the risk factors and protective factors to quantify the mortality risk in hospitalized SCD patients. Such a model would allow for hospital resources to be allocated proportionally to manage the  mortality risk of each patient which could lead to better care and a lower mortality rate in SCD patients.

From the data analysis, age, marital status and duration of stay were found to be statistically significant demographic risk factors for mortality in hospitalized SCD patients. Age between 40 and 59 years old accounted for ~2 times higher percentage of hospitalized SCD patients who died than for percentage of hospitalized SCD patients who were discharged. Age equal to or greater than 60 years old accounted for ~4.7 times higher percentage of hospitalized SCD patients who died than for percentage of hospitalized SCD patients who were discharged. A marital status classification of divorced or separated composed a higher percentage of total hospitalized SCD patients who passed away than of total hospitalized SCD patients who were discharged. A duration of stay less than or equal to 1 and a duration of stay greater than or equal to 12 composed a higher percentage of total hospitalized SCD patients who passed away than of total hospitalized SCD patients who were discharged. Therefore, the hospital and medical team should allocate more resources and be extra cautious in SCD patients who are elderly, divorced and/or separated. In addition, the first day of admission for all SCD patients and SCD patients who have stayed in the hospital for 12 or more days should also be monitored more closely.

The following comorbidites were also found to be statistically significant risk factors for mortality in hospitalized SCD patients

  1. Infectious and parasitic disease
  2. Neoplasm disorder
  3. Endocrine, nutritional and metabolic diseases and immunity disorder
  4. Nervous system disease
  5. Circulatory system disease
  6. Respiratory system disease
  7. Genitourinary system

On the other hand, musculoskeletal and connective tissue disease had a protective effect on mortality.

In addition, machine learning and artificial intelligence deep learning were implemented in order to predict the mortality risk for a new patient using all of the predictive variables (demographic variables, hospital related data and co-morbidities). Nine machine learning and artificial intelligence deep learning classification systems were assessed and 1. Naïve Bayes, 2. Logistic Regression, 3. Random Forest and 4. Neural Network were found to have the top 4 scores in accuracy. Naïve Bayes and Logistic Regression had AUCs of 0.83 and 0.82, respectively, which indicate that they are good predictors of mortality in hospitalized SCD patients. As such, these models can be used by the medical team to allocate more resources and be more aggressive in management of SCD patients with a higher risk of mortality. Use of these models has the potential to reduce mortality in SCD patients as the medical team is provided with objective data that can guide medical management of the patient.

 

 

Relevant Links (Github, external files and documents, and more)

The PDF files contain a more detailed description of the solution and accompanying tables and figures.

Type of Solution

Technology

Pushkar Aggarwal aggarwpr@mail.uc.edu

Attach files

@
Share