Prediction of Diabetes Mellitus using Machine Learning

Mon 22 Jun 2020

Group C

Team Members Are:

  • Robin Ahmed\( \ \) (17221070)
  • Lee Zhen Lek\( \ \) (17219514)
  • Avinnaash Suresh\( \ \) (17219903)
  • Tan Hiap Li\( \ \) (17219269)

Introduction

Diabetes - Where Malaysia Stands

In 2019, 3.6 million Malaysians had diabetes disease, highest in Asia and one of the highest in the world. An alarming 7 million adults (31.3%) aged 18 and above, both diagnosed and undiagnosed are estimated to be affected by diabetes in Malaysia by 2025.

What do we want to achieve in this Project?

Our core objective is to develop an interactive and free app by integrating machine learning model using R programming to predict if a particular observation is at risk of developing diabetes. The PIMA Indians Diabetes dataset(csv format) used in this project is downloaded from Kaggle and is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.

Sample Raw DataSet:

alt text
Data Downloaded from: https://www.kaggle.com/uciml/pima-indians-diabetes-database

Data Pre-processing, Visualization & Model Implementation

Data Pre-processing & Visualization

To reach our goals of predicting Diabetes from the dataset under consideration we have performed the following activities:

  • Data Cleaning
    • Detect missing value.
    • Ensure the font case for continous variable is standardized.
    • Check the decimal points for numeric variable is consistent.
  • Exploratory Data Analaysis

After data preparation and EDA, the cleaned output was taken on which the below machine learning models were implemented using R to check the accuracy & value of different parameters. Finally, the model with highest performance was picked and incorporated into DiabetesPredictor.

Model Implementation

  • Implemented Models and Accuracies
    • Logistic Regression [75.32%]
    • K Nearest Neighbors (KNN) [74.03%]
    • Support Vector Machine(SVM) [74.68%]

DiabetesPredictor Shiny App Details

some
What this DiabetesPredictor App is all about:
  • Overview tab shows some facts about diabetes and describes the overall functions of DiabetesPredictor

  • Inside HeatMap, it visualizes cases & severity in 20 Countries under International Diabetes Federation (IDF) Western Pacific Members

  • Prediction tab is the main page where you can input different parameters to predict whether you have diabetes

  • Comparison tab allows the user to compare his or her details to the rest of the population

  • Some EDA output have been incorporated under comparison & Exploratory Data Analysis tab

  • Finally we have covered the App description in About tab

Experience & Conclusion

Experience Summary

  • The experience of completing this assignment has been very enriching and empowering for each of us as we have performed as a team and presented us a wide learning opportunity.
  • Being amateurs, initially it was challenging for us to deal with Shiny App, R Packages, Slidify and R-presentation, but guidance and references helped to ease out our tasks.

Conclusion

While we have chosen the model with higher accuracy among the ones under evaluation, yet this application is for an initial state prediction and for raising awareness beforehand. The results cannot be considered final. Therefore users are adviced to consult a physician and take necessary steps for an actual diagnosis.

Thank You & Enjoy the App