Stroke risk Predictor (WQD 7001)

GROUP K

HO WEI YAN (S2116489)- Team Leader
CHARROOGRESINEE A/P RADHAKRISHNAN (17060405)
CHOONG CHE WEI (S2106183)
HII YEW HAN (S2037987)
TAN SHI LING (S2115562)

Date: 24th Monday, 2022

Stroke intro

Objective:
This data product aims to create an early indicator to help people, especially those who have pre-existing health conditions, for testing their stroke risks.

Research Questions:
(1) What are the attributes related to stroke? (2) What is the correlation between pre-existing health conditions and stroke risks? (3) How to predict stroke risk?

Data Science Process:
1. Asking the right question (Empathize)
The primary goal was determined by asking interesting questions such as “what are the common health risks and diseases among adults.” The idea was to predict risks of developing stroke based on certain criteria.

2. Finding and collecting data (Empathize) The healthcare dataset by the World Health Organization was obtained on Kaggle. https://www.kaggle.com/fedesoriano/stroke-prediction-dataset

3. Data Preprocessing (Define) The mean imputation method is implemented on the missing data from the dataset. Attributes in the data is identified and classified accordingly.

4. Data Analysis (Define) Descriptive analysis, exploratory data analysis and predictive analysis.

5. Data Modeling (Ideate & Prototype) Utilize Naive Bayes classification to develop a stroke prediction model.

6. Evaluation and Deployment (Test) Evaluate the model and deploy of Shiny application for visualisation.

Summary

Experience gained:

  • 1. Identify an existing issue based on real-life scenario.
  • 2. Visualize analytic results based on the findings of the topic.
  • 3. Having hands-on experiences using Shiny to create a web-based application and host it in Rpubs.

References: https://www.kaggle.com/yassinehamdaoui1/cardiovascular-disease Github: https://github.com/shilingt/StrokeShinyProject