Stroke Prediction Analysis

Jayesh Gokhale
5/16/2021

Data

  • Dataset taken from Kaggle (posted by: fedesoriano)
  • Categorical Variables
    • gender; ever_married
    • hypertension; heart_disease
    • work_type; residence_status
    • smoking status
  • Numeric Variables
    • age
    • avg_glucose_level
    • bmi
  • Target Variable: stroke (Heart Stroke? Binary)

Application

  • Quick “Trial and Error” on predictors
  • Shows the coefficients with their p-values (significance)
  • Results Sorted in increasing order of p-values
  • Significant coefficients highlighted with asterix

Plots

  • Choice of Scatter plot Numeric Variables (X and Y)
  • “Third Dimension” option for Color on a Categorical Variable
  • Target Variable (stroke) always indicated by Size of Bubble
  • Different Options of Color Palettes available to accommodate for accessibility

Sample Model Output

library(ggplot2)
source(paste0(getwd(),"/StrokePrediction/getDF.R"))
summary(glm(stroke~age,data=df.train,family="binomial"))$coef
             Estimate  Std. Error   z value     Pr(>|z|)
(Intercept) -7.026230 0.360701909 -19.47933 1.644287e-84
age          0.071733 0.005337112  13.44041 3.504734e-41