This RMarkdown file contains the report of the data analysis done for the project on building and deploying a stroke prediction model in R. It contains analysis such as data exploration, summary statistics and building the prediction models. The final report was completed on Sat May 10 10:22:08 2025.
Data Description:
According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.
This data set is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.
# Install and load necessary packages
install.packages("readr")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("shiny")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Load dataset
data <- read_csv("~/Build-deploy-stroke-prediction-model-R/healthcare-dataset-stroke-data.csv")
## Rows: 5110 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): gender, ever_married, work_type, Residence_type, bmi, smoking_status
## dbl (6): id, age, hypertension, heart_disease, avg_glucose_level, stroke
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the first few rows of the dataset
head(data)
## # A tibble: 6 × 12
## id gender age hypertension heart_disease ever_married work_type
## <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 9046 Male 67 0 1 Yes Private
## 2 51676 Female 61 0 0 Yes Self-employed
## 3 31112 Male 80 0 1 Yes Private
## 4 60182 Female 49 0 0 Yes Private
## 5 1665 Female 79 1 0 Yes Self-employed
## 6 56669 Male 81 0 0 Yes Private
## # ℹ 5 more variables: Residence_type <chr>, avg_glucose_level <dbl>, bmi <chr>,
## # smoking_status <chr>, stroke <dbl>
# Check for missing values
summary(data)
## id gender age hypertension
## Min. : 67 Length:5110 Min. : 0.08 Min. :0.00000
## 1st Qu.:17741 Class :character 1st Qu.:25.00 1st Qu.:0.00000
## Median :36932 Mode :character Median :45.00 Median :0.00000
## Mean :36518 Mean :43.23 Mean :0.09746
## 3rd Qu.:54682 3rd Qu.:61.00 3rd Qu.:0.00000
## Max. :72940 Max. :82.00 Max. :1.00000
## heart_disease ever_married work_type Residence_type
## Min. :0.00000 Length:5110 Length:5110 Length:5110
## 1st Qu.:0.00000 Class :character Class :character Class :character
## Median :0.00000 Mode :character Mode :character Mode :character
## Mean :0.05401
## 3rd Qu.:0.00000
## Max. :1.00000
## avg_glucose_level bmi smoking_status stroke
## Min. : 55.12 Length:5110 Length:5110 Min. :0.00000
## 1st Qu.: 77.25 Class :character Class :character 1st Qu.:0.00000
## Median : 91.89 Mode :character Mode :character Median :0.00000
## Mean :106.15 Mean :0.04873
## 3rd Qu.:114.09 3rd Qu.:0.00000
## Max. :271.74 Max. :1.00000
# Check the structure of the data
str(data)
## spc_tbl_ [5,110 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ id : num [1:5110] 9046 51676 31112 60182 1665 ...
## $ gender : chr [1:5110] "Male" "Female" "Male" "Female" ...
## $ age : num [1:5110] 67 61 80 49 79 81 74 69 59 78 ...
## $ hypertension : num [1:5110] 0 0 0 0 1 0 1 0 0 0 ...
## $ heart_disease : num [1:5110] 1 0 1 0 0 0 1 0 0 0 ...
## $ ever_married : chr [1:5110] "Yes" "Yes" "Yes" "Yes" ...
## $ work_type : chr [1:5110] "Private" "Self-employed" "Private" "Private" ...
## $ Residence_type : chr [1:5110] "Urban" "Rural" "Rural" "Urban" ...
## $ avg_glucose_level: num [1:5110] 229 202 106 171 174 ...
## $ bmi : chr [1:5110] "36.6" "N/A" "32.5" "34.4" ...
## $ smoking_status : chr [1:5110] "formerly smoked" "never smoked" "never smoked" "smokes" ...
## $ stroke : num [1:5110] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. id = col_double(),
## .. gender = col_character(),
## .. age = col_double(),
## .. hypertension = col_double(),
## .. heart_disease = col_double(),
## .. ever_married = col_character(),
## .. work_type = col_character(),
## .. Residence_type = col_character(),
## .. avg_glucose_level = col_double(),
## .. bmi = col_character(),
## .. smoking_status = col_character(),
## .. stroke = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
# Explore correlations between variables (e.g., age, gender, smoking status, etc.)
ggplot(data, aes(x = age, fill = stroke)) +
geom_histogram(position = "dodge") +
labs(title = "Distribution of Age and Stroke")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: The following aesthetics were dropped during statistical transformation: fill
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
# Logistic Regression Model
log_model <- glm(stroke ~ age + gender + hypertension + heart_disease + smoking_status,
data = data,
family = "binomial")
summary(log_model)
##
## Call:
## glm(formula = stroke ~ age + gender + hypertension + heart_disease +
## smoking_status, family = "binomial", data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0120 -0.3253 -0.1748 -0.0806 3.7584
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.077407 0.377767 -18.735 < 2e-16 ***
## age 0.070947 0.005177 13.705 < 2e-16 ***
## genderMale 0.049169 0.140442 0.350 0.72626
## genderOther -7.333273 324.743801 -0.023 0.98198
## hypertension 0.468493 0.161955 2.893 0.00382 **
## heart_disease 0.371695 0.188613 1.971 0.04876 *
## smoking_statusnever smoked -0.212151 0.174053 -1.219 0.22289
## smoking_statussmokes 0.088088 0.213542 0.413 0.67997
## smoking_statusUnknown -0.078109 0.205146 -0.381 0.70339
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1990.4 on 5109 degrees of freedom
## Residual deviance: 1600.6 on 5101 degrees of freedom
## AIC: 1618.6
##
## Number of Fisher Scoring iterations: 11
# Random Forest Model
install.packages("randomForest")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
rf_model <- randomForest(stroke ~ age + gender + hypertension + heart_disease + smoking_status,
data = data)
## Warning in randomForest.default(m, y, ...): The response has five or fewer
## unique values. Are you sure you want to do regression?
print(rf_model)
##
## Call:
## randomForest(formula = stroke ~ age + gender + hypertension + heart_disease + smoking_status, data = data)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 0.04334627
## % Var explained: 6.49
# Predict using Logistic Regression
log_pred <- predict(log_model, newdata = data, type = "response")
log_pred_class <- ifelse(log_pred > 0.5, 1, 0)
# Confusion Matrix for Logistic Regression
table(log_pred_class, data$stroke)
##
## log_pred_class 0 1
## 0 4861 249
# Predict using Random Forest
rf_pred <- predict(rf_model, newdata = data)
rf_pred_class <- as.factor(rf_pred)
# Confusion Matrix for Random Forest
table(rf_pred_class, data$stroke)
##
## rf_pred_class 0 1
## 0.0205814118459159 101 0
## 0.0205897405073856 30 0
## 0.0205928743717508 269 0
## 0.0205971847165784 15 0
## 0.0206947196387357 4 0
## 0.0207908213689838 4 0
## 0.0208896795012945 38 0
## 0.0210261154866309 16 0
## 0.0211753937870088 3 0
## 0.0214844544849785 173 0
## 0.0215179720633247 52 0
## 0.0217046064983058 11 0
## 0.0217339088521013 16 0
## 0.0219520095928574 11 0
## 0.0220716483739329 64 0
## 0.0221791864465528 29 0
## 0.0222369343997945 155 0
## 0.0222615260467908 239 0
## 0.0223564321874499 44 0
## 0.0223658208815338 4 0
## 0.0224161879137874 18 0
## 0.0224655317553736 8 0
## 0.0224999752392876 7 0
## 0.0225315222376312 7 0
## 0.0225456433988756 13 0
## 0.0226663837354503 10 0
## 0.0226965064364372 8 0
## 0.0227708896495544 22 0
## 0.0228470575897751 13 1
## 0.0228654506779479 5 0
## 0.0229103417043489 8 0
## 0.0231125392917511 30 0
## 0.0231233435869198 13 0
## 0.0232673256677845 6 0
## 0.0233044372260228 4 0
## 0.0233324062332221 6 0
## 0.0233599609199247 6 0
## 0.0233865162655845 7 0
## 0.0234671022184801 14 0
## 0.0236065542732747 3 0
## 0.0237455922344623 2 0
## 0.0240760939088746 5 0
## 0.0241741636630337 4 1
## 0.0243066724270557 286 0
## 0.02431233886165 23 0
## 0.0244722064977393 94 0
## 0.0244736372743037 4 0
## 0.0245185600598067 11 0
## 0.0245244335563474 8 0
## 0.0245683587026783 9 0
## 0.0246177281784377 9 0
## 0.0246267177536161 4 0
## 0.0246505687208002 14 0
## 0.0246617988834076 4 0
## 0.0247494778219948 7 0
## 0.0247619005676721 9 0
## 0.0249193654523928 18 0
## 0.0249628348277335 4 1
## 0.0249681095397823 98 0
## 0.0249737759743765 2 0
## 0.0250206504157728 36 0
## 0.025100370147005 3 0
## 0.0251190150426469 58 0
## 0.0251342730861711 2 0
## 0.0252489492453018 28 0
## 0.0255315869107987 4 1
## 0.0255388087833734 7 0
## 0.0256098142633197 9 0
## 0.0256530353740866 14 1
## 0.0257754301001587 19 1
## 0.0258356840007374 19 0
## 0.0258635724371353 31 0
## 0.0260005914227887 11 0
## 0.0260701285675209 5 0
## 0.026146285552628 9 0
## 0.0262033476933919 5 0
## 0.0262129722744674 6 1
## 0.0262139746579449 9 1
## 0.0264549230719668 10 0
## 0.0265100779332514 8 0
## 0.0267109998289274 11 0
## 0.0268691585468454 9 0
## 0.0269391987192414 10 0
## 0.0269842742632042 14 0
## 0.0271335325389601 11 0
## 0.0273768162833133 9 0
## 0.0274309263156758 6 0
## 0.0276064341472714 1 0
## 0.027625363869487 24 1
## 0.0278646764018212 9 0
## 0.0281220100791988 22 0
## 0.0283369944400534 7 1
## 0.0284575306968751 5 0
## 0.0284914336360412 6 0
## 0.0286589479952608 7 0
## 0.0286807574559747 5 1
## 0.0288836831283522 12 0
## 0.0289238192022198 13 1
## 0.028936259944564 9 0
## 0.0293317949515263 12 0
## 0.0295215897126123 3 0
## 0.0296701994963137 1 0
## 0.0297059226357401 8 0
## 0.02992024817007 6 0
## 0.0300756754861963 8 1
## 0.0309992527830892 13 1
## 0.031200862817281 11 0
## 0.031804339326026 23 0
## 0.0319800280092763 12 1
## 0.0328786213082416 19 1
## 0.0329237245628946 6 0
## 0.0331328069597301 13 1
## 0.0331510445474095 16 1
## 0.0332342150184889 6 0
## 0.03325505440239 6 0
## 0.0332883509981768 10 1
## 0.0334059055418126 16 0
## 0.0335635371714425 13 1
## 0.0336392941972768 6 0
## 0.0336917341489888 5 1
## 0.0337398008284087 7 1
## 0.0339301620301872 17 0
## 0.0346123872408164 5 0
## 0.0347626414110968 37 0
## 0.0347866433843532 20 0
## 0.0351863542217918 7 0
## 0.0351971565541155 7 0
## 0.0351995546356503 63 0
## 0.0352973121054051 6 0
## 0.0353270186063983 25 0
## 0.0354112462486747 5 0
## 0.0357731536057545 6 0
## 0.0357810212174246 4 0
## 0.0358015691626301 10 0
## 0.0358631015790893 3 0
## 0.0363363630912703 4 0
## 0.0363765930730135 37 0
## 0.0364851031428697 10 0
## 0.0367562005511897 3 0
## 0.0370338961847144 7 2
## 0.0370825944417546 2 0
## 0.0371163827055502 8 0
## 0.0371714258234054 6 2
## 0.0372577102802713 14 0
## 0.0374480458664611 16 1
## 0.0376126347115207 24 1
## 0.0377569408438004 11 1
## 0.0378939727667189 20 0
## 0.0380346530293286 3 0
## 0.0383495740215055 3 0
## 0.0383877100702643 9 1
## 0.0384746348487561 7 0
## 0.0385004461442581 6 0
## 0.0386259037433543 6 0
## 0.0386963679667971 9 1
## 0.0387355526229512 4 0
## 0.0388576912398608 7 0
## 0.0388686039422764 8 0
## 0.0390401708490816 8 0
## 0.0390420558098852 2 0
## 0.0390662257451368 4 0
## 0.0391103589733686 3 1
## 0.0391734014907566 7 0
## 0.0391752972489697 7 0
## 0.039320428565706 5 0
## 0.0394569659091934 11 0
## 0.0396242027994595 7 2
## 0.0397944046786412 4 0
## 0.0399859534517698 4 0
## 0.0400609977748862 6 0
## 0.0401503611766167 14 0
## 0.0402364044229859 8 0
## 0.0405333399997097 7 1
## 0.0406254668444804 8 0
## 0.0407032852932606 8 0
## 0.040769549734647 11 1
## 0.0407982813209069 7 0
## 0.0408356161203283 12 0
## 0.0408451828648853 1 0
## 0.0409194562689563 8 0
## 0.0409714099340487 5 1
## 0.0410670595028487 6 0
## 0.0411396480931319 11 0
## 0.0411538892121746 9 0
## 0.0413067755892917 11 0
## 0.0415709044415465 9 1
## 0.0415747919945281 7 0
## 0.0416762450307245 5 0
## 0.0416842662990464 14 0
## 0.0419194638581247 3 1
## 0.0422890850186751 4 0
## 0.0424360774160354 8 0
## 0.0425762901444834 15 0
## 0.0430575524434885 8 0
## 0.0439415457494065 3 1
## 0.0440428953601926 9 0
## 0.0443216157094595 9 0
## 0.0443401753124923 1 0
## 0.0444879574612986 6 1
## 0.04473177334525 5 0
## 0.0452531876628748 2 1
## 0.0456689963838435 6 0
## 0.0456957394046022 3 0
## 0.0460202413904838 2 0
## 0.0461225264936044 4 0
## 0.0462041063060561 3 0
## 0.0467311609315142 7 2
## 0.0468065535932468 4 0
## 0.0470383456809085 4 0
## 0.0471498646834247 2 0
## 0.0477149670365726 10 0
## 0.0481500855467372 4 0
## 0.0482227561282881 3 0
## 0.04863743943815 5 0
## 0.0487892797929172 5 0
## 0.048947479899874 8 0
## 0.0491306357904341 3 0
## 0.0493697129712935 4 0
## 0.0495067504165913 4 0
## 0.0495348838020191 2 0
## 0.049887078755376 3 0
## 0.0501294239988792 2 0
## 0.0501826199070835 3 1
## 0.0503091010239179 3 0
## 0.0504445986964305 6 0
## 0.0508662171822463 4 0
## 0.0509665963736664 14 0
## 0.0510034061896173 8 0
## 0.0510909695479109 1 0
## 0.0512201939345257 1 0
## 0.0512840164822149 5 0
## 0.0514642383279523 10 0
## 0.0522022520744287 4 0
## 0.0529876872936794 5 0
## 0.0531987732764727 2 0
## 0.0536748802611936 2 0
## 0.0536987906199227 6 1
## 0.0538496740263903 3 0
## 0.0542496740263903 1 0
## 0.0542843587830824 2 1
## 0.0545110332615644 4 0
## 0.0546307890998498 3 1
## 0.054742814955692 6 2
## 0.0557125828002822 2 0
## 0.0557717478528776 6 0
## 0.0558294739723242 7 2
## 0.0559208600956076 3 0
## 0.0567367358831652 7 2
## 0.0569660015041068 1 0
## 0.057105250647004 2 0
## 0.0574740909354822 8 2
## 0.0577121585200137 1 0
## 0.0577260504286419 1 0
## 0.0580261033425725 1 0
## 0.0581306178419049 1 0
## 0.0581956933448791 3 0
## 0.0582735341407152 1 0
## 0.0584575952981177 5 3
## 0.0593481153966779 1 0
## 0.0603750103831911 2 0
## 0.0606084351471009 4 0
## 0.0608175323110555 7 1
## 0.0608956633353175 2 0
## 0.0612494974555879 1 0
## 0.0615593870146011 1 0
## 0.0619585689240699 1 0
## 0.0621930788317481 1 0
## 0.0623849493769224 1 0
## 0.0624189338603841 2 0
## 0.062562214718244 2 0
## 0.0628118957372169 1 0
## 0.0631342850354109 4 0
## 0.0634160729016437 1 0
## 0.0635213360595384 1 0
## 0.0637152423787034 1 0
## 0.0637957897553034 2 0
## 0.0640739763050673 2 0
## 0.0644514127291405 1 0
## 0.0644628314291064 1 0
## 0.0644669598578274 1 0
## 0.0647269716251439 3 0
## 0.0648285968767468 1 0
## 0.0648691328138787 3 0
## 0.0649963126077502 3 0
## 0.0650362961306467 6 0
## 0.0650927892846429 14 0
## 0.0654085787583271 4 0
## 0.0655443319075443 2 0
## 0.0657301060598322 1 0
## 0.0660277698783027 4 0
## 0.0661492037922898 1 0
## 0.0661549496721742 2 0
## 0.0661717290602579 5 0
## 0.0663932631786064 2 0
## 0.0664187382334025 1 0
## 0.0664485623947605 6 0
## 0.0666042271220297 1 0
## 0.0669097657463486 1 0
## 0.0669399148775097 1 0
## 0.0671759176272968 1 0
## 0.0674072360758514 2 0
## 0.0676797586625006 1 0
## 0.0679588687726772 5 0
## 0.0681248619229879 1 0
## 0.0683281862448099 2 0
## 0.068431484128182 1 0
## 0.068516132743349 1 0
## 0.0685585541896296 3 0
## 0.068632616942256 2 0
## 0.06910965838242 1 0
## 0.0691129502863979 3 0
## 0.0691913949325077 1 0
## 0.0692870975141733 1 0
## 0.0693305669481426 3 0
## 0.0696469784861695 1 0
## 0.0696529348996328 3 0
## 0.0696822134871933 2 0
## 0.0702794237316988 9 0
## 0.0702815794487527 2 0
## 0.070386144651902 1 0
## 0.070406467772046 2 0
## 0.0704571858226926 1 0
## 0.0709973407755614 11 0
## 0.0711060431618822 1 0
## 0.0712993684009256 1 0
## 0.0713129391738157 10 1
## 0.0713610931763978 1 0
## 0.0714078168691668 7 2
## 0.0718970161477136 2 1
## 0.0719627405518515 1 0
## 0.0721693812298733 3 0
## 0.0722552935565215 1 1
## 0.0722869298241849 3 0
## 0.0723004839503187 1 0
## 0.0724425433196701 1 1
## 0.0724963854867787 1 0
## 0.0725047444263018 6 4
## 0.0725596582297049 1 0
## 0.0730661522655888 1 0
## 0.0730882386734622 3 0
## 0.0731034602742427 4 1
## 0.0731476833984493 1 0
## 0.0731824218747989 2 0
## 0.0732317793433186 2 0
## 0.0732940227409156 3 0
## 0.0734488360436267 2 0
## 0.0737134226030593 6 0
## 0.0737538604035552 1 0
## 0.0738004413206067 1 0
## 0.0738174927831885 2 0
## 0.0739859238153916 1 0
## 0.0741582049875389 1 0
## 0.0742251046583834 4 0
## 0.0742384335961495 1 0
## 0.0742681171670203 1 0
## 0.0744660704842195 5 0
## 0.074521575603923 2 0
## 0.0745799697645069 1 0
## 0.0746497679490236 1 0
## 0.0746616025744322 6 0
## 0.0746648209704402 1 0
## 0.0748806622579666 1 0
## 0.0749712651333731 1 0
## 0.0750040167539148 2 0
## 0.0752145133286899 1 0
## 0.0752595813559008 6 0
## 0.0753264895012489 5 0
## 0.0755052719706455 1 0
## 0.0759912648603271 3 1
## 0.076039252845846 1 0
## 0.0760622531184313 1 0
## 0.0762796263979148 2 0
## 0.0763303874827774 8 2
## 0.0763819158389025 2 0
## 0.0764994671832089 1 0
## 0.0766972766504916 2 0
## 0.0771400158583021 2 2
## 0.0778065693960347 3 0
## 0.0778630731978336 3 0
## 0.0784080445222391 6 0
## 0.0786060819719882 1 0
## 0.0786657507800513 2 0
## 0.0788972642373996 1 0
## 0.0790082261325722 1 0
## 0.0793949967803872 1 1
## 0.0795519518115435 1 0
## 0.0796622670864057 5 1
## 0.0800844737986074 0 1
## 0.0802837556143813 3 1
## 0.0803478681422182 2 1
## 0.0803830577928851 3 0
## 0.0804048494233086 1 0
## 0.0807669452306843 4 1
## 0.0810641220757527 3 0
## 0.0811929778287917 3 0
## 0.0813220463422926 4 1
## 0.0821349034900405 3 0
## 0.0825590581648096 3 1
## 0.0826235418098579 2 0
## 0.0826373050895029 1 1
## 0.0827590557850883 1 0
## 0.0827752779473005 1 0
## 0.0828510149982857 1 0
## 0.0828608149894255 1 0
## 0.082950755147207 1 0
## 0.0832599713885819 2 0
## 0.0834220118673609 1 0
## 0.083474441880752 2 0
## 0.0837555475076951 2 0
## 0.0838020859840222 1 0
## 0.0838163519975045 1 0
## 0.0841909411080613 1 0
## 0.0842490668932847 1 1
## 0.0844063072195298 4 0
## 0.0845732550360218 2 0
## 0.0846178398253037 5 1
## 0.0846238993699472 5 0
## 0.0847126313101076 5 0
## 0.0848443862610594 1 0
## 0.0850519605093701 1 0
## 0.0852467270930752 1 0
## 0.0852522870881709 1 0
## 0.08535896097695 2 0
## 0.0856477254121992 5 1
## 0.08590308857801 2 0
## 0.0860071330028529 3 0
## 0.0860508658067205 1 0
## 0.0862852289305802 2 0
## 0.0862964789792504 1 1
## 0.0865321157985356 5 0
## 0.0865540799505808 2 0
## 0.0866323410606871 1 0
## 0.0866538403950929 2 0
## 0.0867818241376375 0 1
## 0.0868782755143954 1 0
## 0.0869489466976262 1 0
## 0.0874513385241278 3 0
## 0.0875649868414544 1 0
## 0.0876818906232222 5 1
## 0.0877661372037497 8 0
## 0.0879161975760443 1 0
## 0.0879585051726335 6 1
## 0.0882340557811362 1 0
## 0.0886111767545211 1 0
## 0.0888534434431579 2 0
## 0.0889758299206924 1 0
## 0.0890145551404628 4 0
## 0.0893949969166764 1 0
## 0.0897441212009338 6 1
## 0.0902160261838341 2 0
## 0.0902453470778146 3 0
## 0.090571406966282 1 0
## 0.0910399527362339 2 0
## 0.091442719387374 2 0
## 0.0916903149518897 1 0
## 0.0919519833870142 9 0
## 0.0922663671658661 1 0
## 0.0924131667963206 1 0
## 0.0930211794777631 0 1
## 0.0930573279568141 1 0
## 0.0933101325092873 2 1
## 0.0933404380956411 11 1
## 0.0935967735905208 0 1
## 0.0936162586897922 3 0
## 0.0939466126988602 2 0
## 0.0948578402308707 5 0
## 0.0959833091739709 1 0
## 0.096762479767779 1 1
## 0.096807444746975 6 0
## 0.0968358604238717 1 0
## 0.0968982600469699 1 0
## 0.0971381672955641 4 0
## 0.097479485822332 1 0
## 0.0978989666722018 4 0
## 0.097987624709211 0 1
## 0.0980908367048937 0 1
## 0.0981222222009676 5 1
## 0.0981773546213098 1 0
## 0.0987865390465185 2 0
## 0.0992696764285114 2 0
## 0.0994789523357603 4 0
## 0.0995882538896461 3 0
## 0.0999604772337578 1 0
## 0.100615423598884 1 0
## 0.10185206223229 2 0
## 0.101966130649791 2 1
## 0.102803945783586 4 1
## 0.102821456934637 5 0
## 0.102892906409374 3 1
## 0.103358996393184 10 1
## 0.1034262094223 2 0
## 0.10354190694255 0 1
## 0.10368391132143 2 0
## 0.103985200223488 11 2
## 0.104306069190366 5 2
## 0.105524784854155 4 0
## 0.105951678897387 4 0
## 0.105997411980727 2 0
## 0.106034486869555 0 1
## 0.106068096263302 1 0
## 0.106128401259137 5 0
## 0.106380670608482 2 2
## 0.106448525000482 4 0
## 0.106676828699761 3 1
## 0.107461563200269 1 0
## 0.107765266636109 1 0
## 0.10782614117063 4 2
## 0.107993252778619 2 0
## 0.108057049849981 2 0
## 0.108168185370139 1 0
## 0.108310067401828 1 0
## 0.108407292163152 8 1
## 0.108689773699967 1 0
## 0.108876420287338 8 1
## 0.109413769210498 6 2
## 0.109495700067971 1 0
## 0.109516800185708 1 0
## 0.109717267947417 1 0
## 0.109895235875835 1 0
## 0.110041194554917 2 1
## 0.11037359933417 2 1
## 0.110630314881521 3 0
## 0.111218110368392 3 0
## 0.111487051677749 1 1
## 0.111998585133632 1 0
## 0.112128242373095 2 0
## 0.112444202253736 1 0
## 0.112507888111514 1 1
## 0.112701445399503 14 3
## 0.112703741953044 4 3
## 0.112704635172852 1 0
## 0.113312667603137 1 0
## 0.113314059934193 1 0
## 0.113432627710428 2 0
## 0.113504508081532 13 3
## 0.113765287224121 2 1
## 0.113828908027592 4 2
## 0.114011016285202 1 0
## 0.11401142592151 1 0
## 0.114258537407028 5 3
## 0.114809179558829 1 0
## 0.114845335245677 2 0
## 0.11504346005049 1 0
## 0.115049308132235 4 0
## 0.115341727234747 11 2
## 0.116104220169337 1 0
## 0.116140145567487 7 1
## 0.116265614722445 3 1
## 0.11626835789138 1 0
## 0.116447728071271 2 0
## 0.116651832783201 1 0
## 0.11774504681678 1 0
## 0.1180077675249 1 0
## 0.118080863150403 0 1
## 0.118188630937915 1 0
## 0.118246330439635 5 2
## 0.11835439743159 1 0
## 0.118422267698218 2 1
## 0.118871251570194 2 1
## 0.119030450486681 1 1
## 0.119118311379526 1 0
## 0.1191534565288 1 0
## 0.119389587868947 8 4
## 0.11945299302677 1 0
## 0.120237556142623 1 0
## 0.120264612188352 2 1
## 0.120602673567245 1 0
## 0.120724224094406 1 0
## 0.120737309727524 1 0
## 0.120998373831515 1 0
## 0.121283370368841 1 0
## 0.121360560831758 3 0
## 0.121631797102317 3 0
## 0.122201270718668 1 0
## 0.122344733598933 1 0
## 0.122517768725614 0 1
## 0.122714589296575 2 0
## 0.12318186372024 2 0
## 0.123260957470056 3 0
## 0.123601786102871 3 1
## 0.123842744788525 1 0
## 0.123874068458633 1 0
## 0.124273011475117 2 1
## 0.124465460327952 0 1
## 0.124858008430374 1 0
## 0.125119683211683 1 0
## 0.125216283763934 2 1
## 0.125484263866764 5 2
## 0.125559577183586 1 0
## 0.125577642974955 1 0
## 0.125786768008674 1 0
## 0.126053098713073 1 0
## 0.126205873548163 2 0
## 0.126234349446912 9 3
## 0.126281611375807 2 0
## 0.126329414652659 2 0
## 0.126439615523229 3 1
## 0.126514179477325 3 0
## 0.126718793591659 1 0
## 0.12694722332952 1 0
## 0.127481961472294 1 0
## 0.127518202652161 6 0
## 0.1276256236785 1 0
## 0.127971966724482 4 0
## 0.128167744886127 1 0
## 0.128613460486301 3 0
## 0.128913555868772 4 0
## 0.129155321330759 1 0
## 0.129199808213653 1 0
## 0.12958224487094 3 0
## 0.129942457979577 2 0
## 0.129970287642716 1 0
## 0.130219699854291 5 1
## 0.13030824581407 4 1
## 0.130402289734361 3 1
## 0.130664188461735 1 0
## 0.130997275174191 2 0
## 0.131185320337179 1 0
## 0.13146701587247 1 0
## 0.131529268860263 2 0
## 0.131722131161801 1 0
## 0.131904400060654 2 0
## 0.134193001325316 2 0
## 0.134394966345745 1 0
## 0.134428812940268 0 1
## 0.134677772248408 1 0
## 0.134894942838599 2 0
## 0.134912306466557 1 0
## 0.135498994914924 1 0
## 0.135627090589512 1 0
## 0.136939702995192 3 3
## 0.137121605855763 1 1
## 0.137263235119684 1 0
## 0.137386780867027 1 0
## 0.137501268584777 1 0
## 0.13827932129267 1 0
## 0.138914460070938 3 1
## 0.138944491610991 0 2
## 0.139243343901622 0 1
## 0.139501587879721 1 0
## 0.139519460848187 2 0
## 0.139777640365704 1 0
## 0.140256782636234 2 0
## 0.140466841315217 2 0
## 0.140562871711683 3 0
## 0.140592341831719 0 1
## 0.140608566013821 3 1
## 0.140829609223256 0 1
## 0.141020816516916 4 0
## 0.141064480838141 1 0
## 0.141334502085235 1 0
## 0.141367589106031 0 1
## 0.141870319334466 1 0
## 0.141904178741524 2 0
## 0.142919726685339 1 0
## 0.143660178749285 2 0
## 0.144278024777501 1 0
## 0.144487708159931 1 0
## 0.144555512630733 1 0
## 0.145156444419097 1 0
## 0.145379005487671 3 0
## 0.145803897234169 0 1
## 0.145918157197922 3 0
## 0.145937867808582 1 0
## 0.146660052081396 1 1
## 0.147417572567866 1 0
## 0.147973818460768 2 0
## 0.14805301457508 2 1
## 0.148173668109372 1 2
## 0.148266385643619 1 0
## 0.148342157419959 1 0
## 0.14852882112883 2 1
## 0.148552861095915 1 0
## 0.148571541053257 3 0
## 0.148751202637181 1 1
## 0.149015797721836 1 0
## 0.149179891852525 0 1
## 0.149379256894361 1 0
## 0.150223157869111 2 0
## 0.150363924083178 0 1
## 0.151226984354987 1 0
## 0.151441020136289 1 0
## 0.151855042238753 0 1
## 0.152208523922342 3 1
## 0.152216092798746 1 1
## 0.154020274086043 1 0
## 0.154751451580948 3 1
## 0.154792305169296 1 0
## 0.154979011494625 1 0
## 0.1552932956046 4 1
## 0.156043867501233 1 0
## 0.156516134477134 1 0
## 0.156925866489848 1 0
## 0.157038167031401 1 0
## 0.157832311874327 1 1
## 0.158083025749169 3 0
## 0.158293952516626 2 0
## 0.158323034979596 2 0
## 0.15930884919523 1 1
## 0.160109220716962 1 1
## 0.160742997396161 1 0
## 0.161453177621817 2 0
## 0.161504293674519 0 1
## 0.161917659939313 1 0
## 0.162427802181794 5 2
## 0.162567447786736 1 0
## 0.162866032061394 4 0
## 0.163552741832885 3 1
## 0.163786010123717 0 1
## 0.163872821265305 0 1
## 0.163920449274815 1 1
## 0.164313130563032 2 0
## 0.165301572115015 3 1
## 0.166924639446119 4 1
## 0.166962031892079 1 0
## 0.167146095237964 1 1
## 0.167722065509279 1 0
## 0.168571900855287 1 0
## 0.168633804909267 0 1
## 0.169035946559644 1 0
## 0.169592445063799 1 0
## 0.170136421351403 2 0
## 0.170239398204269 2 0
## 0.170449768642326 1 0
## 0.170509464529203 1 1
## 0.172055049614016 0 1
## 0.172275897292654 1 0
## 0.17262702672306 6 2
## 0.173277635516595 0 1
## 0.174042808360751 1 1
## 0.174167289871859 1 1
## 0.174670224508124 0 1
## 0.175216046155091 2 1
## 0.17540891864544 0 1
## 0.175814783909893 1 0
## 0.176767836503757 5 3
## 0.177350010475546 0 1
## 0.179243181108622 0 1
## 0.179283454334779 0 1
## 0.179716718238767 1 0
## 0.179947471286936 1 0
## 0.180537977981941 1 0
## 0.182517024304881 1 1
## 0.183420392289537 2 0
## 0.183932626020439 1 1
## 0.184149394747507 1 0
## 0.184768339859242 2 3
## 0.185230986364188 2 0
## 0.185350556801895 1 0
## 0.185852358637993 2 0
## 0.186170214897696 1 0
## 0.186561929424862 1 1
## 0.186980968947163 1 1
## 0.18738123122984 1 0
## 0.189197488117086 1 0
## 0.190899032080881 1 0
## 0.192268559857588 1 0
## 0.194285534918955 1 0
## 0.194995571879594 1 1
## 0.197604924432661 1 1
## 0.199037855708288 1 1
## 0.199323042247625 0 3
## 0.203662212479786 1 1
## 0.203715870070946 0 1
## 0.204324530682032 0 1
## 0.204531153943813 1 0
## 0.205648721870398 0 1
## 0.205966310867188 1 0
## 0.206229526045779 1 0
## 0.20801132046003 0 1
## 0.208528130573514 0 1
## 0.208863934916621 2 0
## 0.224364294481757 0 1
## 0.230650539978544 0 1
## 0.246826995302205 0 2
## 0.250562948195027 0 1
# Evaluate models' accuracy
log_accuracy <- mean(log_pred_class == data$stroke)
rf_accuracy <- mean(rf_pred_class == data$stroke)
log_accuracy
## [1] 0.951272
rf_accuracy
## [1] 0
library(shiny)
ui <- fluidPage(
titlePanel("Stroke Prediction"),
sidebarLayout(
sidebarPanel(
numericInput("age", "Age", value = 25, min = 18, max = 100),
selectInput("gender", "Gender", choices = c("Male", "Female", "Other")),
selectInput("hypertension", "Hypertension", choices = c(0, 1)),
selectInput("heart_disease", "Heart Disease", choices = c(0, 1)),
numericInput("avg_glucose_level", "Average Glucose Level", value = 100),
selectInput("smoking_status", "Smoking Status", choices = c("never smoked", "formerly smoked", "smokes", "Unknown"))
),
mainPanel(
textOutput("prediction")
)
)
)
server <- function(input, output) {
output$prediction <- renderText({
# Ensure log_model is already loaded in your environment before running the app
pred_data <- data.frame(
age = input$age,
gender = input$gender,
hypertension = as.numeric(input$hypertension),
heart_disease = as.numeric(input$heart_disease),
avg_glucose_level = input$avg_glucose_level,
smoking_status = input$smoking_status
)
pred <- predict(log_model, newdata = pred_data, type = "response")
if (pred > 0.5) {
return("Likely to have a stroke")
} else {
return("Unlikely to have a stroke")
}
})
}
shinyApp(ui = ui, server = server)
## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
# Load necessary package
library(readr) # or just use read.csv if you prefer
# Load your CSV file (make sure the path and filename are correct)
stroke_data <- read.csv("~/Build-deploy-stroke-prediction-model-R/healthcare-dataset-stroke-data.csv")
# Now you can check the class balance
table(stroke_data$stroke)
##
## 0 1
## 4861 249
The dataset revealed that several variables such as age, hypertension, heart disease, and smoking status significantly influence stroke likelihood. After training three models, the Random Forest classifier showed the best performance in terms of prediction accuracy and robustness.
To make this predictive model usable in real-world applications, it can be deployed in a health dashboard or integrated into a clinical decision support system to assist healthcare providers in assessing stroke risk early. Further work may involve balancing class distributions, increasing dataset size, or building an interactive web app using Shiny or Streamlit for R.