About Data Analysis Report

This RMarkdown file contains the report of the data analysis done for the project on building and deploying a stroke prediction model in R. It contains analysis such as data exploration, summary statistics and building the prediction models. The final report was completed on Sat May 10 10:22:08 2025.

Data Description:

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.

This data set is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

Task One: Import data and data preprocessing

Load data and install packages

# Install and load necessary packages
install.packages("readr")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
install.packages("shiny")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

# Load dataset
data <- read_csv("~/Build-deploy-stroke-prediction-model-R/healthcare-dataset-stroke-data.csv")
## Rows: 5110 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): gender, ever_married, work_type, Residence_type, bmi, smoking_status
## dbl (6): id, age, hypertension, heart_disease, avg_glucose_level, stroke
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the first few rows of the dataset
head(data)
## # A tibble: 6 × 12
##      id gender   age hypertension heart_disease ever_married work_type    
##   <dbl> <chr>  <dbl>        <dbl>         <dbl> <chr>        <chr>        
## 1  9046 Male      67            0             1 Yes          Private      
## 2 51676 Female    61            0             0 Yes          Self-employed
## 3 31112 Male      80            0             1 Yes          Private      
## 4 60182 Female    49            0             0 Yes          Private      
## 5  1665 Female    79            1             0 Yes          Self-employed
## 6 56669 Male      81            0             0 Yes          Private      
## # ℹ 5 more variables: Residence_type <chr>, avg_glucose_level <dbl>, bmi <chr>,
## #   smoking_status <chr>, stroke <dbl>

Describe and explore the data

# Check for missing values
summary(data)
##        id           gender               age         hypertension    
##  Min.   :   67   Length:5110        Min.   : 0.08   Min.   :0.00000  
##  1st Qu.:17741   Class :character   1st Qu.:25.00   1st Qu.:0.00000  
##  Median :36932   Mode  :character   Median :45.00   Median :0.00000  
##  Mean   :36518                      Mean   :43.23   Mean   :0.09746  
##  3rd Qu.:54682                      3rd Qu.:61.00   3rd Qu.:0.00000  
##  Max.   :72940                      Max.   :82.00   Max.   :1.00000  
##  heart_disease     ever_married        work_type         Residence_type    
##  Min.   :0.00000   Length:5110        Length:5110        Length:5110       
##  1st Qu.:0.00000   Class :character   Class :character   Class :character  
##  Median :0.00000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.05401                                                           
##  3rd Qu.:0.00000                                                           
##  Max.   :1.00000                                                           
##  avg_glucose_level     bmi            smoking_status         stroke       
##  Min.   : 55.12    Length:5110        Length:5110        Min.   :0.00000  
##  1st Qu.: 77.25    Class :character   Class :character   1st Qu.:0.00000  
##  Median : 91.89    Mode  :character   Mode  :character   Median :0.00000  
##  Mean   :106.15                                          Mean   :0.04873  
##  3rd Qu.:114.09                                          3rd Qu.:0.00000  
##  Max.   :271.74                                          Max.   :1.00000
# Check the structure of the data
str(data)
## spc_tbl_ [5,110 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id               : num [1:5110] 9046 51676 31112 60182 1665 ...
##  $ gender           : chr [1:5110] "Male" "Female" "Male" "Female" ...
##  $ age              : num [1:5110] 67 61 80 49 79 81 74 69 59 78 ...
##  $ hypertension     : num [1:5110] 0 0 0 0 1 0 1 0 0 0 ...
##  $ heart_disease    : num [1:5110] 1 0 1 0 0 0 1 0 0 0 ...
##  $ ever_married     : chr [1:5110] "Yes" "Yes" "Yes" "Yes" ...
##  $ work_type        : chr [1:5110] "Private" "Self-employed" "Private" "Private" ...
##  $ Residence_type   : chr [1:5110] "Urban" "Rural" "Rural" "Urban" ...
##  $ avg_glucose_level: num [1:5110] 229 202 106 171 174 ...
##  $ bmi              : chr [1:5110] "36.6" "N/A" "32.5" "34.4" ...
##  $ smoking_status   : chr [1:5110] "formerly smoked" "never smoked" "never smoked" "smokes" ...
##  $ stroke           : num [1:5110] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_double(),
##   ..   gender = col_character(),
##   ..   age = col_double(),
##   ..   hypertension = col_double(),
##   ..   heart_disease = col_double(),
##   ..   ever_married = col_character(),
##   ..   work_type = col_character(),
##   ..   Residence_type = col_character(),
##   ..   avg_glucose_level = col_double(),
##   ..   bmi = col_character(),
##   ..   smoking_status = col_character(),
##   ..   stroke = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Explore correlations between variables (e.g., age, gender, smoking status, etc.)
ggplot(data, aes(x = age, fill = stroke)) + 
  geom_histogram(position = "dodge") + 
  labs(title = "Distribution of Age and Stroke")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: The following aesthetics were dropped during statistical transformation: fill
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Task Two: Build prediction models

# Logistic Regression Model
log_model <- glm(stroke ~ age + gender + hypertension + heart_disease + smoking_status, 
                 data = data, 
                 family = "binomial")
summary(log_model)
## 
## Call:
## glm(formula = stroke ~ age + gender + hypertension + heart_disease + 
##     smoking_status, family = "binomial", data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0120  -0.3253  -0.1748  -0.0806   3.7584  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -7.077407   0.377767 -18.735  < 2e-16 ***
## age                          0.070947   0.005177  13.705  < 2e-16 ***
## genderMale                   0.049169   0.140442   0.350  0.72626    
## genderOther                 -7.333273 324.743801  -0.023  0.98198    
## hypertension                 0.468493   0.161955   2.893  0.00382 ** 
## heart_disease                0.371695   0.188613   1.971  0.04876 *  
## smoking_statusnever smoked  -0.212151   0.174053  -1.219  0.22289    
## smoking_statussmokes         0.088088   0.213542   0.413  0.67997    
## smoking_statusUnknown       -0.078109   0.205146  -0.381  0.70339    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1990.4  on 5109  degrees of freedom
## Residual deviance: 1600.6  on 5101  degrees of freedom
## AIC: 1618.6
## 
## Number of Fisher Scoring iterations: 11
# Random Forest Model
install.packages("randomForest")
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:dplyr':
## 
##     combine
rf_model <- randomForest(stroke ~ age + gender + hypertension + heart_disease + smoking_status, 
                         data = data)
## Warning in randomForest.default(m, y, ...): The response has five or fewer
## unique values.  Are you sure you want to do regression?
print(rf_model)
## 
## Call:
##  randomForest(formula = stroke ~ age + gender + hypertension +      heart_disease + smoking_status, data = data) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.04334627
##                     % Var explained: 6.49

Task Three: Evaluate and select prediction models

# Predict using Logistic Regression
log_pred <- predict(log_model, newdata = data, type = "response")
log_pred_class <- ifelse(log_pred > 0.5, 1, 0)

# Confusion Matrix for Logistic Regression
table(log_pred_class, data$stroke)
##               
## log_pred_class    0    1
##              0 4861  249
# Predict using Random Forest
rf_pred <- predict(rf_model, newdata = data)
rf_pred_class <- as.factor(rf_pred)

# Confusion Matrix for Random Forest
table(rf_pred_class, data$stroke)
##                     
## rf_pred_class          0   1
##   0.0205814118459159 101   0
##   0.0205897405073856  30   0
##   0.0205928743717508 269   0
##   0.0205971847165784  15   0
##   0.0206947196387357   4   0
##   0.0207908213689838   4   0
##   0.0208896795012945  38   0
##   0.0210261154866309  16   0
##   0.0211753937870088   3   0
##   0.0214844544849785 173   0
##   0.0215179720633247  52   0
##   0.0217046064983058  11   0
##   0.0217339088521013  16   0
##   0.0219520095928574  11   0
##   0.0220716483739329  64   0
##   0.0221791864465528  29   0
##   0.0222369343997945 155   0
##   0.0222615260467908 239   0
##   0.0223564321874499  44   0
##   0.0223658208815338   4   0
##   0.0224161879137874  18   0
##   0.0224655317553736   8   0
##   0.0224999752392876   7   0
##   0.0225315222376312   7   0
##   0.0225456433988756  13   0
##   0.0226663837354503  10   0
##   0.0226965064364372   8   0
##   0.0227708896495544  22   0
##   0.0228470575897751  13   1
##   0.0228654506779479   5   0
##   0.0229103417043489   8   0
##   0.0231125392917511  30   0
##   0.0231233435869198  13   0
##   0.0232673256677845   6   0
##   0.0233044372260228   4   0
##   0.0233324062332221   6   0
##   0.0233599609199247   6   0
##   0.0233865162655845   7   0
##   0.0234671022184801  14   0
##   0.0236065542732747   3   0
##   0.0237455922344623   2   0
##   0.0240760939088746   5   0
##   0.0241741636630337   4   1
##   0.0243066724270557 286   0
##   0.02431233886165    23   0
##   0.0244722064977393  94   0
##   0.0244736372743037   4   0
##   0.0245185600598067  11   0
##   0.0245244335563474   8   0
##   0.0245683587026783   9   0
##   0.0246177281784377   9   0
##   0.0246267177536161   4   0
##   0.0246505687208002  14   0
##   0.0246617988834076   4   0
##   0.0247494778219948   7   0
##   0.0247619005676721   9   0
##   0.0249193654523928  18   0
##   0.0249628348277335   4   1
##   0.0249681095397823  98   0
##   0.0249737759743765   2   0
##   0.0250206504157728  36   0
##   0.025100370147005    3   0
##   0.0251190150426469  58   0
##   0.0251342730861711   2   0
##   0.0252489492453018  28   0
##   0.0255315869107987   4   1
##   0.0255388087833734   7   0
##   0.0256098142633197   9   0
##   0.0256530353740866  14   1
##   0.0257754301001587  19   1
##   0.0258356840007374  19   0
##   0.0258635724371353  31   0
##   0.0260005914227887  11   0
##   0.0260701285675209   5   0
##   0.026146285552628    9   0
##   0.0262033476933919   5   0
##   0.0262129722744674   6   1
##   0.0262139746579449   9   1
##   0.0264549230719668  10   0
##   0.0265100779332514   8   0
##   0.0267109998289274  11   0
##   0.0268691585468454   9   0
##   0.0269391987192414  10   0
##   0.0269842742632042  14   0
##   0.0271335325389601  11   0
##   0.0273768162833133   9   0
##   0.0274309263156758   6   0
##   0.0276064341472714   1   0
##   0.027625363869487   24   1
##   0.0278646764018212   9   0
##   0.0281220100791988  22   0
##   0.0283369944400534   7   1
##   0.0284575306968751   5   0
##   0.0284914336360412   6   0
##   0.0286589479952608   7   0
##   0.0286807574559747   5   1
##   0.0288836831283522  12   0
##   0.0289238192022198  13   1
##   0.028936259944564    9   0
##   0.0293317949515263  12   0
##   0.0295215897126123   3   0
##   0.0296701994963137   1   0
##   0.0297059226357401   8   0
##   0.02992024817007     6   0
##   0.0300756754861963   8   1
##   0.0309992527830892  13   1
##   0.031200862817281   11   0
##   0.031804339326026   23   0
##   0.0319800280092763  12   1
##   0.0328786213082416  19   1
##   0.0329237245628946   6   0
##   0.0331328069597301  13   1
##   0.0331510445474095  16   1
##   0.0332342150184889   6   0
##   0.03325505440239     6   0
##   0.0332883509981768  10   1
##   0.0334059055418126  16   0
##   0.0335635371714425  13   1
##   0.0336392941972768   6   0
##   0.0336917341489888   5   1
##   0.0337398008284087   7   1
##   0.0339301620301872  17   0
##   0.0346123872408164   5   0
##   0.0347626414110968  37   0
##   0.0347866433843532  20   0
##   0.0351863542217918   7   0
##   0.0351971565541155   7   0
##   0.0351995546356503  63   0
##   0.0352973121054051   6   0
##   0.0353270186063983  25   0
##   0.0354112462486747   5   0
##   0.0357731536057545   6   0
##   0.0357810212174246   4   0
##   0.0358015691626301  10   0
##   0.0358631015790893   3   0
##   0.0363363630912703   4   0
##   0.0363765930730135  37   0
##   0.0364851031428697  10   0
##   0.0367562005511897   3   0
##   0.0370338961847144   7   2
##   0.0370825944417546   2   0
##   0.0371163827055502   8   0
##   0.0371714258234054   6   2
##   0.0372577102802713  14   0
##   0.0374480458664611  16   1
##   0.0376126347115207  24   1
##   0.0377569408438004  11   1
##   0.0378939727667189  20   0
##   0.0380346530293286   3   0
##   0.0383495740215055   3   0
##   0.0383877100702643   9   1
##   0.0384746348487561   7   0
##   0.0385004461442581   6   0
##   0.0386259037433543   6   0
##   0.0386963679667971   9   1
##   0.0387355526229512   4   0
##   0.0388576912398608   7   0
##   0.0388686039422764   8   0
##   0.0390401708490816   8   0
##   0.0390420558098852   2   0
##   0.0390662257451368   4   0
##   0.0391103589733686   3   1
##   0.0391734014907566   7   0
##   0.0391752972489697   7   0
##   0.039320428565706    5   0
##   0.0394569659091934  11   0
##   0.0396242027994595   7   2
##   0.0397944046786412   4   0
##   0.0399859534517698   4   0
##   0.0400609977748862   6   0
##   0.0401503611766167  14   0
##   0.0402364044229859   8   0
##   0.0405333399997097   7   1
##   0.0406254668444804   8   0
##   0.0407032852932606   8   0
##   0.040769549734647   11   1
##   0.0407982813209069   7   0
##   0.0408356161203283  12   0
##   0.0408451828648853   1   0
##   0.0409194562689563   8   0
##   0.0409714099340487   5   1
##   0.0410670595028487   6   0
##   0.0411396480931319  11   0
##   0.0411538892121746   9   0
##   0.0413067755892917  11   0
##   0.0415709044415465   9   1
##   0.0415747919945281   7   0
##   0.0416762450307245   5   0
##   0.0416842662990464  14   0
##   0.0419194638581247   3   1
##   0.0422890850186751   4   0
##   0.0424360774160354   8   0
##   0.0425762901444834  15   0
##   0.0430575524434885   8   0
##   0.0439415457494065   3   1
##   0.0440428953601926   9   0
##   0.0443216157094595   9   0
##   0.0443401753124923   1   0
##   0.0444879574612986   6   1
##   0.04473177334525     5   0
##   0.0452531876628748   2   1
##   0.0456689963838435   6   0
##   0.0456957394046022   3   0
##   0.0460202413904838   2   0
##   0.0461225264936044   4   0
##   0.0462041063060561   3   0
##   0.0467311609315142   7   2
##   0.0468065535932468   4   0
##   0.0470383456809085   4   0
##   0.0471498646834247   2   0
##   0.0477149670365726  10   0
##   0.0481500855467372   4   0
##   0.0482227561282881   3   0
##   0.04863743943815     5   0
##   0.0487892797929172   5   0
##   0.048947479899874    8   0
##   0.0491306357904341   3   0
##   0.0493697129712935   4   0
##   0.0495067504165913   4   0
##   0.0495348838020191   2   0
##   0.049887078755376    3   0
##   0.0501294239988792   2   0
##   0.0501826199070835   3   1
##   0.0503091010239179   3   0
##   0.0504445986964305   6   0
##   0.0508662171822463   4   0
##   0.0509665963736664  14   0
##   0.0510034061896173   8   0
##   0.0510909695479109   1   0
##   0.0512201939345257   1   0
##   0.0512840164822149   5   0
##   0.0514642383279523  10   0
##   0.0522022520744287   4   0
##   0.0529876872936794   5   0
##   0.0531987732764727   2   0
##   0.0536748802611936   2   0
##   0.0536987906199227   6   1
##   0.0538496740263903   3   0
##   0.0542496740263903   1   0
##   0.0542843587830824   2   1
##   0.0545110332615644   4   0
##   0.0546307890998498   3   1
##   0.054742814955692    6   2
##   0.0557125828002822   2   0
##   0.0557717478528776   6   0
##   0.0558294739723242   7   2
##   0.0559208600956076   3   0
##   0.0567367358831652   7   2
##   0.0569660015041068   1   0
##   0.057105250647004    2   0
##   0.0574740909354822   8   2
##   0.0577121585200137   1   0
##   0.0577260504286419   1   0
##   0.0580261033425725   1   0
##   0.0581306178419049   1   0
##   0.0581956933448791   3   0
##   0.0582735341407152   1   0
##   0.0584575952981177   5   3
##   0.0593481153966779   1   0
##   0.0603750103831911   2   0
##   0.0606084351471009   4   0
##   0.0608175323110555   7   1
##   0.0608956633353175   2   0
##   0.0612494974555879   1   0
##   0.0615593870146011   1   0
##   0.0619585689240699   1   0
##   0.0621930788317481   1   0
##   0.0623849493769224   1   0
##   0.0624189338603841   2   0
##   0.062562214718244    2   0
##   0.0628118957372169   1   0
##   0.0631342850354109   4   0
##   0.0634160729016437   1   0
##   0.0635213360595384   1   0
##   0.0637152423787034   1   0
##   0.0637957897553034   2   0
##   0.0640739763050673   2   0
##   0.0644514127291405   1   0
##   0.0644628314291064   1   0
##   0.0644669598578274   1   0
##   0.0647269716251439   3   0
##   0.0648285968767468   1   0
##   0.0648691328138787   3   0
##   0.0649963126077502   3   0
##   0.0650362961306467   6   0
##   0.0650927892846429  14   0
##   0.0654085787583271   4   0
##   0.0655443319075443   2   0
##   0.0657301060598322   1   0
##   0.0660277698783027   4   0
##   0.0661492037922898   1   0
##   0.0661549496721742   2   0
##   0.0661717290602579   5   0
##   0.0663932631786064   2   0
##   0.0664187382334025   1   0
##   0.0664485623947605   6   0
##   0.0666042271220297   1   0
##   0.0669097657463486   1   0
##   0.0669399148775097   1   0
##   0.0671759176272968   1   0
##   0.0674072360758514   2   0
##   0.0676797586625006   1   0
##   0.0679588687726772   5   0
##   0.0681248619229879   1   0
##   0.0683281862448099   2   0
##   0.068431484128182    1   0
##   0.068516132743349    1   0
##   0.0685585541896296   3   0
##   0.068632616942256    2   0
##   0.06910965838242     1   0
##   0.0691129502863979   3   0
##   0.0691913949325077   1   0
##   0.0692870975141733   1   0
##   0.0693305669481426   3   0
##   0.0696469784861695   1   0
##   0.0696529348996328   3   0
##   0.0696822134871933   2   0
##   0.0702794237316988   9   0
##   0.0702815794487527   2   0
##   0.070386144651902    1   0
##   0.070406467772046    2   0
##   0.0704571858226926   1   0
##   0.0709973407755614  11   0
##   0.0711060431618822   1   0
##   0.0712993684009256   1   0
##   0.0713129391738157  10   1
##   0.0713610931763978   1   0
##   0.0714078168691668   7   2
##   0.0718970161477136   2   1
##   0.0719627405518515   1   0
##   0.0721693812298733   3   0
##   0.0722552935565215   1   1
##   0.0722869298241849   3   0
##   0.0723004839503187   1   0
##   0.0724425433196701   1   1
##   0.0724963854867787   1   0
##   0.0725047444263018   6   4
##   0.0725596582297049   1   0
##   0.0730661522655888   1   0
##   0.0730882386734622   3   0
##   0.0731034602742427   4   1
##   0.0731476833984493   1   0
##   0.0731824218747989   2   0
##   0.0732317793433186   2   0
##   0.0732940227409156   3   0
##   0.0734488360436267   2   0
##   0.0737134226030593   6   0
##   0.0737538604035552   1   0
##   0.0738004413206067   1   0
##   0.0738174927831885   2   0
##   0.0739859238153916   1   0
##   0.0741582049875389   1   0
##   0.0742251046583834   4   0
##   0.0742384335961495   1   0
##   0.0742681171670203   1   0
##   0.0744660704842195   5   0
##   0.074521575603923    2   0
##   0.0745799697645069   1   0
##   0.0746497679490236   1   0
##   0.0746616025744322   6   0
##   0.0746648209704402   1   0
##   0.0748806622579666   1   0
##   0.0749712651333731   1   0
##   0.0750040167539148   2   0
##   0.0752145133286899   1   0
##   0.0752595813559008   6   0
##   0.0753264895012489   5   0
##   0.0755052719706455   1   0
##   0.0759912648603271   3   1
##   0.076039252845846    1   0
##   0.0760622531184313   1   0
##   0.0762796263979148   2   0
##   0.0763303874827774   8   2
##   0.0763819158389025   2   0
##   0.0764994671832089   1   0
##   0.0766972766504916   2   0
##   0.0771400158583021   2   2
##   0.0778065693960347   3   0
##   0.0778630731978336   3   0
##   0.0784080445222391   6   0
##   0.0786060819719882   1   0
##   0.0786657507800513   2   0
##   0.0788972642373996   1   0
##   0.0790082261325722   1   0
##   0.0793949967803872   1   1
##   0.0795519518115435   1   0
##   0.0796622670864057   5   1
##   0.0800844737986074   0   1
##   0.0802837556143813   3   1
##   0.0803478681422182   2   1
##   0.0803830577928851   3   0
##   0.0804048494233086   1   0
##   0.0807669452306843   4   1
##   0.0810641220757527   3   0
##   0.0811929778287917   3   0
##   0.0813220463422926   4   1
##   0.0821349034900405   3   0
##   0.0825590581648096   3   1
##   0.0826235418098579   2   0
##   0.0826373050895029   1   1
##   0.0827590557850883   1   0
##   0.0827752779473005   1   0
##   0.0828510149982857   1   0
##   0.0828608149894255   1   0
##   0.082950755147207    1   0
##   0.0832599713885819   2   0
##   0.0834220118673609   1   0
##   0.083474441880752    2   0
##   0.0837555475076951   2   0
##   0.0838020859840222   1   0
##   0.0838163519975045   1   0
##   0.0841909411080613   1   0
##   0.0842490668932847   1   1
##   0.0844063072195298   4   0
##   0.0845732550360218   2   0
##   0.0846178398253037   5   1
##   0.0846238993699472   5   0
##   0.0847126313101076   5   0
##   0.0848443862610594   1   0
##   0.0850519605093701   1   0
##   0.0852467270930752   1   0
##   0.0852522870881709   1   0
##   0.08535896097695     2   0
##   0.0856477254121992   5   1
##   0.08590308857801     2   0
##   0.0860071330028529   3   0
##   0.0860508658067205   1   0
##   0.0862852289305802   2   0
##   0.0862964789792504   1   1
##   0.0865321157985356   5   0
##   0.0865540799505808   2   0
##   0.0866323410606871   1   0
##   0.0866538403950929   2   0
##   0.0867818241376375   0   1
##   0.0868782755143954   1   0
##   0.0869489466976262   1   0
##   0.0874513385241278   3   0
##   0.0875649868414544   1   0
##   0.0876818906232222   5   1
##   0.0877661372037497   8   0
##   0.0879161975760443   1   0
##   0.0879585051726335   6   1
##   0.0882340557811362   1   0
##   0.0886111767545211   1   0
##   0.0888534434431579   2   0
##   0.0889758299206924   1   0
##   0.0890145551404628   4   0
##   0.0893949969166764   1   0
##   0.0897441212009338   6   1
##   0.0902160261838341   2   0
##   0.0902453470778146   3   0
##   0.090571406966282    1   0
##   0.0910399527362339   2   0
##   0.091442719387374    2   0
##   0.0916903149518897   1   0
##   0.0919519833870142   9   0
##   0.0922663671658661   1   0
##   0.0924131667963206   1   0
##   0.0930211794777631   0   1
##   0.0930573279568141   1   0
##   0.0933101325092873   2   1
##   0.0933404380956411  11   1
##   0.0935967735905208   0   1
##   0.0936162586897922   3   0
##   0.0939466126988602   2   0
##   0.0948578402308707   5   0
##   0.0959833091739709   1   0
##   0.096762479767779    1   1
##   0.096807444746975    6   0
##   0.0968358604238717   1   0
##   0.0968982600469699   1   0
##   0.0971381672955641   4   0
##   0.097479485822332    1   0
##   0.0978989666722018   4   0
##   0.097987624709211    0   1
##   0.0980908367048937   0   1
##   0.0981222222009676   5   1
##   0.0981773546213098   1   0
##   0.0987865390465185   2   0
##   0.0992696764285114   2   0
##   0.0994789523357603   4   0
##   0.0995882538896461   3   0
##   0.0999604772337578   1   0
##   0.100615423598884    1   0
##   0.10185206223229     2   0
##   0.101966130649791    2   1
##   0.102803945783586    4   1
##   0.102821456934637    5   0
##   0.102892906409374    3   1
##   0.103358996393184   10   1
##   0.1034262094223      2   0
##   0.10354190694255     0   1
##   0.10368391132143     2   0
##   0.103985200223488   11   2
##   0.104306069190366    5   2
##   0.105524784854155    4   0
##   0.105951678897387    4   0
##   0.105997411980727    2   0
##   0.106034486869555    0   1
##   0.106068096263302    1   0
##   0.106128401259137    5   0
##   0.106380670608482    2   2
##   0.106448525000482    4   0
##   0.106676828699761    3   1
##   0.107461563200269    1   0
##   0.107765266636109    1   0
##   0.10782614117063     4   2
##   0.107993252778619    2   0
##   0.108057049849981    2   0
##   0.108168185370139    1   0
##   0.108310067401828    1   0
##   0.108407292163152    8   1
##   0.108689773699967    1   0
##   0.108876420287338    8   1
##   0.109413769210498    6   2
##   0.109495700067971    1   0
##   0.109516800185708    1   0
##   0.109717267947417    1   0
##   0.109895235875835    1   0
##   0.110041194554917    2   1
##   0.11037359933417     2   1
##   0.110630314881521    3   0
##   0.111218110368392    3   0
##   0.111487051677749    1   1
##   0.111998585133632    1   0
##   0.112128242373095    2   0
##   0.112444202253736    1   0
##   0.112507888111514    1   1
##   0.112701445399503   14   3
##   0.112703741953044    4   3
##   0.112704635172852    1   0
##   0.113312667603137    1   0
##   0.113314059934193    1   0
##   0.113432627710428    2   0
##   0.113504508081532   13   3
##   0.113765287224121    2   1
##   0.113828908027592    4   2
##   0.114011016285202    1   0
##   0.11401142592151     1   0
##   0.114258537407028    5   3
##   0.114809179558829    1   0
##   0.114845335245677    2   0
##   0.11504346005049     1   0
##   0.115049308132235    4   0
##   0.115341727234747   11   2
##   0.116104220169337    1   0
##   0.116140145567487    7   1
##   0.116265614722445    3   1
##   0.11626835789138     1   0
##   0.116447728071271    2   0
##   0.116651832783201    1   0
##   0.11774504681678     1   0
##   0.1180077675249      1   0
##   0.118080863150403    0   1
##   0.118188630937915    1   0
##   0.118246330439635    5   2
##   0.11835439743159     1   0
##   0.118422267698218    2   1
##   0.118871251570194    2   1
##   0.119030450486681    1   1
##   0.119118311379526    1   0
##   0.1191534565288      1   0
##   0.119389587868947    8   4
##   0.11945299302677     1   0
##   0.120237556142623    1   0
##   0.120264612188352    2   1
##   0.120602673567245    1   0
##   0.120724224094406    1   0
##   0.120737309727524    1   0
##   0.120998373831515    1   0
##   0.121283370368841    1   0
##   0.121360560831758    3   0
##   0.121631797102317    3   0
##   0.122201270718668    1   0
##   0.122344733598933    1   0
##   0.122517768725614    0   1
##   0.122714589296575    2   0
##   0.12318186372024     2   0
##   0.123260957470056    3   0
##   0.123601786102871    3   1
##   0.123842744788525    1   0
##   0.123874068458633    1   0
##   0.124273011475117    2   1
##   0.124465460327952    0   1
##   0.124858008430374    1   0
##   0.125119683211683    1   0
##   0.125216283763934    2   1
##   0.125484263866764    5   2
##   0.125559577183586    1   0
##   0.125577642974955    1   0
##   0.125786768008674    1   0
##   0.126053098713073    1   0
##   0.126205873548163    2   0
##   0.126234349446912    9   3
##   0.126281611375807    2   0
##   0.126329414652659    2   0
##   0.126439615523229    3   1
##   0.126514179477325    3   0
##   0.126718793591659    1   0
##   0.12694722332952     1   0
##   0.127481961472294    1   0
##   0.127518202652161    6   0
##   0.1276256236785      1   0
##   0.127971966724482    4   0
##   0.128167744886127    1   0
##   0.128613460486301    3   0
##   0.128913555868772    4   0
##   0.129155321330759    1   0
##   0.129199808213653    1   0
##   0.12958224487094     3   0
##   0.129942457979577    2   0
##   0.129970287642716    1   0
##   0.130219699854291    5   1
##   0.13030824581407     4   1
##   0.130402289734361    3   1
##   0.130664188461735    1   0
##   0.130997275174191    2   0
##   0.131185320337179    1   0
##   0.13146701587247     1   0
##   0.131529268860263    2   0
##   0.131722131161801    1   0
##   0.131904400060654    2   0
##   0.134193001325316    2   0
##   0.134394966345745    1   0
##   0.134428812940268    0   1
##   0.134677772248408    1   0
##   0.134894942838599    2   0
##   0.134912306466557    1   0
##   0.135498994914924    1   0
##   0.135627090589512    1   0
##   0.136939702995192    3   3
##   0.137121605855763    1   1
##   0.137263235119684    1   0
##   0.137386780867027    1   0
##   0.137501268584777    1   0
##   0.13827932129267     1   0
##   0.138914460070938    3   1
##   0.138944491610991    0   2
##   0.139243343901622    0   1
##   0.139501587879721    1   0
##   0.139519460848187    2   0
##   0.139777640365704    1   0
##   0.140256782636234    2   0
##   0.140466841315217    2   0
##   0.140562871711683    3   0
##   0.140592341831719    0   1
##   0.140608566013821    3   1
##   0.140829609223256    0   1
##   0.141020816516916    4   0
##   0.141064480838141    1   0
##   0.141334502085235    1   0
##   0.141367589106031    0   1
##   0.141870319334466    1   0
##   0.141904178741524    2   0
##   0.142919726685339    1   0
##   0.143660178749285    2   0
##   0.144278024777501    1   0
##   0.144487708159931    1   0
##   0.144555512630733    1   0
##   0.145156444419097    1   0
##   0.145379005487671    3   0
##   0.145803897234169    0   1
##   0.145918157197922    3   0
##   0.145937867808582    1   0
##   0.146660052081396    1   1
##   0.147417572567866    1   0
##   0.147973818460768    2   0
##   0.14805301457508     2   1
##   0.148173668109372    1   2
##   0.148266385643619    1   0
##   0.148342157419959    1   0
##   0.14852882112883     2   1
##   0.148552861095915    1   0
##   0.148571541053257    3   0
##   0.148751202637181    1   1
##   0.149015797721836    1   0
##   0.149179891852525    0   1
##   0.149379256894361    1   0
##   0.150223157869111    2   0
##   0.150363924083178    0   1
##   0.151226984354987    1   0
##   0.151441020136289    1   0
##   0.151855042238753    0   1
##   0.152208523922342    3   1
##   0.152216092798746    1   1
##   0.154020274086043    1   0
##   0.154751451580948    3   1
##   0.154792305169296    1   0
##   0.154979011494625    1   0
##   0.1552932956046      4   1
##   0.156043867501233    1   0
##   0.156516134477134    1   0
##   0.156925866489848    1   0
##   0.157038167031401    1   0
##   0.157832311874327    1   1
##   0.158083025749169    3   0
##   0.158293952516626    2   0
##   0.158323034979596    2   0
##   0.15930884919523     1   1
##   0.160109220716962    1   1
##   0.160742997396161    1   0
##   0.161453177621817    2   0
##   0.161504293674519    0   1
##   0.161917659939313    1   0
##   0.162427802181794    5   2
##   0.162567447786736    1   0
##   0.162866032061394    4   0
##   0.163552741832885    3   1
##   0.163786010123717    0   1
##   0.163872821265305    0   1
##   0.163920449274815    1   1
##   0.164313130563032    2   0
##   0.165301572115015    3   1
##   0.166924639446119    4   1
##   0.166962031892079    1   0
##   0.167146095237964    1   1
##   0.167722065509279    1   0
##   0.168571900855287    1   0
##   0.168633804909267    0   1
##   0.169035946559644    1   0
##   0.169592445063799    1   0
##   0.170136421351403    2   0
##   0.170239398204269    2   0
##   0.170449768642326    1   0
##   0.170509464529203    1   1
##   0.172055049614016    0   1
##   0.172275897292654    1   0
##   0.17262702672306     6   2
##   0.173277635516595    0   1
##   0.174042808360751    1   1
##   0.174167289871859    1   1
##   0.174670224508124    0   1
##   0.175216046155091    2   1
##   0.17540891864544     0   1
##   0.175814783909893    1   0
##   0.176767836503757    5   3
##   0.177350010475546    0   1
##   0.179243181108622    0   1
##   0.179283454334779    0   1
##   0.179716718238767    1   0
##   0.179947471286936    1   0
##   0.180537977981941    1   0
##   0.182517024304881    1   1
##   0.183420392289537    2   0
##   0.183932626020439    1   1
##   0.184149394747507    1   0
##   0.184768339859242    2   3
##   0.185230986364188    2   0
##   0.185350556801895    1   0
##   0.185852358637993    2   0
##   0.186170214897696    1   0
##   0.186561929424862    1   1
##   0.186980968947163    1   1
##   0.18738123122984     1   0
##   0.189197488117086    1   0
##   0.190899032080881    1   0
##   0.192268559857588    1   0
##   0.194285534918955    1   0
##   0.194995571879594    1   1
##   0.197604924432661    1   1
##   0.199037855708288    1   1
##   0.199323042247625    0   3
##   0.203662212479786    1   1
##   0.203715870070946    0   1
##   0.204324530682032    0   1
##   0.204531153943813    1   0
##   0.205648721870398    0   1
##   0.205966310867188    1   0
##   0.206229526045779    1   0
##   0.20801132046003     0   1
##   0.208528130573514    0   1
##   0.208863934916621    2   0
##   0.224364294481757    0   1
##   0.230650539978544    0   1
##   0.246826995302205    0   2
##   0.250562948195027    0   1
# Evaluate models' accuracy
log_accuracy <- mean(log_pred_class == data$stroke)
rf_accuracy <- mean(rf_pred_class == data$stroke)

log_accuracy
## [1] 0.951272
rf_accuracy
## [1] 0

Task Four: Deploy the prediction model

library(shiny)

ui <- fluidPage(
  titlePanel("Stroke Prediction"),
  sidebarLayout(
    sidebarPanel(
      numericInput("age", "Age", value = 25, min = 18, max = 100),
      selectInput("gender", "Gender", choices = c("Male", "Female", "Other")),
      selectInput("hypertension", "Hypertension", choices = c(0, 1)),
      selectInput("heart_disease", "Heart Disease", choices = c(0, 1)),
      numericInput("avg_glucose_level", "Average Glucose Level", value = 100),
      selectInput("smoking_status", "Smoking Status", choices = c("never smoked", "formerly smoked", "smokes", "Unknown"))
    ),
    mainPanel(
      textOutput("prediction")
    )
  )
)

server <- function(input, output) {
  output$prediction <- renderText({
    # Ensure log_model is already loaded in your environment before running the app

    pred_data <- data.frame(
      age = input$age,
      gender = input$gender,
      hypertension = as.numeric(input$hypertension),
      heart_disease = as.numeric(input$heart_disease),
      avg_glucose_level = input$avg_glucose_level,
      smoking_status = input$smoking_status
    )

    pred <- predict(log_model, newdata = pred_data, type = "response")
    if (pred > 0.5) {
      return("Likely to have a stroke")
    } else {
      return("Unlikely to have a stroke")
    }
  })
}

shinyApp(ui = ui, server = server)
## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
Shiny applications not supported in static R Markdown documents
# Load necessary package
library(readr)  # or just use read.csv if you prefer

# Load your CSV file (make sure the path and filename are correct)
stroke_data <- read.csv("~/Build-deploy-stroke-prediction-model-R/healthcare-dataset-stroke-data.csv")

# Now you can check the class balance
table(stroke_data$stroke)
## 
##    0    1 
## 4861  249

Task Five: Findings and Conclusions

The dataset revealed that several variables such as age, hypertension, heart disease, and smoking status significantly influence stroke likelihood. After training three models, the Random Forest classifier showed the best performance in terms of prediction accuracy and robustness.

To make this predictive model usable in real-world applications, it can be deployed in a health dashboard or integrated into a clinical decision support system to assist healthcare providers in assessing stroke risk early. Further work may involve balancing class distributions, increasing dataset size, or building an interactive web app using Shiny or Streamlit for R.