class: center, middle, inverse, title-slide .title[ # STA 321 Logistic Regression Project: Predicting a Patient’s Odds of CHD ] .subtitle[ ##
] .author[ ### Josie Gallop ] .date[ ### 2025-02-10 ] --- class:inverse4, top <h1 align="center"> Agenda </h1> <BR> - Binary Predictive Modeling - Logistic Regression - Introduction - Variables - Practical Questions - Exploratory Data Analysis - Three Candidate Models - Full Model - Reduced Model - Forward Selection Model - Model Selection Process - Cross Validation - ROC Analysis - Conclusion and Recommendations --- <h1 align="center"> Introduction </h1> <BR> - Data found on kaggle.com (Dileep, 2019). - Ongoing cardiovascular study in Framingham, Massachussetts. - 4,238 observations of 16 variables. - Various personal and medical risk factors. - Create a logistic regression model which predicts the odds of a patient being at risk for developing CHD in a 10 year period. - Three candidate models: - Full model - Reduced model - Forward selection model --- <h1 align="center"> Variables </h1> <BR> .pull-left[ - gender - age - education - currentSmoker - cigsPerDay - BPMeds - prevalentStroke - prevalentHyp - diabetes ] .pull-right[ - totChol - sysBP - diaBP - BMI - heartRate - glucose - TenYearCHD(binary response variables) - 0 = "no" and 1 = "yes" ] --- class: inverse3 center middle # First Few Entries of the Data Set
--- class: inverse1 center middle ## **sysBP** Distribution <img src="PracticePresentation_files/figure-html/unnamed-chunk-2-1.png" width="600px" style="display: block; margin: auto;" /> --- class: inverse1 center middle ## **diaBP** Distribution <img src="PracticePresentation_files/figure-html/unnamed-chunk-3-1.png" width="600px" style="display: block; margin: auto;" /> --- class: inverse1 center middle # Complete Table of the Data Set ``` # A tibble: 4,238 × 16 male age education currentSmoker cigsPerDay BPMeds prevalentStroke <int> <int> <int> <int> <int> <int> <int> 1 1 39 4 0 0 0 0 2 0 46 2 0 0 0 0 3 1 48 1 1 20 0 0 4 0 61 3 1 30 0 0 5 0 46 3 1 23 0 0 6 0 43 2 0 0 0 0 7 0 63 1 0 0 0 0 8 0 45 2 1 20 0 0 9 1 52 1 0 0 0 0 10 1 43 1 1 30 0 0 # ℹ 4,228 more rows # ℹ 9 more variables: prevalentHyp <int>, diabetes <int>, totChol <int>, # sysBP <dbl>, diaBP <dbl>, BMI <dbl>, heartRate <int>, glucose <int>, # TenYearCHD <int> ```