HOMEWORK

  1. Load your chosen dataset into Rmarkdown
  2. Select the dependent variable you are interested in, along with independent variables which you believe are causing the dependent variable
  3. create a linear model using the “lm()” command, save it to some object
  4. call a “summary()” on your new model
  5. interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?
  6. Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?
  7. Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

#Load data (adjusted skip since headers off)
raw_data <-read_excel("2025-County-Health-Rankings-Texas-Data-v3.xlsx", sheet = "Select Measure Data", skip = 2) 
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...46`
## • `` -> `...47`
## • `` -> `...48`
## • `` -> `...49`
## • `` -> `...50`
## • `` -> `...51`
## • `` -> `...52`
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
## • `` -> `...67`
## • `` -> `...71`
## • `` -> `...75`
## • `` -> `...77`
## • `` -> `...78`
## • `` -> `...79`
## • `` -> `...80`
## • `` -> `...81`
## • `` -> `...82`
## • `` -> `...84`
## • `` -> `...86`
## • `` -> `...90`
## • `` -> `...94`
## • `` -> `...98`
## • `` -> `...100`
## • `` -> `...101`
## • `` -> `...102`
## • `` -> `...103`
## • `` -> `...104`
## • `` -> `...105`
## • `` -> `...107`
## • `` -> `...108`
## • `` -> `...109`
## • `` -> `...110`
## • `` -> `...111`
## • `` -> `...112`
## • `` -> `...117`
## • `` -> `...130`
## • `` -> `...134`
## • `` -> `...135`
## • `` -> `...136`
## • `` -> `...137`
## • `` -> `...138`
## • `` -> `...139`
## • `` -> `...140`
## • `` -> `...141`
## • `` -> `...142`
## • `` -> `...143`
## • `` -> `...144`
## • `` -> `...145`
## • `` -> `...146`
## • `` -> `...147`
## • `` -> `...148`
## • `` -> `...149`
## • `` -> `...154`
## • `` -> `...156`
## • `` -> `...157`
## • `` -> `...158`
## • `` -> `...163`
## • `` -> `...165`
## • `` -> `...171`
## • `` -> `...177`
## • `` -> `...181`
## • `` -> `...185`
## • `` -> `...189`
## • `` -> `...190`
## • `` -> `...191`
## • `` -> `...192`
## • `` -> `...193`
## • `` -> `...194`
## • `` -> `...199`
## • `` -> `...200`
## • `` -> `...201`
## • `` -> `...202`
## • `` -> `...203`
## • `` -> `...204`
## • `` -> `...205`
## • `` -> `...206`
## • `` -> `...207`
## • `` -> `...208`
## • `` -> `...209`
## • `` -> `...210`
## • `` -> `...211`
## • `` -> `...212`
## • `` -> `...213`
## • `` -> `...214`
## • `` -> `...215`
## • `` -> `...216`
## • `` -> `...217`
## • `` -> `...218`
## • `` -> `...219`
## • `` -> `...220`
## • `` -> `...223`
## • `` -> `...225`

Question 2 - Dependent and Independent Variables

#Focusing on older adults (65+), highlighting preventive care disparities in Texas counties. 
#Key independent variables = "Vacc" (% Medicare enrollees with annual flu shot), "Hosp" (hospital stays per 100k Medicare enrollees), and Mammo"(% female Medicare enrollees 65-74 with annual mammogram screening). 
#The dependent variable is "Health" (i.e.,"Poor or Fair Health %" and measures the percentage of adults who self-report their general health as "fair" or "poor"), which suits the Medicare-focused independent variables (i.e., vaccinated %, preventable hospitalization rate, % with annual mammogram), as better preventive care should lower poor health reports in older adults.

older_data <- raw_data[, c(3, 72, 76, 99, 106)] 

colnames(older_data) <- c("County", "Health", "Vacc", "Hosp", "Mammo") 

older_data[,2:5] <- lapply(older_data[,2:5], as.numeric) 

older_data <- na.omit(older_data) 

Questions 3 and 4 - Create a Model and Summary

price_model<-lm(Health~Vacc+Hosp+Mammo, data=older_data)

summary(price_model)
## 
## Call:
## lm(formula = Health ~ Vacc + Hosp + Mammo, data = older_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1377  -2.7795  -0.4865   2.2723  12.6885 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.2527826  1.5533696  23.338  < 2e-16 ***
## Vacc        -0.0062198  0.0383481  -0.162    0.871    
## Hosp         0.0001748  0.0002747   0.636    0.525    
## Mammo       -0.3746138  0.0458377  -8.173 1.87e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.016 on 235 degrees of freedom
## Multiple R-squared:  0.3532, Adjusted R-squared:  0.345 
## F-statistic: 42.78 on 3 and 235 DF,  p-value: < 2.2e-16

Question 5 - Interpret the Model’s R-squared and P-values

#The model’s R-squared is 0.353. This means Flu Vaccinations, Preventive Hospitalization Stays, and Mammogram Screenings together account for 35.3% of variation in % fair or poor health across Texas counties. The impact of these preventive factors is modest, as other factors explain 64.7% 
#The individual p-values are as follow:
#Vacc: p = 0.871 (>0.05, insignificant)
#Hosp: p = 0.525(>0.05, insignificant)
#Mammo: p = 1.87e-14 (<0.001, highly significant)
#The p-values are close to zero for Mammogram. Thus, of the three independent variables, mammogram screening is most important when it comes to fair/poor health %. For every 1% increase in Mammogram screening, fair/poor health % decreases by 0.375 points. Flu Vaccinations and Preventive Hospital Stays show no reliable effect. Thus, mammogram screenings are more important when it comes to fair and poor health %.

Question 6 - Significant Independent Variable

#Significant independent variable: Mammogram Screening (p=1.87e-14)
#Beta coefficient = -0.3746138. For every 1% increase in mammogram screening, poor health % decreases 0.375 points.

#Effect = higher screening individually lowers self-reported poor health in older adults via early detection and prevention. 10% screening rise predicts ~3.75 point drop

# Flu vaccinations and Preventive Hospital Stays had no significant individual effects (p=0.871, 0.525)

Question 7 - Linearity Check

#The model meets the linearity assumption. The Residuals vs. Fitted plot shows random scatter around the horizontal zero line (i.e., red dashed line) with no systematic pattern. Points are evenly distributed across fitted values (i.e. 20-30% poor health range) with no curve. The residuals average is approximately 0, which confirms unbiased predictions.

plot(price_model, which=1)