# 1. Load dataset
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
eci <- read_csv("eci.csv")
## Rows: 274 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): County, CSCS, CSFA, TS, PPSC, TPPS
## dbl (1): B3P
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Checked for missing values
summary(eci)
## County B3P CSCS CSFA
## Length:274 Min. : 2.0 Length:274 Length:274
## Class :character 1st Qu.: 383.0 Class :character Class :character
## Mode :character Median : 993.5 Mode :character Mode :character
## Mean : 6920.7
## 3rd Qu.: 2587.5
## Max. :316834.0
## NA's :20
## TS PPSC TPPS
## Length:274 Length:274 Length:274
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
# Removed rows with missing values
cleaned_data <- na.omit(eci)
# 2. Dependent and Independent variables
dep_var <- cleaned_data$CSCS
ind_var <- cleaned_data$TS
cleaned_data$CSCS<-as.numeric(cleaned_data$CSCS)
## Warning: NAs introduced by coercion
cleaned_data$TS<-as.numeric(cleaned_data$TS)
## Warning: NAs introduced by coercion
# 3. Create a linear model
model<-lm(CSCS~TS, data=cleaned_data)
# 4. Summary of the model
summary(model)
##
## Call:
## lm(formula = CSCS ~ TS, data = cleaned_data)
##
## Residuals:
## 7 16 28 30 31 38
## 2.4684 3.9745 -3.7455 -1.3267 -1.7198 0.3491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.227507 1.694585 -7.216 0.00196 **
## TS 0.956326 0.006359 150.379 1.17e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.192 on 4 degrees of freedom
## (54 observations deleted due to missingness)
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9998
## F-statistic: 2.261e+04 on 1 and 4 DF, p-value: 1.173e-08
# 5. Interpret the model
# R-squared: 0.9998, meaning the model explains 99.89% of the variation in the dependent variable
# p-value: < 1.173e-08, indicating the model is statistically significant overall
Overall Model The F-statistic is very high (22610), with a very small p-value (1.17e-08), which suggests that the model as a whole is statistically significant and that “TS” has a meaningful effect on “CSCS”.
The p-value associated with the F-statistic is less than 0.001, indicating strong evidence against the null hypothesis and confirming that the model significantly explains the variance in “CSCS”.
Significance of Variables The p-value for the intercept (0.00196) and p-value for “TS” (1.17e-08) both indicate statistical significance.
Beta Coefficients Intercept (-12.23): This value represents the predicted value of “CSCS” when “TS” is zero. It might not always have a meaningful interpretation, especially if a “TS” value of zero is outside the meaningful range for this data.
Slope (0.956): The slope coefficient for “TS” is 0.956, which indicates that for each one-unit increase in “TS,” the “CSCS” score is expected to increase by approximately 0.956 units. This suggests a strong, positive relationship between “TS” and “CSCS.”
Residual Standard Error The residual standard error is 3.192, which provides an estimate of the standard deviation of the residuals. Which provides a visual of the size of the errors.
# 6. Interpret the coefficients
# The significant independent variable is "TS" (with a very low p-value of 1.17e-08).
Beta Coefficient for TS (Estimate = 0.956): The coefficient, or slope, tells us that for each one-unit increase in “TS,” the predicted value of “CSCS” increases by approximately 0.956 units. This positive relationship implies that as “TS” scores go up, “CSCS” scores also tend to increase.
Impact of TS on CSCS: The high t-value (150.379) and extremely low p-value suggest that “TS” is a strong predictor of “CSCS,” making it significant in this model. Therefore, “TS” likely explains a substantial portion of the variance in “CSCS.”
The independent variable “TS” positively influences “CSCS,” and this effect is both statistically significant and large. Each unit increase in “TS” leads to nearly a one-unit increase in “CSCS,” reflecting a nearly linear and direct relationship between these variables in the model.
# 7. Check the linearity assumption
plot(model, which = 1)
# The plot shows a clear linear relationship between the variables, so the linearity assumption is met.