library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
#1
project_data <- read_excel("texas federal funds.xlsx")
#2, #3
environmental_model <- lm(`ENVIRONMENTAL HEALTH`~`MATERNAL AND CHILD HEALTH SERVICES BLOCK GRANT TO THE STATES`+`EVEN START - STATE EDUCATIONAL AGENCIES`+`STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION`, data=project_data)
#4
summary(environmental_model)
##
## Call:
## lm(formula = `ENVIRONMENTAL HEALTH` ~ `MATERNAL AND CHILD HEALTH SERVICES BLOCK GRANT TO THE STATES` +
## `EVEN START - STATE EDUCATIONAL AGENCIES` + `STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION`,
## data = project_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1705532 -1212895 400060 649241 1778931
##
## Coefficients:
## Estimate
## (Intercept) 5.264e+05
## `MATERNAL AND CHILD HEALTH SERVICES BLOCK GRANT TO THE STATES` 1.927e-02
## `EVEN START - STATE EDUCATIONAL AGENCIES` -6.671e-02
## `STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION` 1.520e-01
## Std. Error
## (Intercept) 1.927e+06
## `MATERNAL AND CHILD HEALTH SERVICES BLOCK GRANT TO THE STATES` 3.954e-02
## `EVEN START - STATE EDUCATIONAL AGENCIES` 8.558e-02
## `STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION` 1.068e-01
## t value Pr(>|t|)
## (Intercept) 0.273 0.791
## `MATERNAL AND CHILD HEALTH SERVICES BLOCK GRANT TO THE STATES` 0.487 0.638
## `EVEN START - STATE EDUCATIONAL AGENCIES` -0.780 0.456
## `STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION` 1.423 0.188
##
## Residual standard error: 1346000 on 9 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.1897, Adjusted R-squared: -0.08034
## F-statistic: 0.7025 on 3 and 9 DF, p-value: 0.574
#5 The r-squared value is quite low, indicating that the chosen programs in our model only explain 18% of the variance in the data. The p-value sitting at 0.574 also does little to instill confidence, as that value confirms the null hypothesis (the selected health programs have no impact on the funding patterns for environmental programs). Further confirming the null hypothesis is the fact that all selected programs (variables) are insignificant with p-values well above 0.05.
#6 Let’s pretend that the STATE ADMINISTRATIVE EXPENSES FOR CHILD NUTRITION (SAECN) variable is significant. The estimate value (AKA beta coefficient) indicates that for a single unit increase in SAECN, the dependent variable (ENVIRONMENTAL HEALTH program) will increase by a value of 1.423. The other independent variables listed follow the same structure based on their estimate values (MATERNAL… = 0.487 increase, EVEN START… = 0.780 decrease).
#7
plot(environmental_model, which=1)
#7 The model I have created violates the assumption of linearity. For the realationship to be linear, we would need to see our red line be relatively straight and fitted to the dotted line. What we see instead is multiple peaks and valleys, which strongly suggests a violation in the assumption of linearity. Not even our in-class example of a violated model had multiple peaks and valleys (it simply curved), making my model easy to identify as non-linear.