library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(readxl)
setwd("~/Desktop/Monday Class")
district <- read_excel("district.xls")
model_simple <- lm(DA0912DR21R ~ DPSTURNR, data = district)
summary(model_simple)
##
## Call:
## lm(formula = DA0912DR21R ~ DPSTURNR, data = district)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.727 -1.143 -0.797 0.183 49.326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.72294 0.24291 2.976 0.00298 **
## DPSTURNR 0.02521 0.01065 2.366 0.01815 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.318 on 1090 degrees of freedom
## (115 observations deleted due to missingness)
## Multiple R-squared: 0.00511, Adjusted R-squared: 0.004197
## F-statistic: 5.598 on 1 and 1090 DF, p-value: 0.01815
#My Dependent variable is Grades 9–12 Dropout Rate, and my Independent variable is Teacher Turnover Rate.
#My Multiple model for both attendance and college prep shows that Attendance has a stronger effect on Drop Out Rate than College Prep, and that its effect is signficant. My R squared shows that .462 of my model is explained by attendance and college prep courses.
Regression, much like t-tests and correlations, is all about relationships. What is the relationship between X and Y? Or between X, Y and Z?
For very simple data, this is easy enough to see. You can just plot it:
ggplot(district,aes(x= DPSTURNR,y = DA0912DR21R)) + geom_point()
## Warning: Removed 115 rows containing missing values or values outside the scale range
## (`geom_point()`).
#graph 1 looks like there is a relationship with the variables.
ggplot(district,aes(x= DPSTURNR,y = DA0912DR21R )) + geom_point()
## Warning: Removed 115 rows containing missing values or values outside the scale range
## (`geom_point()`).
#Graph 2 does not look like it has a relationship.