HW 6 Week 9 Lecture

Linear Regression

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
district<-read_excel("district.xls")

model_multiple <- lm(DA0912DR21R ~ DA0AT21R+DA0CT21R, data = district)
summary(model_multiple)

## 
## Call:
## lm(formula = DA0912DR21R ~ DA0AT21R + DA0CT21R, data = district)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6637 -0.9424 -0.2303  0.6698 28.0421 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60.626183   2.043604  29.666   <2e-16 ***
## DA0AT21R    -0.624078   0.021918 -28.473   <2e-16 ***
## DA0CT21R    -0.004277   0.002269  -1.885   0.0597 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.002 on 1078 degrees of freedom
##   (126 observations deleted due to missingness)
## Multiple R-squared:  0.462,  Adjusted R-squared:  0.461 
## F-statistic: 462.9 on 2 and 1078 DF,  p-value: < 2.2e-16

#My Dependent variable is Drop Out Rate, and my Independent variables are Attendance Rate and College Prep Class Participation

#My Multiple model for both attendance and college prep shows that Attendance has a stronger effect on Drop Out Rate than College Prep, and that its effect is signficant. My R squared shows that .462 of my model is explained by attendance and college prep courses.

Regression, much like t-tests and correlations, is all about relationships. What is the relationship between X and Y? Or between X, Y and Z?

For very simple data, this is easy enough to see. You can just plot it:

ggplot(district,aes(x= DA0AT21R,y = DA0912DR21R)) + geom_point()

## Warning: Removed 112 rows containing missing values or values outside the scale range
## (`geom_point()`).

#graph 1 looks like there is a relationship with the variables. 
ggplot(district,aes(x= DA0CT21R,y = DA0912DR21R)) + geom_point()

## Warning: Removed 126 rows containing missing values or values outside the scale range
## (`geom_point()`).

#Graph 2 does not look like it has a relationship.

HW 6 Week 9 Lecture

2024-09-03

Linear Regression

HOMEWORK