library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
setwd("C:/Users/KaeRo/Desktop/R Studio/Reseach Data Selection")
library(readxl)
district <- read_excel("district.xls")
Cleaned_district<-district %>% drop_na()

Dependent Variable: Average Teacher Salary (DPSTTOSA) Interesting Indepentdent Variables: DPSAMIFP (percentage of minority staff), DPETWHIP (percentage of white students), DPETALLC (total number of students), DZRVLOCP (percentage of revenue from local taxes)

model1<-lm(DPSTTOSA~DPSAMIFP+DPETWHIP+DPETALLC+DZRVLOCP, data=Cleaned_district)

plot(model1,which=1)

summary(model1)
## 
## Call:
## lm(formula = DPSTTOSA ~ DPSAMIFP + DPETWHIP + DPETALLC + DZRVLOCP, 
##     data = Cleaned_district)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11208.0  -2819.4   -423.7   3176.8  13953.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.348e+04  1.729e+03  30.937  < 2e-16 ***
## DPSAMIFP     4.039e+01  2.337e+01   1.728   0.0849 .  
## DPETWHIP    -2.909e+01  2.307e+01  -1.261   0.2082    
## DPETALLC     5.137e-02  1.251e-02   4.107 5.10e-05 ***
## DZRVLOCP     5.284e+01  1.167e+01   4.527 8.47e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4226 on 318 degrees of freedom
## Multiple R-squared:  0.2506, Adjusted R-squared:  0.2412 
## F-statistic: 26.59 on 4 and 318 DF,  p-value: < 2.2e-16
Interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?
Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?
Variable 1: Total Number of Students (DPETALLC) Estimate: 5.137e-02 Interpret: When a school district adds 1 more student, average teacher pay goes up by 5 cents
Variable 2: Percentage of Revenue from Local Taxes (DZRVLOCP) Estimate: 5.284e+01 Interpret: When a school district increases the percentage of revenue from local taxes by 1%, average teacher pay goes up by $52

Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”

plot(model1,which=1)

The Line is fairly curved, so it violates the assumption of linerarity.