library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
setwd("C:/Users/KaeRo/Desktop/R Studio/Reseach Data Selection")
library(readxl)
district <- read_excel("district.xls")
Cleaned_district<-district %>% drop_na()
Dependent Variable: Average Teacher Salary (DPSTTOSA) Interesting Indepentdent Variables: DPSAMIFP (percentage of minority staff), DPETWHIP (percentage of white students), DPETALLC (total number of students), DZRVLOCP (percentage of revenue from local taxes)
model1<-lm(DPSTTOSA~DPSAMIFP+DPETWHIP+DPETALLC+DZRVLOCP, data=Cleaned_district)
plot(model1,which=1)
summary(model1)
##
## Call:
## lm(formula = DPSTTOSA ~ DPSAMIFP + DPETWHIP + DPETALLC + DZRVLOCP,
## data = Cleaned_district)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11208.0 -2819.4 -423.7 3176.8 13953.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.348e+04 1.729e+03 30.937 < 2e-16 ***
## DPSAMIFP 4.039e+01 2.337e+01 1.728 0.0849 .
## DPETWHIP -2.909e+01 2.307e+01 -1.261 0.2082
## DPETALLC 5.137e-02 1.251e-02 4.107 5.10e-05 ***
## DZRVLOCP 5.284e+01 1.167e+01 4.527 8.47e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4226 on 318 degrees of freedom
## Multiple R-squared: 0.2506, Adjusted R-squared: 0.2412
## F-statistic: 26.59 on 4 and 318 DF, p-value: < 2.2e-16
| Interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables? |
|---|
| Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable? |
| Variable 1: Total Number of Students (DPETALLC) Estimate: 5.137e-02 Interpret: When a school district adds 1 more student, average teacher pay goes up by 5 cents |
| Variable 2: Percentage of Revenue from Local Taxes (DZRVLOCP) Estimate: 5.284e+01 Interpret: When a school district increases the percentage of revenue from local taxes by 1%, average teacher pay goes up by $52 |
Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”
plot(model1,which=1)
The Line is fairly curved, so it violates the assumption of linerarity.