library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")
create a linear model using the “lm()” command, save it to some object
call a “summary()” on your new model
interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?
Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?
Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”
head(district)
## # A tibble: 6 × 137
## DISTNAME DISTRICT DZCNTYNM REGION DZRATING DZCAMPUS DPETALLC DPETBLAP DPETHISP
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CAYUGA … 001902 001 AND… 07 A 3 574 4.4 11.5
## 2 ELKHART… 001903 001 AND… 07 A 4 1150 4 11.8
## 3 FRANKST… 001904 001 AND… 07 A 3 808 8.5 11.3
## 4 NECHES … 001906 001 AND… 07 A 2 342 8.2 13.5
## 5 PALESTI… 001907 001 AND… 07 B 6 3360 25.1 42.9
## 6 WESTWOO… 001908 001 AND… 07 B 4 1332 19.7 26.2
## # ℹ 128 more variables: DPETWHIP <dbl>, DPETINDP <dbl>, DPETASIP <dbl>,
## # DPETPCIP <dbl>, DPETTWOP <dbl>, DPETECOP <dbl>, DPETLEPP <dbl>,
## # DPETSPEP <dbl>, DPETBILP <dbl>, DPETVOCP <dbl>, DPETGIFP <dbl>,
## # DA0AT21R <dbl>, DA0912DR21R <dbl>, DAGC4X21R <dbl>, DAGC5X20R <dbl>,
## # DAGC6X19R <dbl>, DA0GR21N <dbl>, DA0GS21N <dbl>, DDA00A001S22R <dbl>,
## # DDA00A001222R <dbl>, DDA00A001322R <dbl>, DDA00AR01S22R <dbl>,
## # DDA00AR01222R <dbl>, DDA00AR01322R <dbl>, DDA00AM01S22R <dbl>, …
str(district)
## tibble [1,207 × 137] (S3: tbl_df/tbl/data.frame)
## $ DISTNAME : chr [1:1207] "CAYUGA ISD" "ELKHART ISD" "FRANKSTON ISD" "NECHES ISD" ...
## $ DISTRICT : chr [1:1207] "001902" "001903" "001904" "001906" ...
## $ DZCNTYNM : chr [1:1207] "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" ...
## $ REGION : chr [1:1207] "07" "07" "07" "07" ...
## $ DZRATING : chr [1:1207] "A" "A" "A" "A" ...
## $ DZCAMPUS : num [1:1207] 3 4 3 2 6 4 2 6 4 5 ...
## $ DPETALLC : num [1:1207] 574 1150 808 342 3360 ...
## $ DPETBLAP : num [1:1207] 4.4 4 8.5 8.2 25.1 19.7 0.3 0.8 15.7 7.2 ...
## $ DPETHISP : num [1:1207] 11.5 11.8 11.3 13.5 42.9 26.2 8.6 68.7 31.2 27.9 ...
## $ DPETWHIP : num [1:1207] 79.1 80.3 75.2 75.1 27.3 48 87 28.2 48.5 60.6 ...
## $ DPETINDP : num [1:1207] 0 0.3 0.4 0.3 0.2 0.7 0 0.3 0.1 0.3 ...
## $ DPETASIP : num [1:1207] 0.5 0.2 1 0.3 0.7 0.5 0.6 0.3 1 1 ...
## $ DPETPCIP : num [1:1207] 0 0 0 0 0.1 0.1 0 0 0.1 0.1 ...
## $ DPETTWOP : num [1:1207] 4.5 3.4 3.6 2.6 3.7 4.9 3.6 1.7 3.4 3 ...
## $ DPETECOP : num [1:1207] 40.8 45.4 54.2 54.1 81.6 74 46.8 49.6 57.8 50.1 ...
## $ DPETLEPP : num [1:1207] 1 2.8 4.1 2 17.7 7.1 0.6 14.2 5.1 6.9 ...
## $ DPETSPEP : num [1:1207] 14.6 12.1 13.1 10.5 13.5 14.5 14.7 10.4 11.6 11.9 ...
## $ DPETBILP : num [1:1207] 1 2.7 4.1 2 16.1 6.8 0.6 15.2 5 6 ...
## $ DPETVOCP : num [1:1207] 30.5 31.8 43.9 29.5 30.6 38.7 37.7 24.8 18.9 34.4 ...
## $ DPETGIFP : num [1:1207] 6.1 4.6 7.3 5.6 2.3 3.2 3.3 6.8 9.2 6 ...
## $ DA0AT21R : num [1:1207] 96.7 96 95.4 95.8 93.7 94.5 96.7 92.8 97.3 95.2 ...
## $ DA0912DR21R : num [1:1207] 0 0.3 0.4 0 0 0 0 0.4 0.4 0.7 ...
## $ DAGC4X21R : num [1:1207] 100 100 95.2 95.8 99 97.8 100 96.8 100 94.1 ...
## $ DAGC5X20R : num [1:1207] 100 98.9 100 97 99.6 97 100 97.2 100 95.6 ...
## $ DAGC6X19R : num [1:1207] 96 98.8 33.3 100 98.6 97.4 100 96.7 100 95.9 ...
## $ DA0GR21N : num [1:1207] 36 91 41 23 201 95 32 293 52 196 ...
## $ DA0GS21N : num [1:1207] 34 79 40 17 198 77 27 238 52 154 ...
## $ DDA00A001S22R: num [1:1207] 84 85 83 90 74 69 86 76 82 86 ...
## $ DDA00A001222R: num [1:1207] 62 59 57 64 46 40 55 47 56 60 ...
## $ DDA00A001322R: num [1:1207] 33 30 25 27 20 16 25 21 30 31 ...
## $ DDA00AR01S22R: num [1:1207] 81 85 84 87 72 70 86 75 82 84 ...
## $ DDA00AR01222R: num [1:1207] 67 64 63 67 48 45 66 50 60 62 ...
## $ DDA00AR01322R: num [1:1207] 39 34 24 30 20 19 31 22 31 31 ...
## $ DDA00AM01S22R: num [1:1207] 88 84 85 94 75 66 81 76 81 88 ...
## $ DDA00AM01222R: num [1:1207] 65 49 57 69 44 34 42 44 53 62 ...
## $ DDA00AM01322R: num [1:1207] 34 23 26 27 20 14 19 21 29 33 ...
## $ DDA00AC01S22R: num [1:1207] 85 86 81 90 78 73 96 75 83 84 ...
## $ DDA00AC01222R: num [1:1207] 54 63 49 54 48 41 45 46 57 52 ...
## $ DDA00AC01322R: num [1:1207] 22 29 21 23 22 15 16 18 27 21 ...
## $ DDA00AS01S22R: num [1:1207] 78 90 74 83 72 68 92 81 82 87 ...
## $ DDA00AS01222R: num [1:1207] 47 63 48 51 42 38 73 50 51 60 ...
## $ DDA00AS01322R: num [1:1207] 21 42 26 26 20 15 38 27 32 36 ...
## $ DDB00A001S22R: num [1:1207] 60 46 74 88 64 56 -1 71 68 71 ...
## $ DDB00A001222R: num [1:1207] 17 22 38 48 33 26 -1 41 38 37 ...
## $ DDB00A001322R: num [1:1207] 3 8 6 19 11 11 -1 13 14 14 ...
## $ DDH00A001S22R: num [1:1207] 74 85 75 91 73 69 87 72 81 81 ...
## $ DDH00A001222R: num [1:1207] 53 56 46 69 44 36 57 42 50 53 ...
## $ DDH00A001322R: num [1:1207] 24 25 19 26 19 12 20 17 24 24 ...
## $ DDW00A001S22R: num [1:1207] 87 88 85 89 83 75 86 84 88 89 ...
## $ DDW00A001222R: num [1:1207] 66 61 62 66 60 48 55 58 67 66 ...
## $ DDW00A001322R: num [1:1207] 35 32 28 29 29 21 26 29 40 35 ...
## $ DDI00A001S22R: num [1:1207] NA 100 80 -1 75 NA NA 83 -1 62 ...
## $ DDI00A001222R: num [1:1207] NA 100 20 -1 50 NA NA 28 -1 8 ...
## $ DDI00A001322R: num [1:1207] NA 100 20 -1 17 NA NA 6 -1 0 ...
## $ DD300A001S22R: num [1:1207] 33 -1 84 -1 85 100 NA 100 93 97 ...
## $ DD300A001222R: num [1:1207] 33 -1 53 -1 77 100 NA 87 73 82 ...
## $ DD300A001322R: num [1:1207] 17 -1 16 -1 44 88 NA 67 53 56 ...
## $ DD400A001S22R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD400A001222R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD400A001322R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD200A001S22R: num [1:1207] 83 77 75 -1 74 62 88 85 74 83 ...
## $ DD200A001222R: num [1:1207] 54 46 58 -1 44 38 50 58 48 50 ...
## $ DD200A001322R: num [1:1207] 34 23 28 -1 18 13 6 31 13 29 ...
## $ DDE00A001S22R: num [1:1207] 76 77 77 86 70 65 81 67 77 78 ...
## $ DDE00A001222R: num [1:1207] 50 42 49 53 40 34 45 36 48 46 ...
## $ DDE00A001322R: num [1:1207] 23 19 17 17 16 14 17 14 23 19 ...
## $ DA0CT21R : num [1:1207] 58.3 51.6 92.7 87 43.3 40 12.5 42 9.6 38.3 ...
## $ DA0CC21R : num [1:1207] 19 27.7 36.8 15 49.4 28.9 -1 35.8 60 60 ...
## $ DA0CSA21R : num [1:1207] 980 979 980 1007 1048 ...
## $ DA0CAA21R : num [1:1207] NA -1 -1 18.8 21 -1 -1 22.3 NA 23.1 ...
## $ DPSATOFC : num [1:1207] 99.9 186.6 146.7 60.1 553.4 ...
## $ DPSTTOFC : num [1:1207] 46.7 104.9 74.5 30.2 260.3 ...
## $ DPSCTOFP : num [1:1207] 1.5 1.1 1.4 3.1 2.1 1.1 4.1 1.5 4.5 0.9 ...
## $ DPSSTOFP : num [1:1207] 5 2.1 3.5 5 3.4 4.6 3.4 2.6 3.1 3.9 ...
## $ DPSUTOFP : num [1:1207] 5.4 4.9 2 1.7 8.3 4.4 3 5.8 10 6 ...
## $ DPSTTOFP : num [1:1207] 46.8 56.2 50.8 50.3 47 45.5 56.7 50.8 50 49.7 ...
## $ DPSETOFP : num [1:1207] 14.8 16.2 15 13.7 19.7 19.2 9.8 15.4 11.1 8.2 ...
## $ DPSXTOFP : num [1:1207] 26.5 19.5 27.4 26.2 19.5 25.2 23 23.9 21.4 31.3 ...
## $ DPSCTOSA : num [1:1207] 93333 100313 98293 85537 99324 ...
## $ DPSSTOSA : num [1:1207] 73300 79305 71215 81593 80415 ...
## $ DPSUTOSA : num [1:1207] 59550 60616 58022 77642 63829 ...
## $ DPSTTOSA : num [1:1207] 55570 47916 50382 55346 48825 ...
## $ DPSAMIFP : num [1:1207] 15.6 13.4 10.9 16.3 32.1 29.9 1.9 41.3 22.2 18.8 ...
## $ DPSAKIDR : num [1:1207] 5.7 6.2 5.5 5.7 6.1 5 5.2 7.3 7.4 6.5 ...
## $ DPSTKIDR : num [1:1207] 12.3 11 10.8 11.3 12.9 11 9.3 14.4 14.8 13.2 ...
## $ DPST05FP : num [1:1207] 10.4 23.8 32.7 9.7 33.8 44.8 17.9 21.5 35 21.9 ...
## $ DPSTEXPA : num [1:1207] 16.7 13.5 12.8 14.8 12.7 10.3 15.4 13.8 10.2 13.8 ...
## $ DPSTADFP : num [1:1207] 14.8 19 30.7 9.6 15.4 17.4 16.9 24.3 18.5 22.4 ...
## $ DPSTURNR : num [1:1207] 19.1 13.9 21.6 18.3 17.9 30.6 14.6 11.5 17 9.5 ...
## $ DPSTBLFP : num [1:1207] 8.3 2.9 4 6.5 9.6 11.6 0 1.4 4.4 0.5 ...
## $ DPSTHIFP : num [1:1207] 0 6.7 1.3 0 13.8 6.6 0 25.7 8.9 5.6 ...
## $ DPSTWHFP : num [1:1207] 91.7 90.5 93.3 93.5 74.6 80.9 100 69 86.7 93.9 ...
## $ DPSTINFP : num [1:1207] 0 0 0 0 0 0.8 0 0.3 0 0 ...
## $ DPSTASFP : num [1:1207] 0 0 0 0 0 0 0 0.7 0 0 ...
## $ DPSTPIFP : num [1:1207] 0 0 0 0 0 0 0 0 0 0 ...
## $ DPSTTWFP : num [1:1207] 0 0 1.3 0 1.9 0 0 2.8 0 0 ...
## $ DPSTREFP : num [1:1207] 81.6 71.5 87.6 70 71.4 71.4 61 41.7 82.7 66.4 ...
## $ DPSTSPFP : num [1:1207] 9.9 8.4 7.5 5.5 10.2 6.4 5.8 14.4 6.8 9.6 ...
## $ DPSTCOFP : num [1:1207] 0 4.9 2.7 12 5 6.1 19.2 6.5 7.4 9.2 ...
## [list output truncated]
better_district<-district %>% dplyr::select(DAGC4X21R,DPETECOP,DPETWHIP,DPETHISP) %>% na.omit(.)
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
#3) create a linear model using the “lm()” command, save it to some object
#4) call a “summary()” on your new model
grad_model_2<-lm(DAGC4X21R~DPETECOP+DPETWHIP,data=better_district)
summary(grad_model_2)
##
## Call:
## lm(formula = DAGC4X21R ~ DPETECOP + DPETWHIP, data = better_district)
##
## Residuals:
## Min 1Q Median 3Q Max
## -94.932 -1.173 2.137 4.742 12.295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.97177 1.98821 49.779 < 2e-16 ***
## DPETECOP -0.11502 0.02292 -5.018 6.1e-07 ***
## DPETWHIP 0.04190 0.01755 2.387 0.0171 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.4 on 1071 degrees of freedom
## Multiple R-squared: 0.06091, Adjusted R-squared: 0.05916
## F-statistic: 34.73 on 2 and 1071 DF, p-value: 2.423e-15
grad_model_1<-lm(DAGC4X21R~DPETECOP+DPETWHIP,data=better_district)
plot(grad_model_2,which=1)
#The multiple R squared value is 0.06091, meaning that an estimated 6.1% of the varation in (DAGC4X21R) 4-year graduation rates is explained by the % of economically disadvantaged students (DPETECOP) and the % of white students (DPETWHIP). This is a relatively low value, meaning that these two variables alone do not explain much of the variation in graduation rates. #The F statistic is 34.73 with a very small P value (2.423e-15). This indicates that the model as a whole is statistically significant. Despite, the low R-squared. #Significant variables:DPETECOP(percentage(%) of economically disadvantaged students). It is highly significant with a p-value of 6.1e-07. #DPETWHIP (percentage (%)of white students) also statistically significant with a P value of 0.0171. #Insignificant variables: Both independent and dependent variables are statistically significant.
#Interpretation of Significant Independent Variables:DPETECOP (% Economically Disadvantaged Students):The coefficient estimate is -0.11502.Meaning that for each 1% increase in economically disadvantaged students in a district, the 4-year graduation rate is expected to decrease by approximately 0.115% points. Thus, holding the percentage of white students constant. This negative relationship suggests that higher levels of economic disadvantage are associated with lower graduation rates. #DPETWHIP (% White Students):The coefficient estimate is 0.04190. This means that for each 1% increase in white students in a district, the 4-year graduation rate is expected to increase by approximately 0.042% points. Thus, holding the percentage of economically disadvantaged students constant.This positive relationship suggests that districts with higher percentages of white students tend to have slightly higher graduation rates. #The intercept (98.97177) represents the predicted graduation rate when both independent variables are 0. That is, a theoretical district with no economically disadvantaged students and no white students would be predicted to have a graduation rate of approximately 99%.
#There appears to be a pattern in the residuals, with a funnel shape showing higher variance at lower fitted values. The visual shows several significant outliers, specifically in the negative direction.The red line appears to show a slight downward trend rather than being horizontal around zero.These patterns suggest that the linearity assumption is violated. Upon examination of the visual it appears the relationship between the variables and graduation rates is not entirely linear. Concluding, that although both economic disadvantage and white student percentages are significantly associated with graduation rates. The overall model provides only a small portion of the variation in graduation rates (about 6.1%). The violation of the linearity assumption suggests that a more complex modeling approach might be needed to better understand the influencing factors of graduation rates in these school districts.