YEAR ROLL UNEM HGRAD INC
Min. : 1 Min. : 5501 Min. : 5.700 Min. : 9552 Min. :1923
1st Qu.: 8 1st Qu.:10167 1st Qu.: 7.000 1st Qu.:15723 1st Qu.:2351
Median :15 Median :14395 Median : 7.500 Median :17203 Median :2863
Mean :15 Mean :12707 Mean : 7.717 Mean :16528 Mean :2729
3rd Qu.:22 3rd Qu.:14969 3rd Qu.: 8.200 3rd Qu.:18266 3rd Qu.:3127
Max. :29 Max. :16081 Max. :10.100 Max. :19800 Max. :3345
1. Scatterplots: Enrollment vs Predictor Variables
# ROLL vs UNEMPLOYMENTplot(enrollment$UNEM, enrollment$ROLL,xlab ="UNEM",ylab ="ROLL",main ="ROLL vs UNEMPLOYMENT")
# ROLL vs HGRADplot(enrollment$HGRAD, enrollment$ROLL,xlab ="HGRAD",ylab ="ROLL",main ="ROLL vs HIGH SCHOOL GRAD")
# ROLL vs INCplot(enrollment$INC, enrollment$ROLL,xlab ="INC",ylab ="ROLL",main ="ROLL vs INCOME")
2. Linear Model: ROLL ~ UNEM + HGRAD
# Build linear modelfit1 <-lm(ROLL ~ UNEM + HGRAD,data = enrollment)# Model outputfit1
Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enrollment)
Coefficients:
(Intercept) UNEM HGRAD
-8255.7511 698.2681 0.9423
# Model summarysummary(fit1)
Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enrollment)
Residuals:
Min 1Q Median 3Q Max
-2102.2 -861.6 -349.4 374.5 3603.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.256e+03 2.052e+03 -4.023 0.00044 ***
UNEM 6.983e+02 2.244e+02 3.111 0.00449 **
HGRAD 9.423e-01 8.613e-02 10.941 3.16e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8373
F-statistic: 73.03 on 2 and 26 DF, p-value: 2.144e-11
# ANOVA tableanova(fit1)
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 26.349 2.366e-05 ***
HGRAD 1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26 44805568 1723291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3. Which Variable is Most Closely Related to Enrollment?
summary(fit1)
Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enrollment)
Residuals:
Min 1Q Median 3Q Max
-2102.2 -861.6 -349.4 374.5 3603.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.256e+03 2.052e+03 -4.023 0.00044 ***
UNEM 6.983e+02 2.244e+02 3.111 0.00449 **
HGRAD 9.423e-01 8.613e-02 10.941 3.16e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8373
F-statistic: 73.03 on 2 and 26 DF, p-value: 2.144e-11
anova(fit1)
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 26.349 2.366e-05 ***
HGRAD 1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26 44805568 1723291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Most Closely Related Variable: HGRAD
(p < 0.05)
Therefore, HGRAD is the variable most closely related to enrollment because it has the larger F-value in the ANOVA table and the smaller p-value.
4. Residual Plot and Bias Check
# Residuals vs fitted valuesplot(fit1, which =1)
Residual Plot Assignment There is no obvious pattern in the residual plot. Therefore, there is little evidence of model bias.
5. Predict Fall Enrollment
# New observationnewdata <-data.frame(UNEM =9,HGRAD =25000)# Predicted enrollmentpredict(fit1, newdata,interval ="prediction")
fit lwr upr
1 21585.58 18452.36 24718.8
Predicted Enrollment The predicted fall enrollment is approximately 21,586 students.
6. Second Model: Add Income (INC)
# Build second modelfit2 <-lm(ROLL ~ UNEM + HGRAD + INC,data = enrollment)# Model summarysummary(fit2)
Call:
lm(formula = ROLL ~ UNEM + HGRAD + INC, data = enrollment)
Residuals:
Min 1Q Median 3Q Max
-1148.84 -489.71 -1.88 387.40 1425.75
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.153e+03 1.053e+03 -8.691 5.02e-09 ***
UNEM 4.501e+02 1.182e+02 3.809 0.000807 ***
HGRAD 4.065e-01 7.602e-02 5.347 1.52e-05 ***
INC 4.275e+00 4.947e-01 8.642 5.59e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 670.4 on 25 degrees of freedom
Multiple R-squared: 0.9621, Adjusted R-squared: 0.9576
F-statistic: 211.5 on 3 and 25 DF, p-value: < 2.2e-16
# ANOVA tableanova(fit2)
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 101.02 2.894e-10 ***
HGRAD 1 206279143 206279143 458.92 < 2.2e-16 ***
INC 1 33568255 33568255 74.68 5.594e-09 ***
Residuals 25 11237313 449493
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
7. Compare the Two Models
anova(fit1, fit2)
Analysis of Variance Table
Model 1: ROLL ~ UNEM + HGRAD
Model 2: ROLL ~ UNEM + HGRAD + INC
Res.Df RSS Df Sum of Sq F Pr(>F)
1 26 44805568
2 25 11237313 1 33568255 74.68 5.594e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model Comparison
(p < 0.05)
Therefore, including INC significantly improves the model because the comparison ANOVA is significant and the model explains more variation in enrollment.