= read.csv("enrollmentForecast.csv")
ef library(ggplot2)
m10_Quarto_2_Strange
Module 10 Exercise
Set working directory and load data (setwd(“~/Desktop/geog5680/Module Deliverables/Module 10”))
Read in the data
Look at the data structure
str(ef)
'data.frame': 29 obs. of 5 variables:
$ YEAR : int 1 2 3 4 5 6 7 8 9 10 ...
$ ROLL : int 5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
$ UNEM : num 8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
$ HGRAD: int 9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
$ INC : int 1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
1. Make scatterplots of ROLL against the other variables
(ROLL x UNEM)
Fall Undergraduate Enrollment and January Unemployment for New Mexico
ggplot(ef, aes(x=UNEM, y=ROLL)) + geom_point(size=3, alpha=.5) + labs(x="January unemployment rate (%) for New Mexico (UNEM)", y="Fall undergraduate enrollment (ROLL)", main="Fall Undergraduate Enrollment and January Unemployment for New Mexico")
(ROLL x HGRAD)
Fall Undergraduate Enrollment and Spring High School Graduates in New Mexico
ggplot(ef, aes(x=HGRAD, y=ROLL)) + geom_point(size=3, alpha=.5) + labs(x="Spring high school graduates in New Mexico (HGRAD)", y="Fall undergraduate enrollment (ROLL)", main="Fall Undergraduate Enrollment and Spring High School Graduates in New Mexico")
(ROLL x INC)
Undergraduate Enrollment and Per-Capita Income in Albuquerque
ggplot(ef, aes(x=INC, y=ROLL)) + geom_point(size=3, alpha=.5) + labs(x="Per capita income in Albuquerque (1961 dollars) (INC)", y="Fall undergraduate enrollment (ROLL)", main="Undergraduate Enrollment and Per-Capita Income in Albuquerque")
2. Build a linear model using the unemployment rate (UNEM) and number of spring high school graduates (HGRAD) to predict the fall enrollment (ROLL), i.e. ROLL ~ UNEM + HGRAD
$UNEM.cen = ef$UNEM-mean(ef$UNEM)
ef$HGRAD.cen = ef$HGRAD-mean(ef$HGRAD)
eflm(ROLL~UNEM.cen+HGRAD.cen, data = ef)
Call:
lm(formula = ROLL ~ UNEM.cen + HGRAD.cen, data = ef)
Coefficients:
(Intercept) UNEM.cen HGRAD.cen
1.271e+04 6.983e+02 9.423e-01
= lm(ROLL~UNEM.cen+HGRAD.cen, data = ef) future_enroll_predict
3. Use the summary() and anova() functions to investigate the model
summary(future_enroll_predict)
Call:
lm(formula = ROLL ~ UNEM.cen + HGRAD.cen, data = ef)
Residuals:
Min 1Q Median 3Q Max
-2102.2 -861.6 -349.4 374.5 3603.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.271e+04 2.438e+02 52.127 < 2e-16 ***
UNEM.cen 6.983e+02 2.244e+02 3.111 0.00449 **
HGRAD.cen 9.423e-01 8.613e-02 10.941 3.16e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8373
F-statistic: 73.03 on 2 and 26 DF, p-value: 2.144e-11
anova(future_enroll_predict)
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM.cen 1 45407767 45407767 26.349 2.366e-05 ***
HGRAD.cen 1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26 44805568 1723291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4. Make a residual plot and check for any bias in the model
hist(residuals(future_enroll_predict))
plot(future_enroll_predict, which = 1)
5. Use the predict() function to estimate the expected fall enrollment, if the current year’s unemployment rate is 9% and the size of the spring high school graduating class is 25,000 students.
= 9-mean(ef$UNEM)
unem_circumstance.cen = 25000-mean(ef$HGRAD)
hgrad_circumstance.cen = data.frame(UNEM.cen = unem_circumstance.cen, HGRAD.cen=hgrad_circumstance.cen)
circumstance_new_data_frame predict(future_enroll_predict, circumstance_new_data_frame)
1
21585.58
= predict(future_enroll_predict, circumstance_new_data_frame) predicted_fall_enrollment_noINC
If the current year’s unemployment rate is 9% and the size of the spring high school graduating class is 25,000 students, then the estimated expected fall enrollment is 21,585.58 students.
6. Build a second model which includes per capita income (INC).
$INC.cen = ef$INC-mean(ef$INC)
ef= lm(ROLL~UNEM.cen+HGRAD.cen+INC.cen, data = ef)
exp_fall_enroll_w_inc summary(exp_fall_enroll_w_inc)
Call:
lm(formula = ROLL ~ UNEM.cen + HGRAD.cen + INC.cen, data = ef)
Residuals:
Min 1Q Median 3Q Max
-1148.84 -489.71 -1.88 387.40 1425.75
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.271e+04 1.245e+02 102.066 < 2e-16 ***
UNEM.cen 4.501e+02 1.182e+02 3.809 0.000807 ***
HGRAD.cen 4.065e-01 7.602e-02 5.347 1.52e-05 ***
INC.cen 4.275e+00 4.947e-01 8.642 5.59e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 670.4 on 25 degrees of freedom
Multiple R-squared: 0.9621, Adjusted R-squared: 0.9576
F-statistic: 211.5 on 3 and 25 DF, p-value: < 2.2e-16
= data.frame(UNEM.cen = 9-mean(ef$UNEM), HGRAD.cen = 25000-mean(ef$HGRAD), INC.cen = 0)
q6_data_frame predict(exp_fall_enroll_w_inc, q6_data_frame)
1
16728.11
= predict(exp_fall_enroll_w_inc, q6_data_frame) predicted_fall_enrollment_withINC
If the current year’s unemployment rate is 9%, the size of the spring high school graduating class is 25,000 students, and the per capita income is average, then the estimated expected fall enrollment is 16728.11 students.
7. Compare the two models with anova().
anova(future_enroll_predict, exp_fall_enroll_w_inc)
Analysis of Variance Table
Model 1: ROLL ~ UNEM.cen + HGRAD.cen
Model 2: ROLL ~ UNEM.cen + HGRAD.cen + INC.cen
Res.Df RSS Df Sum of Sq F Pr(>F)
1 26 44805568
2 25 11237313 1 33568255 74.68 5.594e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1