Module 10
Reading in csv files and downloading packages.
Warning: package 'ggplot2' was built under R version 4.5.2
enroll= read.csv ("enrollmentForecast.csv" )
str (enroll)
'data.frame': 29 obs. of 5 variables:
$ YEAR : int 1 2 3 4 5 6 7 8 9 10 ...
$ ROLL : int 5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
$ UNEM : num 8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
$ HGRAD: int 9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
$ INC : int 1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
[[1]]
YEAR ROLL UNEM HGRAD INC
1 1 5501 8.1 9552 1923
2 2 5945 7.0 9680 1961
3 3 6629 7.3 9731 1979
4 4 7556 7.5 11666 2030
5 5 8716 7.0 14675 2112
6 6 9369 6.4 15265 2192
7 7 9920 6.5 15484 2235
8 8 10167 6.4 15723 2351
9 9 11084 6.3 16501 2411
10 10 12504 7.7 16890 2475
11 11 13746 8.2 17203 2524
12 12 13656 7.5 17707 2674
13 13 13850 7.4 18108 2833
14 14 14145 8.2 18266 2863
15 15 14888 10.1 19308 2839
16 16 14991 9.2 18224 2898
17 17 14836 7.7 18997 3123
18 18 14478 5.7 19505 3195
19 19 14539 6.5 19800 3239
20 20 14395 7.5 19546 3129
21 21 14599 7.3 19117 3100
22 22 14969 9.2 18774 3008
23 23 15107 10.1 17813 2983
24 24 14831 7.5 17304 3069
25 25 15081 8.8 16756 3151
26 26 15127 9.1 16749 3127
27 27 15856 8.8 16925 3179
28 28 15938 7.8 17231 3207
29 29 16081 7.0 16816 3345
YEAR ROLL UNEM HGRAD INC
Min. : 1 Min. : 5501 Min. : 5.700 Min. : 9552 Min. :1923
1st Qu.: 8 1st Qu.:10167 1st Qu.: 7.000 1st Qu.:15723 1st Qu.:2351
Median :15 Median :14395 Median : 7.500 Median :17203 Median :2863
Mean :15 Mean :12707 Mean : 7.717 Mean :16528 Mean :2729
3rd Qu.:22 3rd Qu.:14969 3rd Qu.: 8.200 3rd Qu.:18266 3rd Qu.:3127
Max. :29 Max. :16081 Max. :10.100 Max. :19800 Max. :3345
Scatterplots
ggplot (enroll, aes (UNEM, ROLL)) +
geom_point () +
xlab ("Unemployment Rate" ) +
ylab ("Raw UNM Undergraduate Enrollment Values " ) +
ggtitle ("UNM Enrollment and Unemployment Rate" )
ggplot (enroll, aes (ROLL, HGRAD)) +
geom_point () +
xlab ("HS Graduates" ) +
ylab ("Raw UNM Undergraduate Enrollment Values" ) +
ggtitle ("UNM Enrollment vs High School Graduates" )
ggplot (enroll, aes (ROLL, INC)) +
geom_point () +
xlab ("Monthly per capita Income" ) +
ylab ("Raw UNM Undergraduate Enrollment Values" ) +
ggtitle ("UNM Enrollment vs Monthly per capita Income" )
Linear Models
lm1 = lm (ROLL ~ UNEM + HGRAD, data = enroll)
summary (lm1)
Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enroll)
Residuals:
Min 1Q Median 3Q Max
-2102.2 -861.6 -349.4 374.5 3603.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.256e+03 2.052e+03 -4.023 0.00044 ***
UNEM 6.983e+02 2.244e+02 3.111 0.00449 **
HGRAD 9.423e-01 8.613e-02 10.941 3.16e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8373
F-statistic: 73.03 on 2 and 26 DF, p-value: 2.144e-11
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 26.349 2.366e-05 ***
HGRAD 1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26 44805568 1723291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes, there is bias in the model.
newx = data.frame (UNEM = 9 , HGRAD = 25000 )
predict (lm1, newx, interval = "prediction" )
fit lwr upr
1 21585.58 18452.36 24718.8
lm2 = lm (ROLL ~ UNEM + HGRAD + INC, data = enroll)
summary (lm2)
Call:
lm(formula = ROLL ~ UNEM + HGRAD + INC, data = enroll)
Residuals:
Min 1Q Median 3Q Max
-1148.84 -489.71 -1.88 387.40 1425.75
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.153e+03 1.053e+03 -8.691 5.02e-09 ***
UNEM 4.501e+02 1.182e+02 3.809 0.000807 ***
HGRAD 4.065e-01 7.602e-02 5.347 1.52e-05 ***
INC 4.275e+00 4.947e-01 8.642 5.59e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 670.4 on 25 degrees of freedom
Multiple R-squared: 0.9621, Adjusted R-squared: 0.9576
F-statistic: 211.5 on 3 and 25 DF, p-value: < 2.2e-16
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 101.02 2.894e-10 ***
HGRAD 1 206279143 206279143 458.92 < 2.2e-16 ***
INC 1 33568255 33568255 74.68 5.594e-09 ***
Residuals 25 11237313 449493
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Model 1: ROLL ~ UNEM + HGRAD
Model 2: ROLL ~ UNEM + HGRAD + INC
Res.Df RSS Df Sum of Sq F Pr(>F)
1 26 44805568
2 25 11237313 1 33568255 74.68 5.594e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes, including the monthly per capita income improves the model.