Module 10

Author

K M

Module 10

Reading in csv files and downloading packages.

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
enroll=read.csv("enrollmentForecast.csv")
str(enroll)
'data.frame':   29 obs. of  5 variables:
 $ YEAR : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ROLL : int  5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
 $ UNEM : num  8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
 $ HGRAD: int  9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
 $ INC  : int  1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
list(enroll)
[[1]]
   YEAR  ROLL UNEM HGRAD  INC
1     1  5501  8.1  9552 1923
2     2  5945  7.0  9680 1961
3     3  6629  7.3  9731 1979
4     4  7556  7.5 11666 2030
5     5  8716  7.0 14675 2112
6     6  9369  6.4 15265 2192
7     7  9920  6.5 15484 2235
8     8 10167  6.4 15723 2351
9     9 11084  6.3 16501 2411
10   10 12504  7.7 16890 2475
11   11 13746  8.2 17203 2524
12   12 13656  7.5 17707 2674
13   13 13850  7.4 18108 2833
14   14 14145  8.2 18266 2863
15   15 14888 10.1 19308 2839
16   16 14991  9.2 18224 2898
17   17 14836  7.7 18997 3123
18   18 14478  5.7 19505 3195
19   19 14539  6.5 19800 3239
20   20 14395  7.5 19546 3129
21   21 14599  7.3 19117 3100
22   22 14969  9.2 18774 3008
23   23 15107 10.1 17813 2983
24   24 14831  7.5 17304 3069
25   25 15081  8.8 16756 3151
26   26 15127  9.1 16749 3127
27   27 15856  8.8 16925 3179
28   28 15938  7.8 17231 3207
29   29 16081  7.0 16816 3345
summary(enroll)
      YEAR         ROLL            UNEM            HGRAD            INC      
 Min.   : 1   Min.   : 5501   Min.   : 5.700   Min.   : 9552   Min.   :1923  
 1st Qu.: 8   1st Qu.:10167   1st Qu.: 7.000   1st Qu.:15723   1st Qu.:2351  
 Median :15   Median :14395   Median : 7.500   Median :17203   Median :2863  
 Mean   :15   Mean   :12707   Mean   : 7.717   Mean   :16528   Mean   :2729  
 3rd Qu.:22   3rd Qu.:14969   3rd Qu.: 8.200   3rd Qu.:18266   3rd Qu.:3127  
 Max.   :29   Max.   :16081   Max.   :10.100   Max.   :19800   Max.   :3345  

Scatterplots

ggplot(enroll, aes(UNEM, ROLL)) +
  geom_point() +
  xlab("Unemployment Rate") +
  ylab("Raw UNM Undergraduate Enrollment Values ") +
  ggtitle("UNM Enrollment and Unemployment Rate")

ggplot(enroll, aes(ROLL, HGRAD)) +
  geom_point() +
  xlab("HS Graduates") +
  ylab("Raw UNM Undergraduate Enrollment Values") +
  ggtitle("UNM Enrollment vs High School Graduates")

ggplot(enroll, aes(ROLL, INC)) +
  geom_point() +
  xlab("Monthly per capita Income") +
  ylab("Raw UNM Undergraduate Enrollment Values") +
  ggtitle("UNM Enrollment vs Monthly per capita Income")

Linear Models

lm1 = lm(ROLL ~ UNEM + HGRAD, data = enroll)
summary(lm1)

Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enroll)

Residuals:
    Min      1Q  Median      3Q     Max 
-2102.2  -861.6  -349.4   374.5  3603.5 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -8.256e+03  2.052e+03  -4.023  0.00044 ***
UNEM         6.983e+02  2.244e+02   3.111  0.00449 ** 
HGRAD        9.423e-01  8.613e-02  10.941 3.16e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared:  0.8489,    Adjusted R-squared:  0.8373 
F-statistic: 73.03 on 2 and 26 DF,  p-value: 2.144e-11
anova(lm1)
Analysis of Variance Table

Response: ROLL
          Df    Sum Sq   Mean Sq F value    Pr(>F)    
UNEM       1  45407767  45407767  26.349 2.366e-05 ***
HGRAD      1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26  44805568   1723291                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, there is bias in the model.

newx = data.frame(UNEM = 9, HGRAD = 25000)
predict(lm1, newx, interval = "prediction")
       fit      lwr     upr
1 21585.58 18452.36 24718.8
lm2 = lm(ROLL ~ UNEM + HGRAD + INC, data = enroll)
summary(lm2)

Call:
lm(formula = ROLL ~ UNEM + HGRAD + INC, data = enroll)

Residuals:
     Min       1Q   Median       3Q      Max 
-1148.84  -489.71    -1.88   387.40  1425.75 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -9.153e+03  1.053e+03  -8.691 5.02e-09 ***
UNEM         4.501e+02  1.182e+02   3.809 0.000807 ***
HGRAD        4.065e-01  7.602e-02   5.347 1.52e-05 ***
INC          4.275e+00  4.947e-01   8.642 5.59e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 670.4 on 25 degrees of freedom
Multiple R-squared:  0.9621,    Adjusted R-squared:  0.9576 
F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16
anova(lm2)
Analysis of Variance Table

Response: ROLL
          Df    Sum Sq   Mean Sq F value    Pr(>F)    
UNEM       1  45407767  45407767  101.02 2.894e-10 ***
HGRAD      1 206279143 206279143  458.92 < 2.2e-16 ***
INC        1  33568255  33568255   74.68 5.594e-09 ***
Residuals 25  11237313    449493                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm1, lm2)
Analysis of Variance Table

Model 1: ROLL ~ UNEM + HGRAD
Model 2: ROLL ~ UNEM + HGRAD + INC
  Res.Df      RSS Df Sum of Sq     F    Pr(>F)    
1     26 44805568                                 
2     25 11237313  1  33568255 74.68 5.594e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, including the monthly per capita income improves the model.