module 10

Author

Rachael Berghahn

enrollment <- read.csv("enrollmentForecast.csv") ##reading enrollment data for University of New Mexico undergrad students
head(enrollment)

  YEAR ROLL UNEM HGRAD  INC
1    1 5501  8.1  9552 1923
2    2 5945  7.0  9680 1961
3    3 6629  7.3  9731 1979
4    4 7556  7.5 11666 2030
5    5 8716  7.0 14675 2112
6    6 9369  6.4 15265 2192

head(enrollment) ##looking at structure of data

  YEAR ROLL UNEM HGRAD  INC
1    1 5501  8.1  9552 1923
2    2 5945  7.0  9680 1961
3    3 6629  7.3  9731 1979
4    4 7556  7.5 11666 2030
5    5 8716  7.0 14675 2112
6    6 9369  6.4 15265 2192

str(enrollment)

'data.frame':   29 obs. of  5 variables:
 $ YEAR : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ROLL : int  5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
 $ UNEM : num  8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
 $ HGRAD: int  9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
 $ INC  : int  1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...

summary(enrollment)

      YEAR         ROLL            UNEM            HGRAD            INC      
 Min.   : 1   Min.   : 5501   Min.   : 5.700   Min.   : 9552   Min.   :1923  
 1st Qu.: 8   1st Qu.:10167   1st Qu.: 7.000   1st Qu.:15723   1st Qu.:2351  
 Median :15   Median :14395   Median : 7.500   Median :17203   Median :2863  
 Mean   :15   Mean   :12707   Mean   : 7.717   Mean   :16528   Mean   :2729  
 3rd Qu.:22   3rd Qu.:14969   3rd Qu.: 8.200   3rd Qu.:18266   3rd Qu.:3127  
 Max.   :29   Max.   :16081   Max.   :10.100   Max.   :19800   Max.   :3345

plot(enrollment$ROLL, enrollment$UNEM, main = "enrollment vs unemployment", xlab="enrollment", ylab="unemployment") ##scatterplots of enrollment vs other variables plot

plot(enrollment$ROLL, enrollment$HGRAD, main = "enrollment vs high school graduates", xlab="enrollment", ylab="high school graduates")

plot(enrollment$ROLL, enrollment$INC, main = "enrollment vs monthly per capita income", xlab="enrollment", ylab="monthly per capita income")

model_one <- lm(ROLL ~ UNEM + HGRAD, data = enrollment) ##making a linear model to predict enrollment based on unemployment rate and spring high school graduates

summary(model_one)


Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enrollment)

Residuals:
    Min      1Q  Median      3Q     Max 
-2102.2  -861.6  -349.4   374.5  3603.5 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -8.256e+03  2.052e+03  -4.023  0.00044 ***
UNEM         6.983e+02  2.244e+02   3.111  0.00449 ** 
HGRAD        9.423e-01  8.613e-02  10.941 3.16e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared:  0.8489,    Adjusted R-squared:  0.8373 
F-statistic: 73.03 on 2 and 26 DF,  p-value: 2.144e-11

anova(model_one) ##while both unemployment and high school grad numbers have small p values, high school graduate number has a much lower p value than that of unemployment, making it more closely related to enrollment numbers

Analysis of Variance Table

Response: ROLL
          Df    Sum Sq   Mean Sq F value    Pr(>F)    
UNEM       1  45407767  45407767  26.349 2.366e-05 ***
HGRAD      1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26  44805568   1723291                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model_one$residuals, main = "Residuals", ylab = "Residuals") ##plot is fairly randomly scattered, but there is more data in the lower half of the graph, showing some bias in data

new_data <- data.frame(UNEM = 9, HGRAD = 25000) 
predict(model_one, newdata = new_data) ##prediction of size of class

       1 
21585.58

model2 = lm(ROLL ~ UNEM + HGRAD + INC, data = enrollment) ##model including income per capita

anova(model_one, model2) ##including monthly per capita income improves model as p value is well below 0.05, meaning null hypothesis is rejected that monthly per capita income has no effect, residuals also dropped to second model, showing per capita income improved dataset

Analysis of Variance Table

Model 1: ROLL ~ UNEM + HGRAD
Model 2: ROLL ~ UNEM + HGRAD + INC
  Res.Df      RSS Df Sum of Sq     F    Pr(>F)    
1     26 44805568                                 
2     25 11237313  1  33568255 74.68 5.594e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1

[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).