'data.frame': 29 obs. of 5 variables:
$ YEAR : int 1 2 3 4 5 6 7 8 9 10 ...
$ ROLL : int 5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
$ UNEM : num 8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
$ HGRAD: int 9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
$ INC : int 1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
summary(enrollment)
YEAR ROLL UNEM HGRAD INC
Min. : 1 Min. : 5501 Min. : 5.700 Min. : 9552 Min. :1923
1st Qu.: 8 1st Qu.:10167 1st Qu.: 7.000 1st Qu.:15723 1st Qu.:2351
Median :15 Median :14395 Median : 7.500 Median :17203 Median :2863
Mean :15 Mean :12707 Mean : 7.717 Mean :16528 Mean :2729
3rd Qu.:22 3rd Qu.:14969 3rd Qu.: 8.200 3rd Qu.:18266 3rd Qu.:3127
Max. :29 Max. :16081 Max. :10.100 Max. :19800 Max. :3345
plot(enrollment$ROLL, enrollment$UNEM, main ="enrollment vs unemployment", xlab="enrollment", ylab="unemployment") ##scatterplots of enrollment vs other variables plot
plot(enrollment$ROLL, enrollment$HGRAD, main ="enrollment vs high school graduates", xlab="enrollment", ylab="high school graduates")
plot(enrollment$ROLL, enrollment$INC, main ="enrollment vs monthly per capita income", xlab="enrollment", ylab="monthly per capita income")
model_one <-lm(ROLL ~ UNEM + HGRAD, data = enrollment) ##making a linear model to predict enrollment based on unemployment rate and spring high school graduates
summary(model_one)
Call:
lm(formula = ROLL ~ UNEM + HGRAD, data = enrollment)
Residuals:
Min 1Q Median 3Q Max
-2102.2 -861.6 -349.4 374.5 3603.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.256e+03 2.052e+03 -4.023 0.00044 ***
UNEM 6.983e+02 2.244e+02 3.111 0.00449 **
HGRAD 9.423e-01 8.613e-02 10.941 3.16e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8373
F-statistic: 73.03 on 2 and 26 DF, p-value: 2.144e-11
anova(model_one) ##while both unemployment and high school grad numbers have small p values, high school graduate number has a much lower p value than that of unemployment, making it more closely related to enrollment numbers
Analysis of Variance Table
Response: ROLL
Df Sum Sq Mean Sq F value Pr(>F)
UNEM 1 45407767 45407767 26.349 2.366e-05 ***
HGRAD 1 206279143 206279143 119.701 3.157e-11 ***
Residuals 26 44805568 1723291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_one$residuals, main ="Residuals", ylab ="Residuals") ##plot is fairly randomly scattered, but there is more data in the lower half of the graph, showing some bias in data
new_data <-data.frame(UNEM =9, HGRAD =25000) predict(model_one, newdata = new_data) ##prediction of size of class
1
21585.58
model2 =lm(ROLL ~ UNEM + HGRAD + INC, data = enrollment) ##model including income per capita
anova(model_one, model2) ##including monthly per capita income improves model as p value is well below 0.05, meaning null hypothesis is rejected that monthly per capita income has no effect, residuals also dropped to second model, showing per capita income improved dataset
Analysis of Variance Table
Model 1: ROLL ~ UNEM + HGRAD
Model 2: ROLL ~ UNEM + HGRAD + INC
Res.Df RSS Df Sum of Sq F Pr(>F)
1 26 44805568
2 25 11237313 1 33568255 74.68 5.594e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
1+1
[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).