A Closer Look at the Higher-Ed Work Force Since 2000.
Estimated number of workers employed by institutions of higher education.https://www.chronicle.com/article/colleges-have-shed-a-tenth-of-their-employees-since-the-pandemic-began?utm_source=Iterable&utm_medium=email&utm_campaign=campaign_1706047_nl_Academe-Today_date_20201111&cid=at&source=&sourceId=
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ------------------------------- fpp2 2.4 --
## v ggplot2 3.3.0 v fma 2.4
## v forecast 8.12 v expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.0.2
## Warning: package 'fma' was built under R version 4.0.3
## Warning: package 'expsmooth' was built under R version 4.0.3
##
library(knitr)
highered <- read.csv("C:/Users/burtkb/Downloads/data-3qXvL.csv", skip = 1)
he.ts = ts(highered[,2], start=c(2000,1), end=c(2020,10), frequency = 12)
autoplot(he.ts)
sum(is.na(he.ts))
## [1] 0
I cannot figure out if the employment at universities has decreased that significantly or if there is something wrong with the data. I imagine that employment at higher ed institutions did decrease significantly, but as of October this year, there is a bit of a recovery.
There is an upward trend overall.
There are 0 missing values.
he.train = window(he.ts, start=c(2000,1), end=c(2019,12),frequency=12)
he.test = window(he.ts,start=c(2020,1), end=c(2020,10),frequency=12)
mean(he.train, na.rm=TRUE)
## [1] 4226148
mean(he.test)
## [1] 4423500
he.bc =BoxCox(he.train, lambda=BoxCox.lambda(he.train))
he.ts2 = stl(he.bc, s.window="periodic")
d.hets = seasadj(he.ts2)
autoplot(d.hets)
#model1 - ets
fit1 = ets(d.hets)
f1 = forecast(fit1, h=6)
checkresiduals(f1)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 42.778, df = 19, p-value = 0.00139
##
## Model df: 5. Total lags used: 24
autoplot(f1)
#model2 - arima
fit2 = auto.arima(d.hets, seasonal=FALSE)
f2 = forecast(fit2, h=6)
checkresiduals(f2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,2)
## Q* = 27.901, df = 21, p-value = 0.143
##
## Model df: 3. Total lags used: 24
autoplot(f2)
accuracy(f1)
## ME RMSE MAE MPE MAPE MASE
## Training set 1978421806 20970309703 14417683384 0.02418679 0.1699914 0.06137129
## ACF1
## Training set -0.01482155
accuracy(f2)
## ME RMSE MAE MPE MAPE
## Training set -191463210 20601903878 14517538213 -0.0008876154 0.1702223
## MASE ACF1
## Training set 0.06179634 0.04048342
The arima model looks a bit better to me.
We know that this forecast is inaccurate due to the pandemic, and as we saw in the beginning - before splitting into train and test, there was a dramatic decrease in the number of jobs