A Closer Look at the Higher-Ed Work Force Since 2000.

Estimated number of workers employed by institutions of higher education.https://www.chronicle.com/article/colleges-have-shed-a-tenth-of-their-employees-since-the-pandemic-began?utm_source=Iterable&utm_medium=email&utm_campaign=campaign_1706047_nl_Academe-Today_date_20201111&cid=at&source=&sourceId=

library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## -- Attaching packages ------------------------------- fpp2 2.4 --
## v ggplot2   3.3.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.0.2
## Warning: package 'fma' was built under R version 4.0.3
## Warning: package 'expsmooth' was built under R version 4.0.3
## 
library(knitr)
highered <- read.csv("C:/Users/burtkb/Downloads/data-3qXvL.csv", skip = 1)
he.ts = ts(highered[,2], start=c(2000,1), end=c(2020,10), frequency = 12)
autoplot(he.ts)

sum(is.na(he.ts))
## [1] 0

I cannot figure out if the employment at universities has decreased that significantly or if there is something wrong with the data. I imagine that employment at higher ed institutions did decrease significantly, but as of October this year, there is a bit of a recovery.

There is an upward trend overall.

There are 0 missing values.

he.train = window(he.ts, start=c(2000,1),  end=c(2019,12),frequency=12)
he.test = window(he.ts,start=c(2020,1), end=c(2020,10),frequency=12)

mean(he.train, na.rm=TRUE)
## [1] 4226148
mean(he.test)
## [1] 4423500
he.bc =BoxCox(he.train, lambda=BoxCox.lambda(he.train))
he.ts2 = stl(he.bc, s.window="periodic")
d.hets = seasadj(he.ts2)
autoplot(d.hets)

#model1 - ets
fit1 = ets(d.hets)
f1 = forecast(fit1, h=6)
checkresiduals(f1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 42.778, df = 19, p-value = 0.00139
## 
## Model df: 5.   Total lags used: 24
autoplot(f1)

#model2 - arima
fit2 = auto.arima(d.hets, seasonal=FALSE)
f2 = forecast(fit2, h=6)
checkresiduals(f2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 27.901, df = 21, p-value = 0.143
## 
## Model df: 3.   Total lags used: 24
autoplot(f2)

accuracy(f1)
##                      ME        RMSE         MAE        MPE      MAPE       MASE
## Training set 1978421806 20970309703 14417683384 0.02418679 0.1699914 0.06137129
##                     ACF1
## Training set -0.01482155
accuracy(f2)
##                      ME        RMSE         MAE           MPE      MAPE
## Training set -191463210 20601903878 14517538213 -0.0008876154 0.1702223
##                    MASE       ACF1
## Training set 0.06179634 0.04048342

The arima model looks a bit better to me.

We know that this forecast is inaccurate due to the pandemic, and as we saw in the beginning - before splitting into train and test, there was a dramatic decrease in the number of jobs