Introduction

This case study examines a time series data set that observes the number of hours worked by the part-time and full-time labor force in the United States. The data set contains two variables, the date and the number of total hours worked in millions of hours. The observations in the data set are recorded annually and begin in 1948 and stop in 2021. This data set comes from the Federal Reserve Economic Data (FRED) and was uploaded to github for universal access.

We will use the four baseline forecasting methods (Moving Average, Naive, Seasonal Naive, and Drift) on this time series data to test which model generates the most accurate forecast for this data. We will use Mean Absolute Prediction Error, Mean Absolute Deviation, and Mean Square Error in order to gauge which model generates the most accurate forecast.

Exploratory Data Analysis

After importing the data from GitHub, we next modify the data set to perform the time series analysis. We must first remove the date variable. We then define a training and testing portion of the data. We use the last ten periods for the testing data, and the previous 64 periods we use as the training data for the models. We choose to use the last ten values as the testing data to help avoid forecasting errors.

Model Building Using Different Forecasting Methods

Next we use the four baseline forecasting methods to forecast the next 5 year’s (After 2021) data values.

Forecasting Table
pred.mv pred.naive pred.snaive pred.rwf
166327.8 226014 226014 228028.2
166327.8 226014 226014 230042.4
166327.8 226014 226014 232056.6
166327.8 226014 226014 234070.8
166327.8 226014 226014 236085.0
166327.8 226014 226014 238099.2
166327.8 226014 226014 240113.4
166327.8 226014 226014 242127.7
166327.8 226014 226014 244141.9
166327.8 226014 226014 246156.1
166327.8 226014 226014 248170.3
166327.8 226014 226014 250184.5
166327.8 226014 226014 252198.7
166327.8 226014 226014 254212.9
166327.8 226014 226014 256227.1

After we generate the four different forecasting models, we next can visualize the output of these models.

Visualization

We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 64 historical data points in the time series. The following graph only shows observations #20 - #74 and the 15 forecasted values.

After generating a visualization of our data, we next can test each model’s performance metrics to see which model most accurately forecasts the future amount of labor hours worked.

Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.mv), true.value): longer object
## length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.naive), true.value): longer
## object length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.snaive), true.value): longer
## object length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.rwf), true.value): longer object
## length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length
Overall performance of the four forecasting methods
MAPE MAD MSE
Moving Average 31.479461 1150610.8 5955472593
Naive 6.891091 255318.0 361170545
Seasonal Naive 6.891091 255318.0 361170545
Drift 4.125374 150620.1 126282378

Based upon the model accuracy metrics, the drift forecasting method appears to be the best fit for time series modeling on this data.

Summary

After performing the model diagnostics, we can conclude that the Drift model was most accurate in forecasting the future labor hours worked in the United States. The other methods of modeling show a stagnation in their forecast, but the drift model shows an increase annually which is reflective of the generally growing American Economy.