This case study examines a time series data set that observes the number of hours worked by the part-time and full-time labor force in the United States. The data set contains two variables, the date and the number of total hours worked in millions of hours. The observations in the data set are recorded annually and begin in 1948 and stop in 2021. This data set comes from the Federal Reserve Economic Data (FRED) and was uploaded to github for universal access.
We will use the four baseline forecasting methods (Moving Average, Naive, Seasonal Naive, and Drift) on this time series data to test which model generates the most accurate forecast for this data. We will use Mean Absolute Prediction Error, Mean Absolute Deviation, and Mean Square Error in order to gauge which model generates the most accurate forecast.
After importing the data from GitHub, we next modify the data set to perform the time series analysis. We must first remove the date variable. We then define a training and testing portion of the data. We use the last ten periods for the testing data, and the previous 64 periods we use as the training data for the models. We choose to use the last ten values as the testing data to help avoid forecasting errors.
Next we use the four baseline forecasting methods to forecast the next 5 year’s (After 2021) data values.
| pred.mv | pred.naive | pred.snaive | pred.rwf |
|---|---|---|---|
| 166327.8 | 226014 | 226014 | 228028.2 |
| 166327.8 | 226014 | 226014 | 230042.4 |
| 166327.8 | 226014 | 226014 | 232056.6 |
| 166327.8 | 226014 | 226014 | 234070.8 |
| 166327.8 | 226014 | 226014 | 236085.0 |
| 166327.8 | 226014 | 226014 | 238099.2 |
| 166327.8 | 226014 | 226014 | 240113.4 |
| 166327.8 | 226014 | 226014 | 242127.7 |
| 166327.8 | 226014 | 226014 | 244141.9 |
| 166327.8 | 226014 | 226014 | 246156.1 |
| 166327.8 | 226014 | 226014 | 248170.3 |
| 166327.8 | 226014 | 226014 | 250184.5 |
| 166327.8 | 226014 | 226014 | 252198.7 |
| 166327.8 | 226014 | 226014 | 254212.9 |
| 166327.8 | 226014 | 226014 | 256227.1 |
After we generate the four different forecasting models, we next can visualize the output of these models.
We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 64 historical data points in the time series. The following graph only shows observations #20 - #74 and the 15 forecasted values.
After generating a visualization of our data, we next can test each model’s performance metrics to see which model most accurately forecasts the future amount of labor hours worked.
We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.
## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.mv), true.value): longer object
## length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.naive), true.value): longer
## object length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.snaive), true.value): longer
## object length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length
## Warning in `/.default`(100 * (true.value - pred.rwf), true.value): longer object
## length is not a multiple of shorter object length
## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length
## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length
| MAPE | MAD | MSE | |
|---|---|---|---|
| Moving Average | 31.479461 | 1150610.8 | 5955472593 |
| Naive | 6.891091 | 255318.0 | 361170545 |
| Seasonal Naive | 6.891091 | 255318.0 | 361170545 |
| Drift | 4.125374 | 150620.1 | 126282378 |
Based upon the model accuracy metrics, the drift forecasting method appears to be the best fit for time series modeling on this data.
After performing the model diagnostics, we can conclude that the Drift model was most accurate in forecasting the future labor hours worked in the United States. The other methods of modeling show a stagnation in their forecast, but the drift model shows an increase annually which is reflective of the generally growing American Economy.