Time Series Analysis

Introduction

This case study examines a time series data set that observes the number of hours worked by the part-time and full-time labor force in the United States. The data set contains two variables, the date and the number of total hours worked in millions of hours. The observations in the data set are recorded annually and begin in 1948 and stop in 2021. This data set comes from the Federal Reserve Economic Data (FRED) and was uploaded to github for universal access.

We will use the four baseline forecasting methods (Moving Average, Naive, Seasonal Naive, and Drift) on this time series data to test which model generates the most accurate forecast for this data. We will use Mean Absolute Prediction Error, Mean Absolute Deviation, and Mean Square Error in order to gauge which model generates the most accurate forecast.

Exploratory Data Analysis

After importing the data from GitHub, we next modify the data set to perform the time series analysis. We must first remove the date variable. We then define a training and testing portion of the data. We use the last ten periods for the testing data, and the previous 64 periods we use as the training data for the models. We choose to use the last ten values as the testing data to help avoid forecasting errors.

Model Building Using Different Forecasting Methods

Next we use the four baseline forecasting methods to forecast the next 5 year’s (After 2021) data values.

Forecasting Table
pred.mv	pred.naive	pred.snaive	pred.rwf
166327.8	226014	226014	228028.2
166327.8	226014	226014	230042.4
166327.8	226014	226014	232056.6
166327.8	226014	226014	234070.8
166327.8	226014	226014	236085.0
166327.8	226014	226014	238099.2
166327.8	226014	226014	240113.4
166327.8	226014	226014	242127.7
166327.8	226014	226014	244141.9
166327.8	226014	226014	246156.1
166327.8	226014	226014	248170.3
166327.8	226014	226014	250184.5
166327.8	226014	226014	252198.7
166327.8	226014	226014	254212.9
166327.8	226014	226014	256227.1

After we generate the four different forecasting models, we next can visualize the output of these models.

Visualization

We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 64 historical data points in the time series. The following graph only shows observations #20 - #74 and the 15 forecasted values.

After generating a visualization of our data, we next can test each model’s performance metrics to see which model most accurately forecasts the future amount of labor hours worked.

Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length

## Warning in `/.default`(100 * (true.value - pred.mv), true.value): longer object
## length is not a multiple of shorter object length

## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length

## Warning in `/.default`(100 * (true.value - pred.naive), true.value): longer
## object length is not a multiple of shorter object length

## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length

## Warning in `/.default`(100 * (true.value - pred.snaive), true.value): longer
## object length is not a multiple of shorter object length

## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length

## Warning in `/.default`(100 * (true.value - pred.rwf), true.value): longer object
## length is not a multiple of shorter object length

## Warning in `-.default`(true.value, pred.mv): longer object length is not a
## multiple of shorter object length

## Warning in `-.default`(true.value, pred.naive): longer object length is not a
## multiple of shorter object length

## Warning in `-.default`(true.value, pred.snaive): longer object length is not a
## multiple of shorter object length

## Warning in `-.default`(true.value, pred.rwf): longer object length is not a
## multiple of shorter object length

Overall performance of the four forecasting methods
	MAPE	MAD	MSE
Moving Average	31.479461	1150610.8	5955472593
Naive	6.891091	255318.0	361170545
Seasonal Naive	6.891091	255318.0	361170545
Drift	4.125374	150620.1	126282378

Based upon the model accuracy metrics, the drift forecasting method appears to be the best fit for time series modeling on this data.

Summary

After performing the model diagnostics, we can conclude that the Drift model was most accurate in forecasting the future labor hours worked in the United States. The other methods of modeling show a stagnation in their forecast, but the drift model shows an increase annually which is reflective of the generally growing American Economy.