Chapter 3 Problems
1. Why was the data partitioned?
The data was partitioned to create training and validation periods. THe store wanted to forecast sales for the next 12 months. By partitioning the data into two sets, the data anaylst is able to use the training period to build the model. The validation period will assist the analyst in reviewing the performance of the model. This way the forecast errors and differences between predicted and actual values can be assessed.
2. Why did the analyst choose a 12 month validation period?
The analyst chose a 12 month validation period because the store is looking for a sales forecast of the next 12 months. This will allow the analyst to test the model to check for forecast errors, “overfitting the model” issues, and variances between actual and predicted values.
## Jan Feb Mar Apr May Jun Jul
## 2001 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78
## Aug Sep Oct Nov Dec
## 2001 19888.61 23933.38 25391.35 36024.80 80721.71
4. Compute the RMSE and MAPE for the naive forecasts.
RMSE = 9542.346
MAPE = 27.27926
## ME RMSE MAE MPE MAPE MASE
## Training set 3401.361 6467.818 3744.801 22.39270 25.64127 1.000000
## Test set 7828.278 9542.346 7828.278 27.27926 27.27926 2.090439
## ACF1 Theil's U
## Training set 0.4140974 NA
## Test set 0.2264895 0.7373759
5. Plot a histogram of the forecast errors that result from the naive forecasts. Plot a time plot for the naive forecasts and the actual sales numbers in the validation period. What can you say about the behavior of the naive forecasts?
Here’s the histogram with density curve:
Here’s a time plot for the naive forecasts and the actual sales numbers in the validation period.
The behavior of the naive forecasts shows that the model was slightly under forecasted in comparison to the actual souvenir sales. However, the forecast did follow the same general trend of the actuals performance.
6. The analyst found a forecasting model that gives satisfactory performance on the validation set. What must she do to use the forecasting model for generating forecasts for year 2002?
If she has a sufficient forecasting model that is based on the validation set, I would suggest that she use a roll forward validation method. The store asked her for a 12 month forecast of 2001 so 2002 is not that far ahead that there would be a painful amount of roll forward data partitions. She would need to create multiple training-validation periods by moving the partitioning one period at a time. In addition, she would want to refresh her forecasts period by period. The next step would be to fit the model to the training period and assess the performance on the validation period.
Additional helpful plots:
Error terms distributed for the training period:
Error terms distributed for the validation period:
7. If the goal is forecasting sales in future months, which of the following steps should be taken? (choose one or more)
. Partition the data into training and validation periods
. Look at MAPE and RMSE values for the validation period
. Compute naive forecasts