Which one of the following regression-based models would fit the series best?
It’s clear from the graph that there isn’t any seasonality in the data, so that automatically eliminates linear trend with seasonality and quadratic trend with seasonality. Furthermore, the plotted data has a “U” shape, not a linear shape to it, so the quadratic trend model would work the best for this series.
A) The forecaster decided that there is an exponential trend in the series. In order to fit a regression-based model that accounts for this trend, which of the following operations must be performed, either manually or by a function in R?
Take a logarithm of sales since thats necessary for exponential trends in the data.
B) Fit a regression model with an exponential trend and seasonality, using only the first 20 quarters as the training period (remember to first partition the series into training and validation periods).
deptsales = read.csv("DeptStoreSales.csv", stringsAsFactors=FALSE)
deptsales.ts = ts(deptsales$Sales, start= c(1,1), frequency = 4)
#data partitioning
validlength = 4
trainlength = length(deptsales.ts) - validlength
train.ts = window(deptsales.ts, start = c(1,1), end= c(1, trainlength))
valid.ts = window(deptsales.ts, start= c(1, trainlength+1), end =c(1, trainlength + validlength))
#exponential model
dept_expo = tslm(train.ts ~ trend + season, lambda = 0)
summary(dept_expo)
##
## Call:
## tslm(formula = train.ts ~ trend + season, lambda = 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.053524 -0.013199 -0.004527 0.014387 0.062681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.748945 0.018725 574.057 < 2e-16 ***
## trend 0.011088 0.001295 8.561 3.70e-07 ***
## season2 0.024956 0.020764 1.202 0.248
## season3 0.165343 0.020884 7.917 9.79e-07 ***
## season4 0.433746 0.021084 20.572 2.10e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03277 on 15 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 7.63e+11 on 4 and 15 DF, p-value: < 2.2e-16
C) A partial output is shown in Table 6.7. From the output, after adjusting for trend, are Q2 average sales higher, lower, or apporximately equal to the average Q1 sales? Higher
D) Use this model to forecast sales in quarters 21 and 22.
valid_period_forecasts = forecast(dept_expo, h= validlength)
(valid_period_forecasts)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 58793.71 55790.19 61958.92 54090.84 63905.46
## 6 Q2 60951.51 57837.76 64232.89 56076.04 66250.87
## 6 Q3 70920.09 67297.09 74738.13 65247.24 77086.15
## 6 Q4 93788.66 88997.41 98837.86 86286.57 101943.01
E) The plots shown in figure 6.13 describe the fit (top) and forecast errors (bottom) from this regression model.
**I. Recreate these plots.**
yrange= range(deptsales.ts)
#set up plot
plot(c(1,7), yrange, type= "n", xlab="year", ylab= "Dept Sales", bty= "l", xaxt = "n", yaxt= "n")
#add time series
lines(deptsales.ts, bty= "l")
axis(1,at=seq(1,7,1),labels=format(seq(1,7,1)))
axis(2,at=seq(45000,105000,5000),labels=format(seq(45,105,5)),las=2)
lines(dept_expo$fitted,col="red")
lines(valid_period_forecasts$mean,col="blue",lty=2)
legend(1,105000,c("Actual (sales)","Exponential Trend + Season","Forecasts"),lty=c(1,1,2),col=c("black","red","blue"),bty="n")
**II. Based on these plots, what can you say about your forecasts for quarters Q21 and Q22? Are they likely to overforecast, under-forecast, or be reasonably close to the real sales values?**
valid.ts - valid_period_forecasts$mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 6 2006.290 3948.490 6076.915 9548.339
plot(valid.ts - valid_period_forecasts$mean, type= "o", bty= "l")
They are likely to underforecast.
F) Looking at the residual plot, which of the following statements appear true? The regression model fits the datda well.
G) Which of the following solutions is adequate and a parsimonious solution for improving model fit? Fit quadratic trend model to sales.
souv_sales= read.csv("SouvenirSales.csv", stringsAsFactors = FALSE)
A) based on the two time plots in figure 6.14, which predictors should be included in the regression model? What is the total number of predictors in the model?
Based on the figure, there is an obvious linear trend and monthly seasonality. This means that there will be 11 dummy variables, plus the trend, so 12 predictors total for the model.
B) Run a regression model with Sales as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Call this Model A.
souv.ts= ts(souv_sales$Sales, start = c(1995,1), frequency = 12)
valid_length= 12
train_length= length(souv.ts) - valid_length
souv_sales_train= window(souv.ts, end = c(1995, train_length))
souv_sales_valid= window(souv.ts, start= c(1995, train_length+1))
#create model A
model_a = tslm(souv_sales_train ~ trend + season)
summary(model_a)
##
## Call:
## tslm(formula = souv_sales_train ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12592 -2359 -411 1940 33651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3065.55 2640.26 -1.161 0.25029
## trend 245.36 34.08 7.199 1.24e-09 ***
## season2 1119.38 3422.06 0.327 0.74474
## season3 4408.84 3422.56 1.288 0.20272
## season4 1462.57 3423.41 0.427 0.67077
## season5 1446.19 3424.60 0.422 0.67434
## season6 1867.98 3426.13 0.545 0.58766
## season7 2988.56 3427.99 0.872 0.38684
## season8 3227.58 3430.19 0.941 0.35058
## season9 3955.56 3432.73 1.152 0.25384
## season10 4821.66 3435.61 1.403 0.16573
## season11 11524.64 3438.82 3.351 0.00141 **
## season12 32469.55 3442.36 9.432 2.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared: 0.7903, Adjusted R-squared: 0.7476
## F-statistic: 18.53 on 12 and 59 DF, p-value: 9.435e-16
I. Examine coefficients: Which month tends to have the highest average sales during the year? Why is this reasonable? December has the highest average sales, which actually makes sense because their seasons are opposite from ours, so December is summer for them.
II. What does the trend coefficient of model A mean? Since the trend coefficient is positive, this means that it increases over time. Specifically, it increases over the months by an average of 246.36 (the trend).
C) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the trainging period. Call this Model B.
model_b = tslm(log(souv_sales_train) ~ trend + season)
summary(model_b)
##
## Call:
## tslm(formula = log(souv_sales_train) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
I. Fitting a model to log(Sales) with a linear trend is equivalent to fitting a model to Sales (in dollars) with what type of trend?
Exponential
II. What does the estimated trend coefficent of Model B mean? The trend coefficient of .02112 means that the sales increase by about 2.112%.
III. Use this model to forecast the sales in February 2002.
feb_forecast <- model_b$coefficients["(Intercept)"] + model_b$coefficients["trend"]*86 + model_b$coefficients["season2"]
exp(feb_forecast)
## (Intercept)
## 17062.99
D) Compare the two regression models in terms of forecast performace. Which model is preferable for forecasting? Mention at least two reasons based on the information in the outputs. As you can see below, Model B is better for forecasting, as it has a significantly lower MAPE than model A, as well as a lower ME and MAE.
Model A forecast forecast accuracy:
model_a_forecast = forecast(souv_linear_2, h= valid_length)
accuracy(model_a_forecast$mean, souv_sales_valid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 8251.513 17451.55 10055.28 10.53397 26.66568 0.3206228 0.9075924
Model B forecast accuracy:
model_b_forecast= forecast(model_b, h= valid_length)
accuracy(exp(model_b_forecast$mean), souv_sales_valid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 4824.494 7101.444 5191.669 12.35943 15.5191 0.4245018 0.4610253
E) How would you model this data differently if the goal was understanding the different components of sales in the souvenir shop between 1995 and 2001? Mention two differences.
There would be no need to forecast sales if it was just a descriptive analysis, so I’d do without those. I would also focus more on data exploration, like checking for white noise and doing a LJ box test.