Question 2

Which one of the following regression-based models would fit the series best?

It’s clear from the graph that there isn’t any seasonality in the data, so that automatically eliminates linear trend with seasonality and quadratic trend with seasonality. Furthermore, the plotted data has a “U” shape, not a linear shape to it, so the quadratic trend model would work the best for this series.

Question 4

A) The forecaster decided that there is an exponential trend in the series. In order to fit a regression-based model that accounts for this trend, which of the following operations must be performed, either manually or by a function in R?

Take a logarithm of sales since thats necessary for exponential trends in the data.

B) Fit a regression model with an exponential trend and seasonality, using only the first 20 quarters as the training period (remember to first partition the series into training and validation periods).

deptsales = read.csv("DeptStoreSales.csv", stringsAsFactors=FALSE)
deptsales.ts = ts(deptsales$Sales, start= c(1,1), frequency = 4)
#data partitioning
validlength = 4
trainlength = length(deptsales.ts) - validlength
train.ts = window(deptsales.ts, start = c(1,1), end= c(1, trainlength))
valid.ts = window(deptsales.ts, start= c(1, trainlength+1), end =c(1, trainlength + validlength))


#exponential model
dept_expo = tslm(train.ts ~ trend + season, lambda = 0)
summary(dept_expo)
## 
## Call:
## tslm(formula = train.ts ~ trend + season, lambda = 0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.053524 -0.013199 -0.004527  0.014387  0.062681 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.748945   0.018725 574.057  < 2e-16 ***
## trend        0.011088   0.001295   8.561 3.70e-07 ***
## season2      0.024956   0.020764   1.202    0.248    
## season3      0.165343   0.020884   7.917 9.79e-07 ***
## season4      0.433746   0.021084  20.572 2.10e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03277 on 15 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.63e+11 on 4 and 15 DF,  p-value: < 2.2e-16

C) A partial output is shown in Table 6.7. From the output, after adjusting for trend, are Q2 average sales higher, lower, or apporximately equal to the average Q1 sales? Higher

D) Use this model to forecast sales in quarters 21 and 22.

valid_period_forecasts = forecast(dept_expo, h= validlength)
(valid_period_forecasts)
##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 6 Q1       58793.71 55790.19 61958.92 54090.84  63905.46
## 6 Q2       60951.51 57837.76 64232.89 56076.04  66250.87
## 6 Q3       70920.09 67297.09 74738.13 65247.24  77086.15
## 6 Q4       93788.66 88997.41 98837.86 86286.57 101943.01

E) The plots shown in figure 6.13 describe the fit (top) and forecast errors (bottom) from this regression model.

 **I. Recreate these plots.**
yrange= range(deptsales.ts)

#set up plot
plot(c(1,7), yrange, type= "n", xlab="year", ylab= "Dept Sales", bty= "l", xaxt = "n", yaxt= "n")

#add time series
lines(deptsales.ts, bty= "l")

axis(1,at=seq(1,7,1),labels=format(seq(1,7,1)))
axis(2,at=seq(45000,105000,5000),labels=format(seq(45,105,5)),las=2)
lines(dept_expo$fitted,col="red")
lines(valid_period_forecasts$mean,col="blue",lty=2)
legend(1,105000,c("Actual (sales)","Exponential Trend + Season","Forecasts"),lty=c(1,1,2),col=c("black","red","blue"),bty="n")

 **II. Based on these plots, what can you say about your forecasts for quarters Q21 and Q22? Are they likely to overforecast, under-forecast, or be reasonably close to the real sales values?**
 
valid.ts - valid_period_forecasts$mean
##       Qtr1     Qtr2     Qtr3     Qtr4
## 6 2006.290 3948.490 6076.915 9548.339
plot(valid.ts - valid_period_forecasts$mean, type= "o", bty= "l")

They are likely to underforecast.

F) Looking at the residual plot, which of the following statements appear true? The regression model fits the datda well.

G) Which of the following solutions is adequate and a parsimonious solution for improving model fit? Fit quadratic trend model to sales.

Question 5

souv_sales= read.csv("SouvenirSales.csv", stringsAsFactors = FALSE)

A) based on the two time plots in figure 6.14, which predictors should be included in the regression model? What is the total number of predictors in the model?

Based on the figure, there is an obvious linear trend and monthly seasonality. This means that there will be 11 dummy variables, plus the trend, so 12 predictors total for the model.

B) Run a regression model with Sales as the output variable and with a linear trend and monthly predictors. Remember to fit only the training period. Call this Model A.

souv.ts= ts(souv_sales$Sales, start = c(1995,1), frequency = 12)

valid_length= 12
train_length= length(souv.ts) - valid_length

souv_sales_train= window(souv.ts, end = c(1995, train_length))
souv_sales_valid= window(souv.ts, start= c(1995, train_length+1))

#create model A
model_a = tslm(souv_sales_train ~ trend + season)
summary(model_a)
## 
## Call:
## tslm(formula = souv_sales_train ~ trend + season)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12592  -2359   -411   1940  33651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3065.55    2640.26  -1.161  0.25029    
## trend         245.36      34.08   7.199 1.24e-09 ***
## season2      1119.38    3422.06   0.327  0.74474    
## season3      4408.84    3422.56   1.288  0.20272    
## season4      1462.57    3423.41   0.427  0.67077    
## season5      1446.19    3424.60   0.422  0.67434    
## season6      1867.98    3426.13   0.545  0.58766    
## season7      2988.56    3427.99   0.872  0.38684    
## season8      3227.58    3430.19   0.941  0.35058    
## season9      3955.56    3432.73   1.152  0.25384    
## season10     4821.66    3435.61   1.403  0.16573    
## season11    11524.64    3438.82   3.351  0.00141 ** 
## season12    32469.55    3442.36   9.432 2.19e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared:  0.7903, Adjusted R-squared:  0.7476 
## F-statistic: 18.53 on 12 and 59 DF,  p-value: 9.435e-16

I. Examine coefficients: Which month tends to have the highest average sales during the year? Why is this reasonable? December has the highest average sales, which actually makes sense because their seasons are opposite from ours, so December is summer for them.

II. What does the trend coefficient of model A mean? Since the trend coefficient is positive, this means that it increases over time. Specifically, it increases over the months by an average of 246.36 (the trend).

C) Run a regression model with log(Sales) as the output variable and with a linear trend and monthly predictors. Remember to fit only the trainging period. Call this Model B.

model_b = tslm(log(souv_sales_train) ~ trend + season)
summary(model_b)
## 
## Call:
## tslm(formula = log(souv_sales_train) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16

I. Fitting a model to log(Sales) with a linear trend is equivalent to fitting a model to Sales (in dollars) with what type of trend?
Exponential

II. What does the estimated trend coefficent of Model B mean? The trend coefficient of .02112 means that the sales increase by about 2.112%.

III. Use this model to forecast the sales in February 2002.

feb_forecast <- model_b$coefficients["(Intercept)"] + model_b$coefficients["trend"]*86 + model_b$coefficients["season2"]
exp(feb_forecast)
## (Intercept) 
##    17062.99

D) Compare the two regression models in terms of forecast performace. Which model is preferable for forecasting? Mention at least two reasons based on the information in the outputs. As you can see below, Model B is better for forecasting, as it has a significantly lower MAPE than model A, as well as a lower ME and MAE.

Model A forecast forecast accuracy:

model_a_forecast = forecast(souv_linear_2, h= valid_length)

accuracy(model_a_forecast$mean, souv_sales_valid)
##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 8251.513 17451.55 10055.28 10.53397 26.66568 0.3206228 0.9075924

Model B forecast accuracy:

model_b_forecast= forecast(model_b, h= valid_length)

accuracy(exp(model_b_forecast$mean), souv_sales_valid)
##                ME     RMSE      MAE      MPE    MAPE      ACF1 Theil's U
## Test set 4824.494 7101.444 5191.669 12.35943 15.5191 0.4245018 0.4610253

E) How would you model this data differently if the goal was understanding the different components of sales in the souvenir shop between 1995 and 2001? Mention two differences.

There would be no need to forecast sales if it was just a descriptive analysis, so I’d do without those. I would also focus more on data exploration, like checking for white noise and doing a LJ box test.