Problem 2. Analysis of Canadian Manufacturing Workers Work Hours As the graph shows the data trending downward and then sharply changing direction back upwards, we wouldn’t use a linear trend model. There is no apparent seasonality it seems from the data but it is hard to tell given the closeness of the years in the graph. There is however a a U-shaped trend similar to what we have seen in the Amtrak data. Therefore, the best solution would be to try a quadratic trend model with seasonality to be able to pick up any possible trends due to season or month.
Problem 3: Toys R US
First we import the data and then graph and examine it from the head to the tail.
library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
toys<-read.csv("ToysRUsRevenues.csv", stringsAsFactors = FALSE)
str(toys)
## 'data.frame': 16 obs. of 2 variables:
## $ QuarterYear : chr "Q1-92" "Q2-92" "Q3-92" "Q4-92" ...
## $ Revenue.in.million...: int 1026 1056 1182 2861 1172 1249 1346 3402 1286 1317 ...
head(toys)
## QuarterYear Revenue.in.million...
## 1 Q1-92 1026
## 2 Q2-92 1056
## 3 Q3-92 1182
## 4 Q4-92 2861
## 5 Q1-93 1172
## 6 Q2-93 1249
tail(toys)
## QuarterYear Revenue.in.million...
## 11 Q3-94 1449
## 12 Q4-94 3893
## 13 Q1-95 1462
## 14 Q2-95 1452
## 15 Q3-95 1631
## 16 Q4-95 4200
#Create a time series object out of it
toysTS <- ts(toys$Revenue, start=c(1992, 1),frequency=4)
yrange = range(toysTS)
#set up the plot
plot(c(1992,1995), yrange, type="n",xlab="Year", ylab="Revenue(millions)", bty="l", xaxt="n",yaxt="n")
#Add the time series air
lines(toysTS, bty="l")
#add the x-axis
axis(1, at=seq(1992,1995,1), labels=format(seq(1992,1995,1)))
#add the y-axis
axis(2, at=seq(1000,4500,500), labels=format(seq(1,4.5,.5)), las=2)
3.a - Fit a regression model with a linear trend and seasonal dummy variables. Use the entire series (excluding the last two quarters ) as the training period
ToysTrain<-window(toysTS,1992,c(1995,2))
ToysTrain
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 1026 1056 1182 2861
## 1993 1172 1249 1346 3402
## 1994 1286 1317 1449 3893
## 1995 1462 1452
toysLinearSeason <- tslm(ToysTrain~trend+season)
summary(toysLinearSeason)
##
## Call:
## tslm(formula = ToysTrain ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -335.90 -54.29 18.50 63.80 319.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 906.75 115.35 7.861 2.55e-05 ***
## trend 47.11 11.26 4.185 0.00236 **
## season2 -15.11 119.66 -0.126 0.90231
## season3 89.17 128.67 0.693 0.50582
## season4 2101.73 129.17 16.272 5.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 168.5 on 9 degrees of freedom
## Multiple R-squared: 0.9774, Adjusted R-squared: 0.9673
## F-statistic: 97.18 on 4 and 9 DF, p-value: 2.129e-07
A partial output of the regression model is shown in TAble 6.6 where season2 is the Quarter 2 dummy). i. Mention two statistics ( and their values) that measure how well this model fits the training period. The t-value of -0126 for the season2 variable is a measure of how many standard deviations our coefficient estimate is far away from zero. As in this case the value is not far away from zero which indicates a relationship does not exist between season2 and the dependent variable. The p-value for trend is 2.55e-05 and being less than 5% means that it is unlikely we will see a relationship between the trend variable and the dependent variable that is due to chance. ii. Mention two statistics (and their values) that measure the predictive accuracy of this model. The MAPE for the training set which has a value of 5.006914% is the mean absolute percentage error. This measure gives a percentage score of how forecasta deviate (on average) from the actual values. The MAE for the training set which has a value of 92.53061 is the Mean absolute error/deviation and gives the magnitude of the average absolute error. iii.After adjusting for trend, what is the average difference between sales in Q3 and sales in Q1? Quarter 3 has a coefficient of 89.17 which is the difference between it sales and the base sales which is Quarter1.
4.Forecasting Department Store Sales a. In order to fit a regression-based model that accounts for an exponential trend, a logarithm of sales must be performed. b.Fit a regression model with an exponential trend and seasonality, using only the first 20 quarters as the training period.
First we import plot and summarize the data.
#first lets import the data, plot and run a summary
library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
DeptSales <- read.csv("DeptStoreSales.csv", header=TRUE, stringsAsFactors=FALSE)
str(DeptSales)
## 'data.frame': 24 obs. of 2 variables:
## $ Quarter: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 50147 49325 57048 76781 48617 50898 58517 77691 50862 53028 ...
head(DeptSales)
## Quarter Sales
## 1 1 50147
## 2 2 49325
## 3 3 57048
## 4 4 76781
## 5 5 48617
## 6 6 50898
tail(DeptSales)
## Quarter Sales
## 19 19 71486
## 20 20 92183
## 21 21 60800
## 22 22 64900
## 23 23 76997
## 24 24 103337
#time series object
DeptSales.ts <- ts(DeptSales$Sales, start=c(1,1), freq=4)
yrange=range(DeptSales.ts)
#autoplotting
autoplot(DeptSales.ts)
Then we partition the series into training and validation data.
validLength<-4
trainLength<-length(DeptSales.ts)-validLength
trainLength
## [1] 20
SalesTrain<-window(DeptSales.ts,end=c(1,trainLength))
SalesValid<-window(DeptSales.ts,start=c(1,trainLength+1))
SalesTrain
## Qtr1 Qtr2 Qtr3 Qtr4
## 1 50147 49325 57048 76781
## 2 48617 50898 58517 77691
## 3 50862 53028 58849 79660
## 4 51640 54119 65681 85175
## 5 56405 60031 71486 92183
SalesValid
## Qtr1 Qtr2 Qtr3 Qtr4
## 6 60800 64900 76997 103337
Then we
#call it modelB (as in 5b) and have the dependent variable transformed
modelB<-tslm(log(SalesTrain) ~ trend + season)
summary(modelB)
##
## Call:
## tslm(formula = log(SalesTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.053524 -0.013199 -0.004527 0.014387 0.062681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.748945 0.018725 574.057 < 2e-16 ***
## trend 0.011088 0.001295 8.561 3.70e-07 ***
## season2 0.024956 0.020764 1.202 0.248
## season3 0.165343 0.020884 7.917 9.79e-07 ***
## season4 0.433746 0.021084 20.572 2.10e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03277 on 15 degrees of freedom
## Multiple R-squared: 0.9791, Adjusted R-squared: 0.9736
## F-statistic: 175.9 on 4 and 15 DF, p-value: 2.082e-12
A partial output is shown in Table 6.7. From the output, after adjusting for trend, are Q2 average sales higher, lower or approximately equal to the average Q1 sales? Well, even with the log transform of sales, the coefficient is quite small, but it is positive so the Q2 average sales are close to the Q1 sales only slightly higher.
e this model to forecast sales in Quarter 21 and 22.
# use the named coefficients
q21Forecast <- modelB$coefficients["(Intercept)"] + modelB$coefficients["trend"]*21
q22Forecast <- modelB$coefficients["(Intercept)"] + modelB$coefficients["trend"]*22 + modelB$coefficients["season2"]
#convert them back and simpy echo them
exp(q21Forecast)
## (Intercept)
## 58793.71
exp(q22Forecast)
## (Intercept)
## 60951.51
#Helps set up the plot
yrange = range(DeptSales.ts)
#set up the plot
plot(c(1,6),c(yrange[1],yrange[2]),type="n",xlab="Year", ylab="Department Sales", bty="l",xaxt="n",yaxt="n")
#add the time series air
lines(DeptSales.ts, bty="l")
#add the x-axis
axis(1, at=seq(1,6,1), labels=format(seq(1,6,1)))
#add the y-axis
axis(2, at=seq(45000,105000,5000), labels=format(seq(45,105,5.0)), las=2)
#add the y-axis
lines(exp(modelB$fitted), col="red")
And the forecast error plot…
forecasterror<-DeptSales.ts-exp(modelB$fitted)
autoplot(forecasterror)
It seems that the forecast errors from midyear of year 4 are always positive and therefore we can expect under-forecasting of Qauters 21 and 22.
given the low value of the forecast errors when compared to the actual sales value, I would say that the regression model fits the data well.
Which of the following solutions is adequate and a aparsimonious solution for improving model fit? You could fit a quadratic trend model to Sales but with an adjusted R squared of already 97+% can we really say that fitting a quadratic trend model is parsimonious. To gain at most a 1 to 2 % increase in the adjusted R squared, aren’t we sacrificing model simplicity?
Problem 5. Souvenir Sales
#first lets import the data, plot and run a summary
library(forecast)
library(ggplot2)
setwd("~/USM_MBA_678")
SouvSales <- read.csv("SouvenirSales.csv", header=TRUE, stringsAsFactors=FALSE)
str(SouvSales)
## 'data.frame': 84 obs. of 2 variables:
## $ Date : chr "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
## $ Sales: num 1665 2398 2841 3547 3753 ...
head(SouvSales)
## Date Sales
## 1 Jan-95 1664.81
## 2 Feb-95 2397.53
## 3 Mar-95 2840.71
## 4 Apr-95 3547.29
## 5 May-95 3752.96
## 6 Jun-95 3714.74
tail(SouvSales)
## Date Sales
## 79 Jul-01 26155.15
## 80 Aug-01 28586.52
## 81 Sep-01 30505.41
## 82 Oct-01 30821.33
## 83 Nov-01 46634.38
## 84 Dec-01 104660.67
#time series object
SouvSales.ts <- ts(SouvSales$Sales, start=c(1,1), freq=12)
yrange=range(SouvSales.ts)
#autoplotting
autoplot(SouvSales.ts)
Then we partition the series into training and validation data.
validSouvLength<-12
trainSouvLength<-length(SouvSales.ts)-validSouvLength
trainSouvLength
## [1] 72
SouvTrain<-window(SouvSales.ts,end=c(1,trainSouvLength))
SouvValid<-window(SouvSales.ts,start=c(1,trainSouvLength+1))
SouvTrain
## Jan Feb Mar Apr May Jun Jul Aug
## 1 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74 4349.61 3566.34
## 2 2499.81 5198.24 7225.14 4806.03 5900.88 4951.34 6179.12 4752.15
## 3 4717.02 5702.63 9957.58 5304.78 6492.43 6630.80 7349.62 8176.62
## 4 5921.10 5814.58 12421.25 6369.77 7609.12 7224.75 8121.22 7979.25
## 5 4826.64 6470.23 9638.77 8821.17 8722.37 10209.48 11276.55 12552.22
## 6 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78 19888.61
## Sep Oct Nov Dec
## 1 5021.82 6423.48 7600.60 19756.21
## 2 5496.43 5835.10 12600.08 28541.72
## 3 8573.17 9690.50 15151.84 34061.01
## 4 8093.06 8476.70 17914.66 30114.41
## 5 11637.39 13606.89 21822.11 45060.69
## 6 23933.38 25391.35 36024.80 80721.71
SouvValid
## Jan Feb Mar Apr May Jun Jul
## 7 10243.24 11266.88 21826.84 17357.33 15997.79 18601.53 26155.15
## Aug Sep Oct Nov Dec
## 7 28586.52 30505.41 30821.33 46634.38 104660.67
#Linear trend with seasonality
SouvLinearSeason <- tslm(SouvTrain~trend+season)
summary(SouvLinearSeason)
##
## Call:
## tslm(formula = SouvTrain ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12592 -2359 -411 1940 33651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3065.55 2640.26 -1.161 0.25029
## trend 245.36 34.08 7.199 1.24e-09 ***
## season2 1119.38 3422.06 0.327 0.74474
## season3 4408.84 3422.56 1.288 0.20272
## season4 1462.57 3423.41 0.427 0.67077
## season5 1446.19 3424.60 0.422 0.67434
## season6 1867.98 3426.13 0.545 0.58766
## season7 2988.56 3427.99 0.872 0.38684
## season8 3227.58 3430.19 0.941 0.35058
## season9 3955.56 3432.73 1.152 0.25384
## season10 4821.66 3435.61 1.403 0.16573
## season11 11524.64 3438.82 3.351 0.00141 **
## season12 32469.55 3442.36 9.432 2.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared: 0.7903, Adjusted R-squared: 0.7476
## F-statistic: 18.53 on 12 and 59 DF, p-value: 9.435e-16
b.i. December appears to have the highest Souvenir sales. This is reasonable if December is the principal vacation month or has holidays where travel to Australia is likely. b.ii The trend coefficient of model A is that the coefficient represent the average sales difference between that month and January.
c Run a log(Sales) model
#Linear trend with seasonality
SouvModelB <- tslm(log(SouvTrain)~trend+season)
summary(SouvModelB)
##
## Call:
## tslm(formula = log(SouvTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
c.i. fitting a model to log(Sales) with a linear trend is equivalent to fitting a model to Sales with an exponential trend.
c.ii. That there is a .0211 increase in the log(sales) for each month calculated from the base. c.iii. Sales in February 2002 are 5568.
# use the named coefficients
janForecast <- SouvModelB$coefficients["(Intercept)"] + modelB$coefficients["trend"]*85
febForecast <- SouvModelB$coefficients["(Intercept)"] + modelB$coefficients["trend"]*86 + modelB$coefficients["season2"]
#convert them back and simpy echo them
exp(janForecast)
## (Intercept)
## 5371.333
exp(febForecast)
## (Intercept)
## 5568.468
e)Whatever these components are - which is not clear from the question, I would separate the different components into separate datasets and try to predict sales accordingly. In other words, if the components are different types of products, then I would create multiple models to better understand the trends between those years for the particular product. I could also create dummy variables for the different products which would take into account when the products became discontinued.