The data set fancy (in the fma package) concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff. Study the data set carefully.
plot(fancy,
xlab="Year",
ylab="Monthly Sales",
main="Monthly Sales of a Shop in the Fancy Data Set")
The time plot above shows a seasonality with both spikes at every last month of the year and at every march. December having the biggest spike while march with a smaller spike. This is supported by large influx of visitors to the town at Christmas and for the local surfing festival held every March. Also, there is an increase of sales volume overtime as the shop was expanding.
It is necesssary to take logarithms of these data to visualize the seasonality while setting aside the expanding changes in the data over time.
logFancy <- log(fancy)
festDummy <- rep(0, length(fancy))
festDummy[seq_along(festDummy)%%12 == 3] <- 1
festDummy[3] <- 0
festDummy <- ts(festDummy, frequency = 12, start = c(1987,1))
fancyData <- data.frame(logFancy, festDummy)
fittedData <- tslm(logFancy~trend + season + festDummy, data=fancy)
summary(fittedData)
##
## Call:
## tslm(formula = logFancy ~ trend + season + festDummy, data = fancy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## festDummy 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
plot(residuals(fittedData), type="p", ylab="Residuals")
plot(as.numeric(fitted(fittedData)), residuals(fittedData),
xlab = "Predicted Scores", ylab = "Residuals")
The plots above does not reveal any problems with the model hence both of them have shown residuals to be random.
boxplot(residuals(fittedData)~cycle(residuals(fittedData)))
The boxplots above show wider variance towards August and September which will mean that the model somehow is missing out seasonality at this period.
summary(fittedData)
##
## Call:
## tslm(formula = logFancy ~ trend + season + festDummy, data = fancy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## festDummy 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
summary(fittedData$coefficients)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02202 0.39041 0.54474 1.12051 0.72788 7.61967
From the table above, all coefficients are seen to be significant except for season 3. This is because we represented the surfing festival using the dummy variable making it non significant for the model.
checkresiduals(fittedData)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494
Testing the model using Breusch-Godfrey test, we have:
Null hypothesis: There is no autocorrelation among residuals.
Alternative hypothesis: There exists autocorrelation among residuals.
Significance level: 0.005
Test Statistic: LM test = 37.95, p-value = 0.002
Decision Rule: If the p-value is less than 0.05 then we can reject the null hypothesis and conclude that autocorrelation exists among the residuals.
Decision: Since p-value = 0.002 < 0.05 = significance level, then we reject the null hypothesis.
Conclusion: Autocorrelation still exists among residuals at the 5% level of significance.