1a

Seasonal fluctuations in the data match the anecdotal description. There is a large peak in sales in December each year for Christmas and a smaller peak for the March festival starting in 1988. Additionally, sales drop considerably in January. However, in 1992 and 1993, there is an unexpected change: the months outside of March and December are no longer flat (with, perhaps a small increasing trend). Instead, there is an upward slope showing sales increasing each month, with the December peaks becoming even larger.

library(fpp)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.3.2
## Loading required package: fma
## Warning: package 'fma' was built under R version 3.3.2
## Loading required package: expsmooth
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 3.3.2
data(fancy)

myts = ts(fancy, start=1987, frequency = 12)
plot(myts)

1b

This data shows variation that increases with the level of the series. To straighten the trend so that it can be fitted by a linear model, log transformation is necessary. Looking at my first time series plot, the graph seems to show exponential growth but looking at the new plot with log transformed data, the graph shows a more linear trend, stabilizing the variance.

log.myts = ts(log(fancy), start=1987, frequency = 12)
plot(log.myts)

1c

log.sales = log(fancy)
dum_fest = rep(0, length(fancy))
dum_fest[seq_along(dum_fest)%%12 == 3] = 1
dum_fest[3] = 0
dum_fest = ts(dum_fest, freq = 12, start=c(1987,1))
fancy.new = data.frame(log.sales, dum_fest)

fit <- tslm(log.sales ~ trend + seasonaldummy(log.myts) + dum_fest, data=fancy.new)
summary(fit)
## 
## Call:
## tslm(formula = log.sales ~ trend + seasonaldummy(log.myts) + 
##     dum_fest, data = fancy.new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 9.5819082  0.0784330 122.167  < 2e-16 ***
## trend                       0.0220198  0.0008268  26.634  < 2e-16 ***
## seasonaldummy(log.myts)Jan -1.9622412  0.0961066 -20.417  < 2e-16 ***
## seasonaldummy(log.myts)Feb -1.7108244  0.0960319 -17.815  < 2e-16 ***
## seasonaldummy(log.myts)Mar -1.6961584  0.1949340  -8.701 9.35e-13 ***
## seasonaldummy(log.myts)Apr -1.5781877  0.0959037 -16.456  < 2e-16 ***
## seasonaldummy(log.myts)May -1.5527543  0.0958503 -16.200  < 2e-16 ***
## seasonaldummy(log.myts)Jun -1.5134129  0.0958039 -15.797  < 2e-16 ***
## seasonaldummy(log.myts)Jul -1.3517867  0.0957647 -14.116  < 2e-16 ***
## seasonaldummy(log.myts)Aug -1.3742768  0.0957325 -14.355  < 2e-16 ***
## seasonaldummy(log.myts)Sep -1.2929114  0.0957075 -13.509  < 2e-16 ***
## seasonaldummy(log.myts)Oct -1.2148493  0.0956897 -12.696  < 2e-16 ***
## seasonaldummy(log.myts)Nov -0.7554933  0.0956790  -7.896 2.84e-11 ***
## dum_fest                    0.5015151  0.1964273   2.553   0.0129 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

1d

Residuals against time shows a slight upward trend from 1987 to 1989, downward from 1989 to 1991, and upward from 1991 to 1994. So, a nonlinear transformation might be needed to better fit the model. Residuals against fitted doesn’t show a pattern and the variation in the residuals does not seem to change with the size of the fitted value. Thus, the residuals are unbiased.

plot(residuals(fit), xlab="Predicted scores", ylab="Time", type="p")

plot(as.numeric(fitted(fit)), residuals(fit), xlab="Predicted scores", ylab="Residuals", type="p")

## 1e There is greater variability in August, September, and October. Given that this is a shop in a beach resort town, perhaps there is a weather component in effect during these months that the model does not account for.

boxplot(residuals(fit) ~ cycle(residuals(fit)))

1f

There is an upward trend of 2.20% per month. On average, January has sales 196% lower than December, February has sales 171% lower than December, March has sales 169% lower than December, April has sales 157% lower than December, May has sales 155% lower than December, June has sales 151% lower than December, July has sales 135% lower than December, August has sales 137% lower than December, September has sales 129% lower than December, October has sales 121% lower than December, November has sales 75.5% lower than December. On average, months with the surfing festival have sales 50.2% higher than months without the surfing festival. I was surprised that the March sales were so much lower than December sales compared to the other months but I think the dummy variable for festival is capturing that peak.

summary(fit)
## 
## Call:
## tslm(formula = log.sales ~ trend + seasonaldummy(log.myts) + 
##     dum_fest, data = fancy.new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 9.5819082  0.0784330 122.167  < 2e-16 ***
## trend                       0.0220198  0.0008268  26.634  < 2e-16 ***
## seasonaldummy(log.myts)Jan -1.9622412  0.0961066 -20.417  < 2e-16 ***
## seasonaldummy(log.myts)Feb -1.7108244  0.0960319 -17.815  < 2e-16 ***
## seasonaldummy(log.myts)Mar -1.6961584  0.1949340  -8.701 9.35e-13 ***
## seasonaldummy(log.myts)Apr -1.5781877  0.0959037 -16.456  < 2e-16 ***
## seasonaldummy(log.myts)May -1.5527543  0.0958503 -16.200  < 2e-16 ***
## seasonaldummy(log.myts)Jun -1.5134129  0.0958039 -15.797  < 2e-16 ***
## seasonaldummy(log.myts)Jul -1.3517867  0.0957647 -14.116  < 2e-16 ***
## seasonaldummy(log.myts)Aug -1.3742768  0.0957325 -14.355  < 2e-16 ***
## seasonaldummy(log.myts)Sep -1.2929114  0.0957075 -13.509  < 2e-16 ***
## seasonaldummy(log.myts)Oct -1.2148493  0.0956897 -12.696  < 2e-16 ***
## seasonaldummy(log.myts)Nov -0.7554933  0.0956790  -7.896 2.84e-11 ***
## dum_fest                    0.5015151  0.1964273   2.553   0.0129 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

1g

As the p-value is very small, we reject the null hypothesis that here is no lag one autocorrelation in the residuals. Since there is autocorrelation in the residuals, there is information remaining in the residuals that can be exploited to obtain better forecasts.

dwtest(fit, alt="two.sided")
## 
##  Durbin-Watson test
## 
## data:  fit
## DW = 0.88889, p-value = 1.956e-07
## alternative hypothesis: true autocorrelation is not 0

1h

predicted = forecast(fit, h=36, newdata=fancy.new)
predicted
##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994       9.801475  9.461879 10.141071  9.277961 10.32499
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995       9.755590  9.499844 10.011336  9.361338 10.14984
## Feb 1995      10.029027  9.773281 10.284773  9.634775 10.42328
## Mar 1995      10.567228 10.310518 10.823938 10.171489 10.96297
## Apr 1995      10.205703  9.949957 10.461449  9.811451 10.59996
## May 1995      10.253157  9.997411 10.508903  9.858904 10.64741
## Jun 1995      10.314518 10.058772 10.570264  9.920265 10.70877
## Jul 1995      10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995      10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995      10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995      10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995      11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995      11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996      10.019828  9.760564 10.279093  9.620151 10.41951
## Feb 1996      10.293265 10.034000 10.552530  9.893588 10.69294
## Mar 1996      10.831466 10.571566 11.091365 10.430810 11.23212
## Apr 1996      10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996      10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996      10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996      10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996      10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996      10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996      10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996      11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996      12.224288 11.965023 12.483552 11.824611 12.62396
## Jan 1997      10.284066 10.020703 10.547429  9.878072 10.69006
## Feb 1997      10.557503 10.294140 10.820866 10.151508 10.96350
## Mar 1997      11.095704 10.832028 11.359379 10.689227 11.50218
## Apr 1997      10.734179 10.470816 10.997542 10.328185 11.14017
## May 1997      10.781633 10.518270 11.044995 10.375638 11.18763
## Jun 1997      10.842994 10.579631 11.106357 10.436999 11.24899
## Jul 1997      11.026640 10.763277 11.290003 10.620645 11.43263
## Aug 1997      11.026169 10.762807 11.289532 10.620175 11.43216
## Sep 1997      11.129555 10.866192 11.392918 10.723560 11.53555
## Oct 1997      11.229637 10.966274 11.493000 10.823642 11.63563
## Nov 1997      11.711012 11.447650 11.974375 11.305018 12.11701
## Dec 1997      12.488526 12.225163 12.751888 12.082531 12.89452
## Jan 1998      10.548304 10.280291 10.816318 10.135140 10.96147
## Feb 1998      10.821741 10.553727 11.089754 10.408577 11.23490
## Mar 1998      11.359942 11.091928 11.627955 10.946778 11.77311
## Apr 1998      10.998417 10.730404 11.266431 10.585253 11.41158
## May 1998      11.045870 10.777857 11.313884 10.632707 11.45903
## Jun 1998      11.107232 10.839218 11.375245 10.694068 11.52040
## Jul 1998      11.290878 11.022864 11.558891 10.877714 11.70404
## Aug 1998      11.290407 11.022394 11.558421 10.877244 11.70357
## Sep 1998      11.393793 11.125779 11.661806 10.980629 11.80696
## Oct 1998      11.493875 11.225861 11.761888 11.080711 11.90704
## Nov 1998      11.975250 11.707237 12.243264 11.562086 12.38841
## Dec 1998      12.752764 12.484750 13.020777 12.339600 13.16593
## Jan 1999      10.812542 10.539354 11.085731 10.391400 11.23368
## Feb 1999      11.085979 10.812790 11.359167 10.664837 11.50712
## Mar 1999      11.624180 11.351293 11.897067 11.203503 12.04486
## Apr 1999      11.262655 10.989467 11.535844 10.841513 11.68380
## May 1999      11.310108 11.036920 11.583297 10.888967 11.73125
## Jun 1999      11.371470 11.098281 11.644658 10.950328 11.79261
## Jul 1999      11.555116 11.281927 11.828304 11.133974 11.97626
## Aug 1999      11.554645 11.281457 11.827834 11.133504 11.97579
## Sep 1999      11.658031 11.384842 11.931219 11.236889 12.07917
## Oct 1999      11.758113 11.484924 12.031301 11.336971 12.17925
## Nov 1999      12.239488 11.966300 12.512677 11.818347 12.66063
## Dec 1999      13.017001 12.743813 13.290190 12.595860 13.43814
## Jan 2000      11.076780 10.797921 11.355639 10.646897 11.50666
## Feb 2000      11.350217 11.071358 11.629076 10.920334 11.78010
## Mar 2000      11.888418 11.610150 12.166685 11.459447 12.31739
## Apr 2000      11.526893 11.248034 11.805752 11.097010 11.95678
## May 2000      11.574346 11.295487 11.853205 11.144463 12.00423
## Jun 2000      11.635708 11.356849 11.914566 11.205825 12.06559
## Jul 2000      11.819354 11.540495 12.098212 11.389471 12.24924
## Aug 2000      11.818883 11.540024 12.097742 11.389000 12.24877
## Sep 2000      11.922269 11.643410 12.201127 11.492386 12.35215
## Oct 2000      12.022350 11.743492 12.301209 11.592468 12.45223
## Nov 2000      12.503726 12.224867 12.782585 12.073843 12.93361
## Dec 2000      13.281239 13.002381 13.560098 12.851357 13.71112

1i

rawdata = as.data.frame(predicted)
rawdata = exp(rawdata)
rawdata[1:36,]
##          Point Forecast     Lo 80     Hi 80      Lo 95     Hi 95
## Jan 1994       13244.70  10285.82  17054.73   8969.583  19557.43
## Feb 1994       17409.81  13520.45  22418.00  11790.284  25707.73
## Mar 1994       18060.36  12860.02  25363.61  10699.594  30484.96
## Apr 1994       20774.16  16133.21  26750.16  14068.696  30675.62
## May 1994       21783.73  16917.24  28050.15  14752.395  32166.37
## Jun 1994       23162.27  17987.81  29825.24  15685.969  34201.95
## Jul 1994       27831.56  21613.98  35837.72  18848.111  41096.73
## Aug 1994       27818.48  21603.82  35820.87  18839.249  41077.41
## Sep 1994       30848.42  23956.87  39722.43  20891.193  45551.50
## Oct 1994       34095.57  26478.61  43903.67  23090.230  50346.32
## Nov 1994       55176.84  42850.31  71049.28  37366.903  81475.41
## Dec 1994      120067.79  93244.59 154607.08  81312.400 177294.90
## Jan 1995       17250.40  13357.65  22277.59  11629.938  25587.08
## Feb 1995       22675.20  17558.28  29283.31  15287.252  33633.55
## Mar 1995       38840.85  30046.98  50208.44  26146.972  57697.39
## Apr 1995       27057.06  20951.33  34942.16  18241.435  40133.06
## May 1995       28371.96  21969.51  36640.25  19127.918  42083.42
## Jun 1995       30167.42  23359.80  38958.95  20338.387  44746.58
## Jul 1995       36248.88  28068.91  46812.70  24438.412  53767.06
## Aug 1995       36231.84  28055.72  46790.69  24426.922  53741.78
## Sep 1995       40178.16  31111.50  51887.06  27087.467  59595.26
## Oct 1995       44407.37  34386.35  57348.77  29938.733  65868.34
## Nov 1995       71864.42  55647.40  92807.48  48449.831 106594.69
## Dec 1995      156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996       22467.57  17336.40  29117.46  15065.329  33506.86
## Feb 1996       29533.04  22788.25  38274.14  19802.984  44043.89
## Mar 1996       50587.81  39009.73  65602.25  33887.802  75517.62
## Apr 1996       35240.15  27191.96  45670.42  23629.808  52555.15
## May 1996       36952.72  28513.41  47889.88  24778.151  55109.18
## Jun 1996       39291.20  30317.82  50920.48  26346.183  58596.65
## Jul 1996       47211.93  36429.60  61185.57  31657.322  70409.18
## Aug 1996       47189.73  36412.48  61156.80  31642.439  70376.07
## Sep 1996       52329.57  40378.47  67817.91  35088.887  78041.33
## Oct 1996       57837.85  44628.77  74956.52  38782.394  86256.08
## Nov 1996       93598.96  72222.70 121302.09  62761.521 139588.15
## Dec 1996      203676.38 157160.50 263959.89 136572.460 303751.35

1j

These predictions could be improved by determining if there are additional predictors that have been left out of the model. For instance, any changes that the store makes in the way that they conduct their business due to continued increased sales growth such as an increase in the store’s marketing, a change in the product selection, or store hours could all affect future sales. Additionally, if there are any weather effects on sales that could explain some of the variability in ranges for some of the later months of the year, that information would be useful in the model.

2a

data(texasgas)
plot(texasgas$price, texasgas$consumption)

2b The slope of the fitted line should change with price because as price increases, consumption of gas generall decreases; however, consumption flattens once the price is greater than 60 rather than continuing to decrease. It would seem that consumers in these Texas towns cannot survive on less than about 40 thousand cubic feet of natural gas, which is why their demand remains the same despite the increasing price.

2c

fit.model1 = lm(consumption ~ exp(price), texasgas)
summary(fit.model1)
## 
## Call:
## lm(formula = consumption ~ exp(price), data = texasgas)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -35.86 -25.09 -13.86  20.64  65.14 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.086e+01  7.670e+00   9.238 2.98e-08 ***
## exp(price)  -1.642e-43  1.711e-43  -0.959     0.35    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.19 on 18 degrees of freedom
## Multiple R-squared:  0.04864,    Adjusted R-squared:  -0.004214 
## F-statistic: 0.9203 on 1 and 18 DF,  p-value: 0.3501
(summary(fit.model1)$sigma)^2
## [1] 1101.359
fit.model3= lm(consumption ~ poly(price,2), texasgas)
summary(fit.model3)
## 
## Call:
## lm(formula = consumption ~ poly(price, 2), data = texasgas)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5601  -5.4693   0.7502  11.0252  25.6619 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       69.000      3.213  21.472 9.35e-14 ***
## poly(price, 2)1 -114.043     14.371  -7.936 4.07e-07 ***
## poly(price, 2)2   65.737     14.371   4.574 0.000269 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.37 on 17 degrees of freedom
## Multiple R-squared:  0.8315, Adjusted R-squared:  0.8117 
## F-statistic: 41.95 on 2 and 17 DF,  p-value: 2.666e-07
(summary(fit.model3)$sigma)^2
## [1] 206.5276