===================================================================================================

sales <- read.table("sales.txt")
sales[1] <- NULL
names(sales) <- c("Sales")
sales <- ts(sales, start=1999, frequency=12)

plot(sales,main="Time Series Plot")

From the time series plot above, we can see yearly seasonality in sales as well as some trend. The sales look to decrease until 2006, then increase after. Perhaps a parabolic trend.

1 Fit a “trend only” model.

Below we fit a quadratic trend, and superimpose the trend on the time series.

t <- time(sales) # Extracting time as the explanatory variate from the time series framework of data

reg0 <- lm(sales~t + I(t^2)) # Just model quadratic trend
plot(sales, main = "Time Series with Quadratic Trend")
points(t,predict.lm(reg0),type='l',col='red') # superimpose the fit of model reg0 on the plot of the data

We will now check the diagnostics of the residials to check if this quadratic model meets the model assumptions.

par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg0$fitted, reg0$residuals,main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals") # plot of fitted values vs residuals
qqnorm(reg0$residuals) #qq-plot of residuals
qqline(reg0$residuals,col="blue") # plotting the line, along which the dots in qq-plot should lie
plot(reg0$residuals,main = "Residuals vs Time", ylab = "Residuals") # plotting the residuals vs time
abline(h=0,lty=2,col ="blue") # plotting a horizontal line at 0
acf(reg0$residuals,main="ACF plot of Residuals")#sample acf plot of residuals

shapiro.test(reg0$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  reg0$residuals
## W = 0.99161, p-value = 0.5535

The p-value in excess of 0.05 for the Shapiro-Wilk normality test supports the null hypothesis of normal distribution of residuals. We come to the same conclusion by noticing that the qq-plot displays a straight line. No pattern is apparent on the plot of residuals against the predicted values, or the risiduals over time. Therefore, we consider the model adequate.

However, the ACF plot we notice that the residuals are showing some seasonality. We will continue by attempting to capture this in our model.

2 Add a seasonal component

month <- as.factor(cycle(sales)) # Introducing month as the season
# cycle(sales) # Introducing month as the season

reg1 <- lm(sales~t + I(t^2) + month) # Just model quadratic trend

par(mfrow=c(1,1))
plot(sales, main = "Time Series with Trend and Season ")
points(t,predict.lm(reg1),type='l',col='red') # superimpose the fit of model reg0 on the plot of the data

print(paste("r-squared = ", summary(reg1)$r.squared))
## [1] "r-squared =  0.95294470661997"

The model is fitting the data very well. 95% of the variation in the observed values is explained by our model. We also observe that all of the estimates are significant at the .05 significance level

We will now check the diagnostics of the residials to check if this model meets the model assumptions.

par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg1$fitted, reg1$residuals,main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals") # plot of fitted values vs residuals
qqnorm(reg1$residuals) #qq-plot of residuals
qqline(reg1$residuals,col="blue") # plotting the line, along which the dots in qq-plot should lie
plot(reg1$residuals,main = "Residuals vs Time", ylab = "Residuals") # plotting the residuals vs time
abline(h=0,lty=2,col ="blue") # plotting a horizontal line at 0
acf(reg1$residuals,main="ACF plot of Residuals")#sample acf plot of residuals

shapiro.test(reg1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  reg1$residuals
## W = 0.9899, p-value = 0.3871

The p-value in excess of 0.05 for the Shapiro-Wilk normality test supports the null hypothesis of normal distribution of residuals. We come to the same conclusion by noticing that the qq-plot displays a straight line. No pattern is apparent on the plot of residuals against the predicted values, or the risiduals over time. Therefore, we consider the model adequate.

In the ACF plot we notice that the residuals do not contain the seasonality, as we have succesfully captured this in the model. However the residials are still showing correlation over time, at lag 1-4.

3 Compare the fit of the models

The second model clearly fits the observed data better as it accounts for both the trend and seasonal nature of the data. It has a higher \(R^2\), and all the predictors are significant.

Neither of the models satisfy all of the fundemental assumptions of least squares regression as the error terms were correlated with the past error terms.

4 Forecast 2011

t.new <- seq(2011,2012,length=13)[1:12] # Intoducing new time for forecatsting 
t2.new <- t.new^2
month.new <- factor(rep(1:12)) # Introducing the seasonal value for forecasting

new <- data.frame(t=t.new, t2=t2.new, month=month.new) # Putting the values for forecasting into a dataframe
pred <- predict.lm(reg1,new,interval='prediction') # Computing the prediction as well as prediction interval

par(mfrow=c(1,1))
plot(sales,xlim=c(1999,2012),ylim=c(0,60),main = "2011 Forecast") #plotting the data

abline(v=2011,col='blue',lty=2) # adding a vertical line at the point where prediction starts
lines(pred[,1]~t.new,type='l',col='red')# plotting the predict
lines(pred[,2]~t.new,col='green') # plotting lower limit of the prediction interval
lines(pred[,3]~t.new,col='green') # plotting upper limit of the  prediction interval

We can see the predictions and corresponding 95% prediction intervals above.