===================================================================================================
sales <- read.table("sales.txt")
sales[1] <- NULL
names(sales) <- c("Sales")
sales <- ts(sales, start=1999, frequency=12)
plot(sales,main="Time Series Plot")
From the time series plot above, we can see yearly seasonality in sales as well as some trend. The sales look to decrease until 2006, then increase after. Perhaps a parabolic trend.
Below we fit a quadratic trend, and superimpose the trend on the time series.
t <- time(sales) # Extracting time as the explanatory variate from the time series framework of data
reg0 <- lm(sales~t + I(t^2)) # Just model quadratic trend
plot(sales, main = "Time Series with Quadratic Trend")
points(t,predict.lm(reg0),type='l',col='red') # superimpose the fit of model reg0 on the plot of the data
We will now check the diagnostics of the residials to check if this quadratic model meets the model assumptions.
par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg0$fitted, reg0$residuals,main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals") # plot of fitted values vs residuals
qqnorm(reg0$residuals) #qq-plot of residuals
qqline(reg0$residuals,col="blue") # plotting the line, along which the dots in qq-plot should lie
plot(reg0$residuals,main = "Residuals vs Time", ylab = "Residuals") # plotting the residuals vs time
abline(h=0,lty=2,col ="blue") # plotting a horizontal line at 0
acf(reg0$residuals,main="ACF plot of Residuals")#sample acf plot of residuals
shapiro.test(reg0$residuals)
##
## Shapiro-Wilk normality test
##
## data: reg0$residuals
## W = 0.99161, p-value = 0.5535
The p-value in excess of 0.05 for the Shapiro-Wilk normality test supports the null hypothesis of normal distribution of residuals. We come to the same conclusion by noticing that the qq-plot displays a straight line. No pattern is apparent on the plot of residuals against the predicted values, or the risiduals over time. Therefore, we consider the model adequate.
However, the ACF plot we notice that the residuals are showing some seasonality. We will continue by attempting to capture this in our model.
month <- as.factor(cycle(sales)) # Introducing month as the season
# cycle(sales) # Introducing month as the season
reg1 <- lm(sales~t + I(t^2) + month) # Just model quadratic trend
par(mfrow=c(1,1))
plot(sales, main = "Time Series with Trend and Season ")
points(t,predict.lm(reg1),type='l',col='red') # superimpose the fit of model reg0 on the plot of the data
print(paste("r-squared = ", summary(reg1)$r.squared))
## [1] "r-squared = 0.95294470661997"
The model is fitting the data very well. 95% of the variation in the observed values is explained by our model. We also observe that all of the estimates are significant at the .05 significance level
We will now check the diagnostics of the residials to check if this model meets the model assumptions.
par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg1$fitted, reg1$residuals,main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals") # plot of fitted values vs residuals
qqnorm(reg1$residuals) #qq-plot of residuals
qqline(reg1$residuals,col="blue") # plotting the line, along which the dots in qq-plot should lie
plot(reg1$residuals,main = "Residuals vs Time", ylab = "Residuals") # plotting the residuals vs time
abline(h=0,lty=2,col ="blue") # plotting a horizontal line at 0
acf(reg1$residuals,main="ACF plot of Residuals")#sample acf plot of residuals
shapiro.test(reg1$residuals)
##
## Shapiro-Wilk normality test
##
## data: reg1$residuals
## W = 0.9899, p-value = 0.3871
The p-value in excess of 0.05 for the Shapiro-Wilk normality test supports the null hypothesis of normal distribution of residuals. We come to the same conclusion by noticing that the qq-plot displays a straight line. No pattern is apparent on the plot of residuals against the predicted values, or the risiduals over time. Therefore, we consider the model adequate.
In the ACF plot we notice that the residuals do not contain the seasonality, as we have succesfully captured this in the model. However the residials are still showing correlation over time, at lag 1-4.
The second model clearly fits the observed data better as it accounts for both the trend and seasonal nature of the data. It has a higher \(R^2\), and all the predictors are significant.
Neither of the models satisfy all of the fundemental assumptions of least squares regression as the error terms were correlated with the past error terms.
t.new <- seq(2011,2012,length=13)[1:12] # Intoducing new time for forecatsting
t2.new <- t.new^2
month.new <- factor(rep(1:12)) # Introducing the seasonal value for forecasting
new <- data.frame(t=t.new, t2=t2.new, month=month.new) # Putting the values for forecasting into a dataframe
pred <- predict.lm(reg1,new,interval='prediction') # Computing the prediction as well as prediction interval
par(mfrow=c(1,1))
plot(sales,xlim=c(1999,2012),ylim=c(0,60),main = "2011 Forecast") #plotting the data
abline(v=2011,col='blue',lty=2) # adding a vertical line at the point where prediction starts
lines(pred[,1]~t.new,type='l',col='red')# plotting the predict
lines(pred[,2]~t.new,col='green') # plotting lower limit of the prediction interval
lines(pred[,3]~t.new,col='green') # plotting upper limit of the prediction interval
We can see the predictions and corresponding 95% prediction intervals above.