Libraries

library(ggplot2)

Problem 1

x = c(1,-1)
px = c(0.5, 0.5)
draws = sample(x, size=500, replace=TRUE, prob=px)
St = cumsum(draws)
St = c(0, St)
data = data.frame(time = c(0:500), randwalk=St)
gplot <- ggplot(data, aes(x=time, y=randwalk)) + geom_line() + xlab('')
gplot

acf(St)

Due to the sample ACF plot not converging to the standard normal bounds, we can confidently claim that this time series is not stationary.

Problem 2: Calculate the ACF of Xt

Suppose we have \(X_t = \alpha_1X_{t-1} + \alpha_2X_{t-2} + Z_t\), where \(\alpha_1 = 0.3\) and \(\alpha_2 = 0.5\) and \(Z_t \sim WN(0, \sigma^2)\).

We first show that \(E[X_t] = 0\).

To do this we first note that by successive back substitution (known as the invertibility condition) into the model we can write the \(X_t\)s as an infinite linear combination of the purely random process \(Z_t \sim WN(0, \sigma^2)\). That is,

\[ X_t = \sum_{j=0}^\infty\beta_jZ_{t-j} \]

While there is a relationship between these new coefficient \(\beta\)’s and the previous coefficients \(\alpha\)’s (in our case \(\alpha_1 = 0.3\) and \(\alpha_2 = 0.5\)), it is unimportant for our calculation of expectation. Hence, we have the following.

\[ \begin{align} E[X_t] &= \sum_{j=0}^\infty\beta_jE[Z_{t-j}]\\ &=\sum_{j=0}^\infty\beta_j * 0\\ &= 0 \end{align} \]

Now that we have shown \(E[X_t] = 0\), we can calculate the covariance of two terms in our time series. That is,

\[ \begin{align} \gamma(h) &= Cov(X_t, X_{t+h})\\ &= Cov(X_t, X_{t-h})\\ &= E[X_tX_{t-h}] - E[X_t]E[X_{t-h}]\\ &= E[X_tX_{t-h}] - 0*0\\ &= E[X_tX_{t-h}] \end{align} \]

Where since we already proved \(E[X_t] = 0\) we have the nice property that \(\gamma(h) = E[X_tX_{t-h}]\).

Without loss of generality suppose \(|h| \geq 0\). Continuing we have

\[ \begin{align} \gamma(h) &= E[X_tX_{t-h}]\\ &= E[\alpha_1X_{t-1}X_{t-h} + \alpha_2X_{t-2}X_{t-h} + Z_tX_{t-h}]\\ &= \alpha_1E[X_{t-1}X_{t-h}] + \alpha_2E[X_{t-2}X_{t-h}] + E[Z_tX_{t-h}]\\ &= \alpha_1E[X_{t-1}X_{t-h}] + \alpha_2E[X_{t-2}X_{t-h}] + 0\\ &= \alpha_1\gamma(h-1) + \alpha_2\gamma(h-2) \end{align} \]

In the above we have \(E[Z_tX_{t-h}] = 0\) because the \(X_j\)’s and \(Z_t\)’s are uncorrelated for \(j < t\).

Furthermore, we have shown that \(\gamma(h)\), the autocovariance function, can be written as a recursive relationship between it’s two previous terms.

We now divide each side by \(\gamma(0)\) to begin the derivation of the autocorrelation function.

\[ \begin{align} \rho(h) &= \frac{\gamma(h)}{\gamma(0)}\\ &= \alpha_1\rho(h-1) + \alpha_2\rho(h-2) \end{align} \]

We know that \(\rho(0) = 1\) because by definition \(\rho(0) = \frac{\gamma(0)}{\gamma(0)}\). Furthermore, since \(\gamma(h)\) is an even function, it follows that \(\rho(h)\) is too. We will use these two properties to evaluate \(\rho(1)\). We set h = 1.

\[ \begin{align} \rho(1) &= \alpha_1\rho(0) + \alpha_2\rho(-1)\\ &= \alpha_1*1 + \alpha_2\rho(1) \end{align} \] Thus, we have \[ \rho(1) = \alpha_1 + \alpha_2\rho(1) \implies \rho(1) = \frac{\alpha_1}{1- \alpha_2} \]

To compute \(\rho(2)\) we set h = 2 and back-substitute the two initial values.

\[ \begin{align} \rho(2) &= \alpha_1\rho(1) + \alpha_2\rho(0)\\ &= \alpha_1(\frac{\alpha_1}{1-\alpha_2}) + \alpha_2*1\\ &= \frac{\alpha_1^2}{1-\alpha_2} + \alpha_2 \end{align} \]

Since we have \(\alpha_1 = 0.3\) and \(\alpha_2=0.5\) we can conclude the following:

\(\rho(1) = 0.6\) and \(\rho(2) = 0.68\).

With our recursive relationship \(\rho(h) = 0.3\rho(h-1) + 0.5\rho(h-2)\) and our initial values of \(\rho(0) = 1\) and \(\rho(1) = 0.6\) and \(\rho(2) = 0.68\), we have fully specified the autocorrelation function of \(X_t\).

Problem 3: MA(2) Process

Problem 3a.) Show the process is stationary

We must first show that \(E[X_t] = \mu\), where \(\mu\) is a constant independent of \(t\).

\[ \begin{align} E[X_t] &= E[Z_t] + \theta E[Z_{t-1}] + \theta^2E[Z_{t-2}]\\ &= 0 + \theta*0 + \theta^2*0\\ &= 0 \end{align} \]

We now must show that the autocovariance function is a function of only the shift \(h\) and independent of the time \(t\).

We first write an expression for \(Cov(X_t, X_{t+h})\) for a general \(h\).

\[ \begin{align} Cov(X_t, X_{t+h}) &= E[X_tX_{t+h}] - E[X_t]E[X_{t+h}]\\ &= E[(Z_t + \theta Z_{t-1} + \theta^2Z_{t-2})(Z_{t+h} + \theta Z_{t+h-1} + \theta^2Z_{t+h-2})] - 0*0\\ &= E[Z_tZ_{t+h}] + \theta E[Z_tZ_{t+h-1}] + \theta^2E[Z_tZ_{t+h-2}] + \theta E[Z_{t-1}Z_{t+h}] + \theta^2E[Z_{t-1}Z_{t+h-1}] + \theta^3E[Z_{t-1}Z_{t+h-2}] + \theta^2E[Z_{t-2}Z_{t+h}] + \theta^3E[Z_{t-2}Z_{t+h-1}] + \theta^4E[Z_{t-2}Z_{t+h-2}] \end{align} \]

When h = 0 we have our general h expression turn into (due to independent white noise)

\[ \begin{align} Cov(X_t, X_{t+h}) &= E[Z_t^2] + 0 + 0 + 0 + \theta^2E[Z_{t-1}^2] + 0 + 0 + 0 + \theta^4E[Z_{t-2}^2]\\ &= \sigma^2_z + \theta^2\sigma^2_z + \theta^4\sigma^2_z\\ &= \sigma^2_z(1 + \theta^2 + \theta^4) \end{align} \]

Similarly when h = 1 we have our general h expression turn into

\[ \begin{align} Cov(X_t, X_{t+h}) &= 0 + E[Z_t^2] + 0 + 0 + 0 + \theta^3E[Z_{t-1}^2] + 0 + 0 + 0\\ &= \sigma^2_z(\theta + \theta^3) \end{align} \]

Lastly, when h = 2 we have the following:

\[ \begin{align} Cov(X_t, X_{t+h}) &= 0 + 0 + \theta^2E[Z_t^2] + 0 + 0 + 0 + 0 + 0 + 0\\ &= \theta^2\sigma^2_z \end{align} \]

As we can see in all three cases our autocovariance function is only a function of \(h\) and not of the time \(t\). This, in combination with constant mean, implies that this series is stationary.

Problem 3b) Max and Min Values of the Correlation

With \(X_t\) and \(X_{t-2}\) we have a lag of \(h = 2\). Thus we will examine the autocorrelation function at h = 2.

\[ \begin{align} \rho(2) &= \frac{\gamma(2)}{\gamma(0)}\\ &= \frac{\sigma^2_z\theta^2}{\sigma^2_z(1 + \theta^2 + \theta^4)}\\ &= \frac{\theta^2}{1 + \theta^2 + \theta^4} \end{align} \]

By looking for the critical values of this function (take 1st derivatives) it can be shown that the minimum value is 0 (obtained at \(\theta = 0\)) and the maximum value is \(\frac{1}{3}\) (obtained at \(\theta\) = 1 or \(\theta\) = -1)

Problem 4a.) Comment on the stationarity of the TotalSales.txt data

data = read.table('TotalSales.txt', header=FALSE)
qtrseq <- seq(as.Date('1995-01-01'), by='quarter', length.out=nrow(data))
data.df = data.frame(Year=qtrseq, Earnings=data$V1)
p <-ggplot(data.df,aes(x = Year,y=Earnings))+geom_line( color="steelblue")+geom_point()+xlab("Year")+ylab("Quarterly Sales (in Thousand $)")
p

acf(data.df$Earnings, main='Sample ACF Plot of Earnings TS')

Due to the sample ACF plot not converging to the standard normal bounds, we can confidently claim that this time series is not stationary.

Problem 4b.) Fit Two Appropriate Polynomial Models

We will compare a linear model with seasonal component and quadratic model with seasonal component.

We first prepare our data data.frame

seasonal.cycle = rep(c(1,2,3,4), length.out=nrow(data.df))
seasonal.cycle = as.factor(seasonal.cycle)
data.df = cbind.data.frame(data.df, seasonal.cycle)

myts = ts(data.df$Earnings, start=c(1995,1), frequency=4)
tim = time(myts)

data.df = cbind.data.frame(data.df, tim)

Model: Linear with Seasonal (AIC = 322.3951)

reg1 <- lm(Earnings~tim + seasonal.cycle, data=data.df)

plot(myts, ylab='Quarterly Earnings', main='Model Linear Fit w/Seasonal')
points(data.df$tim,predict.lm(reg1),type='l',col='red')

#AIC(reg1) 322.3951
par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg1$fitted, reg1$residuals) # plot of fitted values vs residuals 
qqnorm(reg1$residuals) #qq-plot of residuals
qqline(reg1$residuals) # plotting the line, along which the dots in qq-plot should lie 
plot(reg1$residuals) # plotting the residuals vs time
abline(h=0,lty=2) # plotting a horizontal line at 0
acf(reg1$residuals) #sample acf plot of residuals

The residuals do not look all that great in this model (linear with seasonal). We can see a decreasing variance as the index time increases. This violates our assumption that the residuals are constant indendependent of time.

Model: Quadratic Model with Seasonal (AIC: 319.1077)

tim2 = data.df$tim^2
data.df = cbind.data.frame(data.df, tim2)
reg2 =  lm(Earnings~tim + tim2 + seasonal.cycle, data=data.df)
plot(myts, ylab='Quarterly Earnings', main='Model Quad Fit w/Seasonal')
points(data.df$tim,predict.lm(reg2),type='l',col='red')

#AIC(reg2) 319.1077
par(mfrow=c(2,2)) # Dividing the plotting page into 4 panels
plot(reg2$fitted, reg1$residuals) # plot of fitted values vs residuals 
qqnorm(reg2$residuals) #qq-plot of residuals
qqline(reg2$residuals) # plotting the line, along which the dots in qq-plot should lie 
plot(reg2$residuals) # plotting the residuals vs time
abline(h=0,lty=2) # plotting a horizontal line at 0
acf(reg2$residuals) #sample acf plot of residuals

Again our residuals do not look that great. We still have the nonconstant variance. In particular, our variance is decreasing as time increases.

Model Selection Conclusion

In conclusion, neither model’s residual diagnostics provided evidence that our models were good fits. Both had issues with nonconstant variance in the residuals. The linear model AIC was 322, while the quadratic model AIC was 319. This is only about a 1% increase in performance for the quadratic model. It is by no means a definite winner and some more polynomial models would need to be tried (and residual diagnostics assessed) before a definitive model could be chosen.