find the full version here https://rpubs.com/TCML/1245841
\[ r_t = -0.3 + 0.8 r_{t-1} -0.2 r_{t-2} + \epsilon_t \]
####a) the mean of the stock return
this is an AR(2) process with \(\alpha_0 = -0.3, \alpha_1 = 0.8, \alpha_2 = -0.2\), assuming this is stationary then,
\[ \mu = \frac{\alpha_0}{1-\alpha_1-\alpha_2} = \frac{-0.3}{(1-0.8+0.2)}=\frac{-0.3}{0.4}=-0.75 \]
####b) the variance of stock return
the variance \(\gamma_0\) of an AR(2) process is given by
\[ \gamma_0 = \frac{\sigma_n^2+2\alpha_1\alpha_2\gamma_1}{1-\alpha_1^2-\alpha_2^2} =\frac{4+2(0.8)(-0.2)\gamma_1}{1-0.64-0.04}= \frac{16-0.32\gamma_1}{0.32} \]
furthermore, we also have the eqution \[ \gamma_1 = \frac{\alpha_1\gamma_0}{1-\alpha_2}=\frac{0.8}{1.2}\gamma_0=\frac{2}{3}\gamma_0 \] from \(\gamma(j=1)=\alpha_2\gamma_{j-2}+\alpha_1\gamma_{j-1}, \gamma_{-1}=\gamma_1\)
substituting this into
\[ 0.32\gamma_0 -0.3= 4-0.32(\frac{2}{3}\gamma_0)\\ 0.32\gamma_0(1-\frac{2}{3}) = 4-0.32\gamma_1\\ \gamma_0 =\frac{4}{0.32}(3)=37.5 \]
both the mean and the variance exists and well behaved, therefore it is likely that it is stationary, but we need to check the poles of the system
####d) is there business ciclicality in the system?
first re-arrange the returns into the following form, and solve for the characteristic polynomial
\[ r_t +0.8 r\_{t-1} - 0.2 r\_{t-2} = 0\\ \Rightarrow (z^{2}+0.8z-0.2)R_z = 0\\ 5z^2+4z-1 = 0 \]
solving for \(z\) in this equation gives us the poles to the charateristic equation
\[ z_p=\frac{-4\pm\sqrt{16+4(5)}}{2(5)}=\frac{-4\pm\sqrt{36}}{10} = \frac{-2\pm3}{5} \\ z_p = 0.2 , z_p = -1 \]
poles are within the unit circle, and are real, therefore there is no cyclicality, and is marginally stable (hence stationary)
\[ \hat{r}_{h+1|h} = -0.3+0.8r_h-0.2r_{h-1}\\ =-0.3+0.8(0.5)-0.2(0.5)=-0.3+0.4-0.1\\ \hat{r}_{h+1|h} = 0 \]
\[ \hat{r}_{h+2|h} = -0.3+0.8\hat{r}_{h+1|h}-0.2r_{h} \\ = -0.3+0.8(0)-0.2(0.5) = -0.3-0.1\\ \hat{r}_{h+2|h} = -0.4 \]
\[ \hat{r}_{h+3|h} = -0.3+0.8\hat{r}_{h+2|h}-0.2\hat{r}_{h+1|h} \\ =-0.3+0.8(0.4)-0.2(0)=-0.3-0.32\\ \hat{r}_{h+3|h} =-0.62 \]
assuming the raw data refers to the (prices)
library("quantmod")
library("tidyverse")
library('e1071')
library('TSA')
library('tseries')
setwd("D:/RstudioTemp/HW2")
getSymbols("MMM",from='2009-01-01',to='2024-10-23')
[1] "MMM"
SLR = log(MMM)
SLR = SLR - lag(SLR)
Mean
apply(MMM,2,mean,na.rm=TRUE)
MMM.Open MMM.High MMM.Low MMM.Close MMM.Volume MMM.Adjusted
1.158938e+02 1.168100e+02 1.149263e+02 1.158998e+02 3.804437e+06 9.075854e+01
Variance
apply(MMM,2,var,na.rm=TRUE)
MMM.Open MMM.High MMM.Low MMM.Close MMM.Volume MMM.Adjusted
1.499258e+03 1.515998e+03 1.479781e+03 1.497798e+03 8.550265e+12 1.183841e+03
Excess kurtosis
apply(MMM,2,kurtosis,na.rm=TRUE)-3
MMM.Open MMM.High MMM.Low MMM.Close MMM.Volume MMM.Adjusted
-3.973097 -3.974175 -3.972540 -3.974739 110.357639 -4.098584
Skew
apply(MMM,2,skewness,na.rm=TRUE)
MMM.Open MMM.High MMM.Low MMM.Close MMM.Volume MMM.Adjusted
0.09703695 0.09594662 0.09658766 0.09651089 8.31144900 -0.10050693
from here on, we will use then adjusted log returns for our model
data <- SLR$MMM.Adjusted
data <-data %>% na.omit() %>% as.numeric()
n_ahead=15
# spliting data into working and testing datasets
wdata = head(data,length(data)-n_ahead+1)
tdata = tail(data,n_ahead)
PACF method,
sing the PACF, it suggest an AR model order of up to 7 as it seems significant,
pacf_data<-pacf(wdata)
AIC Method here for the AIC method, order p = 10 returns the lowest AIC value hence t is suggested, we will use order p = 10 from the AIC method
# fitting AR model using an AIC selection
m=ar(wdata,na.rm=TRUE,method = "mle")
m_order=m$order
m_aic=m$aic
# plotting AIC for different models
plot(m_aic, type = 'o',xlab="aic",ylab='lags',main='aic on different lags')
# training AR model using p=10 form the AIC method
m_ar =arima(wdata,order = c(m_order,0,0))
# function to plot predictions
plot_predictions <- function(model, data, len, n_ahead, p) {
# Generate predictions
pred <- predict(model, n.ahead = n_ahead)
# Indices for predictions
idx <- ((len + 1):(len + n_ahead))
# calculate the stdval given a p, this assumes twin tails
q = 1-(1-p)/2
stdval = qnorm(q)
# Determine y-axis range
ymin <- min(pred$pred - stdval * pred$se,tail(data, len),na.rm=TRUE)
ymax <- max(pred$pred + stdval * pred$se,tail(data, len),na.rm=TRUE)
# Plot the original data (last 'len' points)
plot(
tail(data, len), type = 'l',
xlab = 'Time', ylab = 'values',
main = 'Predicted values',
xlim = c(0, len + n_ahead), ylim = c(ymin, ymax)
)
# Plot predictions
lines(c(len, idx), c(tail(data, 1), pred$pred), col = 'blue', lwd = 2)
# Plot confidence intervals
lines(idx, pred$pred + stdval * pred$se, col = 'red', lwd = 2)
lines(idx, pred$pred - stdval * pred$se, col = 'red', lwd = 2)
}
# function to evaluate model adequacy
eval_model <- function(m, data,len,n_ahead,CI){
# print model
m
# LB test
lagval = 12
ans = Box.test(m$residuals,type = 'Ljung',lag=lagval)
print(ans)
if (ans$p.value>= 0.05) {
HTresult = "cannot"
HTword = "IS NOT"
}else{
HTresult = "can"
HTword = "IS"
}
text = sprintf("Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for %s lags. The p value for the test returns p = %s, we %s reject the Null Hypothesis at 5 percent level; that is there %s serial dependancies in the residual",lagval,round(ans$p.value,6),HTresult, HTword)
print(text)
# visualise residual and plot residual acf
plot(m$residuals, main = 'model residuals', ylab = 'residuals', xlab = 'time')
acf(m$residuals)
# visualise predictions
plot_predictions(m,data,len,n_ahead,CI)
}
plotting and evaluating model adequacy for 95% CI, predicted returns and actual returns seems reasonably similar
len = 40
plot_predictions(m_ar,wdata,len,n_ahead,0.95)
# comparing with actual returns
idx = ((len):(len+n_ahead))
lines(idx,c(tail(wdata,1),tdata))
For MA models we expect a significant auto correlation, before a sharp drop off at lag q, therefore we should be able to visually select q based on the ACF , by looking for a drop off in auto-correlations
looking at the ACF of our data, we see significant values at q = 1 & 7 for our candidate models.
here I propose q = 7, as after 7 we see there’s no more significant auto correlations until lag = 22
acf(wdata)
q = 7
m_ma=arima(wdata,order=c(0,0,q))
again results are reasonably good, but qualitatively indistinguishable from our AR model
plot_predictions(m_ma,wdata,len,n_ahead,0.99)
# comparing with actual returns
idx = ((len):(len+n_ahead))
lines(idx,c(tail(wdata,1),tdata))
AR model the AR model, doesn’t show any significant serial dependencies in the residuals, and the LB test confirms this. the ACF of the results indicate there might be some significant values remaining (ACF of white noise is the Dirac delta function), but only for higher lags and not significant for the LB test
overall the model seem adequate
eval_model(m_ar,wdata,len,n_ahead,0.95)
Box-Ljung test
data: m$residuals
X-squared = 1.5092, df = 12, p-value = 0.9999
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.999865, we cannot reject the Null Hypothesis at 5 percent level; that is there IS NOT serial dependancies in the residual"
MA model Again the MA model, doesn’t show any significant serial dependencies in the residuals, and the LB test confirms this.
the ACF of the results indicate there might be some significant values remaining, again not significant for the LB test, but the order of lag seems to be lower, and for more lags than the AR model
overall the model also seem adequate, but the AR model seems to be better, and the AIC values confirms this
eval_model(m_ma,wdata,len,n_ahead,0.99)
Box-Ljung test
data: m$residuals
X-squared = 6.8339, df = 12, p-value = 0.8684
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.86839, we cannot reject the Null Hypothesis at 5 percent level; that is there IS NOT serial dependancies in the residual"
print(sprintf("AIC : AR MODEL = %s; MA MODEL = %s",round(m_ar$aic,6),round(m_ma$aic,6)))
[1] "AIC : AR MODEL = -22061.696818; MA MODEL = -22059.595214"
VIX is the volatility index for the next 30 days for the S&P 500, and is calculated based on various put and call options. it can be interpreted as the level of the expectation of near term risks in the market, and have been called the “fear gauge”
getSymbols("VIX",from='2009-01-01',to= '2024-10-23')
Warning: VIX contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
[1] "VIX"
VIX = na.omit(VIX)
VIXR = log(VIX)
VIXR = VIXR - lag(VIXR)
here we uses the closing data of the VIX. Visual inspection of the raw data, and the distribution suggest it is non stationary.
to confirm this we run the ADF Test. The Null hypothesis states that the time series is non-stationary. our p-value is much greater than 5% hence we cannot reject the null hypothesis, which confirms our observations
det_lag <- function(data){
data = na.omit(data)
nobs = length(data)
p = ceiling(12*(nobs/100)^0.25)
return(p)
}
plot(VIX$VIX.Close)
wdata_vix = VIX$VIX.Close
wdata_vix <- as.numeric(wdata_vix)
lagval = det_lag(wdata_vix)
plot(density(wdata_vix))
adf.test(wdata_vix,k=lagval)
Augmented Dickey-Fuller Test
data: wdata_vix
Dickey-Fuller = -2.6982, Lag order = 21, p-value = 0.2828
alternative hypothesis: stationary
yes, again using the ADF test, here the p value is << 5% hence we can reject the null hypothesis, this means that the time series is stationary
wdata_dvix = wdata_vix - lag(wdata_vix)
wdata_dvix <- na.omit(wdata_dvix)
plot(density(wdata_dvix))
adf.test(wdata_dvix,k=lagval)
Warning in adf.test(wdata_dvix, k = lagval) :
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: wdata_dvix
Dickey-Fuller = -6.6889, Lag order = 21, p-value = 0.01
alternative hypothesis: stationary
here we used the eacf method. it gave two viable candidates ARMA(0,1) and (2,1)
eacf(wdata_dvix)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o x o x o o o o o o o
1 x x o o x o o o o o o o o o
2 x o o o x o o o o o o o o o
3 x x o o o o x x o o o o o o
4 x x x x o o o o o o o o o o
5 x x o x x o o o o o o o o o
6 x x o x x x x o o o o o o o
7 x x o x x o x o o o o o o o
m1_vix = arima(wdata_vix,order = c(0,1,1))
m2_vix = arima(wdata_vix,order = c(2,1,1))
evaluating the first model, we see that we fail to reject the null hypothesis for our ARMA(1,0) model at 5%, therefore there are likely serial dependencies in the residuals
hence the model is inadequate
eval_model(m1_vix,wdata_vix,100,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 19.17, df = 12, p-value = 0.0845
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.084498, we cannot reject the Null Hypothesis at 5 percent level; that is there IS NOT serial dependancies in the residual"
evaluating the second model, the LB test shows that there are like no serial dependency in the residuals, and hence seems to be adequate
eval_model(m2_vix,wdata_vix,100,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 22.02, df = 12, p-value = 0.0373
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.037301, we can reject the Null Hypothesis at 5 percent level; that is there IS serial dependancies in the residual"
temperature
in_data <- read.table("m-globaltemp.txt",header=T)
wdata_tp <- in_data[,2:13]
wdata_tp <- c(t(wdata_tp))
plot(wdata_tp/100,type='l',main='tempurature against time',ylab = 'temperature(degrees C)',xlab = 'time')
plot(density(wdata_tp/100))
NA
NA
1st differneced temperature
wdata_dtp <- diff(wdata_tp)
plot(wdata_dtp/100,type='l',main = 'differenced tempurature against time', xlab = 'time', ylab = 'differenced temp (degrees C)')
plot(density(wdata_dtp/100))
Here we conduct the ADF test on the temperature (wdata_tp) and 1st differenced series (wdata_dtp)
from our ADF test, we see that we cannot reject our null hypothesis (where the time series is stationary) at the 5% level for the temperature series. this suggest that there is a unit root
on the other hand we see that we can reject our null hypothesis (where the time series is stationary) at the 5% level for the 1st difference series. this suggest that there is no unit root
lag_k = det_lag(wdata_tp)
adf.test(wdata_tp,k=lag_k)
Augmented Dickey-Fuller Test
data: wdata_tp
Dickey-Fuller = -3.0256, Lag order = 25, p-value = 0.1442
alternative hypothesis: stationary
adf.test(wdata_dtp,k=lag_k)
Warning in adf.test(wdata_dtp, k = lag_k) :
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: wdata_dtp
Dickey-Fuller = -11.405, Lag order = 25, p-value = 0.01
alternative hypothesis: stationary
we use the MLE method for the differenced data, which suggest a model order of p = 11
# AR model to fit the differenced data (z_t = x_t - x_t-1)
m = ar(wdata_dtp, method='mle')
plot((0:(length(m$aic)-1)),m$aic,type='o')
print(sprintf("order p = %s",m$order))
[1] "order p = 11"
m_ar_dtp = arima(wdata_tp , order = c(m$order,1,0))
plot_predictions(m_ar_dtp,wdata_tp,100,12,0.95)
we use the MLE method for the original data, which suggest a model order of p = 12
Note think there’s a typo in the question? shouldn’t it be \(arima(x_t, order = c(p,\color{red}{\bf{0}},0))\) ? as we are working with the original data?
# AR model to fit the original data (x_t)
m = ar(wdata_tp, method='mle')
plot((0:(length(m$aic)-1)),m$aic,type='o')
print(sprintf("order p = %s",m$order))
[1] "order p = 12"
m_ar_tp = arima(wdata_tp , order = c(m$order,0,0))
plot_predictions(m_ar_tp,wdata_tp,100,12,0.95)
the eacf produced a prime candidate of ARIMA(0,1,1), yet the LB test shows the model is not adequate. we see that there are still serial dependencies every 6 and 12 periods. this could be because there exists seasonality in the data
# eacf for an arma to fit the differenced data (z_t)
eacf(wdata_dtp)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o o o o o o o o x o o
1 x x o o o o o o o o o x o o
2 x x x o o o o o o o o o o o
3 x x x x o o o o o o o o o o
4 x x x x x o o o o o o o o o
5 x x x o x x o o o o o o o o
6 x x x x o o o o o o o o o o
7 x x x x o o o o o o o o o o
lag_k = det_lag(wdata_dtp)
m_arma1_dtp = arima(wdata_tp , order = c(0,1,1))
eval_model(m_arma1_dtp,wdata_tp,100,n_ahead,0.95)
Box-Ljung test
data: m$residuals
X-squared = 95.303, df = 12, p-value = 4.552e-15
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0, we can reject the Null Hypothesis at 5 percent level; that is there IS serial dependancies in the residual"
here the model is still inadequate as the LB test shows that there are still serial dependencies, specifically the ACF shows highly significant values at lag 12 and 24, suggesting our seasonal terms are unable to capture the seasonal effects
m_arima_s1_tp = arima(wdata_tp,order = c(1,1,2), seasonal = list(order=c(1,1,0),period=12))
eval_model(m_arima_s1_tp,wdata_tp,200,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 55.704, df = 12, p-value = 1.353e-07
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0, we can reject the Null Hypothesis at 5 percent level; that is there IS serial dependancies in the residual"
we assume that we are working on the original data (xt)
Note I think there’s also a typo here? shouldn’t it be \[ (1-\color{red}{\mathbf{sar_1}}L^{12})(1-\color{red}{\mathbf{ar_1}}L)x_t= (1-ma_1L)(1-sma_1L^{12})\epsilon_t\\ \] also according to this model, I think the code should be as follows? \[ m = arima(x_t, order = c(\color{red}{\bf{1}},\color{red}{\bf{0}},1),seasonal = list(order = c(\color{red}{\bf{1}},\color{red}{\bf{0}},1), period = 12) \] testing assuming there is a typo, the model is inadequate as the LB test shows serial dependencies in the residual
m_arima_s2_tp = arima(wdata_tp,order = c(1,0,1), seasonal = list(order=c(1,0,1),period=12))
eval_model(m_arima_s2_tp,wdata_tp,200,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 23.519, df = 12, p-value = 0.02363
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.023627, we can reject the Null Hypothesis at 5 percent level; that is there IS serial dependancies in the residual"
testing assuming there isn’t a typo in the code, the model is also inadequate as the LB test shows serial dependencies in the residual
m_arima_s3_tp = arima(wdata_tp,order = c(0,1,1), seasonal = list(order=c(0,1,1),period=12))
eval_model(m_arima_s3_tp,wdata_tp,200,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 41.874, df = 12, p-value = 3.498e-05
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 3.5e-05, we can reject the Null Hypothesis at 5 percent level; that is there IS serial dependancies in the residual"
suggestion: from part (i) we saw that the model returns good results except for highly significant non-zero auto correlations at lag 12 & 24 which suggest the annual seasonality is not sufficiently captured. instead we can try to use an more seasonal terms to capture these seasonal effects.
let’s try introducing a seasonal MA term, here we saw the results produce a more adequate model, at least in the sense that LB test shows no serial dependencies in the residual, and also a smaller AIC in all the models so far
m_arima_sa_tp = arima(wdata_tp,order = c(1,1,2), seasonal = list(order=c(1,1,1),period=12))
eval_model(m_arima_sa_tp,wdata_tp,200,15,0.95)
Box-Ljung test
data: m$residuals
X-squared = 10.56, df = 12, p-value = 0.5669
[1] "Ljung-Box test: Null Hypothesis states there is no serial dependancies (auto-correlations) in the residual for 12 lags. The p value for the test returns p = 0.56693, we cannot reject the Null Hypothesis at 5 percent level; that is there IS NOT serial dependancies in the residual"