1 Non Stationay Variables- the random walk model for stock prices

The random walk hypothesis in Finance (Fama, 1965) states that the natural logarithm of stock prices behaves like a random walk with a drift. A random walk is a series (or variable) that cannot be predicted.

Yt=φ0+Yt−1+εt

If we want to simulate a random walk, we need the values of the following parameters/variables:

Y0, the first value of the series φ0, the drift of the series σε, the standard deviation (volatility) of the random shock.

2 Random Walk Simulation

library(quantmod)
## Warning: package 'quantmod' was built under R version 4.0.3
## Loading required package: xts
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.3
## -- Attaching packages ---------------------------------------------------------------------------------- fpp2 2.4 --
## v ggplot2   3.3.3     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'fma' was built under R version 4.0.3
## Warning: package 'expsmooth' was built under R version 4.0.3
## 
getSymbols("^GSPC", from="2009-01-01")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "^GSPC"
lnsp<- log(Ad(GSPC))
#I assign a name for the index:
names(lnsp)<-c("lnsp")
N<-nrow(lnsp)

Now, we create a variable for a random walk with a drift trying to model the log of the S&P500.

Reviewing the random walk equation again:

Yt=φ0+Yt−1+εt

phi0<- (as.numeric(lnsp$lnsp[N])-as.numeric(lnsp$lnsp[1])) / N
cat("The value for phi0 is ",phi0)
## The value for phi0 is  0.0004607493

Then we can estimate sigma as: sigma = StDev(lnsp) / sqrt(N).

sigma<-sd(lnsp$lnsp) / sqrt(N)
cat("The volatility of the log is = ",sd(lnsp$lnsp),"\n")
## The volatility of the log is =  0.3889621
cat("The volatility for the shock is = ",sigma)
## The volatility for the shock is =  0.00702918

2.1 Simulation of the random walk

Now you are ready to start the simulation of random walk using rw1: rw1t=ϕ0+rw1t−1+εt

lnsp$rw1 = 0
lnsp$rw1[1]<-lnsp$lnsp[1]
shock <- rnorm(n=N,mean=0,sd=sigma)
lnsp$shock<-shock 

The shock over time:

plot(shock, type="l", col="blue")

does the shock behaves like a normal distribution…

hist(lnsp$shock)

As expected, the shock behaves similiar to a normal-distributed variable.

We start the random walk with the first value of the log of the S&P500. Then, from day 2 we do the simulation according to the previous formula and using the random shock just created:

# I create separate vectors:
rw1<-single(length=N)

# I assign the first value of the random-walk to be equal to real log value of the S&P
rw1[1]<-as.numeric(lnsp$lnsp[1])
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  rw1[i] <- phi0 + rw1[i-1] + shock[i] 
}
lnsp$rw1<-rw1
ts.plot(lnsp$rw1)
lines(seq(1,N),lnsp$lnsp, col="blue")

rw1_v2<-single(length=N)
rw1_v2[1]<-lnsp$lnsp[1]
for (i in 2:N){
  rw1_v2[i] <- rw1_v2[i-1] + shock[i] 
}

ts.plot(lnsp$lnsp, col="blue")
# I plot both lines to compare 
lines(rw1_v2, col="green")

2.1.1 Question 1

WHAT DO YOU OBSERVE with this plot? EXPLAIN WITH YOUR WORDS.

2.1.1.1 Answer 1

BY WATCHING THE FIRST PLOT, WE CAN SAY THAT THE GRAPGH IS PRETTY SIMILAR BETWEEN ITS 2 VARIABLES, ALTHOUGH THE SECOND GRAPH THE DIFFERENCE BETWEEN VARIABLES IS EVIDENT.

regmodel<-lm(lnsp$lnsp ~ lnsp$rw1)
# To print the model in the Rmd we have to do the following: 
s_regmodel <- summary(regmodel)
s_regmodel
## 
## Call:
## lm(formula = lnsp$lnsp ~ lnsp$rw1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26507 -0.07703 -0.02062  0.06304  0.32482 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.935329   0.029303   66.05   <2e-16 ***
## lnsp$rw1    0.712414   0.003722  191.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.108 on 3060 degrees of freedom
## Multiple R-squared:  0.9229, Adjusted R-squared:  0.9229 
## F-statistic: 3.665e+04 on 1 and 3060 DF,  p-value: < 2.2e-16

2.1.2 Question 2

DOES THE REGRESSION RESULT MAKE SENSE? EXPLAIN WHY YES OR WHY NOT?

2.1.2.1 Answer 2

THE DEPENDENT VARIABLE IS THE LOG OF THE S&P500 CONTINOUSLY COMPOUNDED RETURNS AND THE INDEPENDENT VARIABLE IS THE RANODM WALK,THERE IS ENOUGHT STATISTICAL EVIDENCE TO SAY THAT THE RANDOM WALK IS POSITIVE AND LINEARLY RELATED TO THE S&P500.

2.1.3 Question 3

DOES THE LOG OF THE S&P500 LOOKS LIKE A RANDOM WALK? WHY YES OR WHY NOT?

2.1.3.1 Answer 3

THANKS TO THE DRIFT SMALLER THAN 1, THE VOLATILITY ISN´T BIG ENOUGH.

# I plot the natural log pf S&P500
ts.plot(lnsp$lnsp)
# I plot my random walk with a drift
lines(rw1, col="blue")

2.1.4 Question 4

DO YOU THINK THAT WE CAN USE THIS TYPE OF SIMULATION TO PREDICT STOCK PRICES OR INDEXES? WHY YES OR WHY NOT?

2.1.4.1 Answer 4

ITS PROBABLE, DUE TO THE RANDOM WALK SIMILARITY TO THE STOCK MARKETS PERFORMANCE, A THING THAT CAN HELP US GETTIN AN IDEA TO DECIDE A CURSE OF ACTION.

ts.plot(lnsp$lnsp,col="blue")
lines(rw1_v2, col="green")
lines(rw1)

3 Simulating a random walk and an AR(1) process

rm(list = ls())
obs = 1:1000
e1=rnorm(1000,0,sqrt(0.09))
N<-length(obs)
hist(e1)

3.1 Simulating an AR(1) with phi=1 and phi=0.7

e1=rnorm(1000,0,sqrt(0.09))

y1<-single(length=N)
phi1 <- 0.7
phi0 <- 1

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y1[i] <- phi0 + phi1*y1[i-1] + e1[i-1] 
}

ts.plot(y1, col="darkblue")

3.1.1 Question 5

WHAT DO YOU SEE? LOOKING AT THE GRAPH, DOES THE MEAN OF THE SERIES CONVERGE TO A VALUE? IF YES, WHICH VALUE?

3.1.1.1 Answer 5

ITS A STATIONARY SERIES,I CAN SAY THAT THE VALUE OF THE MEAN SHOULD BE 3.5 APROXIMATELY.

3.1.2 Question 6

WHAT IS THE EXPECTED VALUE OF y1 ACCORDING TO THE AR(1) MODEL? PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE.

3.1.2.1 Answer 6

E [Yt] = φ0/(1−φ1)

e_y1 <- phi0/(1-phi1)
e_y1
## [1] 3.333333

3.1.3 Question 7

IS THE EXPECTED VALUE OF y1 SIMILAR TO THE MEAN YOU SAW IN THE FIRST GRAPH?

3.1.3.1 Answer 7

YES THE VALUE IS ALMOST THE SAME

3.1.4 Question 8

IS THE VOLATILITY (STANDARD DEVIATION) SIMILAR IN ALL TIME PERIODS?

3.1.4.1 Answer 8

YES THE VOLATILITY IS ALMOST THE SAME.

3.2 Simulating an AR(1) with phi0=0 and phi1=0.7

e2=rnorm(1000,0,sqrt(0.09))

y2<-single(length=N)
phi1 <- 0.7
phi0 <- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y2[i] <- phi0 + phi1*y2[i-1] + e2[i-1] 
}

ts.plot(y2, col="darkred")

WHAT IS THE EXPECTED VALUE OF y2? ACCORDING TO THE AR(1) MODEL?PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE.

E [Yt] = φ0/(1−φ1)

e_y2 <- phi0/(1-phi1)
e_y2
## [1] 0

LETS NOS FORGET THAT THE MEAN NEEDS TO BE EQUAL TO 0, AND AS THERE IS NO CHANGE IN THE VALUES , THERE IS NO GROWTH.

3.3 Simulating a Random walk with phi0=0

Graph y3 over time.

3.3.1 IS THE MEAN OF THE SERIES CONSTANT OVER TIME?

3.3.1.1 NO, THE MAEAN CAN HAVE DIFFERENT VALUES ANY TIME.

3.3.2 IS THE VOLATILITY OF THE SERIES CONSTANT OVER TIME?

3.3.2.1 NO, THE VOLATILITY IS TOO BIG AND IMPOSSIBLE TO DETERMINE OVER TIME

e3=rnorm(1000,0,sqrt(0.09))
y3<-single(length=N)
phi1_3 <- 1
phi0_3 <- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y3[i] <- phi0_3 + phi1_3*y3[i-1] + e3[i-1] 
}

ts.plot(y3, col="yellow")

3.4 Simulating an AR(1) with phi0=0 and phi1=0.99

DOES THE MEAN CONVERGE TO A SPECIFIC VALUE? IF YES, TO WHICH ONE?

NO, THE MEAN IS NOT CONSTAT BECAUSE THE GRAPH IS NON-STATIONARY.

e4=rnorm(1000,0,sqrt(0.09))

y4<-single(length=N)
phi1_4 <- 0.99
phi0_4<- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y4[i] <- phi0_4 + phi1_4*y4[i-1] + e4[i-1] 
}

ts.plot(y4, col="orange")

3.5 Checking for weakly stationary

A time series is weakly stationary if:

i.Its expected value is constant over time (about the same over time)

ii.Its expected variance over time is constant

iii.The covariance (or correlation) between y and y(t+h) is the same for any t and any h

3.6 Checking for weakly stationary:

#install.packages("zoo")
library(zoo)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
## 
##     first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#I use the rollapply function to get the mean of each series:
y1_mean<-rollapply(y1,12,mean)
y2_mean<-rollapply(y2,12,mean)
y3_mean<-rollapply(y3,12,mean)
y4_mean<-rollapply(y4,12,mean)


x <- seq(1,989,1)

all_means <- tbl_df(data.frame(x, y1_mean, y2_mean, y3_mean, y4_mean))
## Warning: `tbl_df()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
# install.packages("ggplot2")
library(ggplot2)

p <- ggplot(all_means, aes(x = x))

p+ geom_line(aes(y=y1_mean), colour="darkblue")+
  geom_line(aes(y=y2_mean), color="darkred")+
  geom_line(aes(y=y3_mean), color="yellow")+
  geom_line(aes(y=y4_mean), color="orange")

3.6.1 WHICH OF THE SERIES (y1, y2, y3, y4) IS (ARE) WEAKLYSTATIONARY AND WHICH IS (ARE) NOT? BRIEFLY EXPLAIN

3.6.1.1 Y1 AND Y2 ARE STATIONARY DUE TO ITS CONSTANT MEAN AND A SIMILAR VARIANCE, Y3 AND Y4 ARE UNOREDICTABLE DUE TO PHI->=1

y1_sd<-rollapply(y1,12,sd)
y2_sd<-rollapply(y2,12,sd)
y3_sd<-rollapply(y3,12,sd)
y4_sd<-rollapply(y4,12,sd)


all_sd <- tbl_df(data.frame(x, y1_sd, y2_sd, y3_sd, y4_sd))
pl <- ggplot(data = all_sd, aes(x = x))

pl+ geom_line(aes(y=y1_sd), color="darkblue")+
  geom_line(aes(y=y2_sd), color="darkred")+
  geom_line(aes(y=y4_sd), color="yellow")+
  geom_line(aes(y=y3_sd), color="orange")