By: Alberto Dorantes, Ph.D. Feb 23, 2022
Abstract
In this workshop we will start working with the foundations of auto-regressive models.
Non stationary variables - The Random Walk model for stock prices
-The random walk hypothesis in Finance (Fama, 1965) states that the natural logarithm of stock prices behaves like a random walk with a drift. A random walk is a series (or variable) that cannot be predicted. Imagine that Yt is the log price of a stock for today (t). The value of Y for tomorrow (Yt+1) will be equal to its today’s value (Yt) plus a constant value (φ0) plus a random shock. This shock is a pure random value that follows a normal distribution with mean=0 and a specific standard deviation σε. The process is supposed to be the same for all future periods. In mathematical terms, the random walk model is the following-
Yt=φ0+Yt−1+εt
-The εt is a random shock for each day, which is the result of the log price movement due to all news (external and internal to the stock) that influence the price. φ0 refers as the drift of the series. If |φ0| > 0 we say that the series is a random walk with a drift. If φ0 is positive, then the variable will have a positive trend over time; if it is negative, the series will have a negative trend.
If we want to simulate a random walk, we need the values of the following parameters/variables-
Y0, the first value of the series φ0, the drift of the series σε, the standard deviation (volatility) of the random shock
Q Monte Carlo simulation for the random walk model
Let’s go and run a Monte Carlo simulation for a random walk of the S&P 500. We will use real values of the S&P500 to estimate the previous 3 parameters.
Loading/installing R packages
-The following R packages need to be installed for most of the class workshops:
fpp2
fpp3
quantmod
dplyr
ggplot2
Go to the right-bottom windows of RStudio, select the Package tab, click install, and install both R packages. These packages include many other R packages for time-series data management and analysis.
These fpp2 and fpp3 packages were written by Rob J Hyndman and George Athanasopoulos, professors from Monash University at Australia. They are also business consultants with many years of experience doing both serious research in time-series and also applying their findings in the real world.
The quantmod package was written by Jeffrey A. Ryan, one of the most important contributors to R packages for Finance, and organizer of the famous R/Finance Conference held in Chicago each year since 2009.
Once you install these packages, load them in memory:-
library(quantmod)
Loading required package: xts
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(fpp3)
── Attaching packages ─────────────────────────── fpp3 0.4.0 ──
✓ tibble 3.1.6 ✓ tsibble 1.1.1
✓ dplyr 1.0.8 ✓ tsibbledata 0.4.0
✓ tidyr 1.2.0 ✓ feasts 0.2.2
✓ lubridate 1.8.0 ✓ fable 0.3.1
✓ ggplot2 3.3.5
── Conflicts ──────────────────────────────── fpp3_conflicts ──
x lubridate::date() masks base::date()
x dplyr::filter() masks stats::filter()
x dplyr::first() masks xts::first()
x tsibble::index() masks zoo::index()
x tsibble::intersect() masks base::intersect()
x tsibble::interval() masks lubridate::interval()
x dplyr::lag() masks stats::lag()
x dplyr::last() masks xts::last()
x tsibble::setdiff() masks base::setdiff()
x tsibble::union() masks base::union()
library(dplyr)
library(ggplot2)
library(zoo)
Downloading data for the S&P500
-Download the S&P500 historical daily data from Yahoo Finance from 2009 to date.-
getSymbols("^GSPC", from="2009-01-01")
‘getSymbols’ currently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to use
‘loadSymbols’ to automatically load data. getOption("getSymbols.env")
and getOption("getSymbols.auto.assign") will still be checked for
alternate defaults.
This message is shown once per session and may be disabled by setting
options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
[1] "^GSPC"
-Now we generate the log of the S&P index using the closing price/quotation, and create a variable N for the number of days in the dataset:-
lnsp<-log(Ad(GSPC))
names(lnsp)<-c("lnsp")
N<-nrow(lnsp)
-Now we will simulate 2 random walk series estimating the 3 parameters from this log series of the S&P500: a random walk with a drift (name it rw1), and a random walk with no drift (name it rw2).-
Estimating the parameters of the random walk model
-We have to consider the mathematical definition of a random walk and estimate its parameters (initial value, phi0, volatility of the random shock) from the real daily S&P500 data.
Now, we create a variable for a random walk with a drift trying to model the log of the S&P500.
Reviewing the random walk equation again:-
Yt=φ0+Yt−1+εt
The εt is the random shock of each day, which represents the overall average perception of all market participants after learning the news of the day (internal and external news announced to the market).
Remember that εt behaves like a random normal distributed variable with mean=0 and with a specific standard deviation σε.
For the simulation of the random walk, you need to estimate the values of
y0, the first value of the series, which is the log S&P500 index of the first day
ϕ0
σε
You have to estimate ϕ0 using the last and the first real values of the series following the equation of the random walk.
Here you can see possible values of a random walk over time:
Y0=Initialvalue Y1=ϕ0+Y0+ε1 Y2=ϕ0+Y1+ε2 Substituting Y1 with its corresponding equation: Y2=ϕ0+ϕ0+Y0+ε1+ε2 Re-arranging the terms: Y2=2∗ϕ0+Y0+ε1+ε2 If you continue doing the same until the last N value, you can get: YN=N∗ϕ0+Y0+∑t=1Nεt
This mathematical result is kind of intuitive. The value of a random walk at time N will be equal to its initial value plus N times phi0 plus the sum of ALL random shocks from 1 to N.
Since the mean of the shocks is assumed to be zero, then the expected value of the sum of the shocks will also be zero. Then:
E[YN]=N∗ϕ0+Y0 From this equation we see that phi0 can be estimated as: ϕ0=(YN−Y0)N
Then, ϕ0 = (last value - first value) / # of days.
I calculate ϕ0 following this formula:
phi0<- (as.numeric(lnsp$lnsp[N])-as.numeric(lnsp$lnsp[1])) / N
cat("The value for phi0 is ",phi0)
The value for phi0 is 0.0004626235
Remember that N is the total # of observations, so lnsp[N] has last daily value of the log of the S&P500.
Now we need to estimate sigma, which is the standard deviation of the shocks. We can start estimating its variance first. It is known that the variance of a random walk cannot be determined unless we consider a specific number of periods.
Then, let’s consider the equation of the random walk series for the last value (YN), and then estimate its variance from there:
YN=N∗ϕ0+Y0+∑t=1Nεt Using this equation, we calculate the variance of YN :
Var(YN)=Var(N∗ϕ0)+Var(Y0)+∑t=1NVar(εt) The variance of a constant is zero, so the first two terms are equal to zero.
Now analyze the variance of the shock:
Since it is supposed that the volatility (standard deviation) of the shocks is about the same over time, then:
Var(ε1)=Var(ε2)=Var(εN)=σ2ε Then the sum of the variances of all shocks is actually the variance of the shock times N. Then the variance of all the shocks is actually the variance of YN.
Then we can write the variance of YN as:
Var(YN)=N∗Var(ε)=N∗σ2ε To get the standard deviation of YN we take the square root of the variance of YN:
SD(YN)=N−−√∗SD(ε) We use sigma character for standard deviations:
σY=N−−√∗σε
Finally we express the volatility of the shock (σε) in terms of the volatility of YN (σY):
σε=σYN−−√
Then we can estimate sigma as: sigma = StDev(lnsp) / sqrt(N). Let’s do it:
sigma<-sd(lnsp$lnsp) / sqrt(N)
cat("The volatility of the log is = ",sd(lnsp$lnsp),"\n")
The volatility of the log is = 0.4360899
cat("The volatility for the shock is = ",sigma)
The volatility for the shock is = 0.007582166
Simulating the random walk with drift
Now you are ready to start the simulation of random walk using rw1: rw1t=ϕ0+rw1t−1+εt
The ϕ0 coefficient is also drift of the random walk.
We will create a new column in the lnsp R dataset for the random walk with the name rw1.
lnsp$rw1 = 0
-I assigned zero to all values before I do the simulation.
I start assigning the first value of the random walk to be equal to the first value of the log of the S&P500:-
lnsp$rw1[1]<-lnsp$lnsp[1]
Now assign random values from day 2 to the last day following the random walk. For each day, we create the random shock using the function rnorm. We create this shock with standard deviation equal to the volatility of the shock we calculated above (the sigma). We indicate that the mean =0:
shock <- rnorm(n=N,mean=0,sd=sigma)
lnsp$shock<-shock
We can see the shock over time:
plot(shock, type="l", col="blue")

We can also see whether the shock behaves like a normal distribution by doing its histogram:
hist(lnsp$shock)

-As expected, the shock behaves similiar to a normal-distributed variable.
Now we are ready to start the simulation of random walk. Then we fill the values for rw1. Remembering the formula for the random walk process:
rw1t=ϕ0+rw1t−1+εt We start the random walk with the first value of the log of the S&P500. Then, from day 2 we do the simulation according to the previous formula and using the random shock just created:-
# I create separate vectors:
rw1<-single(length=N)
# I assign the first value of the random-walk to be equal to real log value of the S&P
rw1[1]<-as.numeric(lnsp$lnsp[1])
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
rw1[i] <- phi0 + rw1[i-1] + shock[i]
}
lnsp$rw1<-rw1
I plot the simulated random walk and the real log of the S&P500:
ts.plot(lnsp$rw1)
lines(seq(1,N),lnsp$lnsp, col="blue")

Simulating a random walk with no drift
Now we can do a simulation but now without the drift. I this case, the ϕ0 coefficient must be zero.
Use another variable rw2 for this. You can follow the logic we did for rw1, but now ϕ0 will be equal to zero, so we do not include it into the equation:
rw1_v2<-single(length=N)
rw1_v2[1]<-lnsp$lnsp[1]
for (i in 2:N){
rw1_v2[i] <- rw1_v2[i-1] + shock[i]
}
ts.plot(lnsp$lnsp, col="blue")
# I plot both lines to compare
lines(rw1_v2, col="green")

WHAT DO YOU OBSERVE with this plot? EXPLAIN WITH YOUR WORDS. wE CAN SEE THE DIFFERENCE BETWEEN A RANDOM WALK MODEL WITH AND WITHOUT DRIFT, DRIFT BEING THAT INVISIBLE FORCE THAT AFFECTS THE INDEX, NO DRIFT BEHAVES ABNORMAL TO THE ECONOMY WHEREAS DRIFT EXPLAINS ITSELF.
-Now run a simple regression to check whether the rw1 is statistically related to the log of the S&P500. Use rw1 as explanatory variable. Show the regression results as comments.-
regmodel<-lm(lnsp$lnsp~lnsp$rw1)
# I see statistics on my regression model
s_regmodel <- summary(regmodel)
s_regmodel
Call:
lm(formula = lnsp$lnsp ~ lnsp$rw1)
Residuals:
Min 1Q Median 3Q Max
-0.44420 -0.06915 -0.00591 0.06378 0.26446
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.168951 0.035895 -32.57 <2e-16 ***
lnsp$rw1 1.163495 0.004759 244.46 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09986 on 3306 degrees of freedom
Multiple R-squared: 0.9476, Adjusted R-squared: 0.9476
F-statistic: 5.976e+04 on 1 and 3306 DF, p-value: < 2.2e-16
DOES THE REGRESSION RESULT MAKE SENSE? EXPLAIN WHY YES OR WHY NOT? YES IT MAKES SENSE, WE CAN SEE HOW HISTORICAL ISSUES AFFECT THE INDEX.
DOES THE LOG OF THE S&P500 LOOKS LIKE A RANDOM WALK? WHY YES OR WHY NOT? YES, THE RANDOM WALK MAKES SENSE AND IS VERY INTERESTING TO SEE IN ACTION.
# I plot the natural log pf S&P500
ts.plot(lnsp$lnsp)
# I plot my random walk with a drift
lines(rw1, col="blue")

DO YOU THINK THAT WE CAN USE THIS TYPE OF SIMULATION TO PREDICT STOCK PRICES OR INDEXES? WHY YES OR WHY NOT? ID SAY IT WOULD TOTALLY HELP IF WE ANALYZED THE REASONS BEHIND THE BEHAVIOUR EXHIBITED IN THE GRAPHS, SUCH AS HISTORICAL EVENTS AND STOCK FLUCTUATIONS.
ts.plot(lnsp$lnsp,col="blue")
lines(rw1_v2, col="green")
lines(rw1)

3 SIMULATING A RANDOM WALK AND AN AR(1) PROCESS
3.1 Create the dataset with simulation
-Create a dataset of 1000 observations-
rm(list = ls())
obs = 1:1000
#View(obs)
-Create a Gaussian white noise variable with 1,000 values with variance 0.09. Call this variable e1. The Gaussian white noise is a normal variable with mean=0 and a specic variance. We will use Gaussian white noises to model financial returns.-
e1=rnorm(1000,0,sqrt(0.09))
N<-length(obs)
head(e1)
[1] -0.6269308 -0.3262786 -0.0312356 -0.1699208 -0.1189524
[6] -0.6073908
tail(e1)
[1] 0.24314295 0.03975799 0.39108986 0.46774953
[5] 0.56346075 -0.08817299
hist(e1)

3.2 Simulating an AR(1) with phi0=1 and phi1=0.7
-Following the formula for a Simple or First-order Autoregressive, AR(1): Use the Gaussian white noise created (e1) for this
Declare a variable for the coefficient φ1(phi1) and equal this coefficient to 0.7.
Declare a variable for the coefficient φ0(phi0) and equal this coefficient to 1
Using simple simulation, generate the variable y1 as an AR(1) model using the above terms
The variable y1 is an AR(1) process or model. Graph this variable over time
You can run the following code for the previous steps:-
e1=rnorm(1000,0,sqrt(0.09))
y1<-single(length=N)
phi1 <- 0.7
phi0 <- 1
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
y1[i] <- phi0 + phi1*y1[i-1] + e1[i-1]
}
ts.plot(y1, col="darkblue")

a). WHAT DO YOU SEE? LOOKING AT THE GRAPH, DOES THE MEAN OF THE SERIES CONVERGE TO A VALUE? IF YES, WHICH VALUE? YES THAT VALUE WUOULD SIT AROUND 3 AND 4, ID SAY 3.5
b). WHAT IS THE EXPECTED VALUE OF y1 ACCORDING TO THE AR(1) MODEL? PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE. y1[i] <- phi0 + phi1*y1[i-1] + e1[i-1] = 3.5
c). IS THE EXPECTED VALUE OF y1 SIMILAR TO THE MEAN YOU SAW IN THE FIRST GRAPH? THE VALUE FOR Y1 IS EQUAL TO THE MEAN
d). IS THE VOLATILITY (STANDARD DEVIATION) SIMILAR IN ALL TIME PERIODS? NO, AT THE BEGINNING VOLATILITY SETS ITSELF ON A PATH AND THEN IT IDELS LIKE A CAR ENGINE.
3.3 Simulating an AR(1) with phi0=0 and phi1=0.7
Now generate another series y2 with the same parameters than y1, but just change the constant phi0 from 1 to zero.
Graph y2 over time.
Is this time series also an AR(1)?
WHAT IS THE EXPECTED VALUE OF y2? ACCORDING TO THE AR(1) MODEL? PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE. y2[i] <- phi0 + phi1*y1[i-1] + e1[i-1]= 0.5
e2=rnorm(1000,0,sqrt(0.09))
y2<-single(length=N)
phi1 <- 0.7
phi0 <- 0
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
y2[i] <- phi0 + phi1*y2[i-1] + e2[i-1]
}
ts.plot(y2, col="darkred")

3.4 Simulating a Random walk with phi0=0
Now generate another series y3, but now modify the parameters to simulate a Random Walk process with phi0=0.
Graph y3 over time. Observe its behaviour. IS THE MEAN OF THE SERIES CONSTANT OVER TIME? IS THE VOLATILITY OF THE SERIES CONSTANT OVER TIME? EXPLAIN YES, BOTH VOLATILITY AND MEAN OF THE SERIES ARE CONSTANT OVER TIME, WE CAN SEE THIS VISUALLY REPRESENTED BY THE GRAPH.
e3=rnorm(1000,0,sqrt(0.09))
y3<-single(length=N)
phi1 <- 1
phi0 <- 0
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
y3[i] <- phi0 + phi1*y3[i-1] + e3[i-1]
}
ts.plot(y3, col="yellow")

3.5 Simulating an AR(1) with phi0=0 and phi1=0.99
-Now generate another series y4 as an AR(1) process with phi0=0 and phi1=0.99
Graph y4 over time.
IS THIS SERIES AN AR(1) ? EXPLAIN WHY YES OR WHY NOT
DOES THE MEAN CONVERGE TO A SPECIFIC VALUE? IF YES, TO WHICH ONE?-
e4=rnorm(1000,0,sqrt(0.09))
y4<-single(length=N)
phi1 <- 0.99
phi0 <- 0
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
y4[i] <- phi0 + phi1*y4[i-1] + e4[i-1]
}
ts.plot(y4, col="orange")

3.6 Weakly stationarity.
A time series is weakly stationary if:
its expected value is constant over time (about the same over time)
its expected variance over time is constant
the covariance (or correlation) between y and y(t+h) is the same for any t and any h
3.6.1 Checking for weakly stationary:
We will use the rollaply function from the zoo package to calculate rolling windows and estimate rolling means and rolling standard deviations.
Using the rollaply function compute the rolling means using 12 periods for the series y1, y2, y3, and y4:
y1_mean<-rollapply(y1,12,mean)
y2_mean<-rollapply(y2,12,mean)
y3_mean<-rollapply(y3,12,mean)
y4_mean<-rollapply(y4,12,mean)
-These new datasets will have 989 periods instead of 1,000 since we calculated rolling means so that we calculate the first mean until the row 12 for each variable.
Merge the rolling means into one data frame to better manage the time series:-
# Create a variable for the time period:
x <- seq(1,989,1)
# Merge the x and the rolling means datasets into one dataset:
all_means <- tbl_df(data.frame(x, y1_mean, y2_mean, y3_mean, y4_mean))
Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
-Graph the 12-period rolling means to see how the mean moves over time:-
p <- ggplot(all_means, aes(x = x))
p+ geom_line(aes(y=y1_mean), colour="darkblue")+
geom_line(aes(y=y2_mean), color="darkred")+
geom_line(aes(y=y3_mean), color="yellow")+
geom_line(aes(y=y4_mean), color="orange")

Using the rollaply function and a window of 12 time periods, compute the rolling standard deviations of the series y1, y2, y3, and y4:
y1_sd<-rollapply(y1,12,sd)
y2_sd<-rollapply(y2,12,sd)
y3_sd<-rollapply(y3,12,sd)
y4_sd<-rollapply(y4,12,sd)
-Graph the rolling standard deviations of each series to see how the standard deviation moves for each window:-
all_sd <- tbl_df(data.frame(x, y1_sd, y2_sd, y3_sd, y4_sd))
pl <- ggplot(data = all_sd, aes(x = x))
pl+ geom_line(aes(y=y1_sd), color="darkblue")+
geom_line(aes(y=y2_sd), color="darkred")+
geom_line(aes(y=y4_sd), color="yellow")+
geom_line(aes(y=y3_sd), color="orange")

WHICH OF THE SERIES (y1, y2, y3, y4) IS (ARE) WEAKLYSTATIONARY AND WHICH IS (ARE) NOT? BRIEFLY EXPLAIN BASED EXCLUSIVELY ON THE NAKED EYE, I´D SAY THAT Y1, Y2 AND MAYBE Y4 ARE STATIONARY DUE TO THEIR BEHAVIOUR. THE ORANGE Y3 ISNT STATIONARY DUE TO THE AGGRESIVE CHANGES IT EXHIBITS WHERE A CLEAR MEAN CAN´T BE DETERMINED. Y1,2,4 EXPECTED VALUES ARE CONSTANT OVER TIME, EXPECTED VARIANCE IS ALSO CONSTANT AND COVARIANCE IS THE SAME FOR ANY T AND ANY H.
---
title: "Workshop 2, Financial Econometrics II"
Author: Stefan Schweitzer A01209755
output: html_notebook
---
By: Alberto Dorantes, Ph.D.
Feb 23, 2022

# Abstract
In this workshop we will start working with the foundations of auto-regressive models.

## Non stationary variables - The Random Walk model for stock prices

-The random walk hypothesis in Finance (Fama, 1965) states that the natural logarithm of stock prices behaves like a random walk with a drift. A random walk is a series (or variable) that cannot be predicted. Imagine that Yt is the log price of a stock for today (t). The value of Y for tomorrow (Yt+1) will be equal to its today’s value (Yt) plus a constant value (φ0) plus a random shock. This shock is a pure random value that follows a normal distribution with mean=0 and a specific standard deviation σε. The process is supposed to be the same for all future periods. In mathematical terms, the random walk model is the following-

Yt=φ0+Yt−1+εt

-The εt is a random shock for each day, which is the result of the log price movement due to all news (external and internal to the stock) that influence the price. φ0 refers as the drift of the series. If |φ0| > 0 we say that the series is a random walk with a drift. If φ0 is positive, then the variable will have a positive trend over time; if it is negative, the series will have a negative trend.

If we want to simulate a random walk, we need the values of the following parameters/variables-

Y0, the first value of the series
φ0, the drift of the series
σε, the standard deviation (volatility) of the random shock

## Q Monte Carlo simulation for the random walk model

Let’s go and run a Monte Carlo simulation for a random walk of the S&P 500. We will use real values of the S&P500 to estimate the previous 3 parameters.

# Loading/installing R packages

-The following R packages need to be installed for most of the class workshops:

fpp2

fpp3

quantmod

dplyr

ggplot2

Go to the right-bottom windows of RStudio, select the Package tab, click install, and install both R packages. These packages include many other R packages for time-series data management and analysis.

These fpp2 and fpp3 packages were written by Rob J Hyndman and George Athanasopoulos, professors from Monash University at Australia. They are also business consultants with many years of experience doing both serious research in time-series and also applying their findings in the real world.

The quantmod package was written by Jeffrey A. Ryan, one of the most important contributors to R packages for Finance, and organizer of the famous R/Finance Conference held in Chicago each year since 2009.

Once you install these packages, load them in memory:-

```{r}
library(quantmod)
library(fpp3)
library(dplyr)
library(ggplot2)
library(zoo)
```

# Downloading data for the S&P500

-Download the S&P500 historical daily data from Yahoo Finance from 2009 to date.-
```{r}
getSymbols("^GSPC", from="2009-01-01")
```
-Now we generate the log of the S&P index using the closing price/quotation, and create a variable N for the number of days in the dataset:-

```{r}
lnsp<-log(Ad(GSPC))
names(lnsp)<-c("lnsp")
N<-nrow(lnsp)
```

-Now we will simulate 2 random walk series estimating the 3 parameters from this log series of the S&P500:
a random walk with a drift (name it rw1), and
a random walk with no drift (name it rw2).-

# Estimating the parameters of the random walk model

-We have to consider the mathematical definition of a random walk and estimate its parameters (initial value, phi0, volatility of the random shock) from the real daily S&P500 data.

Now, we create a variable for a random walk with a drift trying to model the log of the S&P500.

Reviewing the random walk equation again:-

Yt=φ0+Yt−1+εt

The εt is the random shock of each day, which represents the overall average perception of all market participants after learning the news of the day (internal and external news announced to the market).

Remember that εt behaves like a random normal distributed variable with mean=0 and with a specific standard deviation σε.

For the simulation of the random walk, you need to estimate the values of

y0, the first value of the series, which is the log S&P500 index of the first day

ϕ0

σε

You have to estimate ϕ0 using the last and the first real values of the series following the equation of the random walk.

Here you can see possible values of a random walk over time:

Y0=Initialvalue
Y1=ϕ0+Y0+ε1
Y2=ϕ0+Y1+ε2
Substituting Y1 with its corresponding equation:
Y2=ϕ0+ϕ0+Y0+ε1+ε2
Re-arranging the terms:
Y2=2∗ϕ0+Y0+ε1+ε2
If you continue doing the same until the last N value, you can get:
YN=N∗ϕ0+Y0+∑t=1Nεt

This mathematical result is kind of intuitive. The value of a random walk at time N will be equal to its initial value plus N times phi0 plus the sum of ALL random shocks from 1 to N.

Since the mean of the shocks is assumed to be zero, then the expected value of the sum of the shocks will also be zero. Then:

E[YN]=N∗ϕ0+Y0
From this equation we see that phi0 can be estimated as:
ϕ0=(YN−Y0)N

Then, ϕ0 = (last value - first value) / # of days.

I calculate ϕ0 following this formula:

```{r}
phi0<- (as.numeric(lnsp$lnsp[N])-as.numeric(lnsp$lnsp[1])) / N
cat("The value for phi0 is ",phi0)
```
Remember that N is the total # of observations, so lnsp[N] has last daily value of the log of the S&P500.

Now we need to estimate sigma, which is the standard deviation of the shocks. We can start estimating its variance first. It is known that the variance of a random walk cannot be determined unless we consider a specific number of periods.

Then, let’s consider the equation of the random walk series for the last value (YN), and then estimate its variance from there:

YN=N∗ϕ0+Y0+∑t=1Nεt
Using this equation, we calculate the variance of YN :

Var(YN)=Var(N∗ϕ0)+Var(Y0)+∑t=1NVar(εt)
The variance of a constant is zero, so the first two terms are equal to zero.

Now analyze the variance of the shock:

Since it is supposed that the volatility (standard deviation) of the shocks is about the same over time, then:

Var(ε1)=Var(ε2)=Var(εN)=σ2ε
Then the sum of the variances of all shocks is actually the variance of the shock times N. Then the variance of all the shocks is actually the variance of YN.

Then we can write the variance of YN as:

Var(YN)=N∗Var(ε)=N∗σ2ε
To get the standard deviation of YN we take the square root of the variance of YN:

SD(YN)=N−−√∗SD(ε)
We use sigma character for standard deviations:

σY=N−−√∗σε

Finally we express the volatility of the shock (σε) in terms of the volatility of YN (σY):

σε=σYN−−√

Then we can estimate sigma as: sigma = StDev(lnsp) / sqrt(N). Let’s do it:

```{r}
sigma<-sd(lnsp$lnsp) / sqrt(N)
cat("The volatility of the log is = ",sd(lnsp$lnsp),"\n")
```

```{r}
cat("The volatility for the shock is = ",sigma)
```
# Simulating the random walk with drift

Now you are ready to start the simulation of random walk using rw1:
rw1t=ϕ0+rw1t−1+εt

The ϕ0 coefficient is also drift of the random walk.

We will create a new column in the lnsp R dataset for the random walk with the name rw1.

```{r}
lnsp$rw1 = 0
```

-I assigned zero to all values before I do the simulation.

I start assigning the first value of the random walk to be equal to the first value of the log of the S&P500:-

```{r}
lnsp$rw1[1]<-lnsp$lnsp[1]
```

Now assign random values from day 2 to the last day following the random walk. For each day, we create the random shock using the function rnorm. We create this shock with standard deviation equal to the volatility of the shock we calculated above (the sigma). We indicate that the mean =0:

```{r}
shock <- rnorm(n=N,mean=0,sd=sigma)
lnsp$shock<-shock
```

We can see the shock over time:

```{r}
plot(shock, type="l", col="blue")
```
We can also see whether the shock behaves like a normal distribution by doing its histogram:

```{r}
hist(lnsp$shock)
```

-As expected, the shock behaves similiar to a normal-distributed variable.

Now we are ready to start the simulation of random walk. Then we fill the values for rw1. Remembering the formula for the random walk process:

rw1t=ϕ0+rw1t−1+εt
We start the random walk with the first value of the log of the S&P500. Then, from day 2 we do the simulation according to the previous formula and using the random shock just created:-

```{r}
# I create separate vectors:
rw1<-single(length=N)

# I assign the first value of the random-walk to be equal to real log value of the S&P
rw1[1]<-as.numeric(lnsp$lnsp[1])
# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  rw1[i] <- phi0 + rw1[i-1] + shock[i] 
}
lnsp$rw1<-rw1
```

I plot the simulated random walk and the real log of the S&P500:

```{r}
ts.plot(lnsp$rw1)
lines(seq(1,N),lnsp$lnsp, col="blue")
```

# Simulating a random walk with no drift

Now we can do a simulation but now without the drift. I this case, the ϕ0 coefficient must be zero.

Use another variable rw2 for this. You can follow the logic we did for rw1, but now ϕ0 will be equal to zero, so we do not include it into the equation:

```{r}
rw1_v2<-single(length=N)
rw1_v2[1]<-lnsp$lnsp[1]
for (i in 2:N){
  rw1_v2[i] <- rw1_v2[i-1] + shock[i] 
}

ts.plot(lnsp$lnsp, col="blue")
# I plot both lines to compare 
lines(rw1_v2, col="green")
```

WHAT DO YOU OBSERVE with this plot? EXPLAIN WITH YOUR WORDS.
wE CAN SEE THE DIFFERENCE BETWEEN A RANDOM WALK MODEL WITH AND WITHOUT DRIFT, DRIFT BEING THAT INVISIBLE FORCE THAT AFFECTS THE INDEX, NO DRIFT BEHAVES ABNORMAL TO THE ECONOMY WHEREAS DRIFT EXPLAINS ITSELF.


-Now run a simple regression to check whether the rw1 is statistically related to the log of the S&P500. Use rw1 as explanatory variable. Show the regression results as comments.-

```{r}
regmodel<-lm(lnsp$lnsp~lnsp$rw1)
# I see statistics on my regression model
s_regmodel <- summary(regmodel)
s_regmodel
```
DOES THE REGRESSION RESULT MAKE SENSE? EXPLAIN WHY YES OR WHY NOT?
YES IT MAKES SENSE, WE CAN SEE HOW HISTORICAL ISSUES AFFECT THE INDEX.

DOES THE LOG OF THE S&P500 LOOKS LIKE A RANDOM WALK? WHY YES OR WHY NOT?
YES, THE RANDOM WALK MAKES SENSE AND IS VERY INTERESTING TO SEE IN ACTION.

```{r}
# I plot the natural log pf S&P500
ts.plot(lnsp$lnsp)
# I plot my random walk with a drift
lines(rw1, col="blue")
```
DO YOU THINK THAT WE CAN USE THIS TYPE OF SIMULATION TO PREDICT STOCK PRICES OR INDEXES? WHY YES OR WHY NOT?
ID SAY IT WOULD TOTALLY HELP IF WE ANALYZED THE REASONS BEHIND THE BEHAVIOUR EXHIBITED IN THE GRAPHS, SUCH AS HISTORICAL EVENTS AND STOCK FLUCTUATIONS.

```{r}
ts.plot(lnsp$lnsp,col="blue")
lines(rw1_v2, col="green")
lines(rw1)
```

## 3 SIMULATING A RANDOM WALK AND AN AR(1) PROCESS

# 3.1 Create the dataset with simulation

-Create a dataset of 1000 observations-
```{r}
rm(list = ls())
obs = 1:1000
#View(obs)
```

-Create a Gaussian white noise variable with 1,000 values with variance 0.09. Call this variable e1. The Gaussian white noise is a normal variable with mean=0 and a specic variance. We will use Gaussian white noises to model financial returns.-

```{r}
e1=rnorm(1000,0,sqrt(0.09))
N<-length(obs)
head(e1)
```
```{r}
tail(e1)
```
```{r}
hist(e1)
```

# 3.2 Simulating an AR(1) with phi0=1 and phi1=0.7

-Following the formula for a Simple or First-order Autoregressive, AR(1):
Use the Gaussian white noise created (e1) for this

Declare a variable for the coefficient φ1(phi1) and equal this coefficient to 0.7.

Declare a variable for the coefficient φ0(phi0) and equal this coefficient to 1

Using simple simulation, generate the variable y1 as an AR(1) model using the above terms

The variable y1 is an AR(1) process or model. Graph this variable over time

You can run the following code for the previous steps:-

```{r}
e1=rnorm(1000,0,sqrt(0.09))

y1<-single(length=N)
phi1 <- 0.7
phi0 <- 1

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y1[i] <- phi0 + phi1*y1[i-1] + e1[i-1] 
}

ts.plot(y1, col="darkblue")
```
a). WHAT DO YOU SEE? LOOKING AT THE GRAPH, DOES THE MEAN OF THE SERIES CONVERGE TO A VALUE? IF YES, WHICH VALUE?
YES THAT VALUE WUOULD SIT AROUND 3 AND 4, ID SAY 3.5

b). WHAT IS THE EXPECTED VALUE OF y1 ACCORDING TO THE AR(1) MODEL? PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE.
 y1[i] <- phi0 + phi1*y1[i-1] + e1[i-1] = 3.5

c). IS THE EXPECTED VALUE OF y1 SIMILAR TO THE MEAN YOU SAW IN THE FIRST GRAPH?
THE VALUE FOR Y1 IS EQUAL TO THE MEAN

d). IS THE VOLATILITY (STANDARD DEVIATION) SIMILAR IN ALL TIME PERIODS?
NO, AT THE BEGINNING VOLATILITY SETS ITSELF ON A PATH AND THEN IT IDELS LIKE A CAR ENGINE.

# 3.3 Simulating an AR(1) with phi0=0 and phi1=0.7

Now generate another series y2 with the same parameters than y1, but just change the constant phi0 from 1 to zero.

Graph y2 over time.

Is this time series also an AR(1)?

WHAT IS THE EXPECTED VALUE OF y2? ACCORDING TO THE AR(1) MODEL? PROVIDE THE FORMULA AND CALCULATE THE EXPECTED VALUE.
 y2[i] <- phi0 + phi1*y1[i-1] + e1[i-1]= 0.5

```{r}
e2=rnorm(1000,0,sqrt(0.09))

y2<-single(length=N)
phi1 <- 0.7
phi0 <- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y2[i] <- phi0 + phi1*y2[i-1] + e2[i-1] 
}

ts.plot(y2, col="darkred")
```

# 3.4 Simulating a Random walk with phi0=0

Now generate another series y3, but now modify the parameters to simulate a Random Walk process with phi0=0.

Graph y3 over time. Observe its behaviour.
IS THE MEAN OF THE SERIES CONSTANT OVER TIME?
IS THE VOLATILITY OF THE SERIES CONSTANT OVER TIME? EXPLAIN
YES, BOTH VOLATILITY AND MEAN OF THE SERIES ARE CONSTANT OVER TIME, WE CAN SEE THIS VISUALLY REPRESENTED BY THE GRAPH.

```{r}
e3=rnorm(1000,0,sqrt(0.09))
y3<-single(length=N)
phi1 <- 1
phi0 <- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y3[i] <- phi0 + phi1*y3[i-1] + e3[i-1] 
}

ts.plot(y3, col="yellow")
```

# 3.5 Simulating an AR(1) with phi0=0 and phi1=0.99

-Now generate another series y4 as an AR(1) process with phi0=0 and phi1=0.99

Graph y4 over time.

IS THIS SERIES AN AR(1) ? EXPLAIN WHY YES OR WHY NOT

DOES THE MEAN CONVERGE TO A SPECIFIC VALUE? IF YES, TO WHICH ONE?-

```{r}
e4=rnorm(1000,0,sqrt(0.09))

y4<-single(length=N)
phi1 <- 0.99
phi0 <- 0

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){
  y4[i] <- phi0 + phi1*y4[i-1] + e4[i-1] 
}

ts.plot(y4, col="orange")
```

# 3.6 Weakly stationarity.

A time series is weakly stationary if:

its expected value is constant over time (about the same over time)

its expected variance over time is constant

the covariance (or correlation) between y and y(t+h) is the same for any t and any h

# 3.6.1 Checking for weakly stationary:

We will use the rollaply function from the zoo package to calculate rolling windows and estimate rolling means and rolling standard deviations.

Using the rollaply function compute the rolling means using 12 periods for the series y1, y2, y3, and y4:

```{r}
y1_mean<-rollapply(y1,12,mean)
y2_mean<-rollapply(y2,12,mean)
y3_mean<-rollapply(y3,12,mean)
y4_mean<-rollapply(y4,12,mean)
```

-These new datasets will have 989 periods instead of 1,000 since we calculated rolling means so that we calculate the first mean until the row 12 for each variable.

Merge the rolling means into one data frame to better manage the time series:-

```{r}
# Create a variable for the time period:
x <- seq(1,989,1)
# Merge the x and the rolling means datasets into one dataset:
all_means <- tbl_df(data.frame(x, y1_mean, y2_mean, y3_mean, y4_mean))
```
-Graph the 12-period rolling means to see how the mean moves over time:-

```{r}
p <- ggplot(all_means, aes(x = x))

p+ geom_line(aes(y=y1_mean), colour="darkblue")+
  geom_line(aes(y=y2_mean), color="darkred")+
  geom_line(aes(y=y3_mean), color="yellow")+
  geom_line(aes(y=y4_mean), color="orange")
```
Using the rollaply function and a window of 12 time periods, compute the rolling standard deviations of the series y1, y2, y3, and y4:

```{r}
y1_sd<-rollapply(y1,12,sd)
y2_sd<-rollapply(y2,12,sd)
y3_sd<-rollapply(y3,12,sd)
y4_sd<-rollapply(y4,12,sd)
```

-Graph the rolling standard deviations of each series to see how the standard deviation moves for each window:-

```{r}
all_sd <- tbl_df(data.frame(x, y1_sd, y2_sd, y3_sd, y4_sd))
pl <- ggplot(data = all_sd, aes(x = x))

pl+ geom_line(aes(y=y1_sd), color="darkblue")+
  geom_line(aes(y=y2_sd), color="darkred")+
  geom_line(aes(y=y4_sd), color="yellow")+
  geom_line(aes(y=y3_sd), color="orange")
```

WHICH OF THE SERIES (y1, y2, y3, y4) IS (ARE) WEAKLYSTATIONARY AND WHICH IS (ARE) NOT? BRIEFLY EXPLAIN
BASED EXCLUSIVELY ON THE NAKED EYE, I´D SAY THAT Y1, Y2 AND MAYBE Y4 ARE STATIONARY DUE TO THEIR BEHAVIOUR. THE ORANGE Y3 ISNT STATIONARY DUE TO THE AGGRESIVE CHANGES IT EXHIBITS WHERE A CLEAR MEAN CAN´T BE DETERMINED. Y1,2,4 EXPECTED VALUES ARE CONSTANT OVER TIME, EXPECTED VARIANCE IS ALSO CONSTANT AND COVARIANCE IS THE SAME FOR ANY T AND ANY H.
