1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question. It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.

2 Set up the name of your R Notebook for this workshop

Setup title and name of your Workshop

Once you have created a new R Notebook, you will see a sample R Notebook document. You must DELETE all the lines of this sample document except the first lines related to title and output. As title, write the workshop # and course, and add a new line with your name. You have to end up with something like:

title: “Workshop 1, Time Series”

author: YourName

output: html_notebook

Now you are ready to continue writing your first R Notebook.

You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).

Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.

Save your R Notebook file as W1-YourName.Rmd. Go to the File menu and select Save As.

3 Introduction to time-series variables

In this section we will download time-series variables from Yahoo Finance and explore the time-series datasets, more specifically, we explore the xts-zoo R objects. xts stands for “Extensible time series”, and zoo is an R class for general time-series datasets.

We start clearing our R environment:

rm(list=ls())
# To avoid scientific notation for numbers: 
options(scipen=999)

3.1 Install the quantmod package

In order to import and manage financial data in R, the quantmod package must be installed. This package contains the getSymbols() function, which creates an xts (extensible time series) object in the environment with the downloaded data from the Internet. In order to install packages in R, go to the Package tab in the bottom-right section of RStudio, select Install and then type quantmod, and the botton Install.

Once you install a package, this package will be in your computer forever. You might re-install a package in case there is a new version of the package.

3.2 Load the quantmod package

Now, you have installed a package and it is not necessary to install it again in further occasions. It will stay in your computer. However, next time you want to use it, you have to load it using the library() function

library(quantmod)

3.3 Downloading time-series financial prices

The getSymbols() function enables its user to download online and up-to-date financial data, such as stock prices, market indexes, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, Google Finance, FRED and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.

We download the main monthly market indexes of Mexico (the IPCyC) and the US (the S&P500) from Yahoo from 2011:

getSymbols(c("^MXX", "^GSPC"), from="2011-01-01", periodicity = "monthly", src = "yahoo")

## [1] "^MXX"  "^GSPC"

This function will create 2 xts-zoo R objects with historical data for each market index in a chronological order. These xts-zoo R objects are actually datasets with the index historical values and with a time index. Each R object has at specific class. In this case, the class of these datasets is called xts-zoo. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded (# of shares traded) in every specific period. The adjusted prices are used for stocks, not for market indexes. Adjusted prices consider dividend payments and stock splits. For the case of market indexes, the adjusted prices are always equal to the close prices.

Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:

Op(x): Extract the Opening prices of the period.
Hi(x): Extract the Highest price of the period.
Lo(x): Extract the Lowest price of the period.
Cl(x): Extract the closing prices of the period.
Vo(x): Extract the volume traded of the period.
Ad(x): Extract the Adjusted prices of the period.

3.4 Merging time-series datasets

We can integrate xts-zoo R objects into one xts-zoo dataset using the merge function. In this case, we will only use the adjusted price, so we can also use the Ad function:

# We merge the datasets into a new R object called prices:
prices = merge(MXX,GSPC)
# We only keep the adjusted price columns:
prices = Ad(prices)
# We rename the columns with simpler names:
names(prices) = c("MXX","GSPC")

3.5 Q Visualizing time-series variables

For each index we do a graph to visualize how the index moves over time. We can use the chartSeries function from the quantmod package:

chartSeries(MXX, theme=("white"))

chartSeries(GSPC, theme=("white"))

Respond to the following QUESTIONS:

WHAT YOU CAN SAY ABOUT THE TREND OF BOTH MARKET INDEXES? IS IT CONSTANTLY GROWING, OR DECLINING, OR THERE IS NO CLEAR TREND? BRIEFLY EXPLAIN

Generate a new dataset with the natural log of the indexes:

lnprices = log(prices)

Now do a time plot for the log price:

plot(lnprices$MXX, main = "Log of the Mexican Index over time")

**RESPOND: BRIEFLY MENTION IF YOU SEE A DIFFERENCE BETWEEN THIS PLOT OF THE LOG OF MXX INDEX COMPARED TO THE PLOT OF THE MXX INDEX.*

3.6 Financial returns

A financial simple return for a stock ($R_{t}$) is calculated as a percentage change of price from the previous period (t-1) to the present period (t):

\[ R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1}}=\frac{Adjprice_{t}}{Adjprice_{t-1}}-1 \] For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:

\[ R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25 \]

We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.

In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models. cc returns are also called log returns.

One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):

\[ r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1}) \] This is also called as the difference of the log of the price.

We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):

\[ r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right) \]

cc returns are usually represented by small r, while simple returns are represented by capital R.

3.7 Q Calculation of financial returns

It is recommended to always use adjusted prices to calculate financial returns. In this example that we have market indexes, the adjusted price is exactly the same as the closing price since market indexes do not have stock splits nor dividend payments.

We can use the lag function to get past (lagged) values of a time-series dataset (or column). With this function we can get the price of the previous period to calculate the simple return. Let’s create a new dataset for the simple monthly returns of both indexes:

R = prices / lag(prices,n=1) - 1

We can use the diff function to get the difference of a current value and a lagged value of a dataset (or a column). Let’s create a new dataset for the cc return of both indexes:

r = diff(log(prices))

Remember that the continuously compounded returns can be calculated as the difference between the log of the price of today minus the log of the price of the previous period.

Now do a time graph for the cc returns of the Mexican index:

plot(r$MXX, col = "darkblue",
     main = "cc return for the MXX index")

RESPOND TO THE FOLLOWING:

(a) DOES THIS SERIES HAVE ABOUT THE SAME MEAN FOR ALL TIME PERIODS?

(b) DOES IT HAVE THE SAME STANDARD DEVIATION (VOLATILITY) FOR ALL TIME PERIODS?

4 Non stationary variables - The Random Walk model for stock prices

The random walk hypothesis in Finance (Fama, 1965) states that the natural logarithm of stock prices behaves like a random walk with a drift. A random walk is a series (or variable) that cannot be predicted. Imagine that $Y_t$ is the log price of a stock for today (t). The value of Y for tomorrow ($Y_{t+1}$) will be equal to its today’s value ($Y_t$) plus a constant value ($φ_0$) plus a random shock. This shock is a pure random value that follows a normal distribution with mean=0 and a specific standard deviation $σ_ε$. The process is supposed to be the same for all future periods. In mathematical terms, the random walk model is the following:

\[ Y_t = φ_0 + Y_{t−1} + ε_t \]

The $ε_t$ is a random shock for each day, which is the result of the log price movement due to all news (external and internal to the stock) that influence the price. $φ_0$ refers as the drift of the series. If $|φ_0|$ > 0 we say that the series is a random walk with a drift. If $φ_0$ is positive, then the variable will have a positive trend over time; if it is negative, the series will have a negative trend.

If we want to simulate a random walk, we need the values of the following parameters/variables:

$Y_0$, the first value of the series
$φ_0$, the drift of the series
$σ_ε$, the standard deviation (volatility) of the random shock

4.1 Q Monte Carlo simulation for the random walk model

Let’s go and run a MonteCarlo simulation for a random walk of the S&P 500. We will use real values of the S&P500 to estimate the previous 3 parameters.

4.1.1 Loading/installing R packages

The following R packages need to be installed for most of the class workshops:

fpp2
fpp3
quantmod

Go to the right-bottom windows of RStudio, select the Package tab, click install, and install both R packages. These packages include many other R packages for time-series data management and analysis.

These fpp2 and fpp3 packages were written by Rob J Hyndman and George Athanasopoulos, professors from Monash University at Australia. They are also business consultants with many years of experience doing both serious research in time-series and also applying their findings in the real world.

The quantmod package was written by Jeffrey A. Ryan, one of the most important contributors to R packages for Finance, and organizer of the famous R/Finance Conference held in Chicago each year since 2009.

Once you install these packages, load them in memory:

library(quantmod)
library(fpp2)

Now we generate the log of the S&P index using the closing price/quotation, and create a variable N for the number of days in the dataset:

lnsp<-log(Ad(GSPC))
# I assign a name for the index:
names(lnsp)<-c("lnsp")
N<-nrow(lnsp)

Now we will simulate 2 random walk series estimating the 3 parameters from this log series of the S&P500:

a random walk with a drift (name it rw1), and
a random walk with no drift (name it rw2).

4.1.2 Estimating the parameters of the random walk model

We have to consider the mathematical definition of a random walk and estimate its parameters (initial value, phi0, volatility of the random shock) from the real daily S&P500 data.

Now, we create a variable for a random walk with a drift trying to model the log of the S&P500.

Reviewing the random walk equation again:

\[ Y_t = φ_0 + Y_{t−1} + ε_t \]

The $ε_t$ is the random shock of each day, which represents the overall average perception of all market participants after learning the news of the day (internal and external news announced to the market).

Remember that $\varepsilon_{t}$ behaves like a random normal distributed variable with mean=0 and with a specific standard deviation $\sigma_{\varepsilon}$.

For the simulation of the random walk, you need to estimate the values of

$y_{0}$, the first value of the series, which is the log S&P500 index of the first day
$\phi_{0}$
$\sigma_{\varepsilon}$

You have to estimate $\phi_{0}$ using the last and the first real values of the series following the equation of the random walk. Here you can see possible values of a random walk over time:

\[Y_{0} = Initial value\]

\[Y_{1} = \phi_{0} + Y_{0} + \varepsilon_{1}\]

\[Y_{2} = \phi_{0} + Y_{1} + \varepsilon_{2}\]

Substituting $Y_{1}$ with its corresponding equation:

\[Y_{2} = \phi_{0} + \phi_{0} + Y_{0} + \varepsilon_{1} + \varepsilon_{2}\]

Re-arranging the terms:

\[Y_{2} = 2*\phi_{0} + Y_{0} + \varepsilon_{1} + \varepsilon_{2}\] If you continue doing the same until the last N value, you can get:

\[Y_{N} = N*\phi_{0} + Y_{0} + \sum_{t=1}^{N}\varepsilon_{t}\]

This mathematical result is kind of intuitive. The value of a random walk at time N will be equal to its initial value plus N times phi0 plus the sum of ALL random shocks from 1 to N.

Since the mean of the shocks is assumed to be zero, then the expected value of the sum of the shocks will also be zero. Then:

\[E[Y_{N}] = N*\phi_{0} + Y_{0}\]

From this equation we see that $phi_{0}$ can be estimated as:

\[\phi_{0} = \frac{(Y_{N} - Y_{0})}{N}\]

Then, $\phi_{0}$ = (last value - first value) / # of days.

I use scalars to calculate these coefficients for the simulation. A Stata scalar is a temporal variable to save a number.

I calculate $\phi_{0}$ following this formula:

phi0<- (as.numeric(lnsp$lnsp[N])-as.numeric(lnsp$lnsp[1])) / N
cat("The value for phi0 is ",phi0)

## The value for phi0 is  0.009511628

Remember that N is the total # of observations, so lnsp[N] has last daily value of the log of the S&P500.

Now we need to estimate sigma, which is the standard deviation of the shocks. We can start estimating its variance first. It is known that the variance of a random walk cannot be determined unless we consider a specific number of periods.

Then, let’s consider the equation of the random walk series for the last value ($Y_N$), and then estimate its variance from there:

\[Y_{N} = N*\phi_{0} + Y_{0} + \sum_{t=1}^{N}\varepsilon_{t}\]

Using this equation, we calculate the variance of $Y_N$ :

\[Var(Y_{N}) = Var(N*\phi_{0}) + Var(Y_{0}) + \sum_{t=1}^{N}Var(\varepsilon_{t})\]

The variance of a constant is zero, so the first two terms are equal to zero.

Now analize the variance of the shock:

Since it is supposed that the volatility (standard deviation) of the shocks is about the same over time, then:

\[Var(\varepsilon_{1}) = Var(\varepsilon_{2}) = Var(\varepsilon_{N}) = \sigma_{\varepsilon}^2\]

Then the sum of the variances of all shocks is actually the variance of the shock times N. Then the variance of all the shocks is actually the variance of $Y_N$.

Then we can write the variance of $Y_N$ as:

\[Var(Y_{N}) = N * Var(\varepsilon)= N*\sigma_{\varepsilon}^2\]

To get the standard deviation of $Y_N$ we take the square root of the variance of $Y_N$:

\[SD(Y_{N}) = \sqrt{N}*SD(\varepsilon)\]

We use sigma character for standard deviations:

\[\sigma_{Y} = \sqrt{N}*\sigma_{\varepsilon}\]

Finally we express the volatility of the shock ($\sigma_{\varepsilon}$) in terms of the volatility of $Y_N$ ($\sigma_{Y}$):

\[\sigma_{\varepsilon} = \frac{\sigma_{Y}}{\sqrt{N}} \]

Then we can estimate sigma as: sigma = StDev(lnsp) / sqrt(N). Let’s do it:

sigma<-sd(lnsp$lnsp) / sqrt(N)
cat("The volatility of the log is = ",sd(lnsp$lnsp),"\n")

## The volatility of the log is =  0.3161905

cat("The volatility for the shock is = ",sigma)

## The volatility for the shock is =  0.02839474

4.1.3 Simulating the random walk with drift

Now you are ready to start the simulation of random walk using rw1:

\[rw1_{t} = \phi_{0} + rw1_{t-1} + \varepsilon_{t}\]

The $\phi_{0}$ coefficient is also drift of the random walk.

We will create a new column in the lnsp R dataset for the random walk with the name rw1.

lnsp$rw1 = 0

I assigned zero to all values before I do the simulation.

I start assigning the first value of the random walk to be equal to the first value of the log of the S&P500:

lnsp$rw1[1]<-lnsp$lnsp[1]

Now assign random values from day 2 to the last day following the random walk. For each day, we create the random shock using the function rnorm. We create this shock with standard deviation equal to the volatility of the shock we calculated above (the sigma). We indicate that the mean =0:

shock <- rnorm(n=N,mean=0,sd=sigma)
lnsp$shock<-shock

We can see the shock over time:

plot(shock, type="l", col="blue")

We can also see whether the shock behaves like a normal distribution by doing its histogram:

hist(lnsp$shock)

As expected, the shock behaves similiar to a normal-distributed variable.

Now we are ready to start the simulation of random walk. Then we fill the values for rw1. Remembering the formula for the random walk process:

\[rw1_{t} = \phi_{0} + rw1_{t-1} + \varepsilon_{t}\]

We start the random walk with the first value of the log of the S&P500. Then, from day 2 we do the simulation according to the previous formula and using the random shock just created:

# I create separate vectors:
rw1<-single(length=N)

# I assign the first value of the random-walk to be equal to real log value of the S&P
rw1[1]<-as.numeric(lnsp$lnsp[1])

# Now from day 1 I generate the values of the random walk following the formula:
for (i in 2:N){ rw1[i] <- phi0 + rw1[i-1] + shock[i] }
lnsp$rw1<-rw1

I plot the simulated random walk and the real log of the S&P500:

ts.plot(lnsp$rw1)
lines(seq(1,N),lnsp$lnsp, col="blue")

4.1.4 Simulating a random walk with no drift

Now we can do a simulation but now without the drift. I this case, the $\phi_{0}$ coefficient must be zero.

Use another variable rw2 for this. You can follow the logic we did for rw1, but now $\phi_{0}$ will be equal to zero, so we do not include it into the equation:

rw1_v2<-single(length=N)
rw1_v2[1]<-lnsp$lnsp[1]
for (i in 2:N){  rw1_v2[i] <- rw1_v2[i-1] + shock[i] }

ts.plot(lnsp$lnsp, col="blue")
# I plot both lines to compare 
lines(rw1_v2, col="green")

WHAT DO YOU OBSERVE with this plot? EXPLAIN WITH YOUR WORDS.

Now run a simple regression to check whether the rw1 is statistically related to the log of the S&P500. Use rw1 as explanatory variable. Show the regression results as comments.

regmodel<-lm(lnsp$lnsp~lnsp$rw1)
# I see statistics on my regression model
s_regmodel <- summary(regmodel)
s_regmodel

## 
## Call:
## lm(formula = lnsp$lnsp ~ lnsp$rw1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15328 -0.04977 -0.01826  0.04081  0.26469 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  0.92667    0.15173   6.107         0.0000000125 ***
## lnsp$rw1     0.84346    0.01898  44.440 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07658 on 122 degrees of freedom
## Multiple R-squared:  0.9418, Adjusted R-squared:  0.9413 
## F-statistic:  1975 on 1 and 122 DF,  p-value: < 0.00000000000000022

DOES THE REGRESSION RESULT MAKE SENSE? EXPLAIN WHY YES OR WHY NOT?

DOES THE LOG OF THE S&P500 LOOKS LIKE A RANDOM WALK? WHY YES OR WHY NOT?

# I plot the natural log pf S&P500
ts.plot(lnsp$lnsp)
# I plot my random walk with a drift
lines(rw1, col="blue")

DO YOU THINK THAT WE CAN USE THIS TYPE OF SIMULATION TO PREDICT STOCK PRICES OR INDEXES? WHY YES OR WHY NOT?

ts.plot(lnsp$lnsp,col="blue")
lines(rw1_v2, col="green")
lines(rw1)

5 Q Reading

Read/skim the note: “Introduction to time series”. With your own words:

EXPLAIN WHAT IS A STATIONARY SERIES.
WHICH ARE THE CONDITIONS OF A SERIES TO BE CONSIDERED AS A STATIONARY SERIES?

6 W1 submission

Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)

Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.

Workshop 1, Time Series

Alberto Dorantes, Ph.D. & Felipe A. Pérez S., Ph.D.

April 27, 2021