Abstract
This is a solution for Workshop 1. Not all the workshop will be displayed; only the sections were students needed to work on an exercise or respond questions.In this section we will download time-series variables from Yahoo Finance and explore the time-series datasets.
More specifically, we explore the xts-zoo R objects. xts stands for “eXtensible Time Series”, and zoo is an R class for general time-series datasets.
We start clearing our R environment:
rm(list=ls())
# To avoid scientific notation for numbers:
options(scipen=999)
The quantmod package is needed to download financial online data, and also it has many functions to do financial analysis.
We load the package:
library(quantmod)
The getSymbols() function enables its user to download online and up-to-date financial data, such as stock prices, market indexes, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, Google Finance, FRED and Oanda. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.
We download the main monthly market indexes of Mexico (the IPCyC) and the US (the S&P500) from Yahoo from 2011 to date:
getSymbols(c("^MXX", "^GSPC"), from="2011-01-01", to="2022-02-22", periodicity = "monthly", src = "yahoo")
## [1] "^MXX" "^GSPC"
2 xts-zoo R objects are created in the environment with historical data for each market index in a chronological order. These xts-zoo R objects are actually datasets with the index historical values and with a time index. Each R object has at specific class. In this case, the class of these datasets is called xts-zoo. xts stands for extensible time-series. An xts-zoo object enables an easy manipulation of time series data.
For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded (# of shares traded) in every specific period. The adjusted prices are used for stocks, not for market indexes. Adjusted prices consider dividend payments and stock splits. For the case of market indexes, the adjusted prices are always equal to the close prices.
We can apply functions to an xts dataset. For example, we can us functions to extract specific stock prices stored in the xts datset. For example, we can use any of the following functions applied to an xts
zoo
object:
Op(x)
: Extract the Opening prices of the period.Hi(x)
: Extract the Highest price of the period.Lo(x)
: Extract the Lowest price of the period.Cl(x)
: Extract the closing prices of the period.Vo(x)
: Extract the volume traded of the period.Ad(x)
: Extract the Adjusted prices of the period.We can integrate xts-zoo R objects into one xts-zoo dataset using the merge function. In this case, we will only use the adjusted price, so we can also use the Ad function:
# We merge the datasets into a new R object called prices:
= merge(MXX,GSPC)
prices # We only keep the adjusted price columns:
= Ad(prices)
prices # We rename the columns with simpler names:
names(prices) = c("MXX","GSPC")
For each index we do a graph to visualize how the index moves over time.
We can use the chartSeries function from the quantmod package:
chartSeries(MXX, theme=("white"))
chartSeries(GSPC, theme=("white"))
Respond to the following QUESTIONS:
WHAT YOU CAN SAY ABOUT THE TREND OF BOTH MARKET INDEXES? IS IT CONSTANTLY GROWING, OR DECLINING, OR THERE IS NO CLEAR TREND? BRIEFLY EXPLAIN
R: WE CAN SEE THAT THE VARIABILITY OF THE IPC INDEX IS MUCH HIGHER COMPARED TO THE VARIABILITY OF THE S&P500. WE SEE THAT THE S&P500 HAS A CLEAR GROWING TREND SINCE 2011. IT HAD A STRONG DECLINE IN THE Q2 OF 2020 DUE TO THE PANDEMIC, BUT A QUICK RECOVERY FOR THE LAST MONTHS OF 2020. IN JAN AND FEB 2022 IT IS HAVING A DECLINE.
IN THE CASE OF THE IPC, THERE IS NO CLEAR GROWING TREND OVER THE PERIOD. WE CAN ALSO SEE THE STRONG DECLINE IN Q2 OF 2020 DUE TO THE PANDEMIC, BUT WE DO NOT SEE QUICK RECOVERY COMPARED WITH THE S&P500. WE CA SEE A GENERAL DECLINE OF THE INDEX FROM MID 2017 UP TO THE END OF 2020 AND WITH HIGH VARIETY (VOLATILITY). IN THE LAST 6 MONTHS THE INDEX OSCILATES UP AND DOWN AROUND 53,000 POINTS.
Generate a new dataset with the natural log of the indexes:
= log(prices) lnprices
Now do a time plot for the log price:
plot(lnprices$MXX, main = "Log of the Mexican Index over time")
plot(lnprices$GSPC, main = "Log of the US Index over time")
**RESPOND: BRIEFLY MENTION IF YOU SEE A DIFFERENCE BETWEEN THIS PLOT OF THE LOG OF MXX INDEX COMPARED TO THE PLOT OF THE MXX INDEX.*
R: I SEE THAT THE MOVEMENT OF THE LOG OF THE INDEX FOR THE IPC LOOKS VERY SIMILAR TO ITS MOVEMENT OF THE IPC INDEX. THE ONLY DIFFERENCE IS THE SCALE. THE LOG OF THE IPC MOVES FROM ABOUT 7 TO 8.3, WHILE THE IPC INDEX MOVES FROM 30,000 TO 50,000. THIS MAKES SENSE SINCE THE LOGARITHM OF THE INDEX IS ACTUALLY THE EXPONENT WE RAISE THE EULER CONSTANT e TO GET THE INDEX.
IN THE CASE OF THE US INDEX, WE SEE THAT THE PATTERN OF THE LOG SERIES IS SIMILAR TO THE ORIGINAL US INDEX, BUT WE CAN SEE THAT THE GROWING TREND AFTER COVID IS LESS PRONOUNCED IN THE LOGARITMIC SCALE.
Before we calculate the continuously compounded returns, let’s review the concept of logarithm.
What is a natural logarithm?
The natural logarithm of a number is the exponent that the number e (=2.71…) needs to be raised to get another number. For example, let’s name x=natural logarithm of a stock price p. Then:
\[ e^x = p \] The way to get the value of x that satisfies this equality is actually getting the natural log of p:
\[ x = log_e(p) \] Then, we have to remember that the natural logarithm is actually an exponent that you need to raise the number e to get a specific number.
The natural log is the logarithm of base \(e\) (=2.71…). The number \(e\) is an irrational number (it cannot be expressed as a division of 2 natural numbers), and it is also called the Euler constant. Leonard Euler (1707-1783) took the idea of the logarithm from the great mathematician Jacob Bernoulli, and discovered very astonishing features of the \(e\) number. Euler is considered the most productive mathematician of all times. It is interesting to know that Jacob Bernoulli discovered the number \(e\) (around 1690) when he was playing with calculations to know how a financial amount grows over time.
How \(e\) is related to the growth of financial amounts over time?
Here is a simple example:
If I invest $100.00 with an annual interest rate of 50%, then the end balance of my investment at the end of the first year (at the beginning of year 2) will be:
\[ I_2=100*(1+0.50)^1 \]
If the interest rate is 100%, then I would get:
\[ I_2=100*(1+1)^1=200 \] Then, the general formula to get the final amount of my investment at the beginning of year 2, for any interest rate R can be:
\[ I_2=I_1*(1+R)^1 \] The (1+R) is the growth factor of my investment.
In Finance, the investment amount is called principal. If the interests are calculated (compounded) each month instead of each year, then I would end up with a higher amount at the end of the year.
Monthly compounding means that a monthly interest rate is applied to the amount to get the interest of the month, and then the interest of the month is added to the investment (principal). Then, for month 2 the principal will be higher than the initial investment. At the end of month 2 the interest will be calculated using the updated principal amount. Putting in simple math terms, the final balance of an investment at the beginning of year 2 when doing monthly compounding will be:
\[ I_2=I_1*\left(1+\frac{R}{N}\right)^{1*N} \]
For monthly compounding, N=12, so the monthly interest rate is equal to the annual interest rate R divided by N (R/N). Then, with an annual rate of 100% and monthly compounding (N=12):
\[ I_2=100*\left(1+\frac{1}{12}\right)^{1*12}=100*(2.613..) \]
In this case, the growth factor is \((1+1/12)^{12}\), which is equal to 2.613…
Instead of compounding each month, if the compounding is every moment, then we are doing a continuously compounded rate.
If we do a continuously compounding for the previous example, then the growth factor for one year becomes the astonishing Euler constant e:
Let’s do an example for a compounding of each second (1 year has 31,536,000 seconds). The investment at the end of the year 1 (or at the beginning of year 2) will be:
\[ I_2=100*\left(1+\frac{1}{31536000 }\right)^{1*31536000 }=100*(2.718282..)\cong100*e^1 \]
Now we see that \(e^1\) is the GROWTH FACTOR after 1 year if we do the compounding of the interests every moment!
We can generalize to any other annual interest rate R, so that \(e^R\) is the growth factor for an annual nominal rate R when the interests are compounded every moment.
When compounding every instant, we use small r instead of R for the interest rate. Then, the growth factor will be: \(e^r\)
Then we can do a relationship between this growth rate and an effective equivalent rate:
\[ \left(1+EffectiveRate\right)=e^{r} \]
If we apply the natural logarithm to both sides of the equation:
\[ ln\left(1+EffectiveRate\right)=ln\left(e^r\right) \]
Since the natural logarithm function is the inverse of the exponential function, then:
\[ ln\left(1+EffectiveRate\right)=r \] In the previous example with a nominal rate of 100%, when doing a continuously compounding, then the effective rate will be:
\[ \left(1+EffectiveRate\right)=e^{r}=2.7182 \]
\[ EffectiveRate=e^{r}-1 \] Doing the calculation of the effective rate for this example:
\[ EffectiveRate=e^{1}-1 = 2.7182.. - 1 = 1.7182 = 171.82\% \]
Then, when compounding every moment, starting with a nominal rate of 100% annual interest rate, the actual effective annual rate would be 171.82%!
Now we go back to the calculation of financial returns.
A financial simple return for a stock (\(R_{t}\)) is calculated as a percentage change of price from the previous period (t-1) to the present period (t):
\[ R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1}}=\frac{Adjprice_{t}}{Adjprice_{t-1}}-1 \] For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:
\[ R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25 \]
We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.
In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models. cc returns are also called log returns.
One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):
\[ r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1}) \] This is also called as the difference of the log of the price.
We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):
\[ r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right) \]
cc returns are usually represented by small r, while simple returns are represented by capital R.
It is recommended to always use adjusted prices to calculate financial returns. In this example that we have market indexes, the adjusted price is exactly the same as the closing price since market indexes do not have stock splits nor dividend payments.
in R We can use the lag function to get past (lagged) values of a time-series dataset (or column). With this function we can get the price of the previous period to calculate the simple return. Let’s create a new dataset for the simple monthly returns of both indexes:
= prices / lag(prices,n=1) - 1 R
We can have a quick view of the first returns of the series:
head(R)
## MXX GSPC
## 2011-01-01 NA NA
## 2011-02-01 0.001012944 0.031956564
## 2011-03-01 0.011367259 -0.001047313
## 2011-04-01 -0.012763998 0.028495380
## 2011-05-01 -0.030566881 -0.013500953
## 2011-06-01 0.020240714 -0.018257461
For the first day there is no return calculations for both indexes (the value appears as NA) since there is no price before the first day.
We can use the diff function to get the difference of a current value and a lagged value of a dataset (or a column). Let’s create a new dataset for the cc return of both indexes:
= diff(log(prices)) r
We can view the first cc returns of the series:
head(r)
## MXX GSPC
## 2011-01-01 NA NA
## 2011-02-01 0.001012431 0.031456577
## 2011-03-01 0.011303137 -0.001047862
## 2011-04-01 -0.012846158 0.028096939
## 2011-05-01 -0.031043791 -0.013592919
## 2011-06-01 0.020038594 -0.018426186
Remember that the continuously compounded returns can be calculated as the difference between the log of the price of today and the log of the price of the previous period.
Now do a time plot for the cc returns of the Mexican index:
plot(r$MXX, col = "darkblue",
main = "cc return for the MXX index")
RESPOND TO THE FOLLOWING:
(a) DOES THIS SERIES HAVE ABOUT THE SAME MEAN FOR ALL TIME PERIODS?
R: WE CAN SEE THAT THE MEAN OF cc RETURNS LOOK ABOUT THE SAME FOR ALL PERIODS. THE MEAN MIGHT BE BETWEEN 0.00 AND 0.01.
(b) DOES IT HAVE THE SAME STANDARD DEVIATION (VOLATILITY) FOR ALL TIME PERIODS?
WE SEE THAT THE VOLATILITY IS HIGHER TOWARDS THE END OF THE SERIES. THIS MAKES SENSE DUE TO THE 2020 PANDEMIC CRISIS. BEFORE THE PANDEMIC, THE VOLATILITY LOOKS SIMILAR FOR ALL PERIODS.
Read/skim the note: “Introduction to time series”. With your own words:
a) EXPLAIN WHAT IS A STATIONARY SERIES.
R: A STATIONARY TIME SERIES IS A TIME SERIES VARIABLE THAT HAS ABOUT THE SAME MEAN AND STANDARD DEVIATION FOR ANY PERIOD. THEN, WE CANNOT HAVE A CLEAR GROWING OR DECLINING TREND OVER TIME IF THE SERIES IS STATIONARY, AND ALSO THE STANDARD DEVIATION OF THE SERIES LOOKS SIMILAR (HOMOGENEOUS) FOR ANY PERIOD.
b) WHICH ARE THE CONDITIONS OF A SERIES TO BE CONSIDERED AS A STATIONARY SERIES?
THE 3 CONDITIONS FOR A STATIONARY SERIES ARE:
1) THE MEAN OF THE SERIES IS ABOUT THE SAME FOR ANY TIME PERIOD. IN OTHER WORDS, THE EXPECTED VALUE OF THE SERIES IS DEFINED WITH A SPECIFIC VALUE THAT IS CONSTANT.
2) THE EXPECTED VARIANCE OF THE SERIES IS THE SAME FOR ANY TIME PERIOD. IN OTHER WORDS, THE VARIANCE OF THE SERIES IS DEFINED WITH A SPECIFIC VALUE THAT IS CONSTANT.
3) THE AUTOREGRESSION BETWEEN A VALUE Y AT t AND ITS LAGGED k VALUE IS THE SAME FOR ANY TIME PERIOD.