Abstract
This is an INDIVIDUAL workshop. In this workshop, besides learning the basics of R, we will work on calculation of returns and basic statistical concepts of Finance and the introduction to hypothesis testing. In addition, we will learn to download financial data from public data sources suchs Yahoo Finance and we will work with the properties of simple and continuously compounded returns.You will work in RStudio. Create an R Notebook document (File -> New File -> R Notebook), where you have to write whatever is asked in this workshop.
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
You have to keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file. Pay attention in class to know how to generate an html file from your .Rmd.
title: “Workshop 1, Econometric Models”
author: YourName (First and Last names)
output: html_notebook
Now you are ready to continue writing your first R Notebook.
You can start writing your own notes/explanations we cover in this workshop. When you need to write lines of R Code, you need to click Insert at the top of the RStudio Window and select R. Immediately a chunk of R code will be set up to start writing your R code. You can execute this piece of code by clicking in the play button (green triangle).
Note that you can open and edit several R Notebooks, which will appear as tabs at the top of the window. You can visualize the output (results) of your code in the console, located at the bottom of the window. Also, the created variables are listed in the environment, located in the top-right pane. The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.
We start clearing our R environment:
The getSymbols() downloads online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED, Oanda, and Tiingo. These sources have thousands of finance and economic data series from many market exchanges and other macroeconomic variables around the world.
# We define a vector for the tickers:
tickers<-c("^GSPC","PFE","AZN","SNY","NVS","NFLX","TSLA","AAPL","MSFT")
getSymbols(Symbols=tickers, from="2017-01-01", src="yahoo", periodicity="monthly")## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## pausing 1 second between requests for more than 5 symbols
## [1] "^GSPC" "PFE" "AZN" "SNY" "NVS" "NFLX" "TSLA" "AAPL" "MSFT"
This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical monthly prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.
In the Symbols argument you can specify more than one ticker by using the container c() operator and separated by commas. The from argument is used to indicate the initial date from which you want to bring data. The to argument is the end date of the series you want to download. In this case we omit the to argument in order to download the most recent data. The src argument indicates the source of the data, in this case it is Yahoo Finance. Finally, the periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly).
DO THE SAME WITH THE STOCK DATASETS.
We can list the FIRST 5 rows of the S&P500 index by using head() function:
## GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
## 2017-01-01 2251.57 2300.99 2245.13 2278.87 70483180000 2278.87
## 2017-02-01 2285.59 2371.54 2271.65 2363.64 69162420000 2363.64
## 2017-03-01 2380.13 2400.98 2322.25 2362.72 81547770000 2362.72
## 2017-04-01 2362.34 2398.16 2328.95 2384.20 65265670000 2384.20
## 2017-05-01 2388.50 2418.71 2352.72 2411.80 79607170000 2411.80
DO THE SAME WITH THE STOCK DATASETS.
Also, you can list the LAST 5 rows of any dataset. Note that you can change number of rows you want to display.
## GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
## 2020-10-01 3385.87 3549.85 3233.94 3269.96 89737600000 3269.96
## 2020-11-01 3296.20 3645.99 3279.74 3621.63 100977880000 3621.63
## 2020-12-01 3645.87 3760.20 3633.40 3756.07 96056410000 3756.07
## 2021-01-01 3764.61 3870.90 3662.71 3714.24 105548790000 3714.24
## 2021-02-01 3731.17 3894.56 3725.62 3886.83 25430390000 3886.83
DO THE SAME FOR ALL STOCKS.
For each period, Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits. Then, for the Bitcoin series we can use close of adjusted price to calculate daily returns.
Let’s see some of the benefits of using xts-zoo objects. We can, for example, select columns using any of the following functions, where x represents a generic xts zoo object:
Op(x): Extract the Opening prices of the period.Hi(x): Extract the Highest price of the period.Lo(x): Extract the Lowest price of the period.Cl(x): Extract the closing prices of the period.Vo(x): Extract the volume traded of the period.Ad(x): Extract the Adjusted prices of the period.Save your file as W1-YourName.Rmd in your computer. Go to the File menu and select Save As.
Visualize how our stocks have been valued over time. Use the following command to display the graph and save it.
DO THE SAME FOR OTHER 2 STOCKS.
We can also use another function to better visualize not only price over time, but also the volume traded each month:
DO THE SAME FOR OTHER 2 STOCKS. YOU CAN CHANGE THE COLOR OF THE THEME.
A financial simple return for a stock (\(R_{t}\)) is calculated as a percentage change of price from the previous period (t-1) to the present period (t):
\[ R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1}}=\frac{Adjprice_{t}}{Adjprice_{t-1}}-1 \] For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:
\[ R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25 \]
We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.
In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models.
One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):
\[ r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1}) \] This is also called as the difference of the log of the price.
We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):
\[ r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right) \]
cc returns are usually represented by small r, while simple returns are represented by capital R.
We have historical monthly adjusted prices for each stock. We will first start calculating returns for the S&P 500 index. We can use the log function to calculate the natural logarithm, and the lag function to get the previous value of the adjusted prices. Let’s calculate themonthly simple and cc returns for the S&P 500 in a new dataset:
# We calculate the simple returns of the S&P 500:
SP500_R = adjprices$GSPC / lag(adjprices$GSPC,n=1) - 1
# We calculate the cc returns of the S&P 500
SP500_r = log(adjprices$GSPC) - log(lag(adjprices$GSPC,n=1) )
# We can also do the same calculation of cc returns using the
# second formula:
SP500_r_2 = log(adjprices$GSPC/lag(adjprices$GSPC,n=1) )We calculated cc returns using two formulas and we got exactly the same result. The first formula gets the first difference of the log of the price. The first difference refers to the current value of a time series minus its value of the previous period.
We can simplify the calculation of cc returns by using the function diff(), which calculates the first difference of any time series xts dataset.
As you see, we also apply the na.omit function to drop the first row since the calculation of return for the first month cannot be calculated (it gets an NA value).
# We calculate cc returns for the integrated adjprices dataset:
ccr = na.omit(diff(log(adjprices)))
View(ccr)
# We calculate simple returns for the integrated dataset:
R = adjprices / lag(adjprices,n=1) - 1Since all returns are integrated in one dataset, we can also see all returns in one plot:
## Index GSPC
## Min. :2017-02-01 Min. :-0.1336677
## 1st Qu.:2018-02-01 1st Qu.:-0.0003893
## Median :2019-02-01 Median : 0.0177655
## Mean :2019-01-30 Mean : 0.0108962
## 3rd Qu.:2020-02-01 3rd Qu.: 0.0353880
## Max. :2021-02-01 Max. : 0.1194208
As you can see, summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package fist since table.Stats() belongs to such package.
## GSPC
## Observations 49.0000
## NAs 0.0000
## Minimum -0.1337
## Quartile 1 -0.0004
## Median 0.0178
## Arithmetic Mean 0.0109
## Geometric Mean 0.0098
## Quartile 3 0.0354
## Maximum 0.1194
## SE Mean 0.0067
## LCL Mean (0.95) -0.0027
## UCL Mean (0.95) 0.0245
## Variance 0.0022
## Stdev 0.0472
## Skewness -0.7343
## Kurtosis 1.3865
This function calculates the most common measures of central tendency and dispersion. As central tendency it calculates median, arithmetic and geometric mean. As dispersion measures it calculates the minimum, the maximum values, quartile 1, quartile 3, standard deviation, and variance.
hist(ccr$GSPC, main="Histogram of S&P 500 monthly returns",
xlab="Continuously Compounded returns", col="dark green")You have to:
WRITE THE NULL AND THE ALTERNATIVE HYPOTHESIS
Calculate the Standard error, which is the standard deviation of the MEAN of returns.
Calculate the t-statistic. EXPLAIN/INTERPRET THE VALUE OF t YOU GOT.
WRITE YOUR CONCLUSION OF THE t-TEST
Here is an example of a t-test to check whether the S&P 500 has an average monthly returns significantly greater than zero:
# a)
# H0: mean(ccr$GSPC) = 0
# Ha: mean(ccr$GSPC) <> 0
# b)
se_GSPC.r <- sd(ccr$GSPC) / sqrt(nrow(ccr$GSPC) )
print(paste("Standard error S&P 500 =" , se_GSPC.r))## [1] "Standard error S&P 500 = 0.00674725047827303"
## [1] "t-value S&P 500 = 1.61491080702246"
Since the t-value of the mean return of S&P 500 is lower than 2, I can’t reject the null hypothesis. Therefore, S&P 500 mean return is not statistically different than 0.
##
## One Sample t-test
##
## data: as.numeric(ccr$GSPC)
## t = 1.6149, df = 48, p-value = 0.05644
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## -0.000420444 Inf
## sample estimates:
## mean of x
## 0.01089621
## t
## 1.614911
## t
## TRUE
Here is an example with PFE and AZN:
We start calculating the mean of returns for both stocks:
## [1] 0.006314097
## [1] 0.01532175
Now we set up the hypotheses. Since the mean return of Astrazeneca is higher than the mean return of Pfizer we start believing that Astrazeneca is significantly offering higher average monthly returns than Pfizer. Then the alternative hypothesis will be this initial belief:
#' H0: mean(ccr$AZN) = mean(ccr$PFE)
#' Ha: mean(ccr$AZN) > mean(ccr$PFE)
#' I have to re-arrange the equality to leave a number to the right:
#' H0: mean(ccr$AZN) - mean(ccr$PFE) = 0
#' Ha: mean(ccr$AZN) - mean(ccr$PFE) <>0
# We can set up this hypotheses as follows:
# meandif = mean(ccr$AZN) - mean(ccr$PFE)
# H0: meandif = 0
# Ha: meandif > 0In this case, the random variable of this test is meandif, which is the difference of 2 means. The mean return of PFE and AZN are random variables, so the variable of this test is the difference of 2 random variables. To calculate the t value of this test, we have to know how to estimate the standard deviation of meandif, which is the difference of 2 random variables.
From basic probability theory, if both random variables are independent, then the variance of the difference of 2 random variables is the SUM of the variances! This sounds counter-intuitive. WHY THE VARIANCE OF THE DIFFERENCE OF 2 RANDOM VARIABLES IS THE SUM OF THE 2 VARIANCES INSTEAD OF BEING THE DIFFERENCE OF BOTH VARIANCES? DO YOUR OWN RESEARCH AND BRIEFLY EXPLAIN.
Here we do the calculation of t-value manually in R:
#' t = (mean(ccr$AZN) - mean(ccr$PFE) - 0) / sqrt( (1/N)* (Var(ccr$AZN) + Var(ccr$PFE)) )
N <- nrow(ccr$AZN)
t <- (mean_AZN.r - mean_PFE.r - 0) / sqrt( (1/N) * (var(ccr$AZN) + var(ccr$PFE) ))
cat("t-value = ", t)## t-value = 0.712505
Now we do the t-test using the t.test function and check whether we got the same calculation for t:
##
## Welch Two Sample t-test
##
## data: as.numeric(ccr$AZN) and as.numeric(ccr$PFE)
## t = 0.71251, df = 95.73, p-value = 0.4779
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.01608788 0.03410319
## sample estimates:
## mean of x mean of y
## 0.015321750 0.006314097
## t-vale from t.test = 0.712505
## p-value = 0.4778851
We got the same t-value with the t.test function and our manual calculation.
Conclusion of the test: Since the absolute value of the t is much less than 2 and the p-value of the test is 0.47 we do NOT have statistical evidence to reject the null hypothesis. In other words, we do not have enough statistical evidence at the 95% confidence level to say that the average monthly returns of Astrazeneca is significantly higher than the average monthly returns of Pfizer. We can just say that the average monthly returns of Astrazeneca is higher than that of Pfizer, but this difference is not big enough to say that Astrazeneca will offer 95% of the time returns higher than Pfizer.