Workshop 1, Econometric Models

Author

Alberto Dorantes

Published

February 19, 2024

Abstract
This is an INDIVIDUAL workshop. In this workshop we learn about 1) calculation of returns and 2) descriptive statistics for Finance. In addition, we learn the basics of R computing language, more specifically data management.

1 Introduction to data management in Finance

1.1 Install the quantmod package

Google Colab already has many R packages installed for you, but quantmod is not installed. In order to import and manage financial data in R, you need to install the quantmod package:

#install.packages("quantmod")
# Once you run this chunk, put a # before the command since it is already installed. This will speed up your notebook when you run it!

This package contains several functions for Financial analysis. The getSymbols() function pulls data from online sources such as Yahoo Finance.

1.2 Load the quantmod package

Once installed, you must load the package in memory:

library(quantmod)

1.3 Downloading real financial prices

The getSymbols function downloads prices/quotations of stocks, market indexes, currencies, crypto-currencies, ETF’s and more.

Let’s download monthly prices for Nvidia and Microsoft from YahooFiance from January 2019 to date:

tickers = c("NVDA", "MSFT")
getSymbols(Symbols=tickers, from="2019-01-01", periodicity="monthly", to="2024-02-12")
[1] "NVDA" "MSFT"

In the Symbols argument you can specify on or more than one ticker by using the container c() operator and separate each ticker by commas.

In this case the getSymbols will bring 2 dataset, one for each ticker.

1.4 Show the content of datasets

You can have a look of the content of each dataset. You can see the first of the last prices:

head(NVDA)
           NVDA.Open NVDA.High NVDA.Low NVDA.Close NVDA.Volume NVDA.Adjusted
2019-01-01   32.6600   40.2200  31.9225    35.9375  1687099200      35.65836
2019-02-01   36.1250   41.3200  35.6450    38.5650  1144572800      38.26545
2019-03-01   39.0675   46.2500  36.2000    44.8900  1206854000      44.58722
2019-04-01   45.8150   48.3675  43.3250    45.2500   943778000      44.94480
2019-05-01   45.7775   46.2175  33.8475    33.8650  1118075200      33.63659
2019-06-01   33.9775   41.3425  33.1500    41.0575   823334000      40.82711
tail(NVDA)
           NVDA.Open NVDA.High NVDA.Low NVDA.Close NVDA.Volume NVDA.Adjusted
2023-09-01    497.62    498.00   409.80     434.99   857510100      434.9159
2023-10-01    440.30    476.09   392.30     407.80  1013917700      407.7641
2023-11-01    408.84    505.48   408.69     467.70   914386300      467.6589
2023-12-01    465.25    504.33   450.10     495.22   740951700      495.1765
2024-01-01    492.44    634.93   473.20     615.27   970385300      615.2700
2024-02-01    621.00    823.94   616.50     791.12  1107161100      791.1200

You can visualize the whole monthly prices with a plot:

plot(NVDA)

Or you can do a nicer plot:

chartSeries(NVDA, theme=("black"))

1.5 Data management: merging and cleaning datasets

The datasets that getSymbols download are xts datasets. xts stands for eXtensible Time Series dataset.

An xts dataset has an index, which in this case is the date of the prices. We can merge 2 or more xts datasets according to their indexes:

prices<-merge(NVDA,MSFT)

prices will have now 12 columns since each dataset has 6 columns.Each stock has 5 columns for prices: Open, High, Low, Close and AdjClose, and 1 column for trading Volume in the period.

To calculate financial returns of stocks we MUST use adjusted prices. Adjusted prices already consider stock splits and dividend payments.
To get ONLY adjusted columns we can use Ad function to prices dataset:

adjprices<-Ad(prices)

We can rename the columns with the ticker names:

colnames(adjprices) <- c("NVDA","MSFT")

We can see the first rows of the adjusted prices:

head(adjprices)
               NVDA      MSFT
2019-01-01 35.65836  98.98339
2019-02-01 38.26545 106.18700
2019-03-01 44.58722 112.26620
2019-04-01 44.94480 124.31713
2019-05-01 33.63659 117.73004
2019-06-01 40.82711 127.98755

2 Return calculation

2.1 Simple and continuously compounded (cc) return

A financial simple return for a stock in period t (R_{t}) is usually calculated as the closing stock price in t plus any dividend payment at t, and then divide this sum by the previous closing price. It is calculated as a percentage change from the previous period (t-1) to the present period (t):

R_{t}=\frac{\left(price_{t}-price_{t-1}+ dividend_t\right)}{price_{t-1} }=\frac{price_{t}+dividend_t}{price_{t-1}}-1

When a stock pays dividends of do a stock split, the financial exchange make an adjustment to the historical stock prices. This adjustment to the stock prices is made so that we do not need to use dividends nor splits to calculate simple stock returns. Then, it is always recommended to use adjusted prices to calculate stock returns, unless you have information about all dividends payed in the past.

Then, with adjusted prices the formula for simple returns is easier:

R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1} }=\frac{Adjprice_{t}}{price_{t-1}}-1

For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:

R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25

We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.

Although the arithmetic mean of simple returns R gives us an idea of average past return, in the case of multi-period average return, this method of calculation can be misleading. Let’s see why this is the case.

Imagine you have only 2 periods and you want to calculate the average return of an investment per period:

Returns over time
Period Investment value (at the end of the period) Simple period Return (R)
0 $100 NA
1 $50 -0.50
2 $75 +0.50

Calculating the average simple return of this investment:

\bar{R}=\frac{-0.5+0.5}{2}=0%

Then, the simple average return gives me 0%, while I end up with $75, losing 25% of my initial investment ($100) over the first 2 periods. If I lost 25% of my initial investment over 2 periods, then the average mean return per period might a midpoint between 0 and 25%. The accurate mean return of an investment over time (multi-periods) is the “Geometric Mean” return.

The total return of the investment in the whole period -also called the holding-period return (HPR)- can be calculated as:

HPR=\left(1+R_{1}\right)\left(1+R_{2}\right)...\left(1+R_{N}\right)-1

Using the example, the HPR for this investment is:

HPR=\left(1-0.50\right)\left(1+0.50\right)-1=0.75 - 1 = -0.25

And the formula for the geometric average of returns will be:

\bar{R_{g}}=\sqrt{\left(1+R_{1}\right)\left(1+R_{2}\right)...\left(1+R_{N}\right)}-1

Caculating the geometric average for this investment:

\bar{R_{g}}=\sqrt{\left(1-0.5\right)\left(1+0.5\right)}-1= -0.13397

Then, the right average return per year is about -13.4% and the HPR for the 2 years is -25%.

However, if we use continuosuly compounded returns (r) instead of simple returns (R), then the arithmetic mean of r is an accurate measure that can be converted to simple returns to get the geometric mean, which is the accurate mean return. Let’s do the same example using continuously compounded returns:

Continuously compounded returns
Period Investment value (at the end) Continuously compounded return (r)
0 $100 NA
1 $50 =log(50)-log(100)=-0.6931
2 $75 =log(75)-log(50)=+0.4054

In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models.

One way to calculate cc returns is by subtracting the natural log of the current adjusted price (at t) minus the natural log of the previous adjusted price (at t-1):

r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1})

This is also called as the difference of the log of the price.

We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):

r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right)

cc returns are usually represented by small r, while simple returns are represented by capital R.

But why we use natural logarithm to calculate cc returns? First we need to remember what is a natural logarithm.

2.2 Reviewing the concept of natural logarithm

Generate a new dataset with the natural logarithm (log) of the indexes:

lnprices = log(adjprices)

Now do a time plot for the natural log price of the MXX:

plot(adjprices, main = "Stock prices over time")

addLegend("topleft",legend.names=tickers,lty=c(1,1),lwd=c(2,1),col=c("black","red"))

plot(lnprices, main = "Log of the Stock prices over time")

addLegend("topleft",legend.names=tickers,lty=c(1,1),lwd=c(2,1),col=c("black","red"))

What is a natural logarithm?

The natural logarithm of a number is the exponent that the number e (=2.71…) needs to be raised to get another number. For example, let’s name x=natural logarithm of a stock price p. Then:

e^x = p The way to get the value of x that satisfies this equality is actually getting the natural log of p:

x = log_e(p) Then, we have to remember that the natural logarithm is actually an exponent that you need to raise the number e to get a specific number.

The natural log is the logarithm of base e (=2.71…). The number e is an irrational number (it cannot be expressed as a division of 2 natural numbers), and it is also called the Euler constant. Leonard Euler (1707-1783) took the idea of the logarithm from the great mathematician Jacob Bernoulli, and discovered very astonishing features of the e number. Euler is considered the most productive mathematician of all times. Some historians believe that Jacob Bernoulli discovered the number e around 1690 when he was playing with calculations to know how an amount of money grows over time with an interest rate.

How e is related to the grow of financial amounts over time?

Here is a simple example:

If I invest $100.00 with an annual interest rate of 50%, then the end balance of my investment at the end of the first year (at the beginning of year 2) will be:

I_2=100*(1+0.50)^1

If the interest rate is 100%, then I would get:

I_2=100*(1+1)^1=200 Then, the general formula to get the final amount of my investment at the beginning of year 2, for any interest rate R can be:

I_2=I_1*(1+R)^1 The (1+R) is the growth factor of my investment.

In Finance, the investment amount is called principal. If the interests are calculated (compounded) each month instead of each year, then I would end up with a higher amount at the end of the year.

Monthly compounding means that a monthly interest rate is applied to the amount to get the interest of the month, and then the interest of the month is added to the investment (principal). Then, for month 2 the principal will be higher than the initial investment. At the end of month 2 the interest will be calculated using the updated principal amount. Putting in simple math terms, the final balance of an investment at the beginning of year 2 when doing monthly compounding will be:

I_2=I_1*\left(1+\frac{R}{N}\right)^{1*N}

For monthly compounding, N=12, so the monthly interest rate is equal to the annual interest rate R divided by N (R/N). Then, with an annual rate of 100% and monthly compounding (N=12):

I_2=100*\left(1+\frac{1}{12}\right)^{1*12}=100*(2.613..)

In this case, the growth factor is (1+1/12)^{12}, which is equal to 2.613.

Instead of compounding each month, if the compounding is every moment, then we are doing a continuously compounded rate.

If we do a continuously compounding for the previous example, then the growth factor for one year becomes the astonishing Euler constant e:

Let’s do an example for a compounding of each second (1 year has 31,536,000 seconds). The investment at the end of the year 1 (or at the beginning of year 2) will be:

I_2=100*\left(1+\frac{1}{31536000}\right)^{1*31536000}=100*(2.718282..)\cong100*e^1

Now we see that e^1 is the GROWTH FACTOR after 1 year if we do the compounding of the interests every moment!

We can generalize to any other annual interest rate R, so that e^R is the growth factor for an annual nominal rate R when the interests are compounded every moment.

When compounding every instant, we use small r instead of R for the interest rate. Then, the growth factor will be: e^r

Then we can do a relationship between this growth rate and an effective equivalent rate:

\left(1+EffectiveRate\right)=e^{r}

If we apply the natural logarithm to both sides of the equation:

ln\left(1+EffectiveRate\right)=ln\left(e^r\right)

Since the natural logarithm function is the inverse of the exponential function, then:

ln\left(1+EffectiveRate\right)=r In the previous example with a nominal rate of 100%, when doing a continuously compounding, then the effective rate will be:

\left(1+EffectiveRate\right)=e^{r}=2.7182

EffectiveRate=e^{r}-1 Doing the calculation of the effective rate for this example:

EffectiveRate=e^{1}-1 = 2.7182.. - 1 = 1.7182 = 171.82\%

Then, when compounding every moment, starting with a nominal rate of 100% annual interest rate, the actual effective annual rate would be 171.82%!

2.3 Return calculation

We have historical monthly adjusted prices for each stock. Since we have a dataset with the adjusted prices for 2 stocks, we can easily calculate returns for both stocks:

# We calculate the simple returns for both stocks 
R = adjprices / lag(adjprices) - 1 
# We calculate the cc returns:
r = log(adjprices) - log(lag(adjprices))
# also, we can use the diff function to do the same:
r2 = diff(log(adjprices))

We calculated cc returns using two formulas and we got exactly the same result. The diff function gets the difference between each log price minus its own log price of the previous period (in this case, previous month).

We created 3 datasets: R, r and r2. r and r2 are identical. These datasets have missing values (NA values) in the first month since we cannot calculate the returns for the first month. We can delete this row using the na.omit function:

R = na.omit(R)
r = na.omit(r)

Now visualize the monthly prices and returns of the S&P500 over time

plot(adjprices)

addLegend("topleft",legend.names=tickers,lty=c(1,1),lwd=c(2,1),col=c("black","red"))

plot(R)

addLegend("topleft",legend.names=tickers,lty=c(1,1),lwd=c(2,1),col=c("black","red"))

3 Descriptive statistics for Finance

Descriptive statistics are mainly used to summarize information about any phenomenon. In Finance, we are interested in knowing the average past return on an investment or how much investment returns usually move up and down over time. We can use descriptive statistics to a) learn about past return and risk of investments, and 2) make inferences about average return and risk for estimating expected values for the future.

The most important descriptive statistics are measures of central tendency and measures of dispersion. Another important measure is the calculation of return over several periods, usually called “Buy-and-Hold” return.

3.0.1 Mean, standard deviation and variance of cc returns

Calculate the mean, standard deviation and variance of continuously compounded (cc) monthly returns:

library(PerformanceAnalytics)
table.Stats(r)
                   NVDA    MSFT
Observations    61.0000 61.0000
NAs              0.0000  0.0000
Minimum         -0.3859 -0.1136
Quartile 1      -0.0237 -0.0206
Median           0.0678  0.0307
Arithmetic Mean  0.0508  0.0234
Geometric Mean   0.0402  0.0216
Quartile 3       0.1439  0.0665
Maximum          0.3100  0.1624
SE Mean          0.0184  0.0079
LCL Mean (0.95)  0.0140  0.0076
UCL Mean (0.95)  0.0877  0.0392
Variance         0.0207  0.0038
Stdev            0.1439  0.0617
Skewness        -0.6964 -0.0958
Kurtosis         0.4241 -0.4573

The table.Stats function belongs to the PerformanceAnalytics package.

You can also get these descriptive statistics for a stock:

mean_r_nvda = mean(r$NVDA)
sd_r_nvda = sd(r$NVDA)
var_r_nvda <- var(r$NVDA, na.rm=TRUE) # variance
# Note that the na.rm argument is set to TRUE. This means that NA values will be removed.
# The variables are kept in the environment, so we have to print them to see them in console.
cat("Mean return of NVIDIA =", mean_r_nvda, "\n")
Mean return of NVIDIA = 0.05081092 
cat("Standard deviation of NVIDIA returns = ", sd_r_nvda, "\n")
Standard deviation of NVIDIA returns =  0.1438708 
cat("Variance of NVIDIA returns = ", var_r_nvda, "\n")
Variance of NVIDIA returns =  0.02069881 

3.0.2 Q Mean, standard deviation and variance of simple returns

Calculate the mean, standard deviation and variance of simple monthly returns for Starbucks:

table.Stats(R)
                   NVDA    MSFT
Observations    61.0000 61.0000
NAs              0.0000  0.0000
Minimum         -0.3202 -0.1074
Quartile 1      -0.0234 -0.0204
Median           0.0701  0.0312
Arithmetic Mean  0.0625  0.0256
Geometric Mean   0.0521  0.0237
Quartile 3       0.1548  0.0688
Maximum          0.3634  0.1763
SE Mean          0.0187  0.0081
LCL Mean (0.95)  0.0251  0.0094
UCL Mean (0.95)  0.1000  0.0418
Variance         0.0214  0.0040
Stdev            0.1462  0.0631
Skewness        -0.3030  0.0463
Kurtosis        -0.1685 -0.4217
print("Mean, standard deviation and variance of Nvidia simple returns:")
[1] "Mean, standard deviation and variance of Nvidia simple returns:"
mean(R$NVDA, na.rm=TRUE)
[1] 0.06253886
sd(R$NVDA, na.rm=TRUE)
[1] 0.1461937
var(R$NVDA, na.rm=TRUE)
          NVDA
NVDA 0.0213726

QUESTION: DO YOU SEE A DIFFERENCE BETWEEN THE SIMPLE AND CONTINUOUSLY COMPOUNDED RETURNS? BRIEFLY EXPLAIN.

4 Q The Histrogram

You have to remember what is a histogram. Read the Note Basics of Statistics for Finance.

4.1 Histogram using historical data

Do a histogram of Nvidia cc returns:

hist(r$NVDA, breaks=16,main="Histogram of NVIDIA monthly returns", 
     xlab="Continuously Compounded returns", col="dark green")

QUESTIONS:

A) INTERPRET THIS HISTOGRAM WITH YOUR OWN WORDS

** I CAN SEE THAT THERE ARE MORE POSITIVE RETURNS THAN NEGATIVE RETURNS. THIS IS NOT QUITE USUAL IN RETURNS OF STOCKS. IT MAY BE THAT NVIDIA PRICE HAS BEEN GROWING FOR A LONG TIME. **

** HOWEVER, LOOKING AT THE LEFT, THERE IS ONE MONTH THAT NVIDIA LOST ABOUT 40% OF ITS VALUE, AND OTHER 3 MONTHS THAT LOST MORE THAN 20%!**

** LOOKING TO THE RIGHT, I CAN SEE THAT THERE ARE ABOUT 8 MONTHS THAT NVIDIA OFFERED MORE THAN 20% OF RETURN! **

** THE MEAN RETURN LOOKS BETWEEN 5% AND 10% MONTHLY! THAT IS VERY GOOD FOR A STOCK. IF YOU CONVERT TO ANNUAL RATE, 5% IS ABOUT 60% ANNUAL! **

B) HOW MEAN AND STANDARD DEVIATION IS RELATED WITH THE HISTOGRAM?

FOR A NORMAL DISTRIBUTED VARIABLE, THE MID POINT OF THE HISTOGRAM IS WHERE ARITHMETIC MEAN IS LOCATED, AND ALSO THE MEDIAN. FOR NOT-NORMAL DISTRIBUTED VARIABLE, THE MID POINT WILL BE ONLY THE MEDIAN.

I CAN APPRECIATE THE STANDARD DEVIATION BY LOOKING OF HOW SPREAD IS THE HISTOGRAM TO THE LEFT AND RIGHT FROM THE MID-POINT. FOR A NORMAL-DISTRIBUTED VARIABLE

4.2 Q Histogram using simulated data for returns

With the real mean, and standard deviation of monthly cc returns of Starbucks, create (simulate) a random variable with that mean and standard deviation for the same time period. Use the rnorm function for this:

rNVDA_sim <- rnorm(n=nrow(r), mean = mean_r_nvda, sd=sd_r_nvda)
# We will use the same number of observations as returns.df
# The nrow function gets the number of rows of an R object

Do a histogram of the simulated returns :

# First, omit NA's. This will make your analysis more accurate 
    # and coding easier since many functions throw errors while working with NA's
rNVDA <- na.omit(r$NVDA)

# Calculate the histograms and store their information in variables (don't plot yet)
hist_sim_NVDA<- hist(rNVDA_sim,breaks=16,plot = FALSE)
hist_NVDA <- hist(rNVDA,breaks=16,plot = FALSE)

# Calculate the range of the graph
xlim <- range(hist_NVDA$breaks,hist_sim_NVDA$breaks)
ylim <- range(0,hist_NVDA$density,
              hist_sim_NVDA$density)

# Plot the first histogram
plot(hist_sim_NVDA,xlim = xlim, ylim = ylim,
     col = rgb(1,0,0,0.4),xlab = 'Lengths',
     freq = FALSE, ## relative, not absolute frequency
     main = 'Distribution of simulated and real Starbucks Returns')

# Plot the second histogram on top of the 1st one
opar <- par(new = FALSE)
plot(hist_NVDA,xlim = xlim, ylim = ylim,
     xaxt = 'n', yaxt = 'n', ## don't add axes
     col = rgb(0,0,1,0.4), add = TRUE,
     freq = FALSE) ## relative, not absolute frequency

# Add a legend in the corner
legend('topleft',c('Simulated Returns','Real Returns'),
       fill = rgb(1:0,0,0:1,0.4), bty = 'n')

par(opar)

As you can see, the peach color represents the normally simulated returns, while the light purple bars represent the real returns of Starbucks. The dark purple color appears when both real and simulated returns meet.

QUESTIONS:

A) WHAT DIFFERENCE DO YOU SEE IN THE HISTOGRAMS? HOW REAL RETURNS ARE DIFFERENT FROM THE THEORETICAL NORMAL DISTRIBUTION OF RETURNS? BRIEFLY EXPLAIN.

R: I SEE THAT REAL RETURNS HAVE MORE EXTREME VALUES. IN OTHER WORDS, REAL RETURNS HAVE MORE NEGATIVE RETURNS AND MORE POSITIVE RETURNS COMPARED TO THE SIMULATED RETURNS THAT FOLLOW A NORMAL DISTRIBUTION. REMEMBER THAT THE HISTOGRAM SHOWS HOW OFTEN SPECIFIC RANGES OF VALUES OF THE VARIABLE APPEAR IN THE DATA.

B) Assuming that the monthly returns of Nvidia follow a normal distribution, WHAT WOULD BE THE 95% CONFIDENCE INTERVAL? WHAT IS THE INTERPRETATION OF THIS INTERVAL? EXPLAIN.

R: SINCE THE STANDARD DEVIATION OF NVIDIA RETURNS (CC RETURNS) IS ABOUT 0.14 (14%), THIS MEANS THAT IF YOU MOVE FROM THE MEAN 2 STANDARD DEVIATIONS TO THE LEFT AND 2 STANDARD DEVIATION TO THE RIGHT, YOU WILL BE COVERING ABOUT 95% OF THE MONTHS. THEN, IN THIS CASE THE MEAN RETURN IS 0.0485 (4.85%). THEN, THE 95% CONFIDENCE INTERVAL WILL BE ABOUT:

[ (0.0485 - 2(0.14) .. (0.0485 + 2(0.14)]

[ - 0.2315 .. + 0.3285]

[ -23.15% .. +32.85% ]

IT IS A VERY WIDE RANGE! THIS MEANS THAT NVIDIA IN THE FUTURE, ABOUT 95% OF THE MONTHS WILL OFFER A RETURN BETWEEN -23.15% MONTHLY TO +32.85% MONTHLY!