Abstract
This is a solution for Workshop 1. Not all the workshop will be displayed; only the sections were students needed to work on an exercise or respond questions.A financial simple return for a stock (\(R_{t}\)) is calculated as a percentage change of price from the previous period (t-1) to the present period (t):
\[ R_{t}=\frac{\left(Adjprice_{t}-Adjprice_{t-1}\right)}{Adjprice_{t-1}}=\frac{Adjprice_{t}}{Adjprice_{t-1}}-1 \] For example, if the adjusted price of a stock at the end of January 2021 was $100.00, and its previous (December 2020) adjusted price was $80.00, then the monthly simple return of the stock in January 2021 will be:
\[ R_{Jan2021}=\frac{Adprice_{Jan2021}}{Adprice_{Dec2020}}-1=\frac{100}{80}-1=0.25 \]
We can use returns in decimal or in percentage (multiplying by 100). We will keep using decimals.
In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models. cc returns are also called log returns.
One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):
\[ r_{t}=log(Adjprice_{t})-log(Adjprice_{t-1}) \] This is also called as the difference of the log of the price.
We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):
\[ r_{t}=log\left(\frac{Adjprice_{t}}{Adjprice_{t-1}}\right) \]
cc returns are usually represented by small r, while simple returns are represented by capital R.
Start by clearing your environment (this will erase all the variables and objects that are in your environment):
rm(list = ls())
Use the getSymbols command to download monthly data from Yahoo Finance for Starbucks from January 2008 to date. Type the following command:
library(quantmod)
getSymbols(Symbols = "SBUX", from="2008-01-01",
to="2021-08-17", periodicity="monthly", src = "yahoo")
## [1] "SBUX"
Calculate continuously compounded (cc) monthly returns
<-diff(log(Ad(SBUX)))
returns.zoo <- as.data.frame(diff(log(Ad(SBUX))))
returns.df # change the name of the column in ccreturns.df
colnames(returns.df) <- "r_SBUX"
A data frame is a basic object in R. It is a data structure of R that stores tabular data (rows and columns). Data frames look like matrices but data frames can store different types of objects in different columns. On the other hand, matrices can store only one kind of data.
Calculate the mean, standard deviation and variance of continuously compounded (cc) monthly returns using the summary function:
summary(returns.df)
## r_SBUX
## Min. :-0.38548
## 1st Qu.:-0.02651
## Median : 0.01956
## Mean : 0.01661
## 3rd Qu.: 0.05766
## Max. : 0.26354
## NA's :1
summary(returns.zoo)
## Index SBUX.Adjusted
## Min. :2008-01-01 Min. :-0.38548
## 1st Qu.:2011-05-24 1st Qu.:-0.02651
## Median :2014-10-16 Median : 0.01956
## Mean :2014-10-16 Mean : 0.01661
## 3rd Qu.:2018-03-08 3rd Qu.: 0.05766
## Max. :2021-08-01 Max. : 0.26354
## NA's :1
As you can see, summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package. You can load this package into your R memory:
library(PerformanceAnalytics)
Now we use the table.Stats function from this package to estimate basic descriptive statistics of returns:
table.Stats(returns.df$r_SBUX)
##
## Observations 163.0000
## NAs 1.0000
## Minimum -0.3855
## Quartile 1 -0.0265
## Median 0.0196
## Arithmetic Mean 0.0166
## Geometric Mean 0.0136
## Quartile 3 0.0577
## Maximum 0.2635
## SE Mean 0.0060
## LCL Mean (0.95) 0.0048
## UCL Mean (0.95) 0.0284
## Variance 0.0058
## Stdev 0.0761
## Skewness -0.6924
## Kurtosis 4.6967
This function will show several statistical measures and indicators that may be useful. You can also try to obtain the specific measures you were asked by using the following functions:
<- mean(returns.df$r_SBUX, na.rm=TRUE) # arithmetic mean
mean_r_SBUX <- sd(returns.df$r_SBUX, na.rm=TRUE) # standard deviation
sd_r_SBUX <- var(returns.df$r_SBUX, na.rm=TRUE) # variance
var_r_SBUX # Note that the na.rm argument is set to TRUE. This means that NA values will be removed.
# The variables are kept in the environment, so we have to print them to see them in console.
cat("Mean =", mean_r_SBUX)
## Mean = 0.01661223
cat("Standard deviation = ", sd_r_SBUX)
## Standard deviation = 0.07607764
cat("Variance = ", var_r_SBUX)
## Variance = 0.005787807
Calculate the mean, standard deviation and variance of simple monthly returns for Starbucks:
# First, calculate simple returns as before
$R_SBUX <- SBUX$SBUX.Adjusted / lag(SBUX$SBUX.Adjusted, n=1) - 1
returns.df
# The, apply the previous functions
table.Stats(returns.df$R_SBUX)
##
## Observations 163.0000
## NAs 1.0000
## Minimum -0.3199
## Quartile 1 -0.0262
## Median 0.0197
## Arithmetic Mean 0.0196
## Geometric Mean 0.0168
## Quartile 3 0.0594
## Maximum 0.3015
## SE Mean 0.0060
## LCL Mean (0.95) 0.0079
## UCL Mean (0.95) 0.0314
## Variance 0.0058
## Stdev 0.0762
## Skewness -0.0497
## Kurtosis 3.0995
mean(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.01963446
sd(returns.df$R_SBUX, na.rm=TRUE)
## [1] 0.07617523
var(returns.df$R_SBUX, na.rm=TRUE)
## [,1]
## [1,] 0.005802666
DO YOU SEE A DIFFERENCE BETWEEN THE SIMPLE AND CONTINUOUSLY COMPOUNDED RETURNS? BRIEFLY EXPLAIN.
LET’S SEE THE DIFFERENCE OF MEAN RETURNS BETWEEN THE CC RETURNS AND SIMPLE RETURNS:
= mean(returns.df$R_SBUX, na.rm=TRUE)
mean_R_SBUX cat("Mean of simple returns: ", mean_R_SBUX,"\n")
## Mean of simple returns: 0.01963446
= mean(returns.df$r_SBUX, na.rm=TRUE)
mean_r_SBUX cat("Mean cc return: ", mean_r_SBUX)
## Mean cc return: 0.01661223
R: WE CAN SEE THAT THE MEAN OF SIMPLE MONTHLY RETURNS IS MUCH HIGHER THAN THE MEAN OF CC RETURNS (1.9634458 % VS 1.661223%). POSITIVE SIMPLE RETURNS ARE ALWAYS HIGHER THAN CC RETURNS, BUT NEGATIVE SIMPLE RETURNS ARE LESS NEGATIVE THAN CC RETURNS.
IN THE RANGE BETWEEN -5% TO +5%, THE DIFFERENCE OF SIMPLE VS CC RETURNS IS NOT QUITE MUCH, BUT OUT OF THESE RANGES, YOU CAN SEE BIG DIFFERENCES. THIS IS TRUE SINCE THE RELATIONSHIP BETWEEN SIMPLE AND CC RETURNS IN EXPONENTIAL. SIMPLE RETURNS IS EQUAL TO THE EXPONENTIAL OF CC RETURNS MINUS ONE.
You have to remember what is a histogram. Read the Note Basics of Statistics for Finance.
Do a histogram of Starbuck cc returns:
hist(returns.df$r_SBUX, main="Histogram of SBUX monthly returns",
xlab="Continuously Compounded returns", col="dark green")
INTERPRET THIS HISTOGRAM WITH YOUR OWN WORDS
R: WE CAN SEE THAT THE MOST FREQUENT RETURNS OF STARBUCKS ARE BETWEEN -10% TO +10%. AROUND 80 MONTHS IN THE HISTORY FROM 2008, STARBUCKS HAS OFFERED MONTHLY RETURNS FROM 0 TO 10%. BUT AROUND 57 MONTHS STARBUCKS HAS OFFERED NEGATIVE RETURNS BETWEEN -10% AND 0%.
IN ADDITION, WE CAN SEE THAT STARBUCKS HAS HAD VERY BAD MONTHS AND VERY GOOD MONTHS IN TERMS OF RETURNS. IF WE LOOK TO THE LEFT OF THE HISTOGRAM, WE CAN SEE THAT A FEW MONTHS (AROUND 2), STARBUCKS HAS OFFERED FROM -40% TO -30% MONTHLY RETURNS!. HOWEVER, LOOKING AT THE RIGHT OF THE HISTOGRAM, WE CAN SEE THAT AROUND 4 MONTHS, STARBUCKS HAS OFFERED VERY GOOD RETURNS FROM 20% TO 30%.
IT SEEMS THAT THERE HAS BEEN A LITTLE BIT MORE THAN HALF OF THE MONTHS WITH POSITIVE RETURNS COMPARED TO THE MONTHS WITH NEGATIVE RETURNS, SO THE AVERAGE MONTHLY RETURN SHOULD BE POSITIVE
HOW MEAN AND STANDARD DEVIATION IS RELATED WITH THE HISTOGRAM?
R: STANDARD DEVIATION IS A MEASURE OF DISPERSION OF VALUES OF A VARIABLE. STANDARD DEVIATION CAN BE CONSIDERED AS THE AVERAGE OF ALL ABSOLUTE DEVIATIONS FROM THE MEAN. LOOKING AT THE HISTOGRAM, IF THE BARS THAT REPRESENT FREQUENCIES ARE MORE SPREAD OUT IN THE RANGES OF THE VARIABLE, AND IF THE SCALE OF THE X AXIS IS WIDE, THEN WE CAN CONCLUDE THAT STANDARD DEVIATION IS HIGH. THEN, THE MORE TIGHT THE BARS OF THE HISTOGRAM TO THE AVERAGE POINT, THE LESS THE STANDARD DEVIATION.
With the real mean, and standard deviation of monthly cc returns of Starbucks, create (simulate) a random variable with that mean and standard deviation for the same time period. Use the rnorm function for this:
<- rnorm(n=nrow(returns.df)-1, mean = mean_r_SBUX, sd=sd_r_SBUX)
rSBUX_sim # We will use the same number of observations as returns.df
# The nrow function gets the number of rows of an R object
Do a histogram of the simulated returns :
# First, omit NA's. This will make your analysis more accurate
# and coding easier since many functions throw errors while working with NA's
<- na.omit(returns.df$r_SBUX)
rSBUX
# Calculate the histograms and store their information in variables (don't plot yet)
<- hist(rSBUX_sim,plot = FALSE,
hist_sim_SBUXbreaks=seq(-0.4,0.3,by=0.05))
<- hist(rSBUX,plot = FALSE,
hist_SBUX breaks=seq(-0.4,0.3,by=0.05))
# Calculate the range of the graph
<- range(hist_SBUX$breaks,hist_sim_SBUX$breaks)
xlim <- range(0,hist_SBUX$counts,
ylim $counts)
hist_sim_SBUX
# Plot the first histogram
plot(hist_sim_SBUX,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
#freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of simulated and real Starbucks Returns')
# Plot the second histogram on top of the 1st one
<- par(new = FALSE)
opar plot(hist_SBUX,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE )
#freq = FALSE) ## relative, not absolute frequency
# Add a legend in the corner
legend('topleft',c('Simulated Returns','Real Returns'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n')
par(opar)
As you can see, the peach color represents the normally simulated returns, while the light purple bars represent the real returns of Starbucks. The dark purple color appears when both real and simulated returns meet.
WHAT DIFFERENCE DO YOU SEE IN THE HISTOGRAMS? HOW REAL RETURNS ARE DIFFERENT FROM THE THEORETICAL NORMAL DISTRIBUTION OF RETURNS? BRIEFLY EXPLAIN.
R: WE CAN SEE THAT SIMULATED RETURNS HAVE MUCH LESS EXTREME NEGATIVE VALUES COMPARED WITH THE REAL STARBUCKS MONTHLY RETURNS. THIS IS USUALLY THE CASE FOR FINANCIAL RETURNS OF STOCKS. FINANCIAL RETURNS USUALLY HAVE MORE FREQUENT EXTREME VALUES, BOTH TO THE LEFT (NEGATIVE RETURNS) AND TO THE RIGHT (POSITIVE RETURNS). THIS PATTERN IN THE HISTOGRAM OR DISTRIBUTION OF VALUES IS CALLED A FAT-TAILED DISTRIBUTION, WHICH IS CLOSE TO A NORMAL DISTRIBUTION BUT WITH MORE EXTREME VALUES.
Assuming that the monthly returns of Starbucks follow a normal distribution, WHAT WOULD BE THE 95% CONFIDENCE INTERVAL? WHAT IS THE INTERPRETATION OF THIS INTERVAL? EXPLAIN.
R: IF WE ASSUME THAT STARBUCKS RETURNS FOLLOWS A PERFECT NORMAL DISTRIBUTION, WE COULD CALCULATE THE 95% CONFIDENCE INTERVALE JUST BY ADDING AND SUBTRACTING 1.96 STANDARD DEVIATIONS FROM THE MEAN RETURN. IN THIS CASE, WE CAN DO THIS ROUGH ESTIMATE AS FOLLOWS:
=mean_r_SBUX - 1.96*sd_r_SBUX
min_95CI = mean_r_SBUX + 1.96*sd_r_SBUX
max_95CI
cat("The 95% confidence intervale of Starbucks returns would be: [",min_95CI,"..",max_95CI,"]")
## The 95% confidence intervale of Starbucks returns would be: [ -0.1324999 .. 0.1657244 ]
THEN WE CAN SAY THAT IF STARBUCKS BEHAVE LIKE A NORMAL DISTRIBUTED VARIABLE IN THE FUTURE, 95% OF THE TIME IT WILL OFFER RETURNS BETWEE -13.2499944 % AND 16.5724404%.
WE CAN ALSO USE A T-STUDENT DISTRIBUTION INSTEAD OF A NORMAL DISTRIBUTION TO CALCULATE THE 95% CONFIDENCE INTERVALE. MOST OF THE STATISTICAL SOFTWARE USE T-STUDENT DISTRIBUTION INSTEAD OF NORMAL DISTRIBUTION SINCE T-STUDENT IS MORE CLOSE TO REAL VALUES OF VARIABLES FOR SMALL OR BIG SAMPLES. ACTUALLY, T-STUDENT DISTRIBUTION HAS A LITTLE BIT MORE EXTREME VALUES COMPARED TO THE NORMAL DISTRIBUTION. HOWEVER, WHEN THE SAMPLE IS BIGGER THAN 30 OBSERVATIONS, THE T-STUDENT DISTRIBUTION TENDS TO BE ALMOST THE SAME AS THE NORMAL DISTRIBUTION. WHEN N TENDS TO INFINITE, THE T-STUDENT DISTRIBUTION BECOMES EXACTLY THE NORMAL DISTRIBUTION
FOR THE T-DISTRIBUTION, THE CRITICAL VALUE TO GET THE 95% CONFIDENCEN INTERVALE IS NOT EXACTLY 1.96. THIS CRITICAL VALUE DEPENDS ON THE “DEGREES OF FREEDOM”, WHICH IS USUALLY THE NUMBER OF OBSERVATION MINUS THE NUMBER OF VARIABLES TO BE ANALYZED. WE CAN USE TABLES FOR THE T-DISTRIBUTION, OR USE AN R FUNCTION TO GET THIS VALUE. IN R, WE CAN USE THE FUNCTION qt TO GET THE CRITICAL T VALUE:
<- abs(qt(0.025,nrow(returns.df)-1) )
t_critical_value t_critical_value
## [1] 1.974625
NOW WE USE THIS CRITICAL VALUE INSTEAD OF 1.96 TO ESTIMATE THE 95% CONFIDENCE INTERVALE FOR STARBUCKS RETURNS:
IN THIS CASE, qt FUNCTION RETURNS THE CRITICAL T VALUE FOR A PROBABILITY OF 2.5%. THE T-VALUE IS THE # OF STANDARD DEVIATIONS OF THE VARIABLE THAT YOU HAVE TO GO TO THE LEFT TO COVER 2.5% FROM THER TO MINUS INFINITE. THE abs FUNCTION JUST GET THE ABSOLUTE VALUE SINCE USUALLY THE RESULT OF THE qt FOR PROBABILITIES LESS THAN 50% WILL BE NEGATIVE.
= mean_r_SBUX - t_critical_value * sd_r_SBUX
min_95CI = mean_r_SBUX + t_critical_value * sd_r_SBUX
max_95CI
cat("The 95% confidence intervale of Starbucks returns would be: [",min_95CI,"..",max_95CI,"]")
## The 95% confidence intervale of Starbucks returns would be: [ -0.1336126 .. 0.166837 ]
THEN WE CAN SAY THAT IF STARBUCKS BEHAVE LIKE A T-STUDENT DISTRIBUTION IN THE FUTURE, 95% OF THE TIME IT WILL OFFER RETURNS BETWEE -13.3612551 % AND 16.6837011%.