Project Objectives

Volatility is a statistical measure of the dispersion of returns for a given security or market-index. Commonly, the higher the volatility, the riskier the security. The volatility itself is affected by multiple factors such as “Beta”, “Standard Deviation”, “Earnings”, “Balance Sheet”, “Analyst Recommendations” and “Social Sentiment”. We will start with the concept of Beta as a measure of volatility of a stock relative to the benchmark (like S&P 500) and analyze how it affects the returns of a stock relative to the benchmark. In particular, we will run a simple case study by choosing three stocks with widely differing returns relative to the benchmark (S&P 500) to emphasize the effect of beta on such classes of stocks. We will then generalize this a bit more, by comparing Beta (risk) for a universe of stocks against their respective returns for a chosen period and more important, we want to understand if a portfolio of low-volatility stocks earn returns different from a portfolio of high-volatility stocks. This is one of the fundamental questions that this research ponders upon by providing a statistical insight into the results. We will then repeat the analysis using Standard Deviation as the volatility measurement. The way we present this part of the material is by running a multiple regression model on stock’s return using beta and standard deviation as predictors. While the main focus of this research paper is on stock’s beta and standard deviation, we will also introduce other significant measures of volatility such as Twitter/Stocktwits feeds that is gaining momentum since the advent of Social Media Revolution. This is primarily done to make the investigation complete. .

## put your code here
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
opts_chunk$set(cache = TRUE, message = FALSE)
library(readr)
library(ggplot2)
theme_set(theme_bw(base_size = 16))
library(gganimate)
library(broom)

Load beta of a universe of stocks

url <- "https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/2500%20Stocks%20Financial%20Data.csv"
financial_data <- read_csv(url)

# data wrangling - rename columns etc
names(financial_data)[names(financial_data) == 'Beta (5 Yr Annualized)'] <- 'Beta_5Year'
names(financial_data)[names(financial_data) == 'Beta (1 Year Annualized)'] <- 'Beta_1Year'
names(financial_data)[names(financial_data) == 'Standard Deviation (5 Yr Annualized)'] <- 'SD_5Year'
names(financial_data)[names(financial_data) == 'P/E (Next Year\'s Estimate)'] <- 'PE_NextYear'
names(financial_data)[names(financial_data) == 'Total Return (5 Yr Annualized)'] <- 'Annualized_Return_5Year'
names(financial_data)[names(financial_data) == 'Sector/Industry'] <- 'Sector'
names(financial_data)[names(financial_data) == 'Security Type'] <- 'SecurityType'

select(financial_data, Symbol, SecurityType, Sector, Beta_5Year, Beta_1Year, SD_5Year, PE_NextYear, Annualized_Return_5Year) %>% head() %>% kable

Symbol	SecurityType	Sector	Beta_5Year	Beta_1Year	SD_5Year	PE_NextYear	Annualized_Return_5Year
TALN	CS	Consumer Discretionary	0.0066691	0.7587322	2.21670	15.00000	10.75663
ED	CS	Utilities	0.0173725	0.2749312	0.15303	15.28586	9.82558
IIJI	DR	Information Technology	0.0184750	1.1182022	0.40247	22.30952	10.56512
NEM	CS	Materials	0.0252339	0.6348882	0.36968	21.13001	-19.88953
APPS	CS	Information Technology	0.0275968	-0.2365876	0.86178	19.10256	-2.60258
ITC	CS	Utilities	0.0302839	0.3943955	0.16944	16.26724	12.80138

ggplot(financial_data, aes(x=Beta_5Year)) + 
  geom_histogram(aes(y=..density..), bins=20, colour="black", fill="white") + 
  geom_density(alpha=0.2, fill="#FF6666") +
  geom_vline(aes(xintercept=median(Beta_5Year)), color="red", linetype="dashed", size=1) +
  ggtitle("Beta distribution")

# Plot beta group_by sector to understand patterns
sectors<-c('Utilities', 'Consumer Staples', 'Health Care', 'Information Technology', 'Energy', 'Industrials', 'Consumer Discretionary' , 'Materials', 'Financials')
filter(financial_data, Sector %in% sectors) %>%
  ggplot(aes(x=Beta_5Year)) + 
  geom_histogram(aes(y=..density..), bins=20, colour="black", fill="white") + 
  geom_density(alpha=0.2, fill="#FF6666") +
  facet_wrap(~Sector)

# Summarize beta per sector
sectors_beta<-financial_data %>% filter(Sector %in% sectors) %>%
  group_by(Sector) %>%
  summarize(Beta_Median=round(median(Beta_5Year), 3), 
            Beta_Mean=round(mean(Beta_5Year), 3), 
            Beta_SD=round(sd(Beta_5Year), 3),
            Returns_Mean=round(mean(Annualized_Return_5Year), 3),
            Returns_SD=round(sd(Annualized_Return_5Year), 3)) %>% 
  arrange(Beta_Median)

all_beta<-financial_data %>% 
  summarize(Beta_Median=round(median(Beta_5Year), 3), 
            Beta_Mean=round(mean(Beta_5Year), 3), 
            Beta_SD=round(sd(Beta_5Year), 3),
            Returns_Mean=round(mean(Annualized_Return_5Year), 3),
            Returns_SD=round(sd(Annualized_Return_5Year), 3))
all_beta$Sector<-"ALL"
all_beta<-all_beta[,c("Sector", "Beta_Median", "Beta_Mean", "Beta_SD", "Returns_Mean", "Returns_SD")]

sectors_beta<-rbind(sectors_beta, all_beta)
sectors_beta %>% kable

Sector	Beta_Median	Beta_Mean	Beta_SD	Returns_Mean	Returns_SD
Utilities	0.439	0.568	0.438	8.121	11.119
Consumer Staples	0.770	0.805	0.412	15.474	12.483
Financials	0.990	1.078	0.510	13.707	11.086
Health Care	1.011	1.084	0.480	18.709	16.902
Consumer Discretionary	1.194	1.278	0.583	12.253	17.247
Information Technology	1.284	1.361	0.569	8.020	16.070
Energy	1.300	1.365	0.577	-0.969	17.245
Industrials	1.313	1.373	0.565	10.186	15.397
Materials	1.420	1.386	0.556	3.334	20.459
ALL	1.145	1.201	0.575	10.578	16.064

# bar plot showing mean beta and stock-returns across sectors
ggplot(sectors_beta, aes(x=factor(Sector), y=Beta_Mean)) + 
  geom_bar(stat="identity", width=0.5, colour="black", fill="#FF6666", alpha=0.7) + 
  geom_bar(data=sectors_beta, aes(x=factor(Sector), y=Returns_Mean), stat="identity", width=0.5, colour="black", fill="red", alpha=0.2) + 
  xlab("Sector") +
  ylab("%") +
  scale_y_log10() +
  scale_x_discrete(labels = abbreviate) +
  ggtitle("Mean beta and returns across sectors")

# It is clear that beta for different industry sectors are different with utilities and consumer staples having very low beta values as expected. Utilities and consumer staples tend to fluctuate less with the overall market sentiment as they are basic necessities. On the other hand, industrials and materials are cyclical in nature and their beta tends to be very high. Financials have almost perfect correlation with S&P500 with beta almost 1.

# Also it is clear that returns are different for different beta. We will later see if there is a clear relation between returns and beta for a universe of 2500 stocks.

Data wranging with historical prices - SPY and 3 chosen stocks

SPY_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/SPY%20weekly%2010%20years.csv"
SPY_history<-read_csv(SPY_url)

ED_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/ED%20weekly%2010%20years.csv"
ED_history<-read_csv(ED_url)

ADSK_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/ADSK%20weekly%2010%20years.csv"
ADSK_history<-read_csv(ADSK_url)

IBM_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/IBM%20weekly%2010%20years.csv"
IBM_history<-read_csv(IBM_url)

# build a consolidated frame with weekly returns (%) for SPY/ED/ADSK/IBM
stocks<-data.frame(Date=SPY_history$Date, 
                   SPY.Close=SPY_history$Close, SPY.Volume=SPY_history$Volume, SPY.LastClose=SPY_history$Close, 
                   ED.Close=ED_history$Close, ED.Volume=ED_history$Volume, ED.LastClose=ED_history$Close, 
                   ADSK.Close=ADSK_history$Close, ADSK.Volume=ADSK_history$Volume, ADSK.LastClose=ADSK_history$Close,
                   IBM.Close=IBM_history$Close, IBM.Volume=IBM_history$Volume, IBM.LastClose=IBM_history$Close)

numRows<-nrow(SPY_history)

# populate last week return to facilitate calculating weekly % returns
stocks$SPY.LastClose[1:numRows-1]<-SPY_history$Close[2:numRows]
stocks$ED.LastClose[1:numRows-1]<-ED_history$Close[2:numRows]
stocks$ADSK.LastClose[1:numRows-1]<-ADSK_history$Close[2:numRows]
stocks$IBM.LastClose[1:numRows-1]<-IBM_history$Close[2:numRows]


# compuate weekly % returns
stocks<-mutate(stocks, SPY.Returns=round((SPY.Close-SPY.LastClose)*100/SPY.LastClose, 3),
               ED.Returns=round((ED.Close-ED.LastClose)*100/ED.LastClose, 3),
               ADSK.Returns=round((ADSK.Close-ADSK.LastClose)*100/ADSK.LastClose, 3),
               IBM.Returns=round((IBM.Close-IBM.LastClose)*100/IBM.LastClose, 3))


head(stocks, 2)

##        Date SPY.Close SPY.Volume SPY.LastClose ED.Close ED.Volume
## 1 4/18/2016    209.24   75277700        207.78    71.10   2222300
## 2 4/11/2016    207.78   83455600        204.50    75.85   1273700
##   ED.LastClose ADSK.Close ADSK.Volume ADSK.LastClose IBM.Close IBM.Volume
## 1        75.85      60.07     1365700          58.42    149.30    7887200
## 2        76.04      58.42     1607000          56.76    151.72    3428700
##   IBM.LastClose SPY.Returns ED.Returns ADSK.Returns IBM.Returns
## 1        151.72       0.703     -6.262        2.824      -1.595
## 2        149.35       1.604     -0.250        2.925       1.587

# Summarize weekly returns
SPX_Weekly_Returns<-summary(stocks$SPY.Returns)
ED_Weekly_Returns<-summary(stocks$ED.Returns)
ADSK_Weekly_Returns<-summary(stocks$ADSK.Returns)
IBM_Weekly_Returns<-summary(stocks$IBM.Returns)
Weekly_Returns<-rbind(SPX_Weekly_Returns, ED_Weekly_Returns)
Weekly_Returns<-rbind(Weekly_Returns, ADSK_Weekly_Returns)
Weekly_Returns<-rbind(Weekly_Returns, IBM_Weekly_Returns)
Weekly_Returns %>% kable

	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
SPX_Weekly_Returns	-19.79	-0.9742	0.2685	0.1236	1.492	13.290
ED_Weekly_Returns	-12.86	-1.1780	0.2165	0.1213	1.412	8.115
ADSK_Weekly_Returns	-26.42	-2.7590	0.3265	0.2244	3.256	20.020
IBM_Weekly_Returns	-15.17	-1.3650	0.2075	0.1633	1.777	14.470

Betacomputation: case Study of 3 stocks

# Correlation
ED_cor<-cor(stocks$ED.Returns, stocks$SPY.Returns)
ADSK_cor<-cor(stocks$ADSK.Returns, stocks$SPY.Returns)
IBM_cor<-cor(stocks$IBM.Returns, stocks$SPY.Returns)

# Beta is the slope of the regression line between SPY returns and the stock returns
# Use SPY returns to predict individual stock returns
ED_beta<-tidy(lm(stocks$ED.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>% 
  filter(term == "stocks$SPY.Returns")
ADSK_beta<-tidy(lm(stocks$ADSK.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>% 
  filter(term == "stocks$SPY.Returns")
IBM_beta<-tidy(lm(stocks$IBM.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>% 
  filter(term == "stocks$SPY.Returns")

# use kable to show beta estimates, p values, r-squared and CI values
ED_beta$term[1]<-"Beta (ED)"
ADSK_beta$term[1]<-"Beta (ADSK)"
IBM_beta$term[1]<-"Beta (IBM)"
beta_est<-union(IBM_beta, union(ED_beta, ADSK_beta))
beta_est<-mutate(beta_est, r=c(ED_cor, ADSK_cor, IBM_cor))
beta_est<-mutate(beta_est, r.squared_perc=round(r*r*100, 3))
beta_est<-mutate(beta_est, estimate=round(as.numeric(estimate), 2))
beta_est<-rbind(beta_est, c('Beta (SPY)', 1.0, 0, 0, 0, 1, 1, 0, 0))
beta_est %>% kable

term	estimate	std.error	statistic	p.value	conf.low	conf.high	r	r.squared_perc
Beta (IBM)	0.81	0.0386624990621484	21.0047585679583	2.19355547257144e-71	0.736142567805647	0.888050349063038	0.480272895860774	23.066
Beta (ADSK)	1.41	0.0697787761685322	20.2732056560925	8.9916796773494e-68	1.27755652691469	1.5517224324755	0.664426455822036	44.146
Beta (ED)	0.41	0.0331447254388387	12.4862254750735	1.77815348658691e-31	0.348738492197471	0.478966538080018	0.677502402383787	45.901
Beta (SPY)	1	0	0	0	1	1	0	0

# bar plot showing estimated beta and calculated returns
ggplot(beta_est, aes(x=factor(term), y=estimate)) + 
  geom_bar(stat="identity", width=0.5, colour="black", fill="#FF6666", alpha=0.3) + 
  xlab("Case Study") +
  ylab("Beta Estimate") +
  ggtitle("Case Study: Estimated Beta")

# plot using smoothing
ggplot_lm<-function(stock_col) {
  stocks %>% filter(SPY.Returns >= -10) %>% 
    ggplot(aes_string(x="SPY.Returns", y=stock_col)) + 
    geom_point() +
    geom_smooth(method="lm", se = TRUE) +
    ggtitle(paste(stock_col, " returns against SPY"))
}

ggplot_lm("ED.Returns")

ggplot_lm("ADSK.Returns")

ggplot_lm("IBM.Returns")

We now want to study if a portfolio of low-volatility stocks earn returns different from a portfolio of high-volatility stocks. We will do this using regression as well as simulation

Regression of stock returns using beta (predictor)

beta_cor<-cor(financial_data$Annualized_Return_5Year, financial_data$Beta_5Year)
beta_returns<-tidy(lm(Annualized_Return_5Year ~ Beta_5Year, financial_data), conf.int=TRUE) %>% 
  filter(term == "Beta_5Year")
beta_returns$r<-beta_cor
beta_returns$r.squared_perc<-beta_cor * beta_cor * 100

beta_returns %>% kable

term	estimate	std.error	statistic	p.value	conf.low	conf.high	r	r.squared_perc
Beta_5Year	-5.475609	0.5409108	-10.12294	0	-6.536275	-4.414944	-0.1958902	3.837296

financial_data %>%  
  ggplot(aes_string(x="Beta_5Year", y="Annualized_Return_5Year")) + 
  geom_point() +
  geom_smooth(method="lm", se = TRUE) +
  ggtitle("Beta vs Annualized_Returns (5 Year)")

# From the regression, it appears that annualized returns goes by about 5.47% for every one unit increase in beta.

MonteCarlo simulation

beta_sorted<-financial_data %>% arrange(Beta_5Year) %>% select(Symbol, Beta_5Year, Annualized_Return_5Year)

# Monte Carlo simulation parameters
set.seed(131)
num_bins<-5                     # distinct bins after sorting betas
num_stocks<-nrow(beta_sorted)   # total number of stocks in our universe
bin_size<-num_stocks/num_bins   # number of stocks in each bin
low_beta_bin<-1:bin_size        # low beta stocks position
high_beta_bin<-(num_stocks-bin_size+1):num_stocks  # high beta stocks position
num_samples<-50                 # number of samples randomly chosen from each bin for each run of the simulation

run_expt<-function(bin) {
  samples<-beta_sorted[sample(bin, size=num_samples), ]$Annualized_Return_5Year
  mean_returns<-mean(samples)
  mean_returns
}

# run the simulation to compare returns of low beta stocks vs high beta stocks
B<-10^5
returns_low_beta<-replicate(B, run_expt(low_beta_bin))
mean_returns_low_beta<-mean(returns_low_beta)
sd_returns_low_beta<-sd(returns_low_beta)

returns_high_beta<-replicate(B, run_expt(high_beta_bin))
mean_returns_high_beta<-mean(returns_high_beta)
sd_returns_high_beta<-sd(returns_high_beta)

# tabulate the results
returns_results<-data_frame(Classification="Low Beta", 
                         Mean_Returns=round(mean_returns_low_beta, 3), SD_Returns=round(sd_returns_low_beta, 3))
returns_results<-bind_rows(returns_results, 
                    data_frame(Classification="High Beta", 
                               Mean_Returns=round(mean_returns_high_beta, 3), SD_Returns= round(sd_returns_high_beta, 3)))
returns_results %>% kable

Classification	Mean_Returns	SD_Returns
Low Beta	13.426	1.778
High Beta	5.013	2.550

# Monte Carlo simulation appears to be in agreement with the regression model.

Statistical Analysis of Stock Volatility

April 23, 2016

Background and Motivation

Project Objectives

Data wranging with historical prices - SPY and 3 chosen stocks