The primary motivation of the project is to highlight the “risk” associated with Stocks by applying various statistical methods to measure risk factors such as “Market Sentiment” (“Beta”, “Standard Deviation”) and “Social Sentiment” (“Twitter feeds”). The fundamental reason to choose a topic in finance is partly current experience working in the industry and partly to understand the significance of “risk” being an end investor. Many of us manage different types of brokerage accounts (stock trading, 401k and IRA accounts) without clearly understanding the differing risk associated with the performance of these accounts. The risk appetite itself differs widely with different classes of investors depending on their personal situation and other factors. Hence it is prudent for such investors to understand the risk factors associated with their individual stocks or funds. Further, the recent advances in the social media has provided a wealth of information in the form of Twitter feeds or Google trends that can further provide great insight on the sentiment of investors influenced by market and other events. This research is an attempt to highlight concepts around “risk” factors by performing a thorough statistical analysis backed by appropriate models.
Volatility is a statistical measure of the dispersion of returns for a given security or market-index. Commonly, the higher the volatility, the riskier the security. The volatility itself is affected by multiple factors such as “Beta”, “Standard Deviation”, “Earnings”, “Balance Sheet”, “Analyst Recommendations” and “Social Sentiment”. We will start with the concept of Beta as a measure of volatility of a stock relative to the benchmark (like S&P 500) and analyze how it affects the returns of a stock relative to the benchmark. In particular, we will run a simple case study by choosing three stocks with widely differing returns relative to the benchmark (S&P 500) to emphasize the effect of beta on such classes of stocks. We will then generalize this a bit more, by comparing Beta (risk) for a universe of stocks against their respective returns for a chosen period and more important, we want to understand if a portfolio of low-volatility stocks earn returns different from a portfolio of high-volatility stocks. This is one of the fundamental questions that this research ponders upon by providing a statistical insight into the results. We will then repeat the analysis using Standard Deviation as the volatility measurement. The way we present this part of the material is by running a multiple regression model on stock’s return using beta and standard deviation as predictors. While the main focus of this research paper is on stock’s beta and standard deviation, we will also introduce other significant measures of volatility such as Twitter/Stocktwits feeds that is gaining momentum since the advent of Social Media Revolution. This is primarily done to make the investigation complete. .
## put your code here
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
opts_chunk$set(cache = TRUE, message = FALSE)
library(readr)
library(ggplot2)
theme_set(theme_bw(base_size = 16))
library(gganimate)
library(broom)
Load beta of a universe of stocks
url <- "https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/2500%20Stocks%20Financial%20Data.csv"
financial_data <- read_csv(url)
# data wrangling - rename columns etc
names(financial_data)[names(financial_data) == 'Beta (5 Yr Annualized)'] <- 'Beta_5Year'
names(financial_data)[names(financial_data) == 'Beta (1 Year Annualized)'] <- 'Beta_1Year'
names(financial_data)[names(financial_data) == 'Standard Deviation (5 Yr Annualized)'] <- 'SD_5Year'
names(financial_data)[names(financial_data) == 'P/E (Next Year\'s Estimate)'] <- 'PE_NextYear'
names(financial_data)[names(financial_data) == 'Total Return (5 Yr Annualized)'] <- 'Annualized_Return_5Year'
names(financial_data)[names(financial_data) == 'Sector/Industry'] <- 'Sector'
names(financial_data)[names(financial_data) == 'Security Type'] <- 'SecurityType'
select(financial_data, Symbol, SecurityType, Sector, Beta_5Year, Beta_1Year, SD_5Year, PE_NextYear, Annualized_Return_5Year) %>% head() %>% kable
| Symbol | SecurityType | Sector | Beta_5Year | Beta_1Year | SD_5Year | PE_NextYear | Annualized_Return_5Year |
|---|---|---|---|---|---|---|---|
| TALN | CS | Consumer Discretionary | 0.0066691 | 0.7587322 | 2.21670 | 15.00000 | 10.75663 |
| ED | CS | Utilities | 0.0173725 | 0.2749312 | 0.15303 | 15.28586 | 9.82558 |
| IIJI | DR | Information Technology | 0.0184750 | 1.1182022 | 0.40247 | 22.30952 | 10.56512 |
| NEM | CS | Materials | 0.0252339 | 0.6348882 | 0.36968 | 21.13001 | -19.88953 |
| APPS | CS | Information Technology | 0.0275968 | -0.2365876 | 0.86178 | 19.10256 | -2.60258 |
| ITC | CS | Utilities | 0.0302839 | 0.3943955 | 0.16944 | 16.26724 | 12.80138 |
ggplot(financial_data, aes(x=Beta_5Year)) +
geom_histogram(aes(y=..density..), bins=20, colour="black", fill="white") +
geom_density(alpha=0.2, fill="#FF6666") +
geom_vline(aes(xintercept=median(Beta_5Year)), color="red", linetype="dashed", size=1) +
ggtitle("Beta distribution")
# Plot beta group_by sector to understand patterns
sectors<-c('Utilities', 'Consumer Staples', 'Health Care', 'Information Technology', 'Energy', 'Industrials', 'Consumer Discretionary' , 'Materials', 'Financials')
filter(financial_data, Sector %in% sectors) %>%
ggplot(aes(x=Beta_5Year)) +
geom_histogram(aes(y=..density..), bins=20, colour="black", fill="white") +
geom_density(alpha=0.2, fill="#FF6666") +
facet_wrap(~Sector)
# Summarize beta per sector
sectors_beta<-financial_data %>% filter(Sector %in% sectors) %>%
group_by(Sector) %>%
summarize(Beta_Median=round(median(Beta_5Year), 3),
Beta_Mean=round(mean(Beta_5Year), 3),
Beta_SD=round(sd(Beta_5Year), 3),
Returns_Mean=round(mean(Annualized_Return_5Year), 3),
Returns_SD=round(sd(Annualized_Return_5Year), 3)) %>%
arrange(Beta_Median)
all_beta<-financial_data %>%
summarize(Beta_Median=round(median(Beta_5Year), 3),
Beta_Mean=round(mean(Beta_5Year), 3),
Beta_SD=round(sd(Beta_5Year), 3),
Returns_Mean=round(mean(Annualized_Return_5Year), 3),
Returns_SD=round(sd(Annualized_Return_5Year), 3))
all_beta$Sector<-"ALL"
all_beta<-all_beta[,c("Sector", "Beta_Median", "Beta_Mean", "Beta_SD", "Returns_Mean", "Returns_SD")]
sectors_beta<-rbind(sectors_beta, all_beta)
sectors_beta %>% kable
| Sector | Beta_Median | Beta_Mean | Beta_SD | Returns_Mean | Returns_SD |
|---|---|---|---|---|---|
| Utilities | 0.439 | 0.568 | 0.438 | 8.121 | 11.119 |
| Consumer Staples | 0.770 | 0.805 | 0.412 | 15.474 | 12.483 |
| Financials | 0.990 | 1.078 | 0.510 | 13.707 | 11.086 |
| Health Care | 1.011 | 1.084 | 0.480 | 18.709 | 16.902 |
| Consumer Discretionary | 1.194 | 1.278 | 0.583 | 12.253 | 17.247 |
| Information Technology | 1.284 | 1.361 | 0.569 | 8.020 | 16.070 |
| Energy | 1.300 | 1.365 | 0.577 | -0.969 | 17.245 |
| Industrials | 1.313 | 1.373 | 0.565 | 10.186 | 15.397 |
| Materials | 1.420 | 1.386 | 0.556 | 3.334 | 20.459 |
| ALL | 1.145 | 1.201 | 0.575 | 10.578 | 16.064 |
# bar plot showing mean beta and stock-returns across sectors
ggplot(sectors_beta, aes(x=factor(Sector), y=Beta_Mean)) +
geom_bar(stat="identity", width=0.5, colour="black", fill="#FF6666", alpha=0.7) +
geom_bar(data=sectors_beta, aes(x=factor(Sector), y=Returns_Mean), stat="identity", width=0.5, colour="black", fill="red", alpha=0.2) +
xlab("Sector") +
ylab("%") +
scale_y_log10() +
scale_x_discrete(labels = abbreviate) +
ggtitle("Mean beta and returns across sectors")
# It is clear that beta for different industry sectors are different with utilities and consumer staples having very low beta values as expected. Utilities and consumer staples tend to fluctuate less with the overall market sentiment as they are basic necessities. On the other hand, industrials and materials are cyclical in nature and their beta tends to be very high. Financials have almost perfect correlation with S&P500 with beta almost 1.
# Also it is clear that returns are different for different beta. We will later see if there is a clear relation between returns and beta for a universe of 2500 stocks.
SPY_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/SPY%20weekly%2010%20years.csv"
SPY_history<-read_csv(SPY_url)
ED_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/ED%20weekly%2010%20years.csv"
ED_history<-read_csv(ED_url)
ADSK_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/ADSK%20weekly%2010%20years.csv"
ADSK_history<-read_csv(ADSK_url)
IBM_url<-"https://raw.githubusercontent.com/goodwillyoga/E107project/master/Mohan/data/IBM%20weekly%2010%20years.csv"
IBM_history<-read_csv(IBM_url)
# build a consolidated frame with weekly returns (%) for SPY/ED/ADSK/IBM
stocks<-data.frame(Date=SPY_history$Date,
SPY.Close=SPY_history$Close, SPY.Volume=SPY_history$Volume, SPY.LastClose=SPY_history$Close,
ED.Close=ED_history$Close, ED.Volume=ED_history$Volume, ED.LastClose=ED_history$Close,
ADSK.Close=ADSK_history$Close, ADSK.Volume=ADSK_history$Volume, ADSK.LastClose=ADSK_history$Close,
IBM.Close=IBM_history$Close, IBM.Volume=IBM_history$Volume, IBM.LastClose=IBM_history$Close)
numRows<-nrow(SPY_history)
# populate last week return to facilitate calculating weekly % returns
stocks$SPY.LastClose[1:numRows-1]<-SPY_history$Close[2:numRows]
stocks$ED.LastClose[1:numRows-1]<-ED_history$Close[2:numRows]
stocks$ADSK.LastClose[1:numRows-1]<-ADSK_history$Close[2:numRows]
stocks$IBM.LastClose[1:numRows-1]<-IBM_history$Close[2:numRows]
# compuate weekly % returns
stocks<-mutate(stocks, SPY.Returns=round((SPY.Close-SPY.LastClose)*100/SPY.LastClose, 3),
ED.Returns=round((ED.Close-ED.LastClose)*100/ED.LastClose, 3),
ADSK.Returns=round((ADSK.Close-ADSK.LastClose)*100/ADSK.LastClose, 3),
IBM.Returns=round((IBM.Close-IBM.LastClose)*100/IBM.LastClose, 3))
head(stocks, 2)
## Date SPY.Close SPY.Volume SPY.LastClose ED.Close ED.Volume
## 1 4/18/2016 209.24 75277700 207.78 71.10 2222300
## 2 4/11/2016 207.78 83455600 204.50 75.85 1273700
## ED.LastClose ADSK.Close ADSK.Volume ADSK.LastClose IBM.Close IBM.Volume
## 1 75.85 60.07 1365700 58.42 149.30 7887200
## 2 76.04 58.42 1607000 56.76 151.72 3428700
## IBM.LastClose SPY.Returns ED.Returns ADSK.Returns IBM.Returns
## 1 151.72 0.703 -6.262 2.824 -1.595
## 2 149.35 1.604 -0.250 2.925 1.587
# Summarize weekly returns
SPX_Weekly_Returns<-summary(stocks$SPY.Returns)
ED_Weekly_Returns<-summary(stocks$ED.Returns)
ADSK_Weekly_Returns<-summary(stocks$ADSK.Returns)
IBM_Weekly_Returns<-summary(stocks$IBM.Returns)
Weekly_Returns<-rbind(SPX_Weekly_Returns, ED_Weekly_Returns)
Weekly_Returns<-rbind(Weekly_Returns, ADSK_Weekly_Returns)
Weekly_Returns<-rbind(Weekly_Returns, IBM_Weekly_Returns)
Weekly_Returns %>% kable
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| SPX_Weekly_Returns | -19.79 | -0.9742 | 0.2685 | 0.1236 | 1.492 | 13.290 |
| ED_Weekly_Returns | -12.86 | -1.1780 | 0.2165 | 0.1213 | 1.412 | 8.115 |
| ADSK_Weekly_Returns | -26.42 | -2.7590 | 0.3265 | 0.2244 | 3.256 | 20.020 |
| IBM_Weekly_Returns | -15.17 | -1.3650 | 0.2075 | 0.1633 | 1.777 | 14.470 |
Betacomputation: case Study of 3 stocks
# Correlation
ED_cor<-cor(stocks$ED.Returns, stocks$SPY.Returns)
ADSK_cor<-cor(stocks$ADSK.Returns, stocks$SPY.Returns)
IBM_cor<-cor(stocks$IBM.Returns, stocks$SPY.Returns)
# Beta is the slope of the regression line between SPY returns and the stock returns
# Use SPY returns to predict individual stock returns
ED_beta<-tidy(lm(stocks$ED.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>%
filter(term == "stocks$SPY.Returns")
ADSK_beta<-tidy(lm(stocks$ADSK.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>%
filter(term == "stocks$SPY.Returns")
IBM_beta<-tidy(lm(stocks$IBM.Returns ~ stocks$SPY.Returns), conf.int=TRUE) %>%
filter(term == "stocks$SPY.Returns")
# use kable to show beta estimates, p values, r-squared and CI values
ED_beta$term[1]<-"Beta (ED)"
ADSK_beta$term[1]<-"Beta (ADSK)"
IBM_beta$term[1]<-"Beta (IBM)"
beta_est<-union(IBM_beta, union(ED_beta, ADSK_beta))
beta_est<-mutate(beta_est, r=c(ED_cor, ADSK_cor, IBM_cor))
beta_est<-mutate(beta_est, r.squared_perc=round(r*r*100, 3))
beta_est<-mutate(beta_est, estimate=round(as.numeric(estimate), 2))
beta_est<-rbind(beta_est, c('Beta (SPY)', 1.0, 0, 0, 0, 1, 1, 0, 0))
beta_est %>% kable
| term | estimate | std.error | statistic | p.value | conf.low | conf.high | r | r.squared_perc |
|---|---|---|---|---|---|---|---|---|
| Beta (IBM) | 0.81 | 0.0386624990621484 | 21.0047585679583 | 2.19355547257144e-71 | 0.736142567805647 | 0.888050349063038 | 0.480272895860774 | 23.066 |
| Beta (ADSK) | 1.41 | 0.0697787761685322 | 20.2732056560925 | 8.9916796773494e-68 | 1.27755652691469 | 1.5517224324755 | 0.664426455822036 | 44.146 |
| Beta (ED) | 0.41 | 0.0331447254388387 | 12.4862254750735 | 1.77815348658691e-31 | 0.348738492197471 | 0.478966538080018 | 0.677502402383787 | 45.901 |
| Beta (SPY) | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
# bar plot showing estimated beta and calculated returns
ggplot(beta_est, aes(x=factor(term), y=estimate)) +
geom_bar(stat="identity", width=0.5, colour="black", fill="#FF6666", alpha=0.3) +
xlab("Case Study") +
ylab("Beta Estimate") +
ggtitle("Case Study: Estimated Beta")
# plot using smoothing
ggplot_lm<-function(stock_col) {
stocks %>% filter(SPY.Returns >= -10) %>%
ggplot(aes_string(x="SPY.Returns", y=stock_col)) +
geom_point() +
geom_smooth(method="lm", se = TRUE) +
ggtitle(paste(stock_col, " returns against SPY"))
}
ggplot_lm("ED.Returns")
ggplot_lm("ADSK.Returns")
ggplot_lm("IBM.Returns")
We now want to study if a portfolio of low-volatility stocks earn returns different from a portfolio of high-volatility stocks. We will do this using regression as well as simulation
Regression of stock returns using beta (predictor)
beta_cor<-cor(financial_data$Annualized_Return_5Year, financial_data$Beta_5Year)
beta_returns<-tidy(lm(Annualized_Return_5Year ~ Beta_5Year, financial_data), conf.int=TRUE) %>%
filter(term == "Beta_5Year")
beta_returns$r<-beta_cor
beta_returns$r.squared_perc<-beta_cor * beta_cor * 100
beta_returns %>% kable
| term | estimate | std.error | statistic | p.value | conf.low | conf.high | r | r.squared_perc |
|---|---|---|---|---|---|---|---|---|
| Beta_5Year | -5.475609 | 0.5409108 | -10.12294 | 0 | -6.536275 | -4.414944 | -0.1958902 | 3.837296 |
financial_data %>%
ggplot(aes_string(x="Beta_5Year", y="Annualized_Return_5Year")) +
geom_point() +
geom_smooth(method="lm", se = TRUE) +
ggtitle("Beta vs Annualized_Returns (5 Year)")
# From the regression, it appears that annualized returns goes by about 5.47% for every one unit increase in beta.
MonteCarlo simulation
beta_sorted<-financial_data %>% arrange(Beta_5Year) %>% select(Symbol, Beta_5Year, Annualized_Return_5Year)
# Monte Carlo simulation parameters
set.seed(131)
num_bins<-5 # distinct bins after sorting betas
num_stocks<-nrow(beta_sorted) # total number of stocks in our universe
bin_size<-num_stocks/num_bins # number of stocks in each bin
low_beta_bin<-1:bin_size # low beta stocks position
high_beta_bin<-(num_stocks-bin_size+1):num_stocks # high beta stocks position
num_samples<-50 # number of samples randomly chosen from each bin for each run of the simulation
run_expt<-function(bin) {
samples<-beta_sorted[sample(bin, size=num_samples), ]$Annualized_Return_5Year
mean_returns<-mean(samples)
mean_returns
}
# run the simulation to compare returns of low beta stocks vs high beta stocks
B<-10^5
returns_low_beta<-replicate(B, run_expt(low_beta_bin))
mean_returns_low_beta<-mean(returns_low_beta)
sd_returns_low_beta<-sd(returns_low_beta)
returns_high_beta<-replicate(B, run_expt(high_beta_bin))
mean_returns_high_beta<-mean(returns_high_beta)
sd_returns_high_beta<-sd(returns_high_beta)
# tabulate the results
returns_results<-data_frame(Classification="Low Beta",
Mean_Returns=round(mean_returns_low_beta, 3), SD_Returns=round(sd_returns_low_beta, 3))
returns_results<-bind_rows(returns_results,
data_frame(Classification="High Beta",
Mean_Returns=round(mean_returns_high_beta, 3), SD_Returns= round(sd_returns_high_beta, 3)))
returns_results %>% kable
| Classification | Mean_Returns | SD_Returns |
|---|---|---|
| Low Beta | 13.426 | 1.778 |
| High Beta | 5.013 | 2.550 |
# Monte Carlo simulation appears to be in agreement with the regression model.