The purpose of the project is to apply the concept of ANOVA (Analysis of Variance) by utilizing dummby variables in linear regression. Since ANOVA measures whether the mean of different groups is identical or not, we don’t have to calculate MSB or MSE if we use the features of dummy variables in regression. Specifically, if the p-value of F-statistic of the dummy regression is above 0.05, we can accept the null hypothesis which is u1=u2=u3…=ux. On the other hand, If the p-value of F-statistic is below 0.05, then we accept the null hypthesis which indicates that at least, one of the coefficient has different mean.

S&P 500 vs Nasdaq vs Portfolio

Data Preperation

From the previous project, we constructed a tangency point portfolio as well as box-constraint GMV portfolio. Thus, we will compare a total of 4 portfolio to find out whether they all have same mean return or not.

#Import necessary libraries
library(tidyquant)
## 필요한 패키지를 로딩중입니다: lubridate
## 
## 다음의 패키지를 부착합니다: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## 필요한 패키지를 로딩중입니다: PerformanceAnalytics
## 필요한 패키지를 로딩중입니다: xts
## 필요한 패키지를 로딩중입니다: zoo
## 
## 다음의 패키지를 부착합니다: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 다음의 패키지를 부착합니다: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
## 필요한 패키지를 로딩중입니다: quantmod
## 필요한 패키지를 로딩중입니다: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## == Need to Learn tidyquant? ====================================================
## Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x dplyr::first()           masks xts::first()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x dplyr::last()            masks xts::last()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()
library(DT)
#Define the assets of portfolio
symbols <- c('AMD','AMZN','MSFT', 'NVDA', 'PEP')

#Define the weights of each asset for tangecy portfolio
wts.tp <- c(0.12,0.49,0.08,0.29,0.02)
wts.gmv <- c(0.1,0.1,0.3,0.1,0.4)

#Import Stock prices
stocks.prices <-  tq_get(symbols,get  = "stock.prices",
                         from = "2012-01-01",
                         to = "2022-11-30") %>%
group_by(symbol)

head(stocks.prices)
## # A tibble: 6 x 8
## # Groups:   symbol [1]
##   symbol date        open  high   low close   volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1 AMD    2012-01-03  5.53  5.59  5.44  5.48 12675100     5.48
## 2 AMD    2012-01-04  5.47  5.49  5.41  5.46  8034900     5.46
## 3 AMD    2012-01-05  5.45  5.57  5.35  5.46 11476900     5.46
## 4 AMD    2012-01-06  5.44  5.52  5.39  5.43 12938600     5.43
## 5 AMD    2012-01-09  5.42  5.6   5.38  5.59 12585400     5.59
## 6 AMD    2012-01-10  5.66  5.75  5.64  5.71 10129600     5.71
#Calculating monthly returns 
multpl_stock_monthly_ret <- stocks.prices %>%
  group_by(symbol) %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'monthly',
               col_rename = 'returns')
head(multpl_stock_monthly_ret)
## # A tibble: 6 x 3
## # Groups:   symbol [1]
##   symbol date       returns
##   <chr>  <date>       <dbl>
## 1 AMD    2012-01-31  0.224 
## 2 AMD    2012-02-29  0.0954
## 3 AMD    2012-03-30  0.0912
## 4 AMD    2012-04-30 -0.0823
## 5 AMD    2012-05-31 -0.174 
## 6 AMD    2012-06-29 -0.0576
#Computing returns for Tangency point portfolio
port_ret_tp <- multpl_stock_monthly_ret %>%
  tq_portfolio(assets_col = symbol,
               returns_col = returns,
               weights = wts.tp,
               geometric = FALSE,
               col_rename = 'returns')
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
## i Please use `spread()` instead.
## i The deprecated feature was likely used in the tidyquant package.
##   Please report the issue at
##   <https://github.com/business-science/tidyquant/issues>.
#Add symbol colums
port_ret_tp$symbol = c('Tangency')
head(port_ret_tp)
## # A tibble: 6 x 3
##   date       returns symbol  
##   <date>       <dbl> <chr>   
## 1 2012-01-31  0.0922 Tangency
## 2 2012-02-29 -0.0124 Tangency
## 3 2012-03-30  0.0803 Tangency
## 4 2012-04-30  0.0153 Tangency
## 5 2012-05-31 -0.0796 Tangency
## 6 2012-06-29  0.0657 Tangency
#Computing returns for Box-constraint GMV portfolio
port_ret_gmv <- multpl_stock_monthly_ret %>%
  tq_portfolio(assets_col = symbol,
               returns_col = returns,
               weights = wts.gmv,
               geometric = FALSE,
               col_rename = 'returns')

#Add symbol colums
port_ret_gmv$symbol = c('GMV')
head(port_ret_gmv)
## # A tibble: 6 x 3
##   date       returns symbol
##   <date>       <dbl> <chr> 
## 1 2012-01-31  0.0628 GMV   
## 2 2012-02-29  0.0156 GMV   
## 3 2012-03-30  0.0501 GMV   
## 4 2012-04-30 -0.0136 GMV   
## 5 2012-05-31 -0.0403 GMV   
## 6 2012-06-29  0.0436 GMV
#Import S&P 500 data and compute the returns
spy.prices <-  tq_get('SPY',get  = "stock.prices",
                         from = "2012-01-01",
                         to = "2022-11-30")
spy_ret <- spy.prices %>%
  group_by(symbol) %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'monthly',
               col_rename = 'returns')
head(spy_ret)
## # A tibble: 6 x 3
## # Groups:   symbol [1]
##   symbol date        returns
##   <chr>  <date>        <dbl>
## 1 SPY    2012-01-31  0.0300 
## 2 SPY    2012-02-29  0.0434 
## 3 SPY    2012-03-30  0.0322 
## 4 SPY    2012-04-30 -0.00668
## 5 SPY    2012-05-31 -0.0601 
## 6 SPY    2012-06-29  0.0406
#Import Nasdaq data and compute the returns
qqq.prices <-  tq_get('QQQ',get  = "stock.prices",
                         from = "2012-01-01",
                         to = "2022-11-30")
qqq_ret <- qqq.prices %>%
  group_by(symbol) %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'monthly',
               col_rename = 'returns')
qqq_ret
## # A tibble: 131 x 3
## # Groups:   symbol [1]
##    symbol date        returns
##    <chr>  <date>        <dbl>
##  1 QQQ    2012-01-31  0.0638 
##  2 QQQ    2012-02-29  0.0641 
##  3 QQQ    2012-03-30  0.0505 
##  4 QQQ    2012-04-30 -0.0117 
##  5 QQQ    2012-05-31 -0.0704 
##  6 QQQ    2012-06-29  0.0362 
##  7 QQQ    2012-07-31  0.00998
##  8 QQQ    2012-08-31  0.0519 
##  9 QQQ    2012-09-28  0.00888
## 10 QQQ    2012-10-31 -0.0528 
## # ... with 121 more rows
#Merge all the 4 portfolios in to 1 data set
Data <- rbind(qqq_ret,spy_ret,port_ret_gmv,port_ret_tp)
datatable(Data)

Visualize the return of each asset

#Visualize the return 
Data %>%
  mutate(year=year(date)) %>%
  group_by(symbol,year) %>%
  summarise(mean=mean(returns)) %>%
  ggplot(aes(x=year,y=mean,fill=symbol))+
  geom_bar(stat = 'identity', position = 'dodge', width = 0.7, size=1) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(2012,2022,2)) +
  labs(x='Year',y='Mean Returns')+
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")+
  theme(legend.position = 'top')+
  facet_wrap(~symbol) +
  ggtitle('Mean Return of Each Portfolio')
## `summarise()` has grouped output by 'symbol'. You can override using the
## `.groups` argument.

Dummy Regression & ANOVA

Now, we see that all the necessary information are integrated into one dataframe. From now, we will going to create dummby variables for each symbol.

#Creating Dummby variable
#For each portfolio, there are 131 observations. 
#Thus there should be 131 - '1' and 393 - '0' for each dummby variable.
#Exclude qqq dummy variable to avoid dummy trap
#Creating dummies for SPY-GMV-Tangency  

Dspy <- rep(0,131)
Dspy[132:262] <- 1
Dspy[263:524] <- 0
Dgmv <- rep(0,262)
Dgmv[263:393] <- 1
Dgmv[394:524] <- 0
Dtp <- rep(0,393)
Dtp[394:524] <- 1

Data2 <- cbind(Data,Dspy,Dgmv,Dtp)
## New names:
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
colnames(Data2)[4] <- 'Dspy'
colnames(Data2)[5] <- 'Dgmv'
colnames(Data2)[6] <- 'Dtp'

datatable(Data2)

Since we have 4 different types of assets, the dummies should be no more than 3 due dumy trap

#Run regression
reg <- lm(data = Data2, returns ~ Dspy + Dgmv +Dtp)
summary(reg)
## 
## Call:
## lm(formula = returns ~ Dspy + Dgmv + Dtp, data = Data2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.270035 -0.030297  0.005216  0.032355  0.218451 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.014243   0.005048   2.821  0.00496 **
## Dspy        -0.003165   0.007139  -0.443  0.65767   
## Dgmv         0.005449   0.007139   0.763  0.44561   
## Dtp          0.012911   0.007139   1.809  0.07109 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05778 on 520 degrees of freedom
## Multiple R-squared:  0.0111, Adjusted R-squared:  0.005392 
## F-statistic: 1.945 on 3 and 520 DF,  p-value: 0.1213

Since the p-value is over 0.05 (0.12) it’s not statistically significant at a 95% confidence level. So we could conclude that the mean return of the 4 portfolios are same, accepting the null hypthesis.

However, being more generous regarding confidence level, we could reject the null hypothesis and say that the portfolio has a different returns at a 88% confidence level. If we adopt this perspective, then we could say that while the portfolio offered an average of 1.4% monthly return, tangency point portfolio offered 2.6% (0.014 + 0.012) monthly return at an average.