CAPM, regression and descriptive statistics

Calculating returns and regression analysis

da <- read.csv('../Data/bac.csv', stringsAsFactors = FALSE)
da$Date <- as.Date(da$Date, format = "%d/%m/%Y")
plot(da$Date, da$BAC, xlab = "Year", ylab = "Price", type = 'l', main = "BAC price")

The scatter plot

First calculate the returns and then plot the relationship between the returns.

da$BACR <- c(da$BAC[1:length(da$BAC) - 1] / da$BAC[2:length(da$BAC)] - 1, NA)
da$SPYR <- c(da$SPY[1:length(da$SPY) - 1] / da$SPY[2:length(da$SPY)] - 1, NA)
plot(da$SPYR, da$BACR, main = "BAC and market returns", xlab = "Market", 
ylab = "Bank of Aermica")

The returns move together as expected.

Now estimate the linear relationship between the two. Use the lm function to do that.

eq1 <- lm(da$BACR ~ da$SPYR)
summary(eq1, round = 2)

## 
## Call:
## lm(formula = da$BACR ~ da$SPYR)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40661 -0.05099 -0.00938  0.04927  0.59819 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.002538   0.008828   0.287    0.774    
## da$SPYR     1.565524   0.192469   8.134 1.33e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1097 on 153 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3019, Adjusted R-squared:  0.2973 
## F-statistic: 66.16 on 1 and 153 DF,  p-value: 1.328e-13

You can see that the beta value indicating the relationship between the returns on the market and the returns for Bank of America is 1.6 and that this is statistically significant.

We can check if the key assumptions about linear regression hold by taking a look at the residuals and the auto-correlation function.

plot(da$Date[1:length(da$Date) -1], eq1$residuals, main = "Residuals from regreession", xlab = "Year",  ylab = "Residuals", type = 'l')

acf(eq1$residuals, main = "Autocorrelation of residuals")

The residuals seem to be reasonable. There is hetroscedasticity around 2007 and 2008. We know that financial assets have volatility intrauterine. There is also some evidence of auto-correlation of the residuals in the 4 and 8 months. This may have to be investigated further. However, for now, we add this regression to the scatter-gram.

plot(da$SPYR, da$BACR, main = "BAC and market returns", xlab = "Market", 
ylab = "Bank of Aermica")
abline(eq1, col = 'red')

The slope is the beta.

Historgram

Take a look at the histogram.

mybreaks = seq(from = -0.6, to = 0.8, by = 0.1)
hist(da$BACR, breaks = mybreaks, prob = TRUE, main = "BAC returns", col = 'light blue', 
     xlab = "BAC returns")

hist(da$SPYR, breaks = mybreaks,  prob = TRUE, main = "SPY returns", col = 'light blue', xlab = "SPY returns")

Now calculate some key statistics

First we want to create a function to calculate the key statistics.

Set up the statistics that you want to calculate
Write the function
Run the function

Calculate the mean, standard deviation, skew, kurtosis, maximum, minimum and count. For the mean, skew and kurtosis we also want to test whether the sample statistic is different from zero.

mystats <- function(x) {
  x <- x[!is.na(x)]
  n <- length(x)
  m <- mean(x)
  s <- sd(x) 
  t <- (m)/(s/n^0.5)
  med <- median(x) 
  skew <- sum((x-m)^3/s^3)/n
  #ses <- ((6*n*(n-1))/((n-1)*(n+1)*(n+3)))^0.5
  kurt <- sum((x-m)^4/s^4)/n - 3
  #sek <- ((n^2-1)/((n-3)*(n + 5)))^0.5
  max <- max(x)
  min <- min(x)
  return(c("Number" =  n, "Ann Mean" = 100 * (((m + 1)^12)-1), "T-stat on Mean" = t, 
           "Ann Median" = 100 * (((med + 1)^12) -1), "Standard Deviation" = s, "Skew" = skew, "Kurtosis" = kurt,  "Monthly Max" = 100 * max,  "Monthly Min" = 100 * min))
}

df <- round(data.frame("BAC stats" = mystats(da$BACR), "SPY stats" = 
                         mystats(da$SPYR)), 2)
df

##                    BAC.stats SPY.stats
## Number                155.00    155.00
## Ann Mean                8.47      3.31
## T-stat on Mean          0.65      0.74
## Ann Median              5.50      9.99
## Standard Deviation      0.13      0.05
## Skew                    0.48     -0.45
## Kurtosis                7.53      0.63
## Monthly Max            73.13     10.92
## Monthly Min           -53.26    -16.52

This tell us that we have 155 monthly variables and that (not surprisingly) there is greater risk for Bank of America than there is for the overall market. The two key figures here are minimum and kurtosis. The minimum shows the worst month that was experienced. If you lose 50% of your wealth, there is a good chance that you will panic and sell the rest of your holding. The kurtosis measures fat tails. This means that very good and very bad things happen more than would be expected if there were a normal distrubtion.

CAPM, regression and descriptive statistics

Rob Hayward

March 2021

Calculating returns and regression analysis

The scatter plot

Historgram

Now calculate some key statistics