Market Factor

My R Code

Load packages

# options(scipen=999) # shut down scientific notation
library(tidyverse);library(Hmisc);#library(roll) # roll_prod
rm(list=ls())

Load data

CRSP

You will need to download the data from CRSP, to which WRDS has access. The data items used here are given below.

CRSP Permanent Number (PERMNO) (select PERMNO as company codes)
CRSP Permanent Company Number (permco)
Share Code (shrcd)
Exchange Code (exchcd)
Primary Exchange (primexch)
Price (prc)
Holding Period Return (ret)
Number of Shares Outstanding (shrout)
Delisting Return (dlret)

Note the difference between PERMNO and PERMCO, where PERMNO is security-specific and PERMCO is company-specific. For example, Google has two class of shares, therefore Google has one PERMCO but two PERMNO.

Code

# df <- read.csv("Data/CRSP_2022.csv")
# saveRDS(df, "Data/CRSP_2022.rds")
crsp <- readRDS("Data/CRSP_2022.rds")

# convert all column names to lower case to keep consistent
names(crsp) <- tolower(names(crsp))

Process CRSP

This section deals with the raw CRSP data. It includes converting date format, computing firm size, and filling missing returns with zeros.

One can argue about whether filling missing returns with zeros is reasonable, as there can be many different reasons for a return to be missing. If a stock is suspended from trading within month t while it is selected in your portfolio (long or short) at the end of month t-1, it will have a missing return but it takes a non-zero weight in your portfolio in month t. Without further information, taking a missing return as zero would make more sense than ignoring it in your portfolio in this case.

You can un-comment the three-line code to take delisting returns into account, though I do not find it influence any results.

Code

df <- crsp

df$date.o <- df$date
df$date <- as.integer(substr(df$date, 1, 6))
# a minus sign before PRC means it is the average of bid and ask prices
df$prc <- abs(df$prc) # absolute value of price 
df$me <- df$prc * df$shrout # firm ME (ME--market equity)
df$ret <- as.numeric(df$ret) # NAs will be introduced by coercion
# df$dlret <- as.numeric(df$dlret)

## Deal with Delist returns and NA returns
df$ret.o <- df$ret # keep the original return data
df[is.na(df$ret), 'ret'] <- 0
# df[is.na(df$dlret), 'dlret'] <- 0
# # Take the delist return into account
# df$ret <- (1+df$ret)*(1+df$dlret) - 1

# check number of obs/row (192512 ~ 2021/12)
nrow(df) # 4810041

## [1] 4810041

# lag me and prior-return
df <- df %>%
  group_by(permno) %>%
  arrange(date, .by_group=T) %>%
  mutate(
    me.lag = dplyr::lag(me, 1)
  )

# screening stocks following conventions
df <- df[(df$shrcd%in%c(10,11)) & (df$primexch%in%c('N','A','Q')),]

Form market portfolios

Code

# Value weighted
res <- df %>% 
  filter(!is.na(me.lag)) %>%
  group_by(date) %>%
  summarise(ret = weighted.mean(ret, me.lag, na.rm = T))

res$ret <- res$ret * 100

Verify the results

This section verifies the results with the Fama-French results: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. You need to download the three-factor data from their website and name the first column “date” and save it as “FF_All_Factors.csv” in your “Data” folder.

My result is very close to their market factor with very close mean and t-stat and a correlation of 100% in the sample from 1926 to 2021.

Code

# load Fama-French factor
ff <- read.csv("Data/FF_All_Factors.csv")
names(ff) <- tolower(names(ff))

# merge ff and res
res <- merge(res, ff, by='date')
res$ret.rf <- res$ret - res$rf

min(res$date) # 192607

## [1] 192607

max(res$date) # 202112

## [1] 202112

# full sample
t.test(res$ret.rf) # t = 4.4069, mean of x = 0.6926623

## 
##  One Sample t-test
## 
## data:  res$ret.rf
## t = 4.4069, df = 1145, p-value = 1.147e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.3842732 1.0010514
## sample estimates:
## mean of x 
## 0.6926623

t.test(res$mkt.rf) # t = 4.3992, mean of x = 0.6933246

## 
##  One Sample t-test
## 
## data:  res$mkt.rf
## t = 4.3992, df = 1145, p-value = 1.188e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.3841018 1.0025474
## sample estimates:
## mean of x 
## 0.6933246

cor(res$ret.rf, res$mkt.rf) # 0.9999927

## [1] 0.9999927

# # sub-sample 1
# t.test(res$ret.rf[res$date>196306]) # t = 3.5051, mean of x = 0.5869876  
# t.test(ff$mkt.rf[ff$date>196306]) # t = 3.5099, mean of x = 0.5892165  
# cor(res$ret.rf[res$date>196306], ff$mkt.rf[ff$date>196306]) # 0.9999972 
# 
# # sub-sample 2
# t.test(res$ret.rf[res$date>199912]) # t = 2.1845, mean of x = 0.6054074  
# t.test(ff$mkt.rf[ff$date>199912]) # t = 2.1809, mean of x = 0.6057576  
# cor(res$ret.rf[res$date>199912], ff$mkt.rf[ff$date>199912]) # 0.9999972

Plot the cumulative returns

This section plots the cumulative returns of my result (ret.rf) and Fama-French market factor (mkt.rf). It helps visualize the consistency and it concludes my replication work for market factor.

Code

res.p <- res[c('date','ret.rf','mkt.rf')]
# convert to R date format
res.p$date <- as.Date(paste0(res.p$date,'01'), '%Y%m%d')
res.p$date <- res.p$date + months(1) - 1

# transform wide table into long table
res.p <- reshape2::melt(res.p, id='date')
names(res.p) <- c("Date","Factor","Return")

# compute cumulative return
res.p <- res.p %>%
  group_by(Factor) %>% 
  mutate(
    Return.Cum = cumprod(1+Return/100)
  )

# cut 1950-2021 into 4 subsamples
# library(Hmisc) # func cut2()
res.p$Year <- as.integer(substr(res.p$Date,1,4))
res.p$Subsample <- cut2(res.p$Year, g=4)

# ggplot
fig <- ggplot(data = res.p, aes(x=Date, y=Return.Cum, group=Factor)) +
  geom_line(aes(linetype=Factor, color=Factor)) +
  scale_linetype_manual(values=c('solid',"longdash","dotted","dashed","twodash"))+
  scale_color_manual(values=c('blue','brown','black','black','coral4'))+
  ylab('Portfolio worth if invest $1 at the beginning') +
  scale_x_date(date_labels = "%m-%Y", date_breaks = "4 year") +
  theme_bw() +
  theme(legend.title = element_blank()
  ) + 
  facet_wrap(vars(Subsample),
             scales = "free") + 
  scale_y_continuous() + 
  labs(title=paste('Cumulative return of factors from',min(res$date)))
print(fig)

# ggsave('Cumulative_Return_Market_Factor.pdf', width = 12, height = 7)

Market Factor

Kai Wang

2/6/2022

My R Code

Load packages

Load data

Process CRSP

Form market portfolios

Verify the results

Plot the cumulative returns