The Fama-French Procedure

You can find how detailed description of Fama-French method to construct momentum factor on their website: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_mom_factor.html.

They construct 2-by-3 size and prior-return portfolios, where size is measured by the firms’ market equity (price times share outstanding) and prior-return is the firms’ cumulative return from t-12 to t-2 (month t-1 is skipped due to the one-month short-term reversal phenomenon).

The momentum factor is the average return of past winners (across small and big), firms with higher cumulative return from t-12 to t-2, minus the average return of past losers (across small and big).

My R Code

Load packages

# options(scipen=999, warn = 1) # shut down scientific notation
library(tidyverse);library(Hmisc);library(roll) # roll_prod
rm(list=ls()) # clear global environment

Load data

CRSP

You will need to download the data from CRSP, to which WRDS has access. The data items used here are given below.

  • CRSP Permanent Number (PERMNO) (select PERMNO as company codes)
  • CRSP Permanent Company Number (permco)
  • Share Code (shrcd)
  • Exchange Code (exchcd)
  • Primary Exchange (primexch)
  • Price (prc)
  • Holding Period Return (ret)
  • Number of Shares Outstanding (shrout)
  • Delisting Return (dlret)

Note the difference between PERMNO and PERMCO, where PERMNO is security-specific and PERMCO is company-specific. For example, Google has two class of shares, therefore Google has one PERMCO but two PERMNO.

CCM

In addition to CRSP, I use the CRSP-COMPUSTAT-Merged link table (CCM) to connect CRSP to COMPUSTAT. CCM is accessible under CRSP in WRDS. I download the entire link table as it is not a big file. Even though constructing momemntum factor does not require combining CRSP and COMPUSTAT, I apply it to keep consistency.

Code

# df <- read.csv("Data/CRSP_2022.csv")
# saveRDS(df, "Data/CRSP_2022.rds")
crsp <- readRDS("Data/CRSP_2022.rds")
ccm <- read.csv("Data/CCM.csv")

# convert all column names to lower case to keep consistent
names(crsp) <- tolower(names(crsp))
names(ccm) <- tolower(names(ccm))

Process CRSP

This section deals with the raw CRSP data. It includes converting date format, computing firm size, and filling missing returns with zeros.

One can argue about whether filling missing returns with zeros is reasonable, as there can be many different reasons for a return to be missing. If a stock is suspended from trading within month t while it is selected in your portfolio (long or short) at the end of month t-1, it will have a missing return but it takes a non-zero weight in your portfolio in month t. Without further information, taking a missing return as zero would make more sense than ignoring it in your portfolio in this case.

You can un-comment the three-line code to take delisting returns into account, though I do not find it influence any results.

Code

df <- crsp

df$date.o <- df$date
df$date <- as.integer(substr(df$date, 1, 6))
# a minus sign before PRC means it is the average of bid and ask prices
df$prc <- abs(df$prc) # absolute value of price 
df$me <- df$prc * df$shrout # firm ME (ME--market equity)
df$ret <- as.numeric(df$ret) # NAs will be introduced by coercion
# df$dlret <- as.numeric(df$dlret)

## Deal with Delist returns and NA returns
df$ret.o <- df$ret # keep the original return data
df[is.na(df$ret), 'ret'] <- 0
# df[is.na(df$dlret), 'dlret'] <- 0
# # Take the delist return into account
# df$ret <- (1+df$ret)*(1+df$dlret) - 1

# check number of obs/row (192512 ~ 2021/12)
nrow(df) # 4810041      
## [1] 4810041
# lag me and prior-return
df <- df %>%
  group_by(permno) %>%
  arrange(date, .by_group=T) %>%
  mutate(
    me.lag = dplyr::lag(me, 1),
    ret.cum.0.10 = roll_prod(1+ret, width = 11)-1,
    ret.cum.2.12 = dplyr::lag(ret.cum.0.10, n=2)
  )

# screening stocks following conventions
df <- df[(df$shrcd%in%c(10,11)) & (df$primexch%in%c('N','A','Q')),]
crsp.ready <- df

Process CCM and merge CRSP and CCM

This section merges CRSP with CCM. You can find more information about how to use CCM on WRDS: https://wrds-www.wharton.upenn.edu/pages/classroom/using-crspcompustat-merged-database/.

After combining the two datasets, I use gvkey as the firm id to construct portfolios. This procedure can help keep consistent with the other factor constructions.

Code

df <- ccm

# unique(df$linktype) 
# # "NU" "LU" "NR" "LC" "LX" "LD" "LN" "LS" "NP"
df <- df[substr(df$linktype,1,1) == "L", ]
df <- df[df$linkprim %in% c('P','C'),]
df <- df[c('lpermno','gvkey','cik','linkdt','linkenddt')]
colnames(df)[1] <- 'permno'
df$linkenddt <- as.integer(df$linkenddt)
df$linkenddt <- ifelse(is.na(df$linkenddt), Inf, df$linkenddt)
# merge
df <- merge(df, crsp.ready, by='permno')
# the ccm link is only valid between linkdt and linkenddt
df <- df[df$date.o >= df$linkdt & df$date.o <= df$linkenddt, ]
df$linkdt <- NULL
df$linkenddt <- NULL

# sum up the size (me) of different permno under the same gvkey
df <- df %>%
  group_by(gvkey, date) %>%
  mutate(
    me.sum = sum(me),
    me.sum.lag = sum(me.lag)
  ) %>%  # max me
  filter(
    me.lag == max(me.lag)
  )

# # check whether there is duplicated gvkey in a month
# #   you only need to confirm this once
# nrow(df) # 3120553      
# df <- df[!duplicated(df[,c('gvkey','date')]),]
nrow(df) # 3120553
## [1] 3120553

Form 2-by-3 me and prior-return portfolios and compute momentum factor

This section cuts firms into 2 size groups (small and big) and 3 prior-return groups (winner, neutral, and loser) independently at the end of each month (t-1). A (Fama-French) momentum portfolio in each month (t) takes long positions in small-winner and big-winner equally and take short position in small-loser and big-loser equally. Note that Fama-French always use NYSE cut-points to form portfolios, hence I use [primexch==“N”] when calculating the cut-points to take care of this convention.

Code

## Rank stocks based on different information
df <- df %>%
  filter(!is.na(me.sum.lag) & !is.na(ret.cum.2.12) & date>=192701) %>%
  group_by(date) %>% 
  mutate(rank.me = findInterval(me.sum.lag, quantile(me.sum.lag[primexch=="N"], probs=0.5))+1,
         rank.ret = findInterval(ret.cum.2.12, quantile(ret.cum.2.12[primexch=="N"], probs=c(0.3,0.7)))+1)

# Value weighted
res.vw <- df %>%
  group_by(date, rank.me, rank.ret) %>%
  summarise(ret = weighted.mean(ret, me.sum.lag, na.rm = T))

# # check the rank
# unique(res.vw$rank.me) # 1 2
# unique(res.vw$rank.ret) # 1 2 3

# for each Date in each future month, get the difference between winnner and loser
res <- res.vw %>%
  group_by(date) %>%
  summarise(
    wml = 0.5*(ret[rank.me==1&rank.ret==3]+ret[rank.me==2&rank.ret==3]) - 
      0.5*(ret[rank.me==1&rank.ret==1]+ret[rank.me==2&rank.ret==1])
  )
res$wml <- res$wml * 100  

Verify the results

This section verifies the results with the Fama-French results: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. You need to download it from their website and name the first column “date” and save it as “Momentum_2_3.csv” in your “Data” folder.

My result is very close to their momentum factor with very close mean and t-stat and a correlation of 99.4% in the sample from 1950 to 2021.

Code

# load Fama-French momentum factor
ff <- read.csv("Data/Momentum_2_3.csv")
names(ff) <- tolower(names(ff))
# names(ff) # "date" "mom" 

# # check the date range
# min(ff$date) # 192701
# max(ff$date) # 202112
# min(res$date) # 194907
# max(res$date) # 202112

# merge ff and res
res <- merge(res, ff, by='date')

# full sample
t.test(res$wml) # t = 5.0691, mean of x = 0.6746326
## 
##  One Sample t-test
## 
## data:  res$wml
## t = 5.0691, df = 869, p-value = 4.884e-07
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.4134244 0.9358408
## sample estimates:
## mean of x 
## 0.6746326
t.test(res$mom) # t = 5.0871, mean of x = 0.6754598
## 
##  One Sample t-test
## 
## data:  res$mom
## t = 5.0871, df = 869, p-value = 4.456e-07
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.4148539 0.9360656
## sample estimates:
## mean of x 
## 0.6754598
cor(res$wml, res$mom) # 0.9941092
## [1] 0.9941092
# # sub-sample 1
# t.test(res$wml[res$date>196306]) # t = 3.8336, mean of x = 0.6087843 
# t.test(ff$mom[ff$date>196306]) # t = 3.9346, mean of x = 0.6247009 
# cor(res$wml[res$date>196306], ff$mom[ff$date>196306]) # 0.9983856 
# 
# # sub-sample 2
# t.test(res$wml[res$date>199912]) # t = 0.52527, mean of x = 0.1703845 
# t.test(ff$mom[ff$date>199912]) # t = 0.50273, mean of x = 0.162803 
# cor(res$wml[res$date>199912], ff$mom[ff$date>199912]) # 0.9992226

Plot the cumulative returns

This section plots the cumulative returns of my result (wml) and Fama-French momentum factor (mom). It helps visualize the consistency and it concludes my replication work for momentum factor.

Code

res.p <- res
# convert to R date format
res.p$date <- as.Date(paste0(res.p$date,'01'), '%Y%m%d')
res.p$date <- res.p$date + months(1) - 1

# transform wide table into long table
res.p <- reshape2::melt(res.p, id='date')
names(res.p) <- c("Date","Factor","Return")

# compute cumulative return
res.p <- res.p %>%
  group_by(Factor) %>% 
  mutate(
    Return.Cum = cumprod(1+Return/100)
  )

# cut 1950-2021 into 4 subsamples
# library(Hmisc) # func cut2()
res.p$Year <- as.integer(substr(res.p$Date,1,4))
res.p$Subsample <- cut2(res.p$Year, g=4)

# ggplot
fig <- ggplot(data = res.p, aes(x=Date, y=Return.Cum, group=Factor)) +
  geom_line(aes(linetype=Factor, color=Factor)) +
  scale_linetype_manual(values=c('solid',"longdash","dotted","dashed","twodash"))+
  scale_color_manual(values=c('blue','brown','black','black','coral4'))+
  ylab('Portfolio worth if invest $1 at the beginning') +
  scale_x_date(date_labels = "%m-%Y", date_breaks = "4 year") +
  theme_bw() +
  theme(legend.title = element_blank()
  ) + 
  facet_wrap(vars(Subsample),
             scales = "free") + 
  scale_y_continuous() + 
  labs(title=paste('Cumulative return of factors from',min(res$date)))
print(fig)

# ggsave('Cumulative_Return_Momentum_Factor.pdf', width = 12, height = 7)