The idea for this project is to give examples on how to calculate factors based on financial raw data and using these factors to construct portfolios which share similar characteristics and then evaluate the performance through monthly rebalancing.
Let’s load the libraries first
library(tidyverse)
library(PerformanceAnalytics)
## Warning: package 'PerformanceAnalytics' was built under R version 3.5.1
Let’s import the data now. The data is monthly time series (ranging from 197001 till 201606) of raw variables required to construct the factors for each constituent in S&P500 prevalent at the point of time.
Now, load the data and view the summary
Data <- read_csv("Data.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## gvkey = col_character(),
## tic = col_character(),
## sp500 = col_integer(),
## date = col_integer()
## )
## See spec(...) for full column specifications.
Data
## # A tibble: 483,208 x 23
## gvkey tic prccm prchm prclm trfm trt1m cshom fyearq fqtr ceqq
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 001010 4165A 46.4 49.9 46.1 1.38 -5.36 NaN 1969 4 192.
## 2 001010 4165A 48.1 48.4 44.4 1.40 5.07 NaN 1969 4 192.
## 3 001010 4165A 49.2 50.2 47.5 1.40 2.34 NaN 1969 4 192.
## 4 001010 4165A 45.5 51.5 42.5 1.40 -7.61 NaN 1970 1 NaN
## 5 001010 4165A 38.5 45.1 35.6 1.42 -14.1 NaN 1970 1 NaN
## 6 001010 4165A 38.5 42.6 38.5 1.42 0 NaN 1970 1 NaN
## 7 001010 4165A 38.0 39.7 36.7 1.42 -1.30 NaN 1970 2 NaN
## 8 001010 4165A 39.5 39.7 36.6 1.44 5.53 NaN 1970 2 NaN
## 9 001010 4165A 43.2 43.5 39.0 1.44 9.49 NaN 1970 2 NaN
## 10 001010 4165A 41.5 44.2 41.0 1.44 -4.05 NaN 1970 3 NaN
## # ... with 483,198 more rows, and 12 more variables: cshoq <dbl>,
## # dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## # seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>, sp500 <int>,
## # date <int>
summary(Data)
## gvkey tic prccm prchm
## Length:483208 Length:483208 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 18.62 1st Qu.: 20.00
## Mode :character Mode :character Median : 30.12 Median : 32.25
## Mean : 38.02 Mean : 40.38
## 3rd Qu.: 46.00 3rd Qu.: 48.68
## Max. :4736.00 Max. :5059.00
## NA's :2722 NA's :3955
## prclm trfm trt1m
## Min. : 0.00 Min. : 1.000 Min. : -99.866
## 1st Qu.: 17.12 1st Qu.: 1.081 1st Qu.: -4.412
## Median : 28.00 Median : 1.537 Median : 0.944
## Mean : 35.34 Mean : 2.989 Mean : 1.574
## 3rd Qu.: 42.88 3rd Qu.: 2.683 3rd Qu.: 6.647
## Max. :4467.00 Max. :265.278 Max. :9900.000
## NA's :3947 NA's :1655 NA's :2330
## cshom fyearq fqtr ceqq
## Min. :0.000e+00 Min. :1969 Min. :1.000 Min. :-136332.0
## 1st Qu.:8.616e+07 1st Qu.:1980 1st Qu.:1.000 1st Qu.: 233.1
## Median :1.747e+08 Median :1992 Median :2.000 Median : 826.8
## Mean :4.185e+08 Mean :1992 Mean :2.497 Mean : 3142.2
## 3rd Qu.:3.818e+08 3rd Qu.:2003 3rd Qu.:3.000 3rd Qu.: 2444.8
## Max. :2.906e+10 Max. :2017 Max. :4.000 Max. : 263025.0
## NA's :309953 NA's :12011 NA's :12011 NA's :35017
## cshoq dlttq epsfxq
## Min. : 0.00 Min. : 0.0 Min. :-990.000
## 1st Qu.: 17.30 1st Qu.: 71.6 1st Qu.: 0.240
## Median : 54.61 Median : 400.5 Median : 0.510
## Mean : 199.03 Mean : 3936.1 Mean : 0.917
## 3rd Qu.: 161.90 3rd Qu.: 1686.1 3rd Qu.: 0.888
## Max. :29206.44 Max. :3160074.0 Max. :3330.000
## NA's :29333 NA's :36312 NA's :26967
## ibcomq ibq saleq
## Min. :-62059.00 Min. :-61659.00 Min. :-25623.0
## 1st Qu.: 4.00 1st Qu.: 4.20 1st Qu.: 141.4
## Median : 21.03 Median : 21.93 Median : 459.6
## Mean : 97.22 Mean : 100.67 Mean : 1607.6
## 3rd Qu.: 79.00 3rd Qu.: 80.97 3rd Qu.: 1329.8
## Max. :127090.00 Max. :127140.00 Max. :131565.0
## NA's :14323 NA's :14302 NA's :15674
## seqq dvpsxq FY_1 LTG
## Min. :-91142.0 Min. : 0.000 Min. :-94.12 Min. :-708.10
## 1st Qu.: 243.2 1st Qu.: 0.000 1st Qu.: 0.12 1st Qu.: 9.47
## Median : 857.8 Median : 0.160 Median : 0.32 Median : 12.25
## Mean : 3253.2 Mean : 0.228 Mean : 0.44 Mean : 13.46
## 3rd Qu.: 2511.0 3rd Qu.: 0.340 3rd Qu.: 0.63 3rd Qu.: 15.86
## Max. :270083.0 Max. :216.350 Max. : 29.88 Max. : 310.90
## NA's :28793 NA's :12993 NA's :263543 NA's :273410
## sp500 date
## Min. :0.0000 Min. :19700131
## 1st Qu.:0.0000 1st Qu.:19800831
## Median :1.0000 Median :19920131
## Mean :0.5815 Mean :19918209
## 3rd Qu.:1.0000 3rd Qu.:20030531
## Max. :1.0000 Max. :20161130
##
Defining the columns of the dataset.
Let’s do some data pre-processing. From the summary we see that the min of “sp500” variable is 0. This suggests that the data has entities which were never part of the index. Let’s take those out.
Data <- Data[-which(Data$sp500 == 0e-7), ]
summary(Data)
## gvkey tic prccm
## Length:280977 Length:280977 Min. : 0.187
## Class :character Class :character 1st Qu.: 23.120
## Mode :character Mode :character Median : 35.250
## Mean : 42.816
## 3rd Qu.: 52.330
## Max. :1474.230
## NA's :77
## prchm prclm trfm
## Min. : 0.234 Min. : 0.125 Min. : 1.000
## 1st Qu.: 24.810 1st Qu.: 21.375 1st Qu.: 1.238
## Median : 37.490 Median : 32.875 Median : 1.787
## Mean : 45.358 Mean : 39.932 Mean : 3.040
## 3rd Qu.: 55.250 3rd Qu.: 48.875 3rd Qu.: 2.991
## Max. :1501.791 Max. :1429.580 Max. :145.620
## NA's :79 NA's :79 NA's :13
## trt1m cshom fyearq fqtr
## Min. :-86.8645 Min. :4.624e+06 Min. :1969 Min. :1.000
## 1st Qu.: -4.1807 1st Qu.:1.460e+08 1st Qu.:1981 1st Qu.:1.000
## Median : 0.9091 Median :2.693e+08 Median :1993 Median :2.000
## Mean : 1.1241 Mean :5.704e+08 Mean :1993 Mean :2.498
## 3rd Qu.: 6.1333 3rd Qu.:5.174e+08 3rd Qu.:2004 3rd Qu.:4.000
## Max. :262.6632 Max. :2.906e+10 Max. :2017 Max. :4.000
## NA's :51 NA's :169484 NA's :1633 NA's :1633
## ceqq cshoq dlttq epsfxq
## Min. :-91142 Min. : 0.00 Min. : 0.0 Min. : -75.120
## 1st Qu.: 582 1st Qu.: 31.12 1st Qu.: 172.6 1st Qu.: 0.310
## Median : 1578 Median : 99.16 Median : 749.7 Median : 0.600
## Mean : 4816 Mean : 286.98 Mean : 4473.6 Mean : 0.682
## 3rd Qu.: 4377 3rd Qu.: 259.04 3rd Qu.: 2956.0 3rd Qu.: 1.000
## Max. :263025 Max. :29206.44 Max. :628148.0 Max. :1969.000
## NA's :14754 NA's :10093 NA's :12770 NA's :9033
## ibcomq ibq saleq
## Min. :-62059.0 Min. :-61659.0 Min. :-25623.0
## 1st Qu.: 10.5 1st Qu.: 11.0 1st Qu.: 326.7
## Median : 44.9 Median : 46.0 Median : 854.1
## Mean : 153.9 Mean : 156.5 Mean : 2373.5
## 3rd Qu.: 145.0 3rd Qu.: 147.0 3rd Qu.: 2181.0
## Max. : 21386.0 Max. : 21386.0 Max. :131565.0
## NA's :2398 NA's :2380 NA's :3013
## seqq dvpsxq FY_1 LTG
## Min. :-91142.0 Min. : 0.0000 Min. :-22.86 Min. :-92.50
## 1st Qu.: 600.2 1st Qu.: 0.0741 1st Qu.: 0.20 1st Qu.: 9.18
## Median : 1591.6 Median : 0.2250 Median : 0.41 Median : 11.67
## Mean : 4888.7 Mean : 0.2714 Mean : 0.54 Mean : 12.21
## 3rd Qu.: 4461.0 3rd Qu.: 0.3800 3rd Qu.: 0.73 3rd Qu.: 14.55
## Max. :270083.0 Max. :65.0000 Max. : 29.88 Max. :277.51
## NA's :10172 NA's :1704 NA's :132174 NA's :134865
## sp500 date
## Min. :1 Min. :19700131
## 1st Qu.:1 1st Qu.:19810930
## Median :1 Median :19930630
## Mean :1 Mean :19929874
## 3rd Qu.:1 3rd Qu.:20050228
## Max. :1 Max. :20161031
##
Date_dat <- round(Data[, "date"]/100)
Data <- Data %>% select(- tic, - sp500, - date) %>%
bind_cols(Date_dat) %>%
select(date, gvkey:LTG)
Data
## # A tibble: 280,977 x 21
## date gvkey prccm prchm prclm trfm trt1m cshom fyearq fqtr ceqq
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 197001 001010 46.4 49.9 46.1 1.38 -5.36 NaN 1969 4 192.
## 2 197002 001010 48.1 48.4 44.4 1.40 5.07 NaN 1969 4 192.
## 3 197003 001010 49.2 50.2 47.5 1.40 2.34 NaN 1969 4 192.
## 4 197004 001010 45.5 51.5 42.5 1.40 -7.61 NaN 1970 1 NaN
## 5 197005 001010 38.5 45.1 35.6 1.42 -14.1 NaN 1970 1 NaN
## 6 197006 001010 38.5 42.6 38.5 1.42 0 NaN 1970 1 NaN
## 7 197007 001010 38.0 39.7 36.7 1.42 -1.30 NaN 1970 2 NaN
## 8 197008 001010 39.5 39.7 36.6 1.44 5.53 NaN 1970 2 NaN
## 9 197009 001010 43.2 43.5 39.0 1.44 9.49 NaN 1970 2 NaN
## 10 197010 001010 41.5 44.2 41.0 1.44 -4.05 NaN 1970 3 NaN
## # ... with 280,967 more rows, and 10 more variables: cshoq <dbl>,
## # dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## # seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>
summary(Data)
## date gvkey prccm prchm
## Min. :197001 Length:280977 Min. : 0.187 Min. : 0.234
## 1st Qu.:198109 Class :character 1st Qu.: 23.120 1st Qu.: 24.810
## Median :199306 Mode :character Median : 35.250 Median : 37.490
## Mean :199298 Mean : 42.816 Mean : 45.358
## 3rd Qu.:200502 3rd Qu.: 52.330 3rd Qu.: 55.250
## Max. :201610 Max. :1474.230 Max. :1501.791
## NA's :77 NA's :79
## prclm trfm trt1m
## Min. : 0.125 Min. : 1.000 Min. :-86.8645
## 1st Qu.: 21.375 1st Qu.: 1.238 1st Qu.: -4.1807
## Median : 32.875 Median : 1.787 Median : 0.9091
## Mean : 39.932 Mean : 3.040 Mean : 1.1241
## 3rd Qu.: 48.875 3rd Qu.: 2.991 3rd Qu.: 6.1333
## Max. :1429.580 Max. :145.620 Max. :262.6632
## NA's :79 NA's :13 NA's :51
## cshom fyearq fqtr ceqq
## Min. :4.624e+06 Min. :1969 Min. :1.000 Min. :-91142
## 1st Qu.:1.460e+08 1st Qu.:1981 1st Qu.:1.000 1st Qu.: 582
## Median :2.693e+08 Median :1993 Median :2.000 Median : 1578
## Mean :5.704e+08 Mean :1993 Mean :2.498 Mean : 4816
## 3rd Qu.:5.174e+08 3rd Qu.:2004 3rd Qu.:4.000 3rd Qu.: 4377
## Max. :2.906e+10 Max. :2017 Max. :4.000 Max. :263025
## NA's :169484 NA's :1633 NA's :1633 NA's :14754
## cshoq dlttq epsfxq
## Min. : 0.00 Min. : 0.0 Min. : -75.120
## 1st Qu.: 31.12 1st Qu.: 172.6 1st Qu.: 0.310
## Median : 99.16 Median : 749.7 Median : 0.600
## Mean : 286.98 Mean : 4473.6 Mean : 0.682
## 3rd Qu.: 259.04 3rd Qu.: 2956.0 3rd Qu.: 1.000
## Max. :29206.44 Max. :628148.0 Max. :1969.000
## NA's :10093 NA's :12770 NA's :9033
## ibcomq ibq saleq
## Min. :-62059.0 Min. :-61659.0 Min. :-25623.0
## 1st Qu.: 10.5 1st Qu.: 11.0 1st Qu.: 326.7
## Median : 44.9 Median : 46.0 Median : 854.1
## Mean : 153.9 Mean : 156.5 Mean : 2373.5
## 3rd Qu.: 145.0 3rd Qu.: 147.0 3rd Qu.: 2181.0
## Max. : 21386.0 Max. : 21386.0 Max. :131565.0
## NA's :2398 NA's :2380 NA's :3013
## seqq dvpsxq FY_1 LTG
## Min. :-91142.0 Min. : 0.0000 Min. :-22.86 Min. :-92.50
## 1st Qu.: 600.2 1st Qu.: 0.0741 1st Qu.: 0.20 1st Qu.: 9.18
## Median : 1591.6 Median : 0.2250 Median : 0.41 Median : 11.67
## Mean : 4888.7 Mean : 0.2714 Mean : 0.54 Mean : 12.21
## 3rd Qu.: 4461.0 3rd Qu.: 0.3800 3rd Qu.: 0.73 3rd Qu.: 14.55
## Max. :270083.0 Max. :65.0000 Max. : 29.88 Max. :277.51
## NA's :10172 NA's :1704 NA's :132174 NA's :134865
Next, we move on to calculate the factors. The factors are inspired from CapitalIQ’s Alpha Factor Library. Let’s list them out,
Before we can start the calculation process, it might be feasible to create panel data tables for each raw variable which will be used to calculate these factors. This way we can directly call the variables from the local environment. We can use the “spread” function from the “tidyr” package by taking our initial data and create variable tables based on the gvkey.
for (i in 3:ncol(Data)) {
samp <- Data[, c(1, 2, i)]
colnames(samp)[3] <- c("x")
assign(colnames(Data)[i], as.matrix(spread(samp, gvkey, x)))
}
Let’s call some of the variable tables
as.tibble(prccm)
## # A tibble: 562 x 1,503
## date `001010` `001013` `001040` `001043` `001045` `001062` `001075`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 197001 46.4 NA 17.6 50.0 23.8 29.9 NA
## 2 197002 48.1 NA 21 44.2 29.9 33.1 NA
## 3 197003 49.2 NA 21.5 46.9 26.9 41.4 NA
## 4 197004 45.5 NA 19.9 34.5 23.9 38.5 NA
## 5 197005 38.5 NA 19.2 27.5 24.1 37.6 NA
## 6 197006 38.5 NA 21 24.0 16.2 43.2 NA
## 7 197007 38.0 NA 23.6 26.9 18.9 41.1 NA
## 8 197008 39.5 NA 25.8 31.9 21.8 44.4 NA
## 9 197009 43.2 NA 26.9 31.7 21 42.2 NA
## 10 197010 41.5 NA 25.9 26.9 19.4 44.8 NA
## # ... with 552 more rows, and 1,495 more variables: `001078` <dbl>,
## # `001098` <dbl>, `001133` <dbl>, `001161` <dbl>, `001164` <dbl>,
## # `001177` <dbl>, `001194` <dbl>, `001209` <dbl>, `001215` <dbl>,
## # `001221` <dbl>, `001230` <dbl>, `001239` <dbl>, `001240` <dbl>,
## # `001243` <dbl>, `001246` <dbl>, `001253` <dbl>, `001279` <dbl>,
## # `001300` <dbl>, `001301` <dbl>, `001308` <dbl>, `001318` <dbl>,
## # `001327` <dbl>, `001356` <dbl>, `001359` <dbl>, `001362` <dbl>,
## # `001365` <dbl>, `001380` <dbl>, `001387` <dbl>, `001392` <dbl>,
## # `001394` <dbl>, `001408` <dbl>, `001409` <dbl>, `001414` <dbl>,
## # `001420` <dbl>, `001428` <dbl>, `001429` <dbl>, `001430` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001465` <dbl>,
## # `001468` <dbl>, `001478` <dbl>, `001481` <dbl>, `001485` <dbl>,
## # `001487` <dbl>, `001489` <dbl>, `001517` <dbl>, `001528` <dbl>,
## # `001567` <dbl>, `001572` <dbl>, `001573` <dbl>, `001581` <dbl>,
## # `001596` <dbl>, `001598` <dbl>, `001602` <dbl>, `001608` <dbl>,
## # `001609` <dbl>, `001619` <dbl>, `001620` <dbl>, `001623` <dbl>,
## # `001628` <dbl>, `001632` <dbl>, `001651` <dbl>, `001661` <dbl>,
## # `001663` <dbl>, `001678` <dbl>, `001690` <dbl>, `001704` <dbl>,
## # `001722` <dbl>, `001738` <dbl>, `001746` <dbl>, `001753` <dbl>,
## # `001755` <dbl>, `001758` <dbl>, `001762` <dbl>, `001789` <dbl>,
## # `001794` <dbl>, `001803` <dbl>, `001831` <dbl>, `001837` <dbl>,
## # `001848` <dbl>, `001878` <dbl>, `001891` <dbl>, `001913` <dbl>,
## # `001920` <dbl>, `001925` <dbl>, `001965` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001990` <dbl>, `001995` <dbl>, `001998` <dbl>,
## # `002014` <dbl>, `002019` <dbl>, `002024` <dbl>, `002029` <dbl>,
## # `002044` <dbl>, `002051` <dbl>, `002055` <dbl>, ...
as.tibble(trt1m)
## # A tibble: 562 x 1,503
## date `001010` `001013` `001040` `001043` `001045` `001062` `001075`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 197001 -5.36 NA -4.73 -18.0 -22.8 8.64 NA
## 2 197002 5.07 NA 20.4 -10.8 26.6 10.9 NA
## 3 197003 2.34 NA 2.38 5.93 -10.0 24.9 NA
## 4 197004 -7.61 NA -7.56 -26.4 -11.2 -6.95 NA
## 5 197005 -14.1 NA -2.01 -19.3 1.88 -1.36 NA
## 6 197006 0 NA 9.09 -12.7 -32.6 15.0 NA
## 7 197007 -1.30 NA 12.5 12.0 16.2 -4.91 NA
## 8 197008 5.53 NA 9.95 18.6 16.3 7.90 NA
## 9 197009 9.49 NA 4.37 0.706 -3.45 -4.79 NA
## 10 197010 -4.05 NA -3.72 -15.4 -7.74 6.75 NA
## # ... with 552 more rows, and 1,495 more variables: `001078` <dbl>,
## # `001098` <dbl>, `001133` <dbl>, `001161` <dbl>, `001164` <dbl>,
## # `001177` <dbl>, `001194` <dbl>, `001209` <dbl>, `001215` <dbl>,
## # `001221` <dbl>, `001230` <dbl>, `001239` <dbl>, `001240` <dbl>,
## # `001243` <dbl>, `001246` <dbl>, `001253` <dbl>, `001279` <dbl>,
## # `001300` <dbl>, `001301` <dbl>, `001308` <dbl>, `001318` <dbl>,
## # `001327` <dbl>, `001356` <dbl>, `001359` <dbl>, `001362` <dbl>,
## # `001365` <dbl>, `001380` <dbl>, `001387` <dbl>, `001392` <dbl>,
## # `001394` <dbl>, `001408` <dbl>, `001409` <dbl>, `001414` <dbl>,
## # `001420` <dbl>, `001428` <dbl>, `001429` <dbl>, `001430` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001465` <dbl>,
## # `001468` <dbl>, `001478` <dbl>, `001481` <dbl>, `001485` <dbl>,
## # `001487` <dbl>, `001489` <dbl>, `001517` <dbl>, `001528` <dbl>,
## # `001567` <dbl>, `001572` <dbl>, `001573` <dbl>, `001581` <dbl>,
## # `001596` <dbl>, `001598` <dbl>, `001602` <dbl>, `001608` <dbl>,
## # `001609` <dbl>, `001619` <dbl>, `001620` <dbl>, `001623` <dbl>,
## # `001628` <dbl>, `001632` <dbl>, `001651` <dbl>, `001661` <dbl>,
## # `001663` <dbl>, `001678` <dbl>, `001690` <dbl>, `001704` <dbl>,
## # `001722` <dbl>, `001738` <dbl>, `001746` <dbl>, `001753` <dbl>,
## # `001755` <dbl>, `001758` <dbl>, `001762` <dbl>, `001789` <dbl>,
## # `001794` <dbl>, `001803` <dbl>, `001831` <dbl>, `001837` <dbl>,
## # `001848` <dbl>, `001878` <dbl>, `001891` <dbl>, `001913` <dbl>,
## # `001920` <dbl>, `001925` <dbl>, `001965` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001990` <dbl>, `001995` <dbl>, `001998` <dbl>,
## # `002014` <dbl>, `002019` <dbl>, `002024` <dbl>, `002029` <dbl>,
## # `002044` <dbl>, `002051` <dbl>, `002055` <dbl>, ...
as.tibble(saleq)
## # A tibble: 562 x 1,503
## date `001010` `001013` `001040` `001043` `001045` `001062` `001075`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 197001 83.6 NA 170. 96.1 284. 0.809 NA
## 2 197002 83.6 NA 170. 102. 284. 0.809 NA
## 3 197003 83.6 NA 170. 102. 284. 1.45 NA
## 4 197004 86.8 NA 151. 102. 266. 1.45 NA
## 5 197005 86.8 NA 151. 109. 266. 1.45 NA
## 6 197006 86.8 NA 151. 109. 266. 0.801 NA
## 7 197007 90.3 NA 163. 109. 288. 0.801 NA
## 8 197008 90.3 NA 163. 113. 288. 0.801 NA
## 9 197009 90.3 NA 163. 113. 288. 1.30 NA
## 10 197010 86.1 NA 158. 113. 310. 1.30 NA
## # ... with 552 more rows, and 1,495 more variables: `001078` <dbl>,
## # `001098` <dbl>, `001133` <dbl>, `001161` <dbl>, `001164` <dbl>,
## # `001177` <dbl>, `001194` <dbl>, `001209` <dbl>, `001215` <dbl>,
## # `001221` <dbl>, `001230` <dbl>, `001239` <dbl>, `001240` <dbl>,
## # `001243` <dbl>, `001246` <dbl>, `001253` <dbl>, `001279` <dbl>,
## # `001300` <dbl>, `001301` <dbl>, `001308` <dbl>, `001318` <dbl>,
## # `001327` <dbl>, `001356` <dbl>, `001359` <dbl>, `001362` <dbl>,
## # `001365` <dbl>, `001380` <dbl>, `001387` <dbl>, `001392` <dbl>,
## # `001394` <dbl>, `001408` <dbl>, `001409` <dbl>, `001414` <dbl>,
## # `001420` <dbl>, `001428` <dbl>, `001429` <dbl>, `001430` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001465` <dbl>,
## # `001468` <dbl>, `001478` <dbl>, `001481` <dbl>, `001485` <dbl>,
## # `001487` <dbl>, `001489` <dbl>, `001517` <dbl>, `001528` <dbl>,
## # `001567` <dbl>, `001572` <dbl>, `001573` <dbl>, `001581` <dbl>,
## # `001596` <dbl>, `001598` <dbl>, `001602` <dbl>, `001608` <dbl>,
## # `001609` <dbl>, `001619` <dbl>, `001620` <dbl>, `001623` <dbl>,
## # `001628` <dbl>, `001632` <dbl>, `001651` <dbl>, `001661` <dbl>,
## # `001663` <dbl>, `001678` <dbl>, `001690` <dbl>, `001704` <dbl>,
## # `001722` <dbl>, `001738` <dbl>, `001746` <dbl>, `001753` <dbl>,
## # `001755` <dbl>, `001758` <dbl>, `001762` <dbl>, `001789` <dbl>,
## # `001794` <dbl>, `001803` <dbl>, `001831` <dbl>, `001837` <dbl>,
## # `001848` <dbl>, `001878` <dbl>, `001891` <dbl>, `001913` <dbl>,
## # `001920` <dbl>, `001925` <dbl>, `001965` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001990` <dbl>, `001995` <dbl>, `001998` <dbl>,
## # `002014` <dbl>, `002019` <dbl>, `002024` <dbl>, `002029` <dbl>,
## # `002044` <dbl>, `002051` <dbl>, `002055` <dbl>, ...
As we can see, the tables are organized such that each column corresponds to one constituent of the index and each row contains the variable values at a point in time. This will ease our factor calculation process.
Let’s move on to calculating the factors…
t <- nrow(ceqq)
n <- ncol(ceqq)
n_id <- colnames(ceqq)[2:n]
t_id <- ceqq[,1]
hl1m <- function(prchm, prccm, prclm){
t <- nrow(prchm)
HL1M <- (prchm - prccm)/(prccm - prclm)
rownames(HL1M) <- prchm[,1]
return(HL1M[13:t,])
}
HL1M <- hl1m(prchm = prchm[, 2:n], prccm = prccm[,2:n], prclm = prclm[,2:n])
as.tibble(HL1M)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.789 NA 0.111 0.130 0.0784 0.786 NA 0.0833
## 2 4.40 NA 1 2.55 0.583 0.0488 NA 2
## 3 0.657 NA 0.132 0.133 0.294 0.438 NA 2.5
## 4 0.0882 NA 0.115 0.0333 1.03 0.615 NA 1.44
## 5 3.44 NA 1.07 0.159 0.529 7.88 NA 3.3
## 6 Inf NA 0.882 1.45 9.43 1.46 NA 4.91
## 7 51.0 NA 9 9.13 2.67 0.394 NA 2.15
## 8 0.0465 NA 0.0851 1.77 0.197 5.36 NA 0.216
## 9 1.00 NA 0.333 2.45 0.3 Inf NA 0.455
## 10 10.3 NA 4.8 6.20 1.79 10 NA 1.36
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Calculating Retention Ratio
ret_ratio <- function(dvpsxq, epsfxq){
t <- length(dvpsxq)
RetentionRatio <- 1 - rowSums(cbind(dvpsxq[13:t],
lag(dvpsxq, 3)[13:t],
lag(dvpsxq, 6)[13:t],
lag(dvpsxq, 9)[13:t]),
na.rm = TRUE)/
rowSums(cbind(epsfxq[13:t],
lag(epsfxq, 3)[13:t],
lag(epsfxq, 6)[13:t],
lag(epsfxq, 9)[13:t]),
na.rm = TRUE)
return(RetentionRatio)
}
RetentionRatio <- matrix(data = NA, nrow = t-12, ncol = n-1)
colnames(RetentionRatio) <- n_id
rownames(RetentionRatio) <- t_id[13:t]
for (i in 2:n){
RetentionRatio[, i-1] <- ret_ratio(dvpsxq = dvpsxq[,i],
epsfxq = epsfxq[,i])
}
as.tibble(RetentionRatio)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.273 NaN 0.561 0.125 -1.96 0.552 NaN 0.623
## 2 0.273 NaN 0.561 0.0551 -1.96 0.552 NaN 0.623
## 3 0.273 NaN 0.561 0.0551 -1.96 0.534 NaN 0.623
## 4 0.226 NaN 0.563 0.0551 -1.96 0.534 NaN 0.572
## 5 0.226 NaN 0.563 0.213 -1.96 0.534 NaN 0.572
## 6 0.226 NaN 0.563 0.213 -1.96 0.521 NaN 0.572
## 7 0.101 NaN 0.577 0.213 -3.21 0.521 NaN 0.495
## 8 0.101 NaN 0.577 -0.127 -3.21 0.521 NaN 0.495
## 9 0.101 NaN 0.577 -0.127 -3.21 0.502 NaN 0.495
## 10 0.0769 NaN 0.576 -0.127 0.439 0.502 NaN 0.418
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Calculating Trailing 12 Month EPS
ttm_ceps <- function(ceqq, cshoq){
t <- length(cshoq)
ttm_ceps <- rowSums(cbind(ceqq[13:t],
lag(ceqq, 3)[13:t],
lag(ceqq, 6)[13:t],
lag(ceqq, 9)[13:t]),
na.rm = TRUE)/
rowSums(cbind(cshoq[13:t],
lag(cshoq, 3)[13:t],
lag(cshoq, 6)[13:t],
lag(cshoq, 9)[13:t]),
na.rm = TRUE)
return(ttm_ceps)
}
TTMCEPS <- matrix(data = NA, nrow = t-12, ncol = n-1)
colnames(RetentionRatio) <- n_id
rownames(RetentionRatio) <- t_id[13:t]
for (i in 2:n){
TTMCEPS[, i-1] <- ttm_ceps(ceqq = ceqq[,i],
cshoq = cshoq[,i])
}
as.tibble(TTMCEPS)
## # A tibble: 550 x 1,502
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 35.0 NaN 11.2 12.5 18.7 31.5 NaN 19.2 8.73 13.5 NaN NaN
## 2 35.0 NaN 11.2 8.30 18.7 31.5 NaN 19.2 8.73 13.5 NaN NaN
## 3 35.0 NaN 11.2 8.30 18.7 32.4 NaN 19.2 8.73 13.5 NaN NaN
## 4 35.2 NaN 5.52 8.30 18.7 32.4 NaN 9.58 5.82 13.5 NaN NaN
## 5 35.2 NaN 5.52 6.23 18.7 32.4 NaN 9.58 5.82 13.5 NaN NaN
## 6 35.2 NaN 5.52 6.23 18.7 33.3 NaN 9.58 5.82 13.5 NaN NaN
## 7 35.2 NaN 3.66 6.23 18.7 33.3 NaN 6.38 4.37 13.5 NaN NaN
## 8 35.2 NaN 3.66 6.10 18.7 33.3 NaN 6.38 4.37 13.5 NaN NaN
## 9 35.2 NaN 3.66 6.10 18.7 32.9 NaN 6.38 4.37 13.5 NaN NaN
## 10 35.2 NaN 2.76 6.10 18.7 32.9 NaN 4.79 4.13 13.6 NaN NaN
## # ... with 540 more rows, and 1,490 more variables: V13 <dbl>, V14 <dbl>,
## # V15 <dbl>, V16 <dbl>, V17 <dbl>, V18 <dbl>, V19 <dbl>, V20 <dbl>,
## # V21 <dbl>, V22 <dbl>, V23 <dbl>, V24 <dbl>, V25 <dbl>, V26 <dbl>,
## # V27 <dbl>, V28 <dbl>, V29 <dbl>, V30 <dbl>, V31 <dbl>, V32 <dbl>,
## # V33 <dbl>, V34 <dbl>, V35 <dbl>, V36 <dbl>, V37 <dbl>, V38 <dbl>,
## # V39 <dbl>, V40 <dbl>, V41 <dbl>, V42 <dbl>, V43 <dbl>, V44 <dbl>,
## # V45 <dbl>, V46 <dbl>, V47 <dbl>, V48 <dbl>, V49 <dbl>, V50 <dbl>,
## # V51 <dbl>, V52 <dbl>, V53 <dbl>, V54 <dbl>, V55 <dbl>, V56 <dbl>,
## # V57 <dbl>, V58 <dbl>, V59 <dbl>, V60 <dbl>, V61 <dbl>, V62 <dbl>,
## # V63 <dbl>, V64 <dbl>, V65 <dbl>, V66 <dbl>, V67 <dbl>, V68 <dbl>,
## # V69 <dbl>, V70 <dbl>, V71 <dbl>, V72 <dbl>, V73 <dbl>, V74 <dbl>,
## # V75 <dbl>, V76 <dbl>, V77 <dbl>, V78 <dbl>, V79 <dbl>, V80 <dbl>,
## # V81 <dbl>, V82 <dbl>, V83 <dbl>, V84 <dbl>, V85 <dbl>, V86 <dbl>,
## # V87 <dbl>, V88 <dbl>, V89 <dbl>, V90 <dbl>, V91 <dbl>, V92 <dbl>,
## # V93 <dbl>, V94 <dbl>, V95 <dbl>, V96 <dbl>, V97 <dbl>, V98 <dbl>,
## # V99 <dbl>, V100 <dbl>, V101 <dbl>, V102 <dbl>, V103 <dbl>, V104 <dbl>,
## # V105 <dbl>, V106 <dbl>, V107 <dbl>, V108 <dbl>, V109 <dbl>,
## # V110 <dbl>, V111 <dbl>, V112 <dbl>, ...
# Calculating Sustainable Growth Rate
sus_grw_rate <- function(epsfxq, RetentionRatio, TTMCEPS){
t <- length(epsfxq)
sum_EPSQ <- rowSums(cbind(epsfxq[13:t],
lag(epsfxq, 3)[13:t],
lag(epsfxq, 6)[13:t],
lag(epsfxq, 9)[13:t]), na.rm = T)
SusGrwRate <- RetentionRatio*sum_EPSQ/TTMCEPS
return(SusGrwRate)
}
SusGrwRate <- matrix(data = NA, nrow = t-12, ncol = n-1)
colnames(SusGrwRate) <- n_id
rownames(SusGrwRate) <- t_id[13:t]
for (i in 2:n){
SusGrwRate[, i-1] <- sus_grw_rate(epsfxq = epsfxq[,i],
RetentionRatio = RetentionRatio[,(i-1)],
TTMCEPS = TTMCEPS[,(i-1)])
}
as.tibble(SusGrwRate)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0257 NaN 0.102 0.0161 -0.0284 0.0274 NaN 0.0950
## 2 0.0257 NaN 0.102 0.00843 -0.0284 0.0274 NaN 0.0950
## 3 0.0257 NaN 0.102 0.00843 -0.0284 0.0247 NaN 0.0950
## 4 0.0199 NaN 0.210 0.00843 -0.0284 0.0247 NaN 0.154
## 5 0.0199 NaN 0.210 0.0434 -0.0284 0.0247 NaN 0.154
## 6 0.0199 NaN 0.210 0.0434 -0.0284 0.0229 NaN 0.154
## 7 0.00767 NaN 0.336 0.0434 -0.0326 0.0229 NaN 0.169
## 8 0.00767 NaN 0.336 -0.0147 -0.0326 0.0229 NaN 0.169
## 9 0.00767 NaN 0.336 -0.0147 -0.0326 0.0214 NaN 0.169
## 10 0.00568 NaN 0.455 -0.0147 0.0251 0.0214 NaN 0.165
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Earnings Quality <- Net Profit Margin
net_profit_margin <- function(ibq, saleq){
t <- length(ibq)
NetProfitMargin <- rowSums(cbind(ibq[13:t],
lag(ibq, 3)[13:t],
lag(ibq, 6)[13:t],
lag(ibq, 9)[13:t]),
na.rm = TRUE)/
rowSums(cbind(saleq[13:t],
lag(saleq, 3)[13:t],
lag(saleq, 6)[13:t],
lag(saleq, 9)[13:t]),
na.rm = TRUE)
return(NetProfitMargin)
}
NetProfitMargin <- matrix(data = NA, nrow = t-12, ncol = n-1)
colnames(NetProfitMargin) <- n_id
rownames(NetProfitMargin) <- t_id[13:t]
for (i in 2:n){
NetProfitMargin[, i-1] <- net_profit_margin(ibq = ibq[,i],
saleq = saleq[,i])
}
as.tibble(NetProfitMargin)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0553 NaN 0.0577 0.0305 -0.0233 0.848 NaN 0.0875
## 2 0.0553 NaN 0.0577 0.0245 -0.0233 0.848 NaN 0.0875
## 3 0.0553 NaN 0.0577 0.0245 -0.0233 0.845 NaN 0.0875
## 4 0.0525 NaN 0.0590 0.0245 -0.0409 0.845 NaN 0.0767
## 5 0.0525 NaN 0.0590 0.0246 -0.0409 0.845 NaN 0.0767
## 6 0.0525 NaN 0.0590 0.0246 -0.0409 0.832 NaN 0.0767
## 7 0.0464 NaN 0.0601 0.0246 -0.0415 0.832 NaN 0.0650
## 8 0.0464 NaN 0.0601 0.0121 -0.0415 0.832 NaN 0.0650
## 9 0.0464 NaN 0.0601 0.0121 -0.0415 0.825 NaN 0.0650
## 10 0.0459 NaN 0.0595 0.0121 -0.0215 0.825 NaN 0.0565
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Valuation: Book to Price
bp_ratio <- function(ceqq, cshoq, prccm){
t <- nrow(ceqq)
BP <- ceqq/(cshoq*prccm)
return(BP[13:t,])
}
BP <- bp_ratio(ceqq = ceqq[, 2:n], cshoq = cshoq[, 2:n], prccm = prccm[, 2:n])
rownames(BP) <- rownames(HL1M)
as.tibble(BP)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.735 NA 0.373 NaN 0.662 0.745 NA 0.242
## 2 0.749 NA 0.368 NaN 0.677 0.672 NA 0.239
## 3 0.687 NA 0.320 NaN 0.656 0.699 NA 0.258
## 4 0.596 NA NaN NaN NaN 0.695 NA NaN
## 5 0.635 NA NaN NaN NaN 0.771 NA NaN
## 6 0.639 NA NaN NaN NaN 0.768 NA NaN
## 7 0.703 NA NaN NaN NaN 0.706 NA NaN
## 8 0.610 NA NaN 0.692 NaN 0.771 NA NaN
## 9 0.614 NA NaN 0.669 NaN 0.692 NA NaN
## 10 0.690 NA NaN 0.781 NaN 0.901 NA NaN
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Capital Efficiency: Return on Equity
roe <- function(ibcomq, ceqq){
t <- length(ibcomq)
ROE <- 4*rowSums(cbind(ibcomq[13:t],
lag(ibcomq, 3)[13:t],
lag(ibcomq, 6)[13:t],
lag(ibcomq, 9)[13:t]),
na.rm = TRUE)/
rowSums(cbind(ceqq[13:t],
lag(ceqq, 3)[13:t],
lag(ceqq, 6)[13:t],
lag(ceqq, 9)[13:t]),
na.rm = TRUE)
return(ROE)
}
ROE <- matrix(data = NA, nrow = t-12, ncol = n-1)
colnames(ROE) <- n_id
rownames(ROE) <- t_id[13:t]
for (i in 2:n){
ROE[, i-1] <- roe(ibcomq = ibcomq[,i],
ceqq = ceqq[,i])
}
as.tibble(ROE)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.377 NaN 0.730 0.256 -0.279 0.0496 NaN 0.614
## 2 0.377 NaN 0.730 0.204 -0.279 0.0496 NaN 0.614
## 3 0.377 NaN 0.730 0.204 -0.279 0.0464 NaN 0.614
## 4 0.177 NaN 0.748 0.204 -0.488 0.0464 NaN 0.538
## 5 0.177 NaN 0.748 0.205 -0.488 0.0464 NaN 0.538
## 6 0.177 NaN 0.748 0.205 -0.488 0.0439 NaN 0.538
## 7 0.101 NaN 0.781 0.205 -0.506 0.0439 NaN 0.455
## 8 0.101 NaN 0.781 0.102 -0.506 0.0439 NaN 0.455
## 9 0.101 NaN 0.781 0.102 -0.506 0.0427 NaN 0.455
## 10 0.0739 NaN 0.809 0.102 -0.272 0.0427 NaN 0.395
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Log Market Cap
logmcap <- function(cshom, prccm){
LogMCap <- log(cshom*prccm)
return(LogMCap)
}
LogMCap <- logmcap(cshom = cshom[, 2:n], prccm = prccm[, 2:n])
LogMCap <- LogMCap[13:t,]
as.tibble(LogMCap)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NaN NA NaN NaN NaN NaN NA NaN
## 2 NaN NA NaN NaN NaN NaN NA NaN
## 3 NaN NA NaN NaN NaN NaN NA NaN
## 4 NaN NA NaN NaN NaN NaN NA NaN
## 5 NaN NA NaN NaN NaN NaN NA NaN
## 6 NaN NA NaN NaN NaN NaN NA NaN
## 7 NaN NA NaN NaN NaN NaN NA NaN
## 8 NaN NA NaN NaN NaN NaN NA NaN
## 9 NaN NA NaN NaN NaN NaN NA NaN
## 10 NaN NA NaN NaN NaN NaN NA NaN
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Expected Long term Growth
LTG <- LTG[13:t,]
rownames(LTG) <- LTG[,1]
LTG <- LTG[, -1]
as.tibble(LTG)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NaN NA NaN NaN NaN NaN NA NaN
## 2 NaN NA NaN NaN NaN NaN NA NaN
## 3 NaN NA NaN NaN NaN NaN NA NaN
## 4 NaN NA NaN NaN NaN NaN NA NaN
## 5 NaN NA NaN NaN NaN NaN NA NaN
## 6 NaN NA NaN NaN NaN NaN NA NaN
## 7 NaN NA NaN NaN NaN NaN NA NaN
## 8 NaN NA NaN NaN NaN NaN NA NaN
## 9 NaN NA NaN NaN NaN NaN NA NaN
## 10 NaN NA NaN NaN NaN NaN NA NaN
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# Volatility: 12M Realized Price Volatility
r_i <- matrix(nrow = t-1, ncol = n-1)
colnames(r_i) <- n_id
rownames(r_i) <- t_id[2:t]
ret_i <- function(prccm){
r_i <- log(prccm/lag(prccm, 1))
return(r_i)
}
for (k in 2:n){
r_i[, k-1] <- ret_i(prccm = prccm[, k])[2:t]
}
AnnVol12M <- matrix(nrow = t-12, ncol = n-1)
colnames(AnnVol12M) <- n_id
rownames(AnnVol12M) <- t_id[13:t]
Ann_Vol12M <- function(r_i, n = 12){
t <- length(r_i)
ret_lags <- r_i[12:t]
for (i in 1:(n-1)){
ret_lags <- cbind(ret_lags,
lag(r_i, i)[n:t])
}
AnnVol12M <- sqrt(n)*sqrt(rowSums(ret_lags^2)/11)
return(AnnVol12M)
}
for (k in 1:(n-1)){
AnnVol12M[, k] <- Ann_Vol12M(r_i = r_i[,k])
}
as.tibble(AnnVol12M)
## # A tibble: 550 x 1,502
## `001010` `001013` `001040` `001043` `001045` `001062` `001075` `001078`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.246 NA 0.310 0.569 0.621 0.330 NA 0.191
## 2 0.244 NA 0.251 0.555 0.573 0.330 NA 0.189
## 3 0.259 NA 0.290 0.580 0.564 0.240 NA 0.203
## 4 0.291 NA 0.279 0.495 0.569 0.228 NA 0.196
## 5 0.242 NA 0.279 0.473 0.569 0.251 NA 0.168
## 6 0.243 NA 0.265 0.456 0.459 0.211 NA 0.182
## 7 0.263 NA 0.235 0.501 0.432 0.223 NA 0.212
## 8 0.299 NA 0.223 0.469 0.469 0.227 NA 0.228
## 9 0.283 NA 0.219 0.470 0.471 0.221 NA 0.211
## 10 0.306 NA 0.217 0.466 0.464 0.348 NA 0.225
## # ... with 540 more rows, and 1,494 more variables: `001098` <dbl>,
## # `001133` <dbl>, `001161` <dbl>, `001164` <dbl>, `001177` <dbl>,
## # `001194` <dbl>, `001209` <dbl>, `001215` <dbl>, `001221` <dbl>,
## # `001230` <dbl>, `001239` <dbl>, `001240` <dbl>, `001243` <dbl>,
## # `001246` <dbl>, `001253` <dbl>, `001279` <dbl>, `001300` <dbl>,
## # `001301` <dbl>, `001308` <dbl>, `001318` <dbl>, `001327` <dbl>,
## # `001356` <dbl>, `001359` <dbl>, `001362` <dbl>, `001365` <dbl>,
## # `001380` <dbl>, `001387` <dbl>, `001392` <dbl>, `001394` <dbl>,
## # `001408` <dbl>, `001409` <dbl>, `001414` <dbl>, `001420` <dbl>,
## # `001428` <dbl>, `001429` <dbl>, `001430` <dbl>, `001440` <dbl>,
## # `001447` <dbl>, `001449` <dbl>, `001465` <dbl>, `001468` <dbl>,
## # `001478` <dbl>, `001481` <dbl>, `001485` <dbl>, `001487` <dbl>,
## # `001489` <dbl>, `001517` <dbl>, `001528` <dbl>, `001567` <dbl>,
## # `001572` <dbl>, `001573` <dbl>, `001581` <dbl>, `001596` <dbl>,
## # `001598` <dbl>, `001602` <dbl>, `001608` <dbl>, `001609` <dbl>,
## # `001619` <dbl>, `001620` <dbl>, `001623` <dbl>, `001628` <dbl>,
## # `001632` <dbl>, `001651` <dbl>, `001661` <dbl>, `001663` <dbl>,
## # `001678` <dbl>, `001690` <dbl>, `001704` <dbl>, `001722` <dbl>,
## # `001738` <dbl>, `001746` <dbl>, `001753` <dbl>, `001755` <dbl>,
## # `001758` <dbl>, `001762` <dbl>, `001789` <dbl>, `001794` <dbl>,
## # `001803` <dbl>, `001831` <dbl>, `001837` <dbl>, `001848` <dbl>,
## # `001878` <dbl>, `001891` <dbl>, `001913` <dbl>, `001920` <dbl>,
## # `001925` <dbl>, `001965` <dbl>, `001976` <dbl>, `001988` <dbl>,
## # `001990` <dbl>, `001995` <dbl>, `001998` <dbl>, `002014` <dbl>,
## # `002019` <dbl>, `002024` <dbl>, `002029` <dbl>, `002044` <dbl>,
## # `002051` <dbl>, `002055` <dbl>, `002080` <dbl>, ...
# total return
trt1m <- trt1m[13:t,]
trt1m <- trt1m[, -1]
Next, the stocks are sorted into quintiles (i.e., 5 bins) based on the values of various stock characteristics. Portfolios are rebalanced once a month. All portfolios are equal weighted. To calculate the returns of portfolios, we only take the average returns of the stocks in each of the quintile portfolio. The monthly total returns of stocks (including both capital gain and dividends, adjusted for corporate events such as stock splits) are provided in the dataset (trt1m). Next, we calculate the Q-Spread. For consistency, the quantile spread can be defined as the difference of the returns between the top and bottom portfolio in each month. \[QSpread = Top - Bottom\]
However, the definition of QSpread may change based on the underlying characteristics of the factors. For example, for Book to Price ratios, it has been investigated (Basu 1977, Fama & French 1993) that stocks with high book to price ratios tend to outperform stocks with low book to price ratios. In this case the aforementioned formula for Qspread will suffice. However, say, for annual 12M vol, research suggests (Ang et al. (2006, 2009)) that low volatility stocks tend to outperform high volatility stocks. In this case, we need to long the bottom quintiles and short the top quintiles. However, this may be debatable since the relationship across stocks with respect to volatility may invert given the nature of persistence and state of the economy we are in. Below is the list fundamental rationale for each factor:
# Sorting stocks into quintiles based on the values
# of various stock characteristics
# This function sorts returns based on factors and
# also calculates the Q spread
sort_q <- function(factor_data, trt1m, flip = F){
t <- nrow(factor_data)
k <- ncol(factor_data)
top20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(top20) <- colnames(factor_data)
bot20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(bot20) <- colnames(factor_data)
for (i in 1:(t-1)){
sort_rank <- sort(rank(factor_data[i, is.finite(factor_data[i,])],
na.last = NA, ties.method = "min"))
ID_bot20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank,
.20))])
ID_top20 <- names(sort_rank[which(sort_rank >= quantile(sort_rank,
.80))])
for (q in 1:length(ID_bot20)) {
bot20[i+1, ID_bot20[q]] <- trt1m[i+1, ID_bot20[q]]
}
for (j in 1:length(ID_top20)){
top20[i+1, ID_top20[j]] <- trt1m[i+1, ID_top20[j]]
}
}
# Evaluate the portfolio performance and calcualte Q-spread
PP_QS <- data.frame(matrix(data = NA, nrow = t, ncol = 3))
colnames(PP_QS) <- c("top20", "bot20", "Q-spread")
for (i in 1:(t-1)){
PP_QS[i, 1] <- mean(as.numeric(top20[i, 1:k]), na.rm = TRUE)
PP_QS[i, 2] <- mean(as.numeric(bot20[i, 1:k]), na.rm = TRUE)
}
if (flip == T){
PP_QS[, 3] <- PP_QS[, 2] - PP_QS[, 1]
} else {
# Q-spread
PP_QS[, 3] <- PP_QS[, 1] - PP_QS[, 2]
}
return (PP_QS)
}
# calculate q spread for each factors
Q_SusG <- sort_q(SusGrwRate, trt1m, flip = F)
Q_LTG <- sort_q(LTG, trt1m, flip = F)
Q_HL1M <- sort_q(HL1M, trt1m, flip = F)
Q_BP <- sort_q(BP, trt1m, flip = F)
Q_NetProfitMargin <- sort_q(NetProfitMargin, trt1m, flip = F)
Q_ROE <- sort_q(ROE, trt1m, flip = F)
Q_logmc <- sort_q(LogMCap, trt1m, flip = T)
Q_annvol <- sort_q(AnnVol12M, trt1m, flip = T)
Let’s generate some summary stats and correlation matrix
QSpreads <- cbind(Q_HL1M[,"Q-spread"], Q_SusG[,"Q-spread"],
Q_BP[,"Q-spread"], Q_ROE[,"Q-spread"],
Q_logmc[,"Q-spread"], Q_LTG[,"Q-spread"],
Q_NetProfitMargin[,"Q-spread"], Q_annvol[,"Q-spread"])
factorID <- c("HL1M", "SusG", "BP", "ROE", "logmc",
"LTG", "NetProfitMargin","AnnVol12M")
colnames(QSpreads) <- factorID
After obtaining the QSpreads, let’s build a function which calculates summary stats and a correlation matrix.
# Calculating Statistics for all factors
statistics <- function(QS_dat){
mean <- mean(QS_dat, na.rm = TRUE)
std <- sqrt(var(QS_dat, na.rm = TRUE))
sharpR <- mean*sqrt(12)/std
skew <- PerformanceAnalytics::skewness(QS_dat,na.rm = TRUE)
max <- max(QS_dat, na.rm = TRUE)
min <- min(QS_dat, na.rm = TRUE)
autocorr <- acf(na.omit(QS_dat), lag.max = 24, plot = F)
statistics <- c(mean, std, sharpR, skew, max, min,
autocorr$acf[2], autocorr$acf[13], autocorr$acf[25])
return (statistics)
}
stats_table <- t(data.frame(apply(QSpreads, 2, statistics)))
colnames(stats_table) <- c("Mean","Std","Sharpe Ratio","Skewness","Max","Min",
"Autocorrelation lag1","Autocorrelation lag12",
"Autocorrelation lag24")
rownames(stats_table) <- colnames(QSpreads)
stats_table
## Mean Std Sharpe Ratio Skewness Max
## HL1M 0.95001333 3.441828 0.95616137 1.3515607 22.67728
## SusG 0.19477150 3.227128 0.20907389 -0.8175633 14.05274
## BP 0.43441166 4.152452 0.36239939 1.0434139 28.10853
## ROE 0.16625780 3.601559 0.15991237 -1.2514105 12.09618
## logmc 0.74553782 4.380265 0.58960325 1.5559174 28.16044
## LTG -0.05537362 4.463583 -0.04297442 0.1689935 21.88016
## NetProfitMargin -0.05247470 3.812840 -0.04767515 -0.9982104 12.95718
## AnnVol12M -0.16897385 4.971579 -0.11773776 -0.9775217 17.22831
## Min Autocorrelation lag1 Autocorrelation lag12
## HL1M -10.37731 -0.086130236 0.04715092
## SusG -20.49229 0.059542790 0.13801354
## BP -15.65564 0.108537385 0.14658415
## ROE -24.43695 0.072678997 0.20432987
## logmc -10.12893 0.220105646 0.11643221
## LTG -18.68106 0.006124222 -0.04014567
## NetProfitMargin -22.21734 0.135769176 0.17989634
## AnnVol12M -32.19983 0.095273184 0.05120257
## Autocorrelation lag24
## HL1M 0.059028400
## SusG 0.051489485
## BP 0.092046046
## ROE 0.095067686
## logmc -0.009210851
## LTG -0.094533252
## NetProfitMargin 0.097132742
## AnnVol12M -0.032035171
# Inspecting the correlation plot
cor_Q <- cor(x = QSpreads, use = "pairwise.complete.obs")
corrplot::corrplot(corr = cor_Q)
From the table above, we can see that the Price Momentum (HL1M) factor performs the best out of all the factors just based on average return. The Sharpe ratio of the Price Momentum validates the performanceon a risk adjusted basis. Following Price Momentum, value (BP) and size factor have the highest expected returns and Sharpe ratios. However, out of the 8 factors, 4 of the factors including Analyst Expectation (LTG), Earnings Quality (NPM), Annual 12M Volatility and Capital Efficiency (ROE) have negative average returns.
Focusing on the tails, HL1M and BP again have the highest maximum return and the lowest minimum return while their returns are positively skewed. While ROE, size, NPM and historical growth (SusGrwRate) factors are all negatively skewed with a low minimum and maximum returns. This suggests that those factors have a relatively more tail exposure.
Let’s plot the QSpreads…
# plotting the q spreads
for(i in 1:ncol(QSpreads)){
barplot(height = QSpreads[,i], names.arg = ceqq[13:t, 1],
main = colnames(QSpreads)[i],
xlab = "Time", ylab = "QSpread", col = i)
}
Portfolio turnover is defined as the total number of transactions (both buying and selling) each month divided by the total number of stocks in the long-short portfolio. Since the portfolios are equal weighted, the key is to compute the overlap between the long (resp. short) positions in one month and the next month.
turnover <- function(factor_data){
t <- nrow(factor_data)
k <- ncol(factor_data)
top20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(top20) <- colnames(factor_data)
bot20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(bot20) <- colnames(factor_data)
# finding bottom 20
for (i in 1:(t-1)){
sort_rank <- sort(rank(factor_data[i,is.finite(factor_data[i, ])],
na.last = NA,ties.method = "min"))
ID_bot20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank,
.20))])
for (k in 1:length(ID_bot20)) {
bot20[i+1, ID_bot20[k]] <- trt1m[i+1, ID_bot20[k]]
}
}
# finding top 20
for (i in 1:(t-1)){
sort_rank <- sort(rank(factor_data[i, is.finite(factor_data[i,])],
na.last = NA, ties.method = "min"))
ID_top20 <- names(sort_rank[which(sort_rank>= quantile(sort_rank,
.80))])
for (j in 1:length(ID_top20)){
top20[i+1, ID_top20[j]] <- trt1m[i+1, ID_top20[j]]
}
}
# calculating turnover ratio
turnover_ratio <- sapply(data.frame(matrix(data = NA,
nrow = t,
ncol = 4)), as.numeric)
colnames(turnover_ratio) <- c("Top20 Transactions",
"Bot20 Transactions",
"# of stocks","Turnover Ratio")
for (i in 3:t){
common_top20 <- length(intersect(names(top20[i,- which(is.na(top20[i,]))]),
names(top20[i-1,-which(is.na(top20[i-1,]))])))
turnover_ratio[i, 1] <- length(top20[i,- which(is.na(top20[i,]))]) -
2*common_top20 +
length(top20[i-1,-which(is.na(top20[i-1,]))])
common_bot20 <- length(intersect(names(bot20[i,- which(is.na(bot20[i,]))]), names(bot20[i-1,-which(is.na(bot20[i-1,]))])))
turnover_ratio[i, 2] <- length(bot20[i, -which(is.na(bot20[i,]))]) - 2*common_bot20 +
length(bot20[i-1,- which(is.na(bot20[i-1,]))])
turnover_ratio[i, 3] <- length(top20[i-1,- which(is.na(top20[i-1, ]))]) + length(bot20[i-1,- which(is.na(bot20[i-1, ]))])
turnover_ratio[i, 4] <- (turnover_ratio[i, 1] +
turnover_ratio[i,2])/turnover_ratio[i, 3]
}
return(turnover_ratio)
}
turnover_HL1M <- turnover(HL1M)
turnover_SusG <- turnover(SusGrwRate)
turnover_BP <- turnover(BP)
turnover_ROE <- turnover(ROE)
turnover_logmc <- turnover(LogMCap)
turnover_LTG <- turnover(LTG)
turnover_NPM <- turnover(NetProfitMargin)
turnover_annvol <- turnover(AnnVol12M)
turnover_series <- cbind(turnover_HL1M[, "Turnover Ratio"],
turnover_SusG[, "Turnover Ratio"],
turnover_BP[, "Turnover Ratio"],
turnover_ROE[, "Turnover Ratio"],
turnover_logmc[, "Turnover Ratio"],
turnover_LTG[, "Turnover Ratio"],
turnover_NPM[, "Turnover Ratio"],
turnover_annvol[, "Turnover Ratio"])
colnames(turnover_series) <- c("HL1M", "SusG", "BP", "ROE", "logmc",
"LTG", "NetProfitMargin","AnnVol12M")
Plotting the turnover series for each factor…
for(i in 1:ncol(turnover_series)){
plot(y = turnover_series[,i], x = ceqq[13:t, 1],
main = paste("Turnover Ratio - ",colnames(turnover_series)[i]), type = "l",
xlab = "Time", ylab = "Turnover Ratio", col = i)
}
let’s calculate the average turnover took place for each factor throughout time.
mean_turnover_ratio <- data.frame(HL1M = mean(turnover_HL1M[, 4], na.rm = TRUE),
SusG = mean(turnover_SusG[, 4], na.rm = TRUE),
LTG = mean(turnover_LTG[is.finite(turnover_LTG[, 4]), 4],
na.rm = TRUE),
NPM = mean(turnover_NPM[, 4], na.rm = TRUE),
BP = mean(turnover_BP[, 4], na.rm = TRUE),
ROE = mean(turnover_ROE[, 4], na.rm = TRUE),
logmc = mean(turnover_logmc[is.finite(turnover_logmc[, 4]), 4],
na.rm = TRUE),
ann12Mvol = mean(turnover_annvol[, 4], na.rm = TRUE))
mean_turnover_ratio
## HL1M SusG LTG NPM BP ROE logmc
## 1 1.616903 0.1286362 0.2103071 0.08952728 0.1792418 0.1107698 0.08992325
## ann12Mvol
## 1 0.2399072
In terms of turnover ratio, Price Momentum has the highest turnover and NetProfitMargin shows the lowest turnover among the 8 factors. This shows that besides the higher risk adjusted return, price momentum portfolio will also have high transaction costs.
In order to check whether our calculated Q Spreads for each factor is statistically significant, t-tests are conducted. Using the time-series of the factor returns (QSpread), the t-test shows whether the mean of each factor significantly different from zero.
# T-tests
T_QS <- data.frame(t(sapply(apply(QSpreads, 2, t.test), c)))
Significance_factors <- data.frame(matrix(data = NA, nrow = 8, ncol = 3))
rownames(Significance_factors) <- factorID
colnames(Significance_factors) <- c("Avg_spread", "t_stats", "p_val")
Significance_factors[, "Avg_spread"] <- unlist(T_QS[,"estimate"])
Significance_factors[, "t_stats"] <- unlist(T_QS[,"statistic"])
Significance_factors[, "p_val"] <- unlist(T_QS[,"p.value"])
Significance_factors
## Avg_spread t_stats p_val
## HL1M 0.95001333 6.4614628 2.295610e-10
## SusG 0.19477150 1.4128611 1.582653e-01
## BP 0.43441166 2.4489906 1.463846e-02
## ROE 0.16625780 1.0806417 2.803327e-01
## logmc 0.74553782 2.5302614 1.209736e-02
## LTG -0.05537362 -0.2449919 8.065919e-01
## NetProfitMargin -0.05247470 -0.3221749 7.474433e-01
## AnnVol12M -0.16897385 -0.7956379 4.265877e-01
Here, based on the results obtained, we can see that assuming a 95% confidence level, the time series of QSpreads generated for Momentum, Value (BP) and Size are statistically significant.
The project outlines an overview of how to use financial data for publicly listed entities within S&P 500 and create long short portfolio based on CapitalIQ Factors calculated. The QSpreads calculated were analyzed from a statistical point of view and the significance of expected returns was evaluated based on a t-test. As we see, Price Momentum, Value and Size ends as more statistically significant factors while providing positive risk adjusted expected returns.