Financial Performance Analytics Made Easy

Author

Teal Emery

Does ESG Offer Better Risk-Adjusted Returns?

There are two key motivators for ESG investing

  1. Aligning investor’s portfolios with their values.
  2. Better pricing extra-financial risks from environmental, social, and governance factors not previously not widely considered in investment analysis.

The latter claim is complicated. The time horizons at which these risks manifest may be much longer than the horizons we’re looking at. But we’ll go ahead and see what we can about the risk-adjusted returns of ESG using traditional financial performance analytics tools.

We will do this two ways.

  1. We’ll examine the toolkit for financial performance analysis using tidyquant to look at the performance of the ESG ETF versus its traditional counterpart as a benchmark.
  2. We’ll then learn how to use performance analysis metrics as a tool for exploratory analysis with the returns of all of the stocks in the index (1.3 mn observations!)

First, let’s load our packages:

library(tidyverse) 
library(tidyquant)

ESG Performance Analysis

In this section we will rely heavily on the tidyquant R package. This package is a gem. It can:

  • Retrieve financial and economic data
  • calculate asset and portfolio returns
  • calculate performance analysis measures

And a lot more. Take the time to go through the tutorials.

Our Dataset

From the ETF screener we used in part one, we picked the following funds to look at. ESGU is the largest ESG fund, and the one we’ve been looking at. SUSA is another ESG fund that has a much longer history (ESGU only has approx 5 years of history). IVV is the largest regular ETF, and it’s a plain-vanilla S&P 500 tracker, so we’re using it as our equity benchmark. SHV is a short-term US Treasury ETF, and we’re using it as the risk-free rate.

assets_tbl <- tribble(~ticker, ~asset_type,
        "ESGU", "ESG Fund",
        "SUSA", "ESG Fund - Long History",
        "IVV", "Benchmark",
        "SHV", "Risk Free")

assets_tbl
# A tibble: 4 × 2
  ticker asset_type             
  <chr>  <chr>                  
1 ESGU   ESG Fund               
2 SUSA   ESG Fund - Long History
3 IVV    Benchmark              
4 SHV    Risk Free              

Getting the Data

It’s this easy. Use tidyquant::tq_get().

asset_prices_tbl <- assets_tbl |>
  tq_get()

asset_prices_tbl
# A tibble: 9,222 × 9
   ticker asset_type date        open  high   low close volume adjusted
   <chr>  <chr>      <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 ESGU   ESG Fund   2016-12-06  48.3  48.3  48.3  48.3    100     44.0
 2 ESGU   ESG Fund   2016-12-07  48.3  48.3  48.3  48.3      0     44.0
 3 ESGU   ESG Fund   2016-12-08  48.3  48.3  48.3  48.3      0     44.0
 4 ESGU   ESG Fund   2016-12-09  48.3  48.3  48.3  48.3      0     44.0
 5 ESGU   ESG Fund   2016-12-12  48.3  48.3  48.3  48.3      0     44.0
 6 ESGU   ESG Fund   2016-12-13  49.7  49.7  49.7  49.7    200     45.3
 7 ESGU   ESG Fund   2016-12-14  49.7  49.7  49.7  49.7      0     45.3
 8 ESGU   ESG Fund   2016-12-15  49.7  49.7  49.7  49.7      0     45.3
 9 ESGU   ESG Fund   2016-12-16  49.7  49.7  49.7  49.7      0     45.3
10 ESGU   ESG Fund   2016-12-19  49.7  49.7  49.7  49.7      0     45.3
# … with 9,212 more rows

The default data source is yahoo finance. The adjusted column on the right is useful because it adjusts for stock splits and dividends, so it can be used to calculate total returns.

If we were doing something beyond exploratory analysis, we might want to also adjust for fees. But that’s for later.

Let’s look at our data.

asset_prices_tbl |>
  # for each asset type
  group_by(asset_type) |>
  # what is the first date we have data?
  summarize(first_date = first(date),
            # what is the last date we have data?
            last_date = last(date),
            #how many observations do we have
            observations = n())
# A tibble: 4 × 4
  asset_type              first_date last_date  observations
  <chr>                   <date>     <date>            <int>
1 Benchmark               2013-01-02 2023-02-22         2553
2 ESG Fund                2016-12-06 2023-02-22         1563
3 ESG Fund - Long History 2013-01-02 2023-02-22         2553
4 Risk Free               2013-01-02 2023-02-22         2553

We see that it pulls approximately 10 years of data by default. ESGU was launched in 2016 and has data since then.

Looking At Our Data

Indexing data can be a great way to visualize relative financial performance over time. The adjusted column accounts for dividends and stock splits, and we can use that to calculate total returns. We’re going to look at gross returns and ignore management fees for the moment to keep the analysis straightforward.

p1 <- asset_prices_tbl |>
  # Filter to a date where all funds have data
  filter(date >= ymd("2016-12-31")) |>
  # group by asset type
  group_by(asset_type) |>
  # construct a total return index where 100 = the first date 
  mutate(index_100 = adjusted/first(adjusted) * 100) |>
  # make a simple line chart
  ggplot(aes(x = date, y = index_100, color = fct_reorder2(.f = asset_type, .x = date, .y = index_100))) +
  geom_line() +
  # the color label is long, so we're going to get rid of it to make the chart look better
  labs(color = "")

p1

You can read index charts like this as “If you invested $100 on day 1, how valuable would your investment be at day x?”

You can use all the tools that we’ve learned before to make this chart look prettier, but for now we’re just exploring so it just has to be functional.

What Does This Chart Tell You?

What information does this chart tell you? Take a minute and write down some bullet points of the key takeaways?

Let’s look at the longer horizon

p2 <- asset_prices_tbl |>
  # filtering out ESGU since it has a shorter history
  filter(ticker != "ESGU") |>
  group_by(asset_type) |>
  mutate(index_100 = adjusted/first(adjusted) * 100) |>
  ggplot(aes(x = date, y = index_100, color = fct_reorder2(.f = asset_type, .x = date, .y = index_100))) +
  geom_line() +
  labs(color = "")

p2

What Does This Chart Tell You?

What information does this chart tell you?

  • What is different from the chart above?

  • What hypotheses could this bring up that you might want to test in the future?

Calculating Returns using tq_transmute

We will calculate weekly returns using tidyquant::tq_transmute(), which allows us to use quantmod::periodReturn() using tidy tidy data.

asset_returns <- asset_prices_tbl |>
  #group by the ticker and asset_type columns
  group_by(ticker, asset_type) |>
  # we want to calculate returns based on the `adjusted` column
  # which accounts for stock splits & dividends
  tq_transmute(select     = adjusted,
               # we want to use the `periodReturn` function
               # look at ?periodReturn for more info
               mutate_fun = periodReturn,
               # on weekly returns
               period = "weekly") |>
  # always ungroup after grouping
  ungroup()

asset_returns
# A tibble: 1,915 × 4
   ticker asset_type date       weekly.returns
   <chr>  <chr>      <date>              <dbl>
 1 ESGU   ESG Fund   2016-12-09        0      
 2 ESGU   ESG Fund   2016-12-16        0.0292 
 3 ESGU   ESG Fund   2016-12-23       -0.00725
 4 ESGU   ESG Fund   2016-12-30       -0.00373
 5 ESGU   ESG Fund   2017-01-06        0.0149 
 6 ESGU   ESG Fund   2017-01-13       -0.00221
 7 ESGU   ESG Fund   2017-01-20       -0.00382
 8 ESGU   ESG Fund   2017-01-27        0.0117 
 9 ESGU   ESG Fund   2017-02-03        0      
10 ESGU   ESG Fund   2017-02-10        0.00718
# … with 1,905 more rows

Again, let’s work hard to be lazy and make a function:

calculate_weekly_returns <- function(data, col_rename = NULL) {
  
  data |>
  tq_transmute(select     = adjusted,
               mutate_fun = periodReturn,
               period = "weekly",
               # allows you to rename the resulting column
               col_rename = col_rename)
}

Should you use monthly, weekly, or daily returns?

You should use returns that fit the context of the analysis you are doing. But here are some things to think about:

  • Yearly returns could make sense if you’re looking at long time horizons (decades), especially if you’re going to be examining those returns against data that you have only have on a yearly basis.
  • Monthly returns are commonly used for investment analysis, but only if you have enough datapoints. A rule of thumb might be to only use monthly returns once you have something close to 10 years of data.
  • Weekly returns are useful when you have shorter time series. They strip out some of the noise and logistical issues (e.g. time zones) with daily returns.
  • Daily returns are useful if you’re trying to analyze daily activity. But they have some challenges. First, how do you deal with returns from different time zones. Asian markets close by the time US Markets open & each react to the movements of the other. Second, daily returns have a lot of noise that can obscure the “signal” we’re looking for.

We’re using weekly data because it strips out some of the noise, but still gives us enough datapoints to work with given our short time history.

Excess Returns

We’d all like to have excess returns, right?

But really. When we’re looking at risk-adjusted financial performance, at heart we’re asking two related questions:

  1. What were the returns of a given financial asset versus the risk-free alternative?
  2. What were the returns of a given financial asset versus a benchmark?

US Treasury bill returns are a commonly used risk-free rate, and what we will use here. They have minimal (though not zero) credit risk, and minimal duration risk (interest rate risk). We will use the BlackRock SHV ETF which holds US Treasuries that mature within 1 year.

Since the Global Financial Crisis short-term interest rates have been persistently low. So in practice, a lot of people didn’t subtract out the risk free return, because it was pretty close to zero. But now, with a steep central bank hiking cycle, you finally get paid real money to invest in the risk-free asset.

ust_3m_yield <- "DGS3MO" |> 
  # .Gets data from FRED -- The Federal Reserve data API
  tq_get(get = "economic.data",
         from = "2008-01-01")

ust_3m_yield |> 
  ggplot(aes(x = date, y = price)) +
  geom_line(line_width = 3, color = "blue", alpha = .7) +
  scale_y_continuous(labels = scales::label_percent(scale = 1)) +
  theme_minimal() +
  labs(title = "You Finally Get Paid To Buy T-Bills Again!",
       x = "", 
       y = "Market Yield on 3-month T-Bills",
       caption = "Data: FRED")
Warning in geom_line(line_width = 3, color = "blue", alpha = 0.7): Ignoring
unknown parameters: `line_width`
Warning: Removed 1 row containing missing values (`geom_line()`).

When you got paid close to 0% on T-bills, a 7% return on a financial portfolio seemed great. When you get can get paid a risk-free 5% to sit in T-bills, 7% looks less impressive. Better to measure the 2% return you get in excess of the risk free rate.

Steps for Calculating Risk-Adjusted Return Metrics

Create a tibble with the risk free return and the date.

risk_free_returns <- asset_returns |>
  filter(asset_type == "Risk Free") |>
  select(date, risk_free_returns = weekly.returns)

risk_free_returns
# A tibble: 530 × 2
   date       risk_free_returns
   <date>                 <dbl>
 1 2013-01-04        0         
 2 2013-01-11        0         
 3 2013-01-18       -0.0000906 
 4 2013-01-25       -0.0000911 
 5 2013-02-01        0.000119  
 6 2013-02-08       -0.0000910 
 7 2013-02-15        0         
 8 2013-02-22        0         
 9 2013-03-01        0.00000916
10 2013-03-08       -0.0000907 
# … with 520 more rows

Next, we’ll attach the risk free rate on the right so we can calculate the returns of all of our assets in excess of the risk-free rate.

excess_returns_tbl <- asset_returns |> 
  # filter out the risk free returns
  filter(asset_type != "Risk Free") |> 
  left_join(risk_free_returns, by = "date") |> 
  # calculate the excess returns:  How much more did the investment return versus
  # a "risk free" investment (most often short duration government t-bills)
  mutate(excess_returns = weekly.returns - risk_free_returns) |> 
  select(ticker:date, excess_returns)

excess_returns_tbl
# A tibble: 1,385 × 4
   ticker asset_type date       excess_returns
   <chr>  <chr>      <date>              <dbl>
 1 ESGU   ESG Fund   2016-12-09       0.000362
 2 ESGU   ESG Fund   2016-12-16       0.0288  
 3 ESGU   ESG Fund   2016-12-23      -0.00743 
 4 ESGU   ESG Fund   2016-12-30      -0.00373 
 5 ESGU   ESG Fund   2017-01-06       0.0146  
 6 ESGU   ESG Fund   2017-01-13      -0.00239 
 7 ESGU   ESG Fund   2017-01-20      -0.00391 
 8 ESGU   ESG Fund   2017-01-27       0.0118  
 9 ESGU   ESG Fund   2017-02-03      -0.000181
10 ESGU   ESG Fund   2017-02-10       0.00700 
# … with 1,375 more rows

We’ll create a tibble of the excess returns of the benchmark over the risk free rate.

benchmark_excess_returns <- excess_returns_tbl |> 
  filter(asset_type == "Benchmark") |> 
  select(date, benchmark_excess_returns = excess_returns)

benchmark_excess_returns
# A tibble: 530 × 2
   date       benchmark_excess_returns
   <date>                        <dbl>
 1 2013-01-04                  0.00361
 2 2013-01-11                  0.00482
 3 2013-01-18                  0.00861
 4 2013-01-25                  0.0121 
 5 2013-02-01                  0.00724
 6 2013-02-08                  0.00351
 7 2013-02-15                  0.00262
 8 2013-02-22                 -0.00209
 9 2013-03-01                  0.00163
10 2013-03-08                  0.0223 
# … with 520 more rows

Now, we have a tibble that has the inputs we need for calculating risk-adjusted return statistics using tidyquant::tq_performance() :

  • Asset Returns

  • Benchmark Returns

esg_asset_excess_returns <- excess_returns_tbl |> 
  filter(str_detect(asset_type, "ESG")) |> 
  rename(asset_excess_returns = excess_returns) |> 
  left_join(benchmark_excess_returns, by = "date")

esg_asset_excess_returns
# A tibble: 855 × 5
   ticker asset_type date       asset_excess_returns benchmark_excess_returns
   <chr>  <chr>      <date>                    <dbl>                    <dbl>
 1 ESGU   ESG Fund   2016-12-09             0.000362                 0.0318  
 2 ESGU   ESG Fund   2016-12-16             0.0288                  -0.000980
 3 ESGU   ESG Fund   2016-12-23            -0.00743                  0.00232 
 4 ESGU   ESG Fund   2016-12-30            -0.00373                 -0.00885 
 5 ESGU   ESG Fund   2017-01-06             0.0146                   0.0159  
 6 ESGU   ESG Fund   2017-01-13            -0.00239                 -0.00127 
 7 ESGU   ESG Fund   2017-01-20            -0.00391                 -0.00136 
 8 ESGU   ESG Fund   2017-01-27             0.0118                   0.0102  
 9 ESGU   ESG Fund   2017-02-03            -0.000181                 0.00151 
10 ESGU   ESG Fund   2017-02-10             0.00700                  0.00900 
# … with 845 more rows

Performance Analysis

TidyQuant: tq_performance()

tq_performance() allows you to use all of the awesome functions included in the PerformanceAnalytics R Package in a tidyverse setting. Read about tq_performance() here.

esg_asset_excess_returns |> 
  # we want to make sure we're comparing both over the same time period
  filter(date >= ymd("2017-01-01")) |> 
  group_by(ticker, asset_type) |> 
  # Ra = the excess returns of the asset we're analyzing
   tq_performance(Ra = asset_excess_returns,
                 # Rb = the excess returns of the benchmark we're comparing it against
                 Rb = benchmark_excess_returns,
                 # The function we are using see tq_performance_fun_options()
                 performance_fun = InformationRatio,
                 scale = 52) |> 
  ungroup()
# A tibble: 2 × 3
  ticker asset_type              InformationRatio.1
  <chr>  <chr>                                <dbl>
1 ESGU   ESG Fund                           -0.0376
2 SUSA   ESG Fund - Long History             0.174 
esg_asset_excess_returns |> 
  # we want to make sure we're comparing both over the same time period
  filter(date >= ymd("2017-01-01")) |> 
  group_by(ticker, asset_type) |> 
   tq_performance(Ra = asset_excess_returns,
                 Rb = benchmark_excess_returns,
                 performance_fun = table.CAPM,
                 scale = 52) |> 
  ungroup()
# A tibble: 2 × 14
  ticker asset_type  Activ…¹ Alpha Annua…²  Beta `Beta-` `Beta+` Corre…³ Corre…⁴
  <chr>  <chr>         <dbl> <dbl>   <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 ESGU   ESG Fund    -0.0009  0    -0.0005 0.999   1.00     1.02   0.992       0
2 SUSA   ESG Fund -…  0.0045  1e-4  0.0042 1.00    0.964    1.01   0.991       0
# … with 4 more variables: InformationRatio <dbl>, `R-squared` <dbl>,
#   TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated variable names
#   ¹​ActivePremium, ²​AnnualizedAlpha, ³​Correlation, ⁴​`Correlationp-value`

Performance by year

Does ESG do well in some markets, and not well in others? Here’s one way to look at it.

esg_asset_returns_capm_by_year <- esg_asset_excess_returns |> 
  # we want to make sure we're comparing both over the same time period
  #filter(date >= ymd("2017-01-01")) |> 
  mutate(year = year(date)) |> 
  # group by ticker, asset_type, and by year
  group_by(ticker, asset_type, year) |> 
  # calculate the same way as before
   tq_performance(Ra = asset_excess_returns,
                 Rb = benchmark_excess_returns,
                 performance_fun = table.CAPM,
                 scale = 52) |> 
  ungroup()

esg_asset_returns_capm_by_year
# A tibble: 19 × 15
   ticker asset_t…¹  year Activ…²   Alpha Annua…³   Beta `Beta-` `Beta+` Corre…⁴
   <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1 ESGU   ESG Fund   2016 -0.105   0.0052  0.309  -0.116   4.13    0.264  -0.125
 2 ESGU   ESG Fund   2017 -0.0068  0.0013  0.0699  0.618   0.279   0.835   0.617
 3 ESGU   ESG Fund   2018 -0.0005  0       0.0004  1.01    1.03    1.06    0.991
 4 ESGU   ESG Fund   2019  0.0097  0.0001  0.0037  1.02    1.09    0.992   0.993
 5 ESGU   ESG Fund   2020  0.0391  0.0007  0.0349  0.988   0.984   0.998   0.999
 6 ESGU   ESG Fund   2021 -0.0187 -0.0004 -0.0198  1.02    1.03    1.03    0.996
 7 ESGU   ESG Fund   2022 -0.0209 -0.0004 -0.0207  1.02    0.978   1.02    0.999
 8 ESGU   ESG Fund   2023  0.0065 -0.0001 -0.0057  1.05    1.03    1.24    0.997
 9 SUSA   ESG Fund…  2013 -0.0158 -0.0003 -0.0154  1.01    0.894   0.982   0.969
10 SUSA   ESG Fund…  2014  0.0035  0.0001  0.0078  0.968   1.07    0.955   0.980
11 SUSA   ESG Fund…  2015 -0.0315 -0.0006 -0.0317  0.959   0.860   1.03    0.982
12 SUSA   ESG Fund…  2016  0.006   0.0001  0.0077  0.980   0.991   0.954   0.974
13 SUSA   ESG Fund…  2017  0.0077  0.0001  0.0035  1.02    0.818   1.08    0.920
14 SUSA   ESG Fund…  2018 -0.011  -0.0003 -0.0153  0.956   0.946   0.923   0.989
15 SUSA   ESG Fund…  2019  0.0082 -0.0002 -0.008   1.06    0.952   1.06    0.988
16 SUSA   ESG Fund…  2020  0.0601  0.001   0.0552  0.969   0.956   0.972   0.997
17 SUSA   ESG Fund…  2021  0.0169  0       0.0025  1.04    1.23    1.01    0.989
18 SUSA   ESG Fund…  2022 -0.0319 -0.0005 -0.026   1.06    0.967   1.09    0.992
19 SUSA   ESG Fund…  2023  0.0002 -0.0003 -0.017   1.08    1.24    0.909   0.985
# … with 5 more variables: `Correlationp-value` <dbl>, InformationRatio <dbl>,
#   `R-squared` <dbl>, TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated
#   variable names ¹​asset_type, ²​ActivePremium, ³​AnnualizedAlpha, ⁴​Correlation

Now you have a new time series that you can use to compare the performance over different time periods.

esg_asset_returns_capm_by_year |> 
  # filter out 2016 for ESGU because it only has a few weeks of data for that year
  filter(!(year == 2016 & ticker == "ESGU")) |> 
  ggplot(aes(x = year, y = AnnualizedAlpha, color = fct_reorder2(.f = ticker, .x = year, .y = AnnualizedAlpha))) +
  geom_line() +
  labs(color = "") +
  geom_hline(yintercept = 0, lty = "dashed")

Learning More About tq_performance

?tq_performance
tq_performance_fun_options()
$table.funs
 [1] "table.AnnualizedReturns" "table.Arbitrary"        
 [3] "table.Autocorrelation"   "table.CAPM"             
 [5] "table.CaptureRatios"     "table.Correlation"      
 [7] "table.Distributions"     "table.DownsideRisk"     
 [9] "table.DownsideRiskRatio" "table.DrawdownsRatio"   
[11] "table.HigherMoments"     "table.InformationRatio" 
[13] "table.RollingPeriods"    "table.SFM"              
[15] "table.SpecificRisk"      "table.Stats"            
[17] "table.TrailingPeriods"   "table.UpDownRatios"     
[19] "table.Variability"      

$CAPM.funs
 [1] "CAPM.alpha"       "CAPM.beta"        "CAPM.beta.bear"   "CAPM.beta.bull"  
 [5] "CAPM.CML"         "CAPM.CML.slope"   "CAPM.dynamic"     "CAPM.epsilon"    
 [9] "CAPM.jensenAlpha" "CAPM.RiskPremium" "CAPM.SML.slope"   "TimingRatio"     
[13] "MarketTiming"    

$SFM.funs
[1] "SFM.alpha"       "SFM.beta"        "SFM.CML"         "SFM.CML.slope"  
[5] "SFM.dynamic"     "SFM.epsilon"     "SFM.jensenAlpha"

$descriptive.funs
[1] "mean"           "sd"             "min"            "max"           
[5] "cor"            "mean.geometric" "mean.stderr"    "mean.LCL"      
[9] "mean.UCL"      

$annualized.funs
[1] "Return.annualized"        "Return.annualized.excess"
[3] "sd.annualized"            "SharpeRatio.annualized"  

$VaR.funs
[1] "VaR"  "ES"   "ETL"  "CDD"  "CVaR"

$moment.funs
 [1] "var"              "cov"              "skewness"         "kurtosis"        
 [5] "CoVariance"       "CoSkewness"       "CoSkewnessMatrix" "CoKurtosis"      
 [9] "CoKurtosisMatrix" "M3.MM"            "M4.MM"            "BetaCoVariance"  
[13] "BetaCoSkewness"   "BetaCoKurtosis"  

$drawdown.funs
[1] "AverageDrawdown"   "AverageLength"     "AverageRecovery"  
[4] "DrawdownDeviation" "DrawdownPeak"      "maxDrawdown"      

$Bacon.risk.funs
[1] "MeanAbsoluteDeviation" "Frequency"             "SharpeRatio"          
[4] "MSquared"              "MSquaredExcess"        "HurstIndex"           

$Bacon.regression.funs
 [1] "CAPM.alpha"       "CAPM.beta"        "CAPM.epsilon"     "CAPM.jensenAlpha"
 [5] "SystematicRisk"   "SpecificRisk"     "TotalRisk"        "TreynorRatio"    
 [9] "AppraisalRatio"   "FamaBeta"         "Selectivity"      "NetSelectivity"  

$Bacon.relative.risk.funs
[1] "ActivePremium"    "ActiveReturn"     "TrackingError"    "InformationRatio"

$Bacon.drawdown.funs
[1] "PainIndex"     "PainRatio"     "CalmarRatio"   "SterlingRatio"
[5] "BurkeRatio"    "MartinRatio"   "UlcerIndex"   

$Bacon.downside.risk.funs
 [1] "DownsideDeviation"     "DownsidePotential"     "DownsideFrequency"    
 [4] "SemiDeviation"         "SemiVariance"          "UpsideRisk"           
 [7] "UpsidePotentialRatio"  "UpsideFrequency"       "BernardoLedoitRatio"  
[10] "DRatio"                "Omega"                 "OmegaSharpeRatio"     
[13] "OmegaExcessReturn"     "SortinoRatio"          "M2Sortino"            
[16] "Kappa"                 "VolatilitySkewness"    "AdjustedSharpeRatio"  
[19] "SkewnessKurtosisRatio" "ProspectRatio"        

$misc.funs
[1] "KellyRatio"   "Modigliani"   "UpDownRatios"

tq_performance() has a lot of great functions you can use out of the box. It’s worth eventually going through the documentation for the PerformanceAnalytics R Package, where these functions come from. They have been used my quants for many years and are well-tested. The formulas all come from Carl Bacon’s books Practical Portfolio Perfromance Measurement and Practical Risk-Adjusted Performance Measurements. If you’re going to use this package a lot, it’s worth going through the books to understand the details of each measure, and how you should interpret it.

But first, go through all the tutorials for the tidquant package. They are straightforward, and will allow you to do a lot of cool analysis quickly.

Now make it BIGGER!

Let’s Get More Data

Let’s download our ESG ETF comparison dataset we’ve worked with before.

esg_etf_comparison_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"

esg_etf_comparison <- esg_etf_comparison_url |> 
  read_csv()
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
esg_etf_comparison 
# A tibble: 537 × 14
   ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
   <chr>  <chr>   <chr>    <dbl>   <dbl> <lgl>   <lgl>   <lgl>     <dbl> <chr>  
 1 PRU    PRUDEN… Finan…   0.537  0.106  FALSE   FALSE   FALSE     0.431 Overwe…
 2 GIS    GENERA… Consu…   0.552  0.151  FALSE   FALSE   FALSE     0.401 Overwe…
 3 K      KELLOGG Consu…   0.453  0.0592 FALSE   FALSE   FALSE     0.394 Overwe…
 4 ADP    AUTOMA… Infor…   0.649  0.312  FALSE   FALSE   FALSE     0.337 Overwe…
 5 ECL    ECOLAB… Mater…   0.441  0.118  FALSE   FALSE   FALSE     0.322 Overwe…
 6 JCI    JOHNSO… Indus…   0.416  0.112  FALSE   FALSE   FALSE     0.304 Overwe…
 7 ES     EVERSO… Utili…   0.392  0.0896 FALSE   FALSE   FALSE     0.302 Overwe…
 8 PEG    PUBLIC… Utili…   0.376  0.0929 FALSE   FALSE   FALSE     0.284 Overwe…
 9 RTX    RAYTHE… Indus…   0.677  0.401  FALSE   FALSE   FALSE     0.277 Overwe…
10 LNG    CHENIE… Energy   0.274  0      TRUE    FALSE   TRUE      0.274 Overwe…
# … with 527 more rows, 4 more variables: esg_tilt_z_score <dbl>,
#   esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
#   and abbreviated variable names ¹​company_name, ²​standard_etf, ³​in_esg_only,
#   ⁴​in_standard_only, ⁵​in_on_index_only, ⁶​esg_tilt, ⁷​esg_uw_ow

Next, we’ll read in a LOT of data.

It’s really easy to pull this data using tidyquant, but it takes a little while, so I’ve commented it out here. Run this after class when you have 10-15 minutes for it to run.

# esg_data <- esg_etf_comparison |> 
#   select(ticker, company_name) |> 
#   tq_get()
# 
# esg_data

To avoid a delay, we’re going to read it in from the class GitHub data repository. It’s saved as a compressed .rds file, which is a space-efficient file format.

read_github_rds <- function(github_rds_url) {
  
  github_rds_url |> 
    url() |> 
    #unzips it
    gzcon() |> 
    #reads it
    readRDS()
}
equities_github_rds_url <- "https://github.com/t-emery/sais-susfin_data/raw/main/datasets/esg_equity_data_2023-02-22.rds"
equities_data <- equities_github_rds_url |> 
  read_github_rds()

equities_data
# A tibble: 1,308,557 × 4
   ticker company_name             date       adjusted
   <chr>  <chr>                    <date>        <dbl>
 1 PRU    PRUDENTIAL FINANCIAL INC 2013-01-02     38.0
 2 PRU    PRUDENTIAL FINANCIAL INC 2013-01-03     37.7
 3 PRU    PRUDENTIAL FINANCIAL INC 2013-01-04     38.3
 4 PRU    PRUDENTIAL FINANCIAL INC 2013-01-07     37.8
 5 PRU    PRUDENTIAL FINANCIAL INC 2013-01-08     37.7
 6 PRU    PRUDENTIAL FINANCIAL INC 2013-01-09     38.1
 7 PRU    PRUDENTIAL FINANCIAL INC 2013-01-10     39.2
 8 PRU    PRUDENTIAL FINANCIAL INC 2013-01-11     39.0
 9 PRU    PRUDENTIAL FINANCIAL INC 2013-01-14     38.8
10 PRU    PRUDENTIAL FINANCIAL INC 2013-01-15     38.7
# … with 1,308,547 more rows

Ooh, look at that. We’ve got 1.3 million rows of data. You can’t do that in excel. And this is where the tidyverse really excels (no pun intended): the ease of group-wise operations. Analyzing 500 assets isn’t much harder than analyzing 2.

We don’t have data for everything, let’s take a look at what data we’re missing.

Missing Data?

The performance measures we’re running here are only meaningful if they are run on data covering the same time period. So we need to do a little bit of data cleaning first.

First, we’ll calculate some summary statistics about our data availability.

equities_data_data_availability <- equities_data |> 
  # filter out NA values
  filter(!is.na(adjusted)) |> 
  group_by(company_name) |> 
  summarize(first_date = first(date),
            last_date = last(date),
            observations = n()) |> 
  arrange(observations)

equities_data_data_availability
# A tibble: 531 × 4
   company_name              first_date last_date  observations
   <chr>                     <date>     <date>            <int>
 1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21          274
 2 RIVIAN AUTOMOTIVE INC     2021-11-10 2023-02-21          321
 3 ORGANON                   2021-05-14 2023-02-21          446
 4 SNOWFLAKE                 2020-09-16 2023-02-21          612
 5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21          684
 6 CARRIER GLOBAL CORP       2020-03-19 2023-02-21          737
 7 OTIS WORLDWIDE CORP       2020-03-19 2023-02-21          737
 8 CROWDSTRIKE HOLDINGS INC  2019-06-12 2023-02-21          931
 9 CORTEVA INC               2019-05-24 2023-02-21          943
10 UBER TECHNOLOGIES INC     2019-05-10 2023-02-21          953
# … with 521 more rows

From this we can figure out the stock tickers where we have all observations.

companies_with_all_observations <- equities_data_data_availability |> 
  # it's allways better to use a construction like max(observations) instead of hard coding a number in like 2552.  This way if you change your code, or run it again later, it doesn't need to be manually changed. 
  filter(observations == max(observations))

companies_with_all_observations
# A tibble: 479 × 4
   company_name               first_date last_date  observations
   <chr>                      <date>     <date>            <int>
 1 3M                         2013-01-02 2023-02-21         2552
 2 A O SMITH CORP             2013-01-02 2023-02-21         2552
 3 ABBOTT LABORATORIES        2013-01-02 2023-02-21         2552
 4 ABBVIE INC                 2013-01-02 2023-02-21         2552
 5 ACCENTURE PLC              2013-01-02 2023-02-21         2552
 6 ACTIVISION BLIZZARD INC    2013-01-02 2023-02-21         2552
 7 ADOBE INC                  2013-01-02 2023-02-21         2552
 8 ADVANCE AUTO PARTS INC     2013-01-02 2023-02-21         2552
 9 ADVANCED MICRO DEVICES INC 2013-01-02 2023-02-21         2552
10 AES CORP                   2013-01-02 2023-02-21         2552
# … with 469 more rows

We can change the code a bit to see what companies don’t have all observations.

companies_without_all_observations <- equities_data_data_availability |> 
  filter(observations != max(observations))

companies_without_all_observations
# A tibble: 52 × 4
   company_name              first_date last_date  observations
   <chr>                     <date>     <date>            <int>
 1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21          274
 2 RIVIAN AUTOMOTIVE INC     2021-11-10 2023-02-21          321
 3 ORGANON                   2021-05-14 2023-02-21          446
 4 SNOWFLAKE                 2020-09-16 2023-02-21          612
 5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21          684
 6 CARRIER GLOBAL CORP       2020-03-19 2023-02-21          737
 7 OTIS WORLDWIDE CORP       2020-03-19 2023-02-21          737
 8 CROWDSTRIKE HOLDINGS INC  2019-06-12 2023-02-21          931
 9 CORTEVA INC               2019-05-24 2023-02-21          943
10 UBER TECHNOLOGIES INC     2019-05-10 2023-02-21          953
# … with 42 more rows

Looking at these lists, a lot of tech companies entered after 2013. We need to be calculating these performance statistics on the same time sample. But excluding a lot of companies from a particular industry from our exploratory analysis would bias what we are seeing. So let’s look at results from 2020 to present. It’s interesting because it involves the COVID shock + ensuing Everything Rally + Central bank tightening. We are using weekly data, so that leaves sufficient observations.

companies_to_analyze <- equities_data_data_availability |> 
  # filter out companies that don't have data up to the end-date
  filter(last_date == max(last_date)) |> 
  # filter out companies that have start dates after the beginning of 2020
  filter(first_date <= ymd("2020-01-01")) |> 
  select(company_name)

companies_to_analyze
# A tibble: 522 × 1
   company_name            
   <chr>                   
 1 CROWDSTRIKE HOLDINGS INC
 2 CORTEVA INC             
 3 UBER TECHNOLOGIES INC   
 4 DOW INC                 
 5 FOX CORP                
 6 MODERNA INC             
 7 EQUITABLE HOLDINGS INC  
 8 CERIDIAN HCM HOLDING INC
 9 VICI PPTYS INC          
10 MONGODB INC             
# … with 512 more rows

What are we excluding?

equities_data_data_availability |> 
  # filter for companies that have different end dates
  filter(last_date != max(last_date) |
           # or have start dates after 2020
           first_date > ymd("2020-01-01")) 
# A tibble: 9 × 4
  company_name              first_date last_date  observations
  <chr>                     <date>     <date>            <int>
1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21          274
2 RIVIAN AUTOMOTIVE INC     2021-11-10 2023-02-21          321
3 ORGANON                   2021-05-14 2023-02-21          446
4 SNOWFLAKE                 2020-09-16 2023-02-21          612
5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21          684
6 CARRIER GLOBAL CORP       2020-03-19 2023-02-21          737
7 OTIS WORLDWIDE CORP       2020-03-19 2023-02-21          737
8 TWITTER INC               2013-11-07 2022-10-27         2259
9 ABIOMED INC               2013-01-02 2022-12-23         2514

Because we’re doing exploratory analysis, and none of these are large index weights, we’re going to delete NAs, but not do too much else for the moment. But make sure to remember these names, and if they stick out somewhere, remember that there are data challenges with them.

equities_data_returns <- equities_data |>
  # filter out NA values
  filter(!is.na(adjusted)) |>
  # use a right_join() to select only the companies in our companies_to_analyze tibble we created above
  right_join(companies_to_analyze, by = join_by(company_name)) |> 
  group_by(ticker, company_name)  |> 
  # we use the function we created above, and rename the column to asset_returns
  calculate_weekly_returns(col_rename = "asset_returns") |>
  ungroup()

equities_data_returns
# A tibble: 269,977 × 4
   ticker company_name             date       asset_returns
   <chr>  <chr>                    <date>             <dbl>
 1 PRU    PRUDENTIAL FINANCIAL INC 2013-01-04       0.00893
 2 PRU    PRUDENTIAL FINANCIAL INC 2013-01-11       0.0191 
 3 PRU    PRUDENTIAL FINANCIAL INC 2013-01-18       0.00191
 4 PRU    PRUDENTIAL FINANCIAL INC 2013-01-25       0.0239 
 5 PRU    PRUDENTIAL FINANCIAL INC 2013-02-01      -0.00423
 6 PRU    PRUDENTIAL FINANCIAL INC 2013-02-08      -0.0211 
 7 PRU    PRUDENTIAL FINANCIAL INC 2013-02-15      -0.00555
 8 PRU    PRUDENTIAL FINANCIAL INC 2013-02-22      -0.0138 
 9 PRU    PRUDENTIAL FINANCIAL INC 2013-03-01      -0.0196 
10 PRU    PRUDENTIAL FINANCIAL INC 2013-03-08       0.0840 
# … with 269,967 more rows
equities_data_excess_returns <- equities_data_returns |> 
  # same as above
  left_join(risk_free_returns, by = "date") |> 
  # same as above - calculate excess returns above risk free rate
  mutate(asset_excess_returns = asset_returns - risk_free_returns) |> 
  # we don't need these columns anymore
  select(-asset_returns, -risk_free_returns) |> 
  # we already calculated benchmark excess returns above.
  left_join(benchmark_excess_returns, by = "date")

equities_data_excess_returns
# A tibble: 269,977 × 5
   ticker company_name             date       asset_excess_returns benchmark_e…¹
   <chr>  <chr>                    <date>                    <dbl>         <dbl>
 1 PRU    PRUDENTIAL FINANCIAL INC 2013-01-04              0.00893       0.00361
 2 PRU    PRUDENTIAL FINANCIAL INC 2013-01-11              0.0191        0.00482
 3 PRU    PRUDENTIAL FINANCIAL INC 2013-01-18              0.00200       0.00861
 4 PRU    PRUDENTIAL FINANCIAL INC 2013-01-25              0.0240        0.0121 
 5 PRU    PRUDENTIAL FINANCIAL INC 2013-02-01             -0.00435       0.00724
 6 PRU    PRUDENTIAL FINANCIAL INC 2013-02-08             -0.0210        0.00351
 7 PRU    PRUDENTIAL FINANCIAL INC 2013-02-15             -0.00555       0.00262
 8 PRU    PRUDENTIAL FINANCIAL INC 2013-02-22             -0.0138       -0.00209
 9 PRU    PRUDENTIAL FINANCIAL INC 2013-03-01             -0.0196        0.00163
10 PRU    PRUDENTIAL FINANCIAL INC 2013-03-08              0.0841        0.0223 
# … with 269,967 more rows, and abbreviated variable name
#   ¹​benchmark_excess_returns

table.CAPM by ticker

Now we will perform magic. In just a few lines of code, we’ll calculate a wide variety of performance analysis figures for 522 stocks. It may take a few minutes to run.

table_capm_by_ticker <- equities_data_excess_returns |> 
  group_by(ticker, company_name) |> 
  tq_performance(Ra = asset_excess_returns,
                 Rb = benchmark_excess_returns,
                 performance_fun = table.CAPM) |> 
  ungroup()

table_capm_by_ticker   
# A tibble: 522 × 14
   ticker compan…¹ Activ…²   Alpha Annua…³  Beta `Beta-` `Beta+` Corre…⁴ Corre…⁵
   <chr>  <chr>      <dbl>   <dbl>   <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 PRU    PRUDENT… -0.0249 -0.0009 -0.0439 1.46    1.49    1.62    0.766       0
 2 GIS    GENERAL… -0.0273  0.0011  0.0616 0.366   0.435   0.208   0.313       0
 3 K      KELLOGG  -0.0749  0.0002  0.0084 0.417   0.587   0.272   0.361       0
 4 ADP    AUTOMAT…  0.057   0.0012  0.0616 1.02    1.15    1.09    0.743       0
 5 ECL    ECOLAB … -0.0346 -0.0006 -0.0329 1.14    1.22    1.26    0.780       0
 6 JCI    JOHNSON… -0.0175 -0.0002 -0.0096 1.12    1.15    1.01    0.702       0
 7 ES     EVERSOU… -0.0215  0.0005  0.0279 0.708   0.894   0.883   0.534       0
 8 PEG    PUBLIC … -0.016   0.0005  0.0275 0.761   0.991   0.886   0.558       0
 9 RTX    RAYTHEO… -0.035  -0.0006 -0.0319 1.19    1.25    1.33    0.719       0
10 LNG    CHENIER…  0.0925  0.0025  0.138  0.986   0.834   0.846   0.459       0
# … with 512 more rows, 4 more variables: InformationRatio <dbl>,
#   `R-squared` <dbl>, TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated
#   variable names ¹​company_name, ²​ActivePremium, ³​AnnualizedAlpha,
#   ⁴​Correlation, ⁵​`Correlationp-value`

Connect The Datasets and Analyze

Let’s connect our new dataset that we created with table.CAPM per equity with the ETF comparison dataset we’ve used previously.

etf_comparison_w_capm_table <- esg_etf_comparison |> 
  left_join(table_capm_by_ticker, by = join_by(ticker, company_name))

etf_comparison_w_capm_table 
# A tibble: 537 × 26
   ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
   <chr>  <chr>   <chr>    <dbl>   <dbl> <lgl>   <lgl>   <lgl>     <dbl> <chr>  
 1 PRU    PRUDEN… Finan…   0.537  0.106  FALSE   FALSE   FALSE     0.431 Overwe…
 2 GIS    GENERA… Consu…   0.552  0.151  FALSE   FALSE   FALSE     0.401 Overwe…
 3 K      KELLOGG Consu…   0.453  0.0592 FALSE   FALSE   FALSE     0.394 Overwe…
 4 ADP    AUTOMA… Infor…   0.649  0.312  FALSE   FALSE   FALSE     0.337 Overwe…
 5 ECL    ECOLAB… Mater…   0.441  0.118  FALSE   FALSE   FALSE     0.322 Overwe…
 6 JCI    JOHNSO… Indus…   0.416  0.112  FALSE   FALSE   FALSE     0.304 Overwe…
 7 ES     EVERSO… Utili…   0.392  0.0896 FALSE   FALSE   FALSE     0.302 Overwe…
 8 PEG    PUBLIC… Utili…   0.376  0.0929 FALSE   FALSE   FALSE     0.284 Overwe…
 9 RTX    RAYTHE… Indus…   0.677  0.401  FALSE   FALSE   FALSE     0.277 Overwe…
10 LNG    CHENIE… Energy   0.274  0      TRUE    FALSE   TRUE      0.274 Overwe…
# … with 527 more rows, 16 more variables: esg_tilt_z_score <dbl>,
#   esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
#   ActivePremium <dbl>, Alpha <dbl>, AnnualizedAlpha <dbl>, Beta <dbl>,
#   `Beta-` <dbl>, `Beta+` <dbl>, Correlation <dbl>,
#   `Correlationp-value` <dbl>, InformationRatio <dbl>, `R-squared` <dbl>,
#   TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated variable names
#   ¹​company_name, ²​standard_etf, ³​in_esg_only, ⁴​in_standard_only, …

Remember, we’re doing exploratory data analysis. We’re trying to poke & prod at the data do find interesting patterns and generate hypotheses that we can explore with more rigor at at later stage. This combined dataset gives us some fun stuff to work with.

Initial Exploration Examples

Here are two small examples of the kind of exploratory questions we can ask

Question: Do the highest quintile of ESG rated stocks have higher risk-adjusted returns?

table.CAPM gives us a few ways to quantify this. Make sure you look up how these metrics are calculated, and what they represent intuitively.

First, let’s look at AnnualizedAlpha by quintile:

etf_comparison_w_capm_table |> 
  ggplot(aes( y = fct_reorder(as.factor(esg_tilt_quantile_5), AnnualizedAlpha), 
              x = AnnualizedAlpha)) +
  geom_boxplot()
Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`).

Next, let’s use InformationRatio.

etf_comparison_w_capm_table %>%
  ggplot(aes( y = fct_reorder(as.factor(esg_tilt_quantile_5), InformationRatio), 
              x = InformationRatio)) +
  geom_boxplot()
Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`).

What Do We Learn from This?

Remember, quintile 1 represents the most ESG stocks, and quintile 5represents the least ESG stocks. At least during the time period that we’re looking at here, it looks like the least ESG stocks have the highest risk-adjusted returns. Does this mean that ESG is worthless? It could, but not necessarily. These findings are VERY context dependent. First, we remember what we are looking at: the performance of individual stocks over a specific time period. If you change the time period, even by a little, the results could change significantly. This analysis includes a huge commodity price rally where a lot of oil companies saw record profits. Maybe that’s what we’re seeing here? Or maybe it’s the collapse of the tech-bubble?

Coming up with interesting, non-obvious questions is exactly what we’re trying to do here. We could go on to test the hypothesis about oil companies or tech driving these results in further analysis.

In sum: Common Sense + Context Knowledge + Quantitative Tools = Interesting Non-Obvious Insights

Conclusion

In this lesson:

  1. We learned how to use tidyquant to get financial data using tq_get().

  2. We learned calculate financial returns using tq_transmute().

  3. We learned the basic intuitions & recipe for doing performance analysis using tq_performance().

  4. We learned how to scale that analysis and use it to come up with interesting hypotheses using exploratory data analysis.

Now It’s Your Turn

We just scratched the surface.

  • Play around with the data and use the techniques we learned here to explore this data even further.

  • Can you think of other questions you could look into using these tools?