library(tidyverse)
library(tidyquant)Financial Performance Analytics Made Easy
Does ESG Offer Better Risk-Adjusted Returns?
There are two key motivators for ESG investing
- Aligning investor’s portfolios with their values.
- Better pricing extra-financial risks from environmental, social, and governance factors not previously not widely considered in investment analysis.
The latter claim is complicated. The time horizons at which these risks manifest may be much longer than the horizons we’re looking at. But we’ll go ahead and see what we can about the risk-adjusted returns of ESG using traditional financial performance analytics tools.
We will do this two ways.
- We’ll examine the toolkit for financial performance analysis using
tidyquantto look at the performance of the ESG ETF versus its traditional counterpart as a benchmark. - We’ll then learn how to use performance analysis metrics as a tool for exploratory analysis with the returns of all of the stocks in the index (1.3 mn observations!)
First, let’s load our packages:
ESG Performance Analysis
In this section we will rely heavily on the tidyquant R package. This package is a gem. It can:
- Retrieve financial and economic data
- calculate asset and portfolio returns
- calculate performance analysis measures
And a lot more. Take the time to go through the tutorials.
Our Dataset
From the ETF screener we used in part one, we picked the following funds to look at. ESGU is the largest ESG fund, and the one we’ve been looking at. SUSA is another ESG fund that has a much longer history (ESGU only has approx 5 years of history). IVV is the largest regular ETF, and it’s a plain-vanilla S&P 500 tracker, so we’re using it as our equity benchmark. SHV is a short-term US Treasury ETF, and we’re using it as the risk-free rate.
assets_tbl <- tribble(~ticker, ~asset_type,
"ESGU", "ESG Fund",
"SUSA", "ESG Fund - Long History",
"IVV", "Benchmark",
"SHV", "Risk Free")
assets_tbl# A tibble: 4 × 2
ticker asset_type
<chr> <chr>
1 ESGU ESG Fund
2 SUSA ESG Fund - Long History
3 IVV Benchmark
4 SHV Risk Free
Getting the Data
It’s this easy. Use tidyquant::tq_get().
asset_prices_tbl <- assets_tbl |>
tq_get()
asset_prices_tbl# A tibble: 9,222 × 9
ticker asset_type date open high low close volume adjusted
<chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ESGU ESG Fund 2016-12-06 48.3 48.3 48.3 48.3 100 44.0
2 ESGU ESG Fund 2016-12-07 48.3 48.3 48.3 48.3 0 44.0
3 ESGU ESG Fund 2016-12-08 48.3 48.3 48.3 48.3 0 44.0
4 ESGU ESG Fund 2016-12-09 48.3 48.3 48.3 48.3 0 44.0
5 ESGU ESG Fund 2016-12-12 48.3 48.3 48.3 48.3 0 44.0
6 ESGU ESG Fund 2016-12-13 49.7 49.7 49.7 49.7 200 45.3
7 ESGU ESG Fund 2016-12-14 49.7 49.7 49.7 49.7 0 45.3
8 ESGU ESG Fund 2016-12-15 49.7 49.7 49.7 49.7 0 45.3
9 ESGU ESG Fund 2016-12-16 49.7 49.7 49.7 49.7 0 45.3
10 ESGU ESG Fund 2016-12-19 49.7 49.7 49.7 49.7 0 45.3
# … with 9,212 more rows
The default data source is yahoo finance. The adjusted column on the right is useful because it adjusts for stock splits and dividends, so it can be used to calculate total returns.
If we were doing something beyond exploratory analysis, we might want to also adjust for fees. But that’s for later.
Let’s look at our data.
asset_prices_tbl |>
# for each asset type
group_by(asset_type) |>
# what is the first date we have data?
summarize(first_date = first(date),
# what is the last date we have data?
last_date = last(date),
#how many observations do we have
observations = n())# A tibble: 4 × 4
asset_type first_date last_date observations
<chr> <date> <date> <int>
1 Benchmark 2013-01-02 2023-02-22 2553
2 ESG Fund 2016-12-06 2023-02-22 1563
3 ESG Fund - Long History 2013-01-02 2023-02-22 2553
4 Risk Free 2013-01-02 2023-02-22 2553
We see that it pulls approximately 10 years of data by default. ESGU was launched in 2016 and has data since then.
Looking At Our Data
Indexing data can be a great way to visualize relative financial performance over time. The adjusted column accounts for dividends and stock splits, and we can use that to calculate total returns. We’re going to look at gross returns and ignore management fees for the moment to keep the analysis straightforward.
p1 <- asset_prices_tbl |>
# Filter to a date where all funds have data
filter(date >= ymd("2016-12-31")) |>
# group by asset type
group_by(asset_type) |>
# construct a total return index where 100 = the first date
mutate(index_100 = adjusted/first(adjusted) * 100) |>
# make a simple line chart
ggplot(aes(x = date, y = index_100, color = fct_reorder2(.f = asset_type, .x = date, .y = index_100))) +
geom_line() +
# the color label is long, so we're going to get rid of it to make the chart look better
labs(color = "")
p1You can read index charts like this as “If you invested $100 on day 1, how valuable would your investment be at day x?”
You can use all the tools that we’ve learned before to make this chart look prettier, but for now we’re just exploring so it just has to be functional.
What information does this chart tell you? Take a minute and write down some bullet points of the key takeaways?
Let’s look at the longer horizon
p2 <- asset_prices_tbl |>
# filtering out ESGU since it has a shorter history
filter(ticker != "ESGU") |>
group_by(asset_type) |>
mutate(index_100 = adjusted/first(adjusted) * 100) |>
ggplot(aes(x = date, y = index_100, color = fct_reorder2(.f = asset_type, .x = date, .y = index_100))) +
geom_line() +
labs(color = "")
p2What information does this chart tell you?
What is different from the chart above?
What hypotheses could this bring up that you might want to test in the future?
Calculating Returns using tq_transmute
We will calculate weekly returns using tidyquant::tq_transmute(), which allows us to use quantmod::periodReturn() using tidy tidy data.
asset_returns <- asset_prices_tbl |>
#group by the ticker and asset_type columns
group_by(ticker, asset_type) |>
# we want to calculate returns based on the `adjusted` column
# which accounts for stock splits & dividends
tq_transmute(select = adjusted,
# we want to use the `periodReturn` function
# look at ?periodReturn for more info
mutate_fun = periodReturn,
# on weekly returns
period = "weekly") |>
# always ungroup after grouping
ungroup()
asset_returns# A tibble: 1,915 × 4
ticker asset_type date weekly.returns
<chr> <chr> <date> <dbl>
1 ESGU ESG Fund 2016-12-09 0
2 ESGU ESG Fund 2016-12-16 0.0292
3 ESGU ESG Fund 2016-12-23 -0.00725
4 ESGU ESG Fund 2016-12-30 -0.00373
5 ESGU ESG Fund 2017-01-06 0.0149
6 ESGU ESG Fund 2017-01-13 -0.00221
7 ESGU ESG Fund 2017-01-20 -0.00382
8 ESGU ESG Fund 2017-01-27 0.0117
9 ESGU ESG Fund 2017-02-03 0
10 ESGU ESG Fund 2017-02-10 0.00718
# … with 1,905 more rows
Again, let’s work hard to be lazy and make a function:
calculate_weekly_returns <- function(data, col_rename = NULL) {
data |>
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "weekly",
# allows you to rename the resulting column
col_rename = col_rename)
}Should you use monthly, weekly, or daily returns?
You should use returns that fit the context of the analysis you are doing. But here are some things to think about:
- Yearly returns could make sense if you’re looking at long time horizons (decades), especially if you’re going to be examining those returns against data that you have only have on a yearly basis.
- Monthly returns are commonly used for investment analysis, but only if you have enough datapoints. A rule of thumb might be to only use monthly returns once you have something close to 10 years of data.
- Weekly returns are useful when you have shorter time series. They strip out some of the noise and logistical issues (e.g. time zones) with daily returns.
- Daily returns are useful if you’re trying to analyze daily activity. But they have some challenges. First, how do you deal with returns from different time zones. Asian markets close by the time US Markets open & each react to the movements of the other. Second, daily returns have a lot of noise that can obscure the “signal” we’re looking for.
We’re using weekly data because it strips out some of the noise, but still gives us enough datapoints to work with given our short time history.
Excess Returns
We’d all like to have excess returns, right?
But really. When we’re looking at risk-adjusted financial performance, at heart we’re asking two related questions:
- What were the returns of a given financial asset versus the risk-free alternative?
- What were the returns of a given financial asset versus a benchmark?
US Treasury bill returns are a commonly used risk-free rate, and what we will use here. They have minimal (though not zero) credit risk, and minimal duration risk (interest rate risk). We will use the BlackRock SHV ETF which holds US Treasuries that mature within 1 year.
Since the Global Financial Crisis short-term interest rates have been persistently low. So in practice, a lot of people didn’t subtract out the risk free return, because it was pretty close to zero. But now, with a steep central bank hiking cycle, you finally get paid real money to invest in the risk-free asset.
ust_3m_yield <- "DGS3MO" |>
# .Gets data from FRED -- The Federal Reserve data API
tq_get(get = "economic.data",
from = "2008-01-01")
ust_3m_yield |>
ggplot(aes(x = date, y = price)) +
geom_line(line_width = 3, color = "blue", alpha = .7) +
scale_y_continuous(labels = scales::label_percent(scale = 1)) +
theme_minimal() +
labs(title = "You Finally Get Paid To Buy T-Bills Again!",
x = "",
y = "Market Yield on 3-month T-Bills",
caption = "Data: FRED")Warning in geom_line(line_width = 3, color = "blue", alpha = 0.7): Ignoring
unknown parameters: `line_width`
Warning: Removed 1 row containing missing values (`geom_line()`).
When you got paid close to 0% on T-bills, a 7% return on a financial portfolio seemed great. When you get can get paid a risk-free 5% to sit in T-bills, 7% looks less impressive. Better to measure the 2% return you get in excess of the risk free rate.
Steps for Calculating Risk-Adjusted Return Metrics
Create a tibble with the risk free return and the date.
risk_free_returns <- asset_returns |>
filter(asset_type == "Risk Free") |>
select(date, risk_free_returns = weekly.returns)
risk_free_returns# A tibble: 530 × 2
date risk_free_returns
<date> <dbl>
1 2013-01-04 0
2 2013-01-11 0
3 2013-01-18 -0.0000906
4 2013-01-25 -0.0000911
5 2013-02-01 0.000119
6 2013-02-08 -0.0000910
7 2013-02-15 0
8 2013-02-22 0
9 2013-03-01 0.00000916
10 2013-03-08 -0.0000907
# … with 520 more rows
Next, we’ll attach the risk free rate on the right so we can calculate the returns of all of our assets in excess of the risk-free rate.
excess_returns_tbl <- asset_returns |>
# filter out the risk free returns
filter(asset_type != "Risk Free") |>
left_join(risk_free_returns, by = "date") |>
# calculate the excess returns: How much more did the investment return versus
# a "risk free" investment (most often short duration government t-bills)
mutate(excess_returns = weekly.returns - risk_free_returns) |>
select(ticker:date, excess_returns)
excess_returns_tbl# A tibble: 1,385 × 4
ticker asset_type date excess_returns
<chr> <chr> <date> <dbl>
1 ESGU ESG Fund 2016-12-09 0.000362
2 ESGU ESG Fund 2016-12-16 0.0288
3 ESGU ESG Fund 2016-12-23 -0.00743
4 ESGU ESG Fund 2016-12-30 -0.00373
5 ESGU ESG Fund 2017-01-06 0.0146
6 ESGU ESG Fund 2017-01-13 -0.00239
7 ESGU ESG Fund 2017-01-20 -0.00391
8 ESGU ESG Fund 2017-01-27 0.0118
9 ESGU ESG Fund 2017-02-03 -0.000181
10 ESGU ESG Fund 2017-02-10 0.00700
# … with 1,375 more rows
We’ll create a tibble of the excess returns of the benchmark over the risk free rate.
benchmark_excess_returns <- excess_returns_tbl |>
filter(asset_type == "Benchmark") |>
select(date, benchmark_excess_returns = excess_returns)
benchmark_excess_returns# A tibble: 530 × 2
date benchmark_excess_returns
<date> <dbl>
1 2013-01-04 0.00361
2 2013-01-11 0.00482
3 2013-01-18 0.00861
4 2013-01-25 0.0121
5 2013-02-01 0.00724
6 2013-02-08 0.00351
7 2013-02-15 0.00262
8 2013-02-22 -0.00209
9 2013-03-01 0.00163
10 2013-03-08 0.0223
# … with 520 more rows
Now, we have a tibble that has the inputs we need for calculating risk-adjusted return statistics using tidyquant::tq_performance() :
Asset Returns
Benchmark Returns
esg_asset_excess_returns <- excess_returns_tbl |>
filter(str_detect(asset_type, "ESG")) |>
rename(asset_excess_returns = excess_returns) |>
left_join(benchmark_excess_returns, by = "date")
esg_asset_excess_returns# A tibble: 855 × 5
ticker asset_type date asset_excess_returns benchmark_excess_returns
<chr> <chr> <date> <dbl> <dbl>
1 ESGU ESG Fund 2016-12-09 0.000362 0.0318
2 ESGU ESG Fund 2016-12-16 0.0288 -0.000980
3 ESGU ESG Fund 2016-12-23 -0.00743 0.00232
4 ESGU ESG Fund 2016-12-30 -0.00373 -0.00885
5 ESGU ESG Fund 2017-01-06 0.0146 0.0159
6 ESGU ESG Fund 2017-01-13 -0.00239 -0.00127
7 ESGU ESG Fund 2017-01-20 -0.00391 -0.00136
8 ESGU ESG Fund 2017-01-27 0.0118 0.0102
9 ESGU ESG Fund 2017-02-03 -0.000181 0.00151
10 ESGU ESG Fund 2017-02-10 0.00700 0.00900
# … with 845 more rows
Performance Analysis
TidyQuant: tq_performance()
tq_performance() allows you to use all of the awesome functions included in the PerformanceAnalytics R Package in a tidyverse setting. Read about tq_performance() here.
esg_asset_excess_returns |>
# we want to make sure we're comparing both over the same time period
filter(date >= ymd("2017-01-01")) |>
group_by(ticker, asset_type) |>
# Ra = the excess returns of the asset we're analyzing
tq_performance(Ra = asset_excess_returns,
# Rb = the excess returns of the benchmark we're comparing it against
Rb = benchmark_excess_returns,
# The function we are using see tq_performance_fun_options()
performance_fun = InformationRatio,
scale = 52) |>
ungroup()# A tibble: 2 × 3
ticker asset_type InformationRatio.1
<chr> <chr> <dbl>
1 ESGU ESG Fund -0.0376
2 SUSA ESG Fund - Long History 0.174
esg_asset_excess_returns |>
# we want to make sure we're comparing both over the same time period
filter(date >= ymd("2017-01-01")) |>
group_by(ticker, asset_type) |>
tq_performance(Ra = asset_excess_returns,
Rb = benchmark_excess_returns,
performance_fun = table.CAPM,
scale = 52) |>
ungroup()# A tibble: 2 × 14
ticker asset_type Activ…¹ Alpha Annua…² Beta `Beta-` `Beta+` Corre…³ Corre…⁴
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ESGU ESG Fund -0.0009 0 -0.0005 0.999 1.00 1.02 0.992 0
2 SUSA ESG Fund -… 0.0045 1e-4 0.0042 1.00 0.964 1.01 0.991 0
# … with 4 more variables: InformationRatio <dbl>, `R-squared` <dbl>,
# TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated variable names
# ¹ActivePremium, ²AnnualizedAlpha, ³Correlation, ⁴`Correlationp-value`
Performance by year
Does ESG do well in some markets, and not well in others? Here’s one way to look at it.
esg_asset_returns_capm_by_year <- esg_asset_excess_returns |>
# we want to make sure we're comparing both over the same time period
#filter(date >= ymd("2017-01-01")) |>
mutate(year = year(date)) |>
# group by ticker, asset_type, and by year
group_by(ticker, asset_type, year) |>
# calculate the same way as before
tq_performance(Ra = asset_excess_returns,
Rb = benchmark_excess_returns,
performance_fun = table.CAPM,
scale = 52) |>
ungroup()
esg_asset_returns_capm_by_year# A tibble: 19 × 15
ticker asset_t…¹ year Activ…² Alpha Annua…³ Beta `Beta-` `Beta+` Corre…⁴
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ESGU ESG Fund 2016 -0.105 0.0052 0.309 -0.116 4.13 0.264 -0.125
2 ESGU ESG Fund 2017 -0.0068 0.0013 0.0699 0.618 0.279 0.835 0.617
3 ESGU ESG Fund 2018 -0.0005 0 0.0004 1.01 1.03 1.06 0.991
4 ESGU ESG Fund 2019 0.0097 0.0001 0.0037 1.02 1.09 0.992 0.993
5 ESGU ESG Fund 2020 0.0391 0.0007 0.0349 0.988 0.984 0.998 0.999
6 ESGU ESG Fund 2021 -0.0187 -0.0004 -0.0198 1.02 1.03 1.03 0.996
7 ESGU ESG Fund 2022 -0.0209 -0.0004 -0.0207 1.02 0.978 1.02 0.999
8 ESGU ESG Fund 2023 0.0065 -0.0001 -0.0057 1.05 1.03 1.24 0.997
9 SUSA ESG Fund… 2013 -0.0158 -0.0003 -0.0154 1.01 0.894 0.982 0.969
10 SUSA ESG Fund… 2014 0.0035 0.0001 0.0078 0.968 1.07 0.955 0.980
11 SUSA ESG Fund… 2015 -0.0315 -0.0006 -0.0317 0.959 0.860 1.03 0.982
12 SUSA ESG Fund… 2016 0.006 0.0001 0.0077 0.980 0.991 0.954 0.974
13 SUSA ESG Fund… 2017 0.0077 0.0001 0.0035 1.02 0.818 1.08 0.920
14 SUSA ESG Fund… 2018 -0.011 -0.0003 -0.0153 0.956 0.946 0.923 0.989
15 SUSA ESG Fund… 2019 0.0082 -0.0002 -0.008 1.06 0.952 1.06 0.988
16 SUSA ESG Fund… 2020 0.0601 0.001 0.0552 0.969 0.956 0.972 0.997
17 SUSA ESG Fund… 2021 0.0169 0 0.0025 1.04 1.23 1.01 0.989
18 SUSA ESG Fund… 2022 -0.0319 -0.0005 -0.026 1.06 0.967 1.09 0.992
19 SUSA ESG Fund… 2023 0.0002 -0.0003 -0.017 1.08 1.24 0.909 0.985
# … with 5 more variables: `Correlationp-value` <dbl>, InformationRatio <dbl>,
# `R-squared` <dbl>, TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated
# variable names ¹asset_type, ²ActivePremium, ³AnnualizedAlpha, ⁴Correlation
Now you have a new time series that you can use to compare the performance over different time periods.
esg_asset_returns_capm_by_year |>
# filter out 2016 for ESGU because it only has a few weeks of data for that year
filter(!(year == 2016 & ticker == "ESGU")) |>
ggplot(aes(x = year, y = AnnualizedAlpha, color = fct_reorder2(.f = ticker, .x = year, .y = AnnualizedAlpha))) +
geom_line() +
labs(color = "") +
geom_hline(yintercept = 0, lty = "dashed")Learning More About tq_performance
?tq_performancetq_performance_fun_options()$table.funs
[1] "table.AnnualizedReturns" "table.Arbitrary"
[3] "table.Autocorrelation" "table.CAPM"
[5] "table.CaptureRatios" "table.Correlation"
[7] "table.Distributions" "table.DownsideRisk"
[9] "table.DownsideRiskRatio" "table.DrawdownsRatio"
[11] "table.HigherMoments" "table.InformationRatio"
[13] "table.RollingPeriods" "table.SFM"
[15] "table.SpecificRisk" "table.Stats"
[17] "table.TrailingPeriods" "table.UpDownRatios"
[19] "table.Variability"
$CAPM.funs
[1] "CAPM.alpha" "CAPM.beta" "CAPM.beta.bear" "CAPM.beta.bull"
[5] "CAPM.CML" "CAPM.CML.slope" "CAPM.dynamic" "CAPM.epsilon"
[9] "CAPM.jensenAlpha" "CAPM.RiskPremium" "CAPM.SML.slope" "TimingRatio"
[13] "MarketTiming"
$SFM.funs
[1] "SFM.alpha" "SFM.beta" "SFM.CML" "SFM.CML.slope"
[5] "SFM.dynamic" "SFM.epsilon" "SFM.jensenAlpha"
$descriptive.funs
[1] "mean" "sd" "min" "max"
[5] "cor" "mean.geometric" "mean.stderr" "mean.LCL"
[9] "mean.UCL"
$annualized.funs
[1] "Return.annualized" "Return.annualized.excess"
[3] "sd.annualized" "SharpeRatio.annualized"
$VaR.funs
[1] "VaR" "ES" "ETL" "CDD" "CVaR"
$moment.funs
[1] "var" "cov" "skewness" "kurtosis"
[5] "CoVariance" "CoSkewness" "CoSkewnessMatrix" "CoKurtosis"
[9] "CoKurtosisMatrix" "M3.MM" "M4.MM" "BetaCoVariance"
[13] "BetaCoSkewness" "BetaCoKurtosis"
$drawdown.funs
[1] "AverageDrawdown" "AverageLength" "AverageRecovery"
[4] "DrawdownDeviation" "DrawdownPeak" "maxDrawdown"
$Bacon.risk.funs
[1] "MeanAbsoluteDeviation" "Frequency" "SharpeRatio"
[4] "MSquared" "MSquaredExcess" "HurstIndex"
$Bacon.regression.funs
[1] "CAPM.alpha" "CAPM.beta" "CAPM.epsilon" "CAPM.jensenAlpha"
[5] "SystematicRisk" "SpecificRisk" "TotalRisk" "TreynorRatio"
[9] "AppraisalRatio" "FamaBeta" "Selectivity" "NetSelectivity"
$Bacon.relative.risk.funs
[1] "ActivePremium" "ActiveReturn" "TrackingError" "InformationRatio"
$Bacon.drawdown.funs
[1] "PainIndex" "PainRatio" "CalmarRatio" "SterlingRatio"
[5] "BurkeRatio" "MartinRatio" "UlcerIndex"
$Bacon.downside.risk.funs
[1] "DownsideDeviation" "DownsidePotential" "DownsideFrequency"
[4] "SemiDeviation" "SemiVariance" "UpsideRisk"
[7] "UpsidePotentialRatio" "UpsideFrequency" "BernardoLedoitRatio"
[10] "DRatio" "Omega" "OmegaSharpeRatio"
[13] "OmegaExcessReturn" "SortinoRatio" "M2Sortino"
[16] "Kappa" "VolatilitySkewness" "AdjustedSharpeRatio"
[19] "SkewnessKurtosisRatio" "ProspectRatio"
$misc.funs
[1] "KellyRatio" "Modigliani" "UpDownRatios"
tq_performance() has a lot of great functions you can use out of the box. It’s worth eventually going through the documentation for the PerformanceAnalytics R Package, where these functions come from. They have been used my quants for many years and are well-tested. The formulas all come from Carl Bacon’s books Practical Portfolio Perfromance Measurement and Practical Risk-Adjusted Performance Measurements. If you’re going to use this package a lot, it’s worth going through the books to understand the details of each measure, and how you should interpret it.
But first, go through all the tutorials for the tidquant package. They are straightforward, and will allow you to do a lot of cool analysis quickly.
Now make it BIGGER!
Let’s Get More Data
Let’s download our ESG ETF comparison dataset we’ve worked with before.
esg_etf_comparison_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"
esg_etf_comparison <- esg_etf_comparison_url |>
read_csv()Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
esg_etf_comparison # A tibble: 537 × 14
ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
<chr> <chr> <chr> <dbl> <dbl> <lgl> <lgl> <lgl> <dbl> <chr>
1 PRU PRUDEN… Finan… 0.537 0.106 FALSE FALSE FALSE 0.431 Overwe…
2 GIS GENERA… Consu… 0.552 0.151 FALSE FALSE FALSE 0.401 Overwe…
3 K KELLOGG Consu… 0.453 0.0592 FALSE FALSE FALSE 0.394 Overwe…
4 ADP AUTOMA… Infor… 0.649 0.312 FALSE FALSE FALSE 0.337 Overwe…
5 ECL ECOLAB… Mater… 0.441 0.118 FALSE FALSE FALSE 0.322 Overwe…
6 JCI JOHNSO… Indus… 0.416 0.112 FALSE FALSE FALSE 0.304 Overwe…
7 ES EVERSO… Utili… 0.392 0.0896 FALSE FALSE FALSE 0.302 Overwe…
8 PEG PUBLIC… Utili… 0.376 0.0929 FALSE FALSE FALSE 0.284 Overwe…
9 RTX RAYTHE… Indus… 0.677 0.401 FALSE FALSE FALSE 0.277 Overwe…
10 LNG CHENIE… Energy 0.274 0 TRUE FALSE TRUE 0.274 Overwe…
# … with 527 more rows, 4 more variables: esg_tilt_z_score <dbl>,
# esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
# and abbreviated variable names ¹company_name, ²standard_etf, ³in_esg_only,
# ⁴in_standard_only, ⁵in_on_index_only, ⁶esg_tilt, ⁷esg_uw_ow
Next, we’ll read in a LOT of data.
It’s really easy to pull this data using tidyquant, but it takes a little while, so I’ve commented it out here. Run this after class when you have 10-15 minutes for it to run.
# esg_data <- esg_etf_comparison |>
# select(ticker, company_name) |>
# tq_get()
#
# esg_dataTo avoid a delay, we’re going to read it in from the class GitHub data repository. It’s saved as a compressed .rds file, which is a space-efficient file format.
read_github_rds <- function(github_rds_url) {
github_rds_url |>
url() |>
#unzips it
gzcon() |>
#reads it
readRDS()
}equities_github_rds_url <- "https://github.com/t-emery/sais-susfin_data/raw/main/datasets/esg_equity_data_2023-02-22.rds"equities_data <- equities_github_rds_url |>
read_github_rds()
equities_data# A tibble: 1,308,557 × 4
ticker company_name date adjusted
<chr> <chr> <date> <dbl>
1 PRU PRUDENTIAL FINANCIAL INC 2013-01-02 38.0
2 PRU PRUDENTIAL FINANCIAL INC 2013-01-03 37.7
3 PRU PRUDENTIAL FINANCIAL INC 2013-01-04 38.3
4 PRU PRUDENTIAL FINANCIAL INC 2013-01-07 37.8
5 PRU PRUDENTIAL FINANCIAL INC 2013-01-08 37.7
6 PRU PRUDENTIAL FINANCIAL INC 2013-01-09 38.1
7 PRU PRUDENTIAL FINANCIAL INC 2013-01-10 39.2
8 PRU PRUDENTIAL FINANCIAL INC 2013-01-11 39.0
9 PRU PRUDENTIAL FINANCIAL INC 2013-01-14 38.8
10 PRU PRUDENTIAL FINANCIAL INC 2013-01-15 38.7
# … with 1,308,547 more rows
Ooh, look at that. We’ve got 1.3 million rows of data. You can’t do that in excel. And this is where the tidyverse really excels (no pun intended): the ease of group-wise operations. Analyzing 500 assets isn’t much harder than analyzing 2.
We don’t have data for everything, let’s take a look at what data we’re missing.
Missing Data?
The performance measures we’re running here are only meaningful if they are run on data covering the same time period. So we need to do a little bit of data cleaning first.
First, we’ll calculate some summary statistics about our data availability.
equities_data_data_availability <- equities_data |>
# filter out NA values
filter(!is.na(adjusted)) |>
group_by(company_name) |>
summarize(first_date = first(date),
last_date = last(date),
observations = n()) |>
arrange(observations)
equities_data_data_availability# A tibble: 531 × 4
company_name first_date last_date observations
<chr> <date> <date> <int>
1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21 274
2 RIVIAN AUTOMOTIVE INC 2021-11-10 2023-02-21 321
3 ORGANON 2021-05-14 2023-02-21 446
4 SNOWFLAKE 2020-09-16 2023-02-21 612
5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21 684
6 CARRIER GLOBAL CORP 2020-03-19 2023-02-21 737
7 OTIS WORLDWIDE CORP 2020-03-19 2023-02-21 737
8 CROWDSTRIKE HOLDINGS INC 2019-06-12 2023-02-21 931
9 CORTEVA INC 2019-05-24 2023-02-21 943
10 UBER TECHNOLOGIES INC 2019-05-10 2023-02-21 953
# … with 521 more rows
From this we can figure out the stock tickers where we have all observations.
companies_with_all_observations <- equities_data_data_availability |>
# it's allways better to use a construction like max(observations) instead of hard coding a number in like 2552. This way if you change your code, or run it again later, it doesn't need to be manually changed.
filter(observations == max(observations))
companies_with_all_observations# A tibble: 479 × 4
company_name first_date last_date observations
<chr> <date> <date> <int>
1 3M 2013-01-02 2023-02-21 2552
2 A O SMITH CORP 2013-01-02 2023-02-21 2552
3 ABBOTT LABORATORIES 2013-01-02 2023-02-21 2552
4 ABBVIE INC 2013-01-02 2023-02-21 2552
5 ACCENTURE PLC 2013-01-02 2023-02-21 2552
6 ACTIVISION BLIZZARD INC 2013-01-02 2023-02-21 2552
7 ADOBE INC 2013-01-02 2023-02-21 2552
8 ADVANCE AUTO PARTS INC 2013-01-02 2023-02-21 2552
9 ADVANCED MICRO DEVICES INC 2013-01-02 2023-02-21 2552
10 AES CORP 2013-01-02 2023-02-21 2552
# … with 469 more rows
We can change the code a bit to see what companies don’t have all observations.
companies_without_all_observations <- equities_data_data_availability |>
filter(observations != max(observations))
companies_without_all_observations# A tibble: 52 × 4
company_name first_date last_date observations
<chr> <date> <date> <int>
1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21 274
2 RIVIAN AUTOMOTIVE INC 2021-11-10 2023-02-21 321
3 ORGANON 2021-05-14 2023-02-21 446
4 SNOWFLAKE 2020-09-16 2023-02-21 612
5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21 684
6 CARRIER GLOBAL CORP 2020-03-19 2023-02-21 737
7 OTIS WORLDWIDE CORP 2020-03-19 2023-02-21 737
8 CROWDSTRIKE HOLDINGS INC 2019-06-12 2023-02-21 931
9 CORTEVA INC 2019-05-24 2023-02-21 943
10 UBER TECHNOLOGIES INC 2019-05-10 2023-02-21 953
# … with 42 more rows
Looking at these lists, a lot of tech companies entered after 2013. We need to be calculating these performance statistics on the same time sample. But excluding a lot of companies from a particular industry from our exploratory analysis would bias what we are seeing. So let’s look at results from 2020 to present. It’s interesting because it involves the COVID shock + ensuing Everything Rally + Central bank tightening. We are using weekly data, so that leaves sufficient observations.
companies_to_analyze <- equities_data_data_availability |>
# filter out companies that don't have data up to the end-date
filter(last_date == max(last_date)) |>
# filter out companies that have start dates after the beginning of 2020
filter(first_date <= ymd("2020-01-01")) |>
select(company_name)
companies_to_analyze# A tibble: 522 × 1
company_name
<chr>
1 CROWDSTRIKE HOLDINGS INC
2 CORTEVA INC
3 UBER TECHNOLOGIES INC
4 DOW INC
5 FOX CORP
6 MODERNA INC
7 EQUITABLE HOLDINGS INC
8 CERIDIAN HCM HOLDING INC
9 VICI PPTYS INC
10 MONGODB INC
# … with 512 more rows
What are we excluding?
equities_data_data_availability |>
# filter for companies that have different end dates
filter(last_date != max(last_date) |
# or have start dates after 2020
first_date > ymd("2020-01-01")) # A tibble: 9 × 4
company_name first_date last_date observations
<chr> <date> <date> <int>
1 CONSTELLATION ENERGY CORP 2022-01-19 2023-02-21 274
2 RIVIAN AUTOMOTIVE INC 2021-11-10 2023-02-21 321
3 ORGANON 2021-05-14 2023-02-21 446
4 SNOWFLAKE 2020-09-16 2023-02-21 612
5 ZOOMINFO TECHNOLOGIES INC 2020-06-04 2023-02-21 684
6 CARRIER GLOBAL CORP 2020-03-19 2023-02-21 737
7 OTIS WORLDWIDE CORP 2020-03-19 2023-02-21 737
8 TWITTER INC 2013-11-07 2022-10-27 2259
9 ABIOMED INC 2013-01-02 2022-12-23 2514
Because we’re doing exploratory analysis, and none of these are large index weights, we’re going to delete NAs, but not do too much else for the moment. But make sure to remember these names, and if they stick out somewhere, remember that there are data challenges with them.
equities_data_returns <- equities_data |>
# filter out NA values
filter(!is.na(adjusted)) |>
# use a right_join() to select only the companies in our companies_to_analyze tibble we created above
right_join(companies_to_analyze, by = join_by(company_name)) |>
group_by(ticker, company_name) |>
# we use the function we created above, and rename the column to asset_returns
calculate_weekly_returns(col_rename = "asset_returns") |>
ungroup()
equities_data_returns# A tibble: 269,977 × 4
ticker company_name date asset_returns
<chr> <chr> <date> <dbl>
1 PRU PRUDENTIAL FINANCIAL INC 2013-01-04 0.00893
2 PRU PRUDENTIAL FINANCIAL INC 2013-01-11 0.0191
3 PRU PRUDENTIAL FINANCIAL INC 2013-01-18 0.00191
4 PRU PRUDENTIAL FINANCIAL INC 2013-01-25 0.0239
5 PRU PRUDENTIAL FINANCIAL INC 2013-02-01 -0.00423
6 PRU PRUDENTIAL FINANCIAL INC 2013-02-08 -0.0211
7 PRU PRUDENTIAL FINANCIAL INC 2013-02-15 -0.00555
8 PRU PRUDENTIAL FINANCIAL INC 2013-02-22 -0.0138
9 PRU PRUDENTIAL FINANCIAL INC 2013-03-01 -0.0196
10 PRU PRUDENTIAL FINANCIAL INC 2013-03-08 0.0840
# … with 269,967 more rows
equities_data_excess_returns <- equities_data_returns |>
# same as above
left_join(risk_free_returns, by = "date") |>
# same as above - calculate excess returns above risk free rate
mutate(asset_excess_returns = asset_returns - risk_free_returns) |>
# we don't need these columns anymore
select(-asset_returns, -risk_free_returns) |>
# we already calculated benchmark excess returns above.
left_join(benchmark_excess_returns, by = "date")
equities_data_excess_returns# A tibble: 269,977 × 5
ticker company_name date asset_excess_returns benchmark_e…¹
<chr> <chr> <date> <dbl> <dbl>
1 PRU PRUDENTIAL FINANCIAL INC 2013-01-04 0.00893 0.00361
2 PRU PRUDENTIAL FINANCIAL INC 2013-01-11 0.0191 0.00482
3 PRU PRUDENTIAL FINANCIAL INC 2013-01-18 0.00200 0.00861
4 PRU PRUDENTIAL FINANCIAL INC 2013-01-25 0.0240 0.0121
5 PRU PRUDENTIAL FINANCIAL INC 2013-02-01 -0.00435 0.00724
6 PRU PRUDENTIAL FINANCIAL INC 2013-02-08 -0.0210 0.00351
7 PRU PRUDENTIAL FINANCIAL INC 2013-02-15 -0.00555 0.00262
8 PRU PRUDENTIAL FINANCIAL INC 2013-02-22 -0.0138 -0.00209
9 PRU PRUDENTIAL FINANCIAL INC 2013-03-01 -0.0196 0.00163
10 PRU PRUDENTIAL FINANCIAL INC 2013-03-08 0.0841 0.0223
# … with 269,967 more rows, and abbreviated variable name
# ¹benchmark_excess_returns
table.CAPM by ticker
Now we will perform magic. In just a few lines of code, we’ll calculate a wide variety of performance analysis figures for 522 stocks. It may take a few minutes to run.
table_capm_by_ticker <- equities_data_excess_returns |>
group_by(ticker, company_name) |>
tq_performance(Ra = asset_excess_returns,
Rb = benchmark_excess_returns,
performance_fun = table.CAPM) |>
ungroup()
table_capm_by_ticker # A tibble: 522 × 14
ticker compan…¹ Activ…² Alpha Annua…³ Beta `Beta-` `Beta+` Corre…⁴ Corre…⁵
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 PRU PRUDENT… -0.0249 -0.0009 -0.0439 1.46 1.49 1.62 0.766 0
2 GIS GENERAL… -0.0273 0.0011 0.0616 0.366 0.435 0.208 0.313 0
3 K KELLOGG -0.0749 0.0002 0.0084 0.417 0.587 0.272 0.361 0
4 ADP AUTOMAT… 0.057 0.0012 0.0616 1.02 1.15 1.09 0.743 0
5 ECL ECOLAB … -0.0346 -0.0006 -0.0329 1.14 1.22 1.26 0.780 0
6 JCI JOHNSON… -0.0175 -0.0002 -0.0096 1.12 1.15 1.01 0.702 0
7 ES EVERSOU… -0.0215 0.0005 0.0279 0.708 0.894 0.883 0.534 0
8 PEG PUBLIC … -0.016 0.0005 0.0275 0.761 0.991 0.886 0.558 0
9 RTX RAYTHEO… -0.035 -0.0006 -0.0319 1.19 1.25 1.33 0.719 0
10 LNG CHENIER… 0.0925 0.0025 0.138 0.986 0.834 0.846 0.459 0
# … with 512 more rows, 4 more variables: InformationRatio <dbl>,
# `R-squared` <dbl>, TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated
# variable names ¹company_name, ²ActivePremium, ³AnnualizedAlpha,
# ⁴Correlation, ⁵`Correlationp-value`
Connect The Datasets and Analyze
Let’s connect our new dataset that we created with table.CAPM per equity with the ETF comparison dataset we’ve used previously.
etf_comparison_w_capm_table <- esg_etf_comparison |>
left_join(table_capm_by_ticker, by = join_by(ticker, company_name))
etf_comparison_w_capm_table # A tibble: 537 × 26
ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
<chr> <chr> <chr> <dbl> <dbl> <lgl> <lgl> <lgl> <dbl> <chr>
1 PRU PRUDEN… Finan… 0.537 0.106 FALSE FALSE FALSE 0.431 Overwe…
2 GIS GENERA… Consu… 0.552 0.151 FALSE FALSE FALSE 0.401 Overwe…
3 K KELLOGG Consu… 0.453 0.0592 FALSE FALSE FALSE 0.394 Overwe…
4 ADP AUTOMA… Infor… 0.649 0.312 FALSE FALSE FALSE 0.337 Overwe…
5 ECL ECOLAB… Mater… 0.441 0.118 FALSE FALSE FALSE 0.322 Overwe…
6 JCI JOHNSO… Indus… 0.416 0.112 FALSE FALSE FALSE 0.304 Overwe…
7 ES EVERSO… Utili… 0.392 0.0896 FALSE FALSE FALSE 0.302 Overwe…
8 PEG PUBLIC… Utili… 0.376 0.0929 FALSE FALSE FALSE 0.284 Overwe…
9 RTX RAYTHE… Indus… 0.677 0.401 FALSE FALSE FALSE 0.277 Overwe…
10 LNG CHENIE… Energy 0.274 0 TRUE FALSE TRUE 0.274 Overwe…
# … with 527 more rows, 16 more variables: esg_tilt_z_score <dbl>,
# esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
# ActivePremium <dbl>, Alpha <dbl>, AnnualizedAlpha <dbl>, Beta <dbl>,
# `Beta-` <dbl>, `Beta+` <dbl>, Correlation <dbl>,
# `Correlationp-value` <dbl>, InformationRatio <dbl>, `R-squared` <dbl>,
# TrackingError <dbl>, TreynorRatio <dbl>, and abbreviated variable names
# ¹company_name, ²standard_etf, ³in_esg_only, ⁴in_standard_only, …
Remember, we’re doing exploratory data analysis. We’re trying to poke & prod at the data do find interesting patterns and generate hypotheses that we can explore with more rigor at at later stage. This combined dataset gives us some fun stuff to work with.
Initial Exploration Examples
Here are two small examples of the kind of exploratory questions we can ask
Question: Do the highest quintile of ESG rated stocks have higher risk-adjusted returns?
table.CAPM gives us a few ways to quantify this. Make sure you look up how these metrics are calculated, and what they represent intuitively.
First, let’s look at AnnualizedAlpha by quintile:
etf_comparison_w_capm_table |>
ggplot(aes( y = fct_reorder(as.factor(esg_tilt_quantile_5), AnnualizedAlpha),
x = AnnualizedAlpha)) +
geom_boxplot()Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`).
Next, let’s use InformationRatio.
etf_comparison_w_capm_table %>%
ggplot(aes( y = fct_reorder(as.factor(esg_tilt_quantile_5), InformationRatio),
x = InformationRatio)) +
geom_boxplot()Warning: Removed 15 rows containing non-finite values (`stat_boxplot()`).
What Do We Learn from This?
Remember, quintile 1 represents the most ESG stocks, and quintile 5represents the least ESG stocks. At least during the time period that we’re looking at here, it looks like the least ESG stocks have the highest risk-adjusted returns. Does this mean that ESG is worthless? It could, but not necessarily. These findings are VERY context dependent. First, we remember what we are looking at: the performance of individual stocks over a specific time period. If you change the time period, even by a little, the results could change significantly. This analysis includes a huge commodity price rally where a lot of oil companies saw record profits. Maybe that’s what we’re seeing here? Or maybe it’s the collapse of the tech-bubble?
Coming up with interesting, non-obvious questions is exactly what we’re trying to do here. We could go on to test the hypothesis about oil companies or tech driving these results in further analysis.
In sum: Common Sense + Context Knowledge + Quantitative Tools = Interesting Non-Obvious Insights
Conclusion
In this lesson:
We learned how to use
tidyquantto get financial data usingtq_get().We learned calculate financial returns using
tq_transmute().We learned the basic intuitions & recipe for doing performance analysis using
tq_performance().We learned how to scale that analysis and use it to come up with interesting hypotheses using exploratory data analysis.
We just scratched the surface.
Play around with the data and use the techniques we learned here to explore this data even further.
Can you think of other questions you could look into using these tools?