BlackRock ESG ETF Data, Part 3

Author

Okung Obang

Part I: Setup

Load R Package

Code

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.2.2

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.2     ✔ forcats 0.5.2

Warning: package 'ggplot2' was built under R version 4.2.2

Warning: package 'tidyr' was built under R version 4.2.2

Warning: package 'purrr' was built under R version 4.2.2

Warning: package 'dplyr' was built under R version 4.2.2

Warning: package 'stringr' was built under R version 4.2.2

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Load Data

Code

# assign the url to `github_raw_csv_url`
github_raw_csv_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/blackrock_etf_screener_2022-08-30.csv"

# read in the data, and assign it to the object `blackrock_etf_data`
blackrock_etf_data <- read_csv(github_raw_csv_url)

Rows: 393 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): ticker, name, incept_date, net_assets_as_of, asset_class, sub_asse...
dbl  (8): gross_expense_ratio_percent, net_expense_ratio_percent, net_assets...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

blackrock_etf_data <- blackrock_etf_data |> 
  # we are transforming both date columns (currently character strings) into date objects
  # so we can work with them.
  # this syntax is a bit confusing, but selects all columns containing `date` and applies
  # lubridate::mdy() function to them to turn them into date objects. 
  mutate(across(contains("date"), lubridate::mdy)) |>
  # Billions is a more useful magnitude than millions, so we'll create a column with 
  # the assets in billions by dividing by `net_assets_millions` by 1,000 (10^3)
  # If we wanted trillions, we could divide by 1,000,000 (10^6)
  mutate(net_assets_usd_bn = net_assets_usd_mn/10^3) |> 
  # this column doesn't add anything to our analysis - it says that the data is from 8/30/22
  select(-net_assets_as_of)

mini_blackrock_data <- blackrock_etf_data |> 
  # group by whether the fund is an ESG fund or not
  group_by(is_esg) |> 
  # take the top 5 from each group, by net assets
  slice_max(order_by = net_assets_usd_bn, n = 5) |> 
  # select the following columns 
  select(ticker, fund_name = name_wo_ishares_etf, asset_class, sub_asset_class, region, incept_date, net_assets_usd_bn,
         msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |> 
  # rename to `co2_intensity` because the full name is a mouthful, if descriptive.
  rename(co2_intensity = msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |> 
  # always good to ungroup() if you've used a group_by().  We'll discuss later.
  ungroup()

Adding missing grouping variables: `is_esg`

Part II: Finding Insights Through Aggregation

Looking at Mini Dataset

Code

mini_blackrock_data |>
  glimpse()

Rows: 10
Columns: 9
$ is_esg            <chr> "ESG Fund", "ESG Fund", "ESG Fund", "ESG Fund", "ESG…
$ ticker            <chr> "ESGU", "ESGD", "ICLN", "ESGE", "DSI", "IVV", "IEFA"…
$ fund_name         <chr> "ESG Aware MSCI USA", "ESG Aware MSCI EAFE", "Global…
$ asset_class       <chr> "Equity", "Equity", "Equity", "Equity", "Equity", "E…
$ sub_asset_class   <chr> "Large/Mid Cap", "Large/Mid Cap", "All Cap", "Large/…
$ region            <chr> "North America", "Global", "Global", "Global", "Nort…
$ incept_date       <date> 2016-12-01, 2016-06-28, 2008-06-24, 2016-06-28, 200…
$ net_assets_usd_bn <dbl> 22.376688, 6.425673, 5.628011, 4.233544, 3.688930, 2…
$ co2_intensity     <dbl> 103.60, 103.83, 265.82, 167.71, 72.73, 148.34, 126.6…

Code

funds_by_assets_bn <- mini_blackrock_data |> 
  select(fund_name, is_esg, region, net_assets_usd_bn) 

funds_by_assets_bn

# A tibble: 10 × 4
   fund_name                  is_esg       region        net_assets_usd_bn
   <chr>                      <chr>        <chr>                     <dbl>
 1 ESG Aware MSCI USA         ESG Fund     North America             22.4 
 2 ESG Aware MSCI EAFE        ESG Fund     Global                     6.43
 3 Global Clean Energy        ESG Fund     Global                     5.63
 4 ESG Aware MSCI EM          ESG Fund     Global                     4.23
 5 MSCI KLD 400 Social        ESG Fund     North America              3.69
 6 Core S&P 500               Regular Fund North America            298.  
 7 Core MSCI EAFE             Regular Fund Global                    84.2 
 8 Core U.S. Aggregate Bond   Regular Fund North America             82.3 
 9 Core S&P Small-Cap         Regular Fund North America             66.5 
10 Core MSCI Emerging Markets Regular Fund Global                    64.9

Feature Engineering

Code

funds_by_assets_bn |> 
  mutate(value_z_score = (net_assets_usd_bn-mean(net_assets_usd_bn))/sd(net_assets_usd_bn),
         value_rank = rank(net_assets_usd_bn),
         value_percentile = percent_rank(net_assets_usd_bn),
         value_ntile_5 = ntile(net_assets_usd_bn, 5),
         value_pct_total = net_assets_usd_bn/sum(net_assets_usd_bn)) |> 
  arrange(value_rank)

# A tibble: 10 × 9
   fund_name       is_esg region net_a…¹ value…² value…³ value…⁴ value…⁵ value…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 MSCI KLD 400 S… ESG F… North…    3.69 -0.677        1   0           1 0.00578
 2 ESG Aware MSCI… ESG F… Global    4.23 -0.671        2   0.111       1 0.00664
 3 Global Clean E… ESG F… Global    5.63 -0.655        3   0.222       2 0.00882
 4 ESG Aware MSCI… ESG F… Global    6.43 -0.646        4   0.333       2 0.0101 
 5 ESG Aware MSCI… ESG F… North…   22.4  -0.466        5   0.444       3 0.0351 
 6 Core MSCI Emer… Regul… Global   64.9   0.0126       6   0.556       3 0.102  
 7 Core S&P Small… Regul… North…   66.5   0.0307       7   0.667       4 0.104  
 8 Core U.S. Aggr… Regul… North…   82.3   0.209        8   0.778       4 0.129  
 9 Core MSCI EAFE  Regul… Global   84.2   0.230        9   0.889       5 0.132  
10 Core S&P 500    Regul… North…  298.    2.63        10   1           5 0.467  
# … with abbreviated variable names ¹net_assets_usd_bn, ²value_z_score,
#   ³value_rank, ⁴value_percentile, ⁵value_ntile_5, ⁶value_pct_total

Code

funds_by_assets_bn |> 
  mutate(value_z_score = (net_assets_usd_bn-mean(net_assets_usd_bn))/sd(net_assets_usd_bn) 
         * -1, # multiply by negative 1
         value_rank = rank(net_assets_usd_bn * -1), # multiply by negative 1
         value_percentile = percent_rank(net_assets_usd_bn * -1), # multiply by negative 1
         value_ntile_5 = ntile(net_assets_usd_bn * -1, 5), # multiply by negative 1
         value_pct_total = net_assets_usd_bn/sum(net_assets_usd_bn) # not relevant
         ) |> 
  arrange(value_rank)

# A tibble: 10 × 9
   fund_name       is_esg region net_a…¹ value…² value…³ value…⁴ value…⁵ value…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 Core S&P 500    Regul… North…  298.   -2.63         1   0           1 0.467  
 2 Core MSCI EAFE  Regul… Global   84.2  -0.230        2   0.111       1 0.132  
 3 Core U.S. Aggr… Regul… North…   82.3  -0.209        3   0.222       2 0.129  
 4 Core S&P Small… Regul… North…   66.5  -0.0307       4   0.333       2 0.104  
 5 Core MSCI Emer… Regul… Global   64.9  -0.0126       5   0.444       3 0.102  
 6 ESG Aware MSCI… ESG F… North…   22.4   0.466        6   0.556       3 0.0351 
 7 ESG Aware MSCI… ESG F… Global    6.43  0.646        7   0.667       4 0.0101 
 8 Global Clean E… ESG F… Global    5.63  0.655        8   0.778       4 0.00882
 9 ESG Aware MSCI… ESG F… Global    4.23  0.671        9   0.889       5 0.00664
10 MSCI KLD 400 S… ESG F… North…    3.69  0.677       10   1           5 0.00578
# … with abbreviated variable names ¹net_assets_usd_bn, ²value_z_score,
#   ³value_rank, ⁴value_percentile, ⁵value_ntile_5, ⁶value_pct_total

Creating Functions

Code

# add the argument
plus_one <- function(x) {
  # define what you want it to do
  x + 1
}

plus_one(7)

[1] 8

Function I: `add_rank_features()`

Code

add_rank_features <- function(data, value_var = value, rank_by_lowest = TRUE, 
                              quantile_n = 5) {
  

  # If rank_by_lowest is set to TRUE, as it is by default, then it will use this first code chunk 
if (rank_by_lowest)
  data |> 
    mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
             sd({{ value_var }}),
         "{{value_var}}_rank" := rank({{ value_var }}), 
         "{{value_var}}_percentile" := percent_rank({{ value_var }}),
         "{{value_var}}_quantile_{{quantile_n}}" := ntile({{ value_var }}, 
                                                          {{ quantile_n }}),
         "{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }})) |> 
    arrange({{ value_var }})
  
   # If rank_by_lowest is set to FALSE, as it is by default, then it will use this second code chunk
else
  
  data |> 
    mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
             sd({{ value_var }}) * -1, # multiply by -1
           # put negative sign
           "{{value_var}}_rank" := rank(-{{ value_var }}), 
           # put negative sign
           "{{value_var}}_percentile" := percent_rank(-{{ value_var }}),
           # put negative sign
           "{{value_var}}_quantile_{{quantile_n}}" := ntile(-{{ value_var }}, 
                                                            {{ quantile_n }}), 
           # no change, not relevant for pct_total
           "{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }})) |> 
    arrange({{ value_var }}*-1)
    
}

Code

funds_by_assets_bn |> 
  add_rank_features(value_var = net_assets_usd_bn)

# A tibble: 10 × 9
   fund_name       is_esg region net_a…¹ net_a…² net_a…³ net_a…⁴ net_a…⁵ net_a…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 MSCI KLD 400 S… ESG F… North…    3.69 -0.677        1   0           1 0.00578
 2 ESG Aware MSCI… ESG F… Global    4.23 -0.671        2   0.111       1 0.00664
 3 Global Clean E… ESG F… Global    5.63 -0.655        3   0.222       2 0.00882
 4 ESG Aware MSCI… ESG F… Global    6.43 -0.646        4   0.333       2 0.0101 
 5 ESG Aware MSCI… ESG F… North…   22.4  -0.466        5   0.444       3 0.0351 
 6 Core MSCI Emer… Regul… Global   64.9   0.0126       6   0.556       3 0.102  
 7 Core S&P Small… Regul… North…   66.5   0.0307       7   0.667       4 0.104  
 8 Core U.S. Aggr… Regul… North…   82.3   0.209        8   0.778       4 0.129  
 9 Core MSCI EAFE  Regul… Global   84.2   0.230        9   0.889       5 0.132  
10 Core S&P 500    Regul… North…  298.    2.63        10   1           5 0.467  
# … with abbreviated variable names ¹net_assets_usd_bn,
#   ²net_assets_usd_bn_z_score, ³net_assets_usd_bn_rank,
#   ⁴net_assets_usd_bn_percentile, ⁵net_assets_usd_bn_quantile_5,
#   ⁶net_assets_usd_bn_pct_total

Code

funds_by_assets_bn |> 
  add_rank_features(value_var = net_assets_usd_bn, 
                    # Now it will rank by highest value equals 1
                    rank_by_lowest = FALSE, 
                    # let's divide it into 10 quantiles (deciles)
                    quantile_n = 10)

# A tibble: 10 × 9
   fund_name       is_esg region net_a…¹ net_a…² net_a…³ net_a…⁴ net_a…⁵ net_a…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 Core S&P 500    Regul… North…  298.   -2.63         1   0           1 0.467  
 2 Core MSCI EAFE  Regul… Global   84.2  -0.230        2   0.111       2 0.132  
 3 Core U.S. Aggr… Regul… North…   82.3  -0.209        3   0.222       3 0.129  
 4 Core S&P Small… Regul… North…   66.5  -0.0307       4   0.333       4 0.104  
 5 Core MSCI Emer… Regul… Global   64.9  -0.0126       5   0.444       5 0.102  
 6 ESG Aware MSCI… ESG F… North…   22.4   0.466        6   0.556       6 0.0351 
 7 ESG Aware MSCI… ESG F… Global    6.43  0.646        7   0.667       7 0.0101 
 8 Global Clean E… ESG F… Global    5.63  0.655        8   0.778       8 0.00882
 9 ESG Aware MSCI… ESG F… Global    4.23  0.671        9   0.889       9 0.00664
10 MSCI KLD 400 S… ESG F… North…    3.69  0.677       10   1          10 0.00578
# … with abbreviated variable names ¹net_assets_usd_bn,
#   ²net_assets_usd_bn_z_score, ³net_assets_usd_bn_rank,
#   ⁴net_assets_usd_bn_percentile, ⁵net_assets_usd_bn_quantile_10,
#   ⁶net_assets_usd_bn_pct_total

Creating Features by Group

Code

funds_by_assets_bn |> 
  group_by(is_esg) |> 
  add_rank_features(value_var = net_assets_usd_bn, 
                    # Now it will rank by highest value equals 1
                    rank_by_lowest = FALSE)

# A tibble: 10 × 9
# Groups:   is_esg [2]
   fund_name       is_esg region net_a…¹ net_a…² net_a…³ net_a…⁴ net_a…⁵ net_a…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 Core S&P 500    Regul… North…  298.    -1.78        1    0          1  0.500 
 2 Core MSCI EAFE  Regul… Global   84.2    0.348       2    0.25       2  0.141 
 3 Core U.S. Aggr… Regul… North…   82.3    0.367       3    0.5        3  0.138 
 4 Core S&P Small… Regul… North…   66.5    0.525       4    0.75       4  0.112 
 5 Core MSCI Emer… Regul… Global   64.9    0.541       5    1          5  0.109 
 6 ESG Aware MSCI… ESG F… North…   22.4   -1.77        1    0          1  0.528 
 7 ESG Aware MSCI… ESG F… Global    6.43   0.261       2    0.25       2  0.152 
 8 Global Clean E… ESG F… Global    5.63   0.362       3    0.5        3  0.133 
 9 ESG Aware MSCI… ESG F… Global    4.23   0.540       4    0.75       4  0.100 
10 MSCI KLD 400 S… ESG F… North…    3.69   0.609       5    1          5  0.0871
# … with abbreviated variable names ¹net_assets_usd_bn,
#   ²net_assets_usd_bn_z_score, ³net_assets_usd_bn_rank,
#   ⁴net_assets_usd_bn_percentile, ⁵net_assets_usd_bn_quantile_5,
#   ⁶net_assets_usd_bn_pct_total

Understanding the Output

What is the percent of the grouped total for the largest ESG fund and for the largest non-ESG fund?

Code

funds_by_assets_bn |> 
  group_by(is_esg, region) |> 
  add_rank_features(value_var = net_assets_usd_bn, 
                    # Now it will rank by highest value equals 1
                    rank_by_lowest = FALSE) |> 
  # arrange first by is_esg, second by region, and third by rank so you can easily
  # see the ranking within each sub-group
  arrange(is_esg, region, net_assets_usd_bn_rank)

# A tibble: 10 × 9
# Groups:   is_esg, region [4]
   fund_name       is_esg region net_a…¹ net_a…² net_a…³ net_a…⁴ net_a…⁵ net_a…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 ESG Aware MSCI… ESG F… Global    6.43  -0.898       1     0         1   0.395
 2 Global Clean E… ESG F… Global    5.63  -0.179       2     0.5       2   0.346
 3 ESG Aware MSCI… ESG F… Global    4.23   1.08        3     1         3   0.260
 4 ESG Aware MSCI… ESG F… North…   22.4   -0.707       1     0         1   0.858
 5 MSCI KLD 400 S… ESG F… North…    3.69   0.707       2     1         2   0.142
 6 Core MSCI EAFE  Regul… Global   84.2   -0.707       1     0         1   0.565
 7 Core MSCI Emer… Regul… Global   64.9    0.707       2     1         2   0.435
 8 Core S&P 500    Regul… North…  298.    -1.15        1     0         1   0.667
 9 Core U.S. Aggr… Regul… North…   82.3    0.515       2     0.5       2   0.184
10 Core S&P Small… Regul… North…   66.5    0.637       3     1         3   0.149
# … with abbreviated variable names ¹net_assets_usd_bn,
#   ²net_assets_usd_bn_z_score, ³net_assets_usd_bn_rank,
#   ⁴net_assets_usd_bn_percentile, ⁵net_assets_usd_bn_quantile_5,
#   ⁶net_assets_usd_bn_pct_total

Understanding the Output

1) What is the second largest ESG fund with a global focus?

The second largest ESG fund with a global focus is Global Clean Energy.

2) What other grouping might be useful to try with this data? Try it.

It might be useful to group the funds by carbon intensity.

Code

funds_by_co2 <- mini_blackrock_data |>
   select(fund_name, is_esg, region, co2_intensity)
funds_by_co2

# A tibble: 10 × 4
   fund_name                  is_esg       region        co2_intensity
   <chr>                      <chr>        <chr>                 <dbl>
 1 ESG Aware MSCI USA         ESG Fund     North America         104. 
 2 ESG Aware MSCI EAFE        ESG Fund     Global                104. 
 3 Global Clean Energy        ESG Fund     Global                266. 
 4 ESG Aware MSCI EM          ESG Fund     Global                168. 
 5 MSCI KLD 400 Social        ESG Fund     North America          72.7
 6 Core S&P 500               Regular Fund North America         148. 
 7 Core MSCI EAFE             Regular Fund Global                127. 
 8 Core U.S. Aggregate Bond   Regular Fund North America         283. 
 9 Core S&P Small-Cap         Regular Fund North America         133. 
10 Core MSCI Emerging Markets Regular Fund Global                369.

Code

funds_by_co2 |>
 add_rank_features(value_var = co2_intensity,
                    rank_by_lowest = FALSE) |>
   arrange(is_esg, region, co2_intensity_rank)

# A tibble: 10 × 9
   fund_name       is_esg region co2_i…¹ co2_i…² co2_i…³ co2_i…⁴ co2_i…⁵ co2_i…⁶
   <chr>           <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <int>   <dbl>
 1 Global Clean E… ESG F… Global   266.  -0.923        3   0.222       2  0.150 
 2 ESG Aware MSCI… ESG F… Global   168.   0.0998       4   0.333       2  0.0946
 3 ESG Aware MSCI… ESG F… Global   104.   0.766        8   0.778       4  0.0586
 4 ESG Aware MSCI… ESG F… North…   104.   0.768        9   0.889       5  0.0584
 5 MSCI KLD 400 S… ESG F… North…    72.7  1.09        10   1           5  0.0410
 6 Core MSCI Emer… Regul… Global   369.  -2.00         1   0           1  0.208 
 7 Core MSCI EAFE  Regul… Global   127.   0.528        7   0.667       4  0.0714
 8 Core U.S. Aggr… Regul… North…   283.  -1.10         2   0.111       1  0.159 
 9 Core S&P 500    Regul… North…   148.   0.302        5   0.444       3  0.0837
10 Core S&P Small… Regul… North…   133.   0.464        6   0.556       3  0.0749
# … with abbreviated variable names ¹co2_intensity, ²co2_intensity_z_score,
#   ³co2_intensity_rank, ⁴co2_intensity_percentile, ⁵co2_intensity_quantile_5,
#   ⁶co2_intensity_pct_total

Looking at this tidied data set, it seems like Global Clean Energy, despite being the largest ESG Fund, is ranked third in carbon intensity which is surprising.

Aggregation: Summarizing Data By Group

Code

funds_by_assets_bn |> 
  group_by(is_esg) |> 
  # always default to using na.rm = TRUE, otherwise NA values 
  # (very common in the wild) will trip up your calculations
  summarize(avg_assets = mean(net_assets_usd_bn, na.rm = TRUE))

# A tibble: 2 × 2
  is_esg       avg_assets
  <chr>             <dbl>
1 ESG Fund           8.47
2 Regular Fund     119.

Code

funds_by_assets_bn |> 
  group_by(is_esg) |> 
  summarize(avg = mean(net_assets_usd_bn, na.rm = TRUE),
            std_dev = sd(net_assets_usd_bn, na.rm = TRUE),
            min = min(net_assets_usd_bn, na.rm = TRUE),
            max = max(net_assets_usd_bn, na.rm = TRUE),
            pctile_25 = quantile(net_assets_usd_bn, .25, na.rm = TRUE),
            # median is the same as the 50th percentile above
            median = median(net_assets_usd_bn, na.rm = TRUE),
            pctile_75 = quantile(net_assets_usd_bn, .75, na.rm = TRUE))

# A tibble: 2 × 8
  is_esg          avg std_dev   min   max pctile_25 median pctile_75
  <chr>         <dbl>   <dbl> <dbl> <dbl>     <dbl>  <dbl>     <dbl>
1 ESG Fund       8.47    7.85  3.69  22.4      4.23   5.63      6.43
2 Regular Fund 119.    100.   64.9  298.      66.5   82.3      84.2

Function 2: `calculate_summary_stats()`

Code

calculate_summary_stats <- function(data, value_var = value) {
  data |> 
    summarize(avg = mean({{ value_var }}, na.rm = TRUE),
              std_dev = sd({{ value_var }}, na.rm = TRUE),
              min = min({{ value_var }}, na.rm = TRUE),
              max = max({{ value_var }}, na.rm = TRUE),
              pctile_25 = quantile({{ value_var }}, .25, na.rm = TRUE),
              median = median({{ value_var }}, na.rm = TRUE),
              pctile_75 = quantile({{ value_var }}, .75, na.rm = TRUE)) |> 
  # ungrouping is helpful so we don't have to do so afterwards. Save time. 
  ungroup()
}

Code

funds_by_assets_bn |> 
  group_by(is_esg) |> 
  calculate_summary_stats(value_var = net_assets_usd_bn)

# A tibble: 2 × 8
  is_esg          avg std_dev   min   max pctile_25 median pctile_75
  <chr>         <dbl>   <dbl> <dbl> <dbl>     <dbl>  <dbl>     <dbl>
1 ESG Fund       8.47    7.85  3.69  22.4      4.23   5.63      6.43
2 Regular Fund 119.    100.   64.9  298.      66.5   82.3      84.2

Code

funds_by_assets_bn |> 
  group_by(is_esg, region) |> 
  calculate_summary_stats(value_var = net_assets_usd_bn)

`summarise()` has grouped output by 'is_esg'. You can override using the
`.groups` argument.

# A tibble: 4 × 9
  is_esg       region           avg std_dev   min    max pctile…¹ median pctil…²
  <chr>        <chr>          <dbl>   <dbl> <dbl>  <dbl>    <dbl>  <dbl>   <dbl>
1 ESG Fund     Global          5.43    1.11  4.23   6.43     4.93   5.63    6.03
2 ESG Fund     North America  13.0    13.2   3.69  22.4      8.36  13.0    17.7 
3 Regular Fund Global         74.6    13.6  64.9   84.2     69.7   74.6    79.4 
4 Regular Fund North America 149.    129.   66.5  298.      74.4   82.3   190.  
# … with abbreviated variable names ¹pctile_25, ²pctile_75

Understanding the Output

1) What is the size of the median North American ESG fund in our sample?

The median size of the NA ESG fund is 13.02 billion USD in assets.

Comparing Data Aggregates

pivot_longer(): column names -> names; values in column called values.

pivot_wider(): specify column where column names are going to come from using names_from, and the column where values comes from values_from

Code

vignette("pivot")

starting httpd help server ... done

Step 1: Calculate Summary Statistics

Code

funds_by_assets_bn |> 
  group_by(is_esg) |> 
  calculate_summary_stats(value_var = net_assets_usd_bn)

# A tibble: 2 × 8
  is_esg          avg std_dev   min   max pctile_25 median pctile_75
  <chr>         <dbl>   <dbl> <dbl> <dbl>     <dbl>  <dbl>     <dbl>
1 ESG Fund       8.47    7.85  3.69  22.4      4.23   5.63      6.43
2 Regular Fund 119.    100.   64.9  298.      66.5   82.3      84.2

Step 2: Pivot Data with `pivot_longer()`

Compare the two groups of ESG and Regular Funds

Code

funds_by_assets_bn |> 
  # group by the feature(s) you want summary statistics for
  group_by(is_esg) |> 
  # use the function we built to calculate those summary stats.
  calculate_summary_stats(value_var = net_assets_usd_bn) |> 
  # pivot the data longer (tidy format) so that all of the summary stats names are in one and the values are in another.
  pivot_longer(cols = avg:pctile_75, names_to = "aggregates")

# A tibble: 14 × 3
   is_esg       aggregates  value
   <chr>        <chr>       <dbl>
 1 ESG Fund     avg          8.47
 2 ESG Fund     std_dev      7.85
 3 ESG Fund     min          3.69
 4 ESG Fund     max         22.4 
 5 ESG Fund     pctile_25    4.23
 6 ESG Fund     median       5.63
 7 ESG Fund     pctile_75    6.43
 8 Regular Fund avg        119.  
 9 Regular Fund std_dev    100.  
10 Regular Fund min         64.9 
11 Regular Fund max        298.  
12 Regular Fund pctile_25   66.5 
13 Regular Fund median      82.3 
14 Regular Fund pctile_75   84.2

Step 3: Pivot with `pivot_wider()`

Code

funds_by_assets_bn |> 
  # group by the feature(s) you want summary statistics for
  group_by(is_esg) |> 
  # use the function we built to calculate those summary stats.
  calculate_summary_stats(value_var = net_assets_usd_bn) |> 
  # pivot the data longer (tidy format) so that all of the summary stats names are in one and the values are in another.
  pivot_longer(cols = avg:pctile_75, names_to = "aggregates") |> 
  # then pivot the data into wide format where the two groups you are comparing are columns
  pivot_wider(names_from = is_esg, values_from = value)

# A tibble: 7 × 3
  aggregates `ESG Fund` `Regular Fund`
  <chr>           <dbl>          <dbl>
1 avg              8.47          119. 
2 std_dev          7.85          100. 
3 min              3.69           64.9
4 max             22.4           298. 
5 pctile_25        4.23           66.5
6 median           5.63           82.3
7 pctile_75        6.43           84.2

Step 4: Use janitor::clean_names()

Code

 # install.packages("janitor")

Code

funds_by_assets_bn |> 
  # group by the feature(s) you want summary statistics for
  group_by(is_esg) |> 
  # use the function we built to calculate those summary stats.
  calculate_summary_stats(value_var = net_assets_usd_bn) |> 
  # pivot the data longer (tidy format) so that all of the summary stats names are in one and the values are in another.
  pivot_longer(cols = avg:pctile_75, names_to = "aggregates") |> 
  # then pivot the data into wide format where the two groups you are comparing are columns
  pivot_wider(names_from = is_esg, values_from = value) |> 
   # make names snake_case which is easier to work with in the mutate function
  janitor::clean_names()

# A tibble: 7 × 3
  aggregates esg_fund regular_fund
  <chr>         <dbl>        <dbl>
1 avg            8.47        119. 
2 std_dev        7.85        100. 
3 min            3.69         64.9
4 max           22.4         298. 
5 pctile_25      4.23         66.5
6 median         5.63         82.3
7 pctile_75      6.43         84.2

Step 5: Use `mutate()` to make meaningful comparisons

Code

funds_by_assets_bn |> 
  # group by the feature(s) you want summary statistics for
  group_by(is_esg) |> 
  # use the function we built to calculate those summary stats.
  calculate_summary_stats(value_var = net_assets_usd_bn) |> 
  # pivot the data longer (tidy format) so that all of the summary stats names are in one and the values are in another.
  pivot_longer(cols = avg:pctile_75, names_to = "aggregates") |> 
  # then pivot the data into wide format where the two groups you are comparing are columns
  pivot_wider(names_from = is_esg, values_from = value) |> 
   # make names snake_case which is easier to work with in the mutate function
  janitor::clean_names() |> 
  # use your judgement to figure out the best way to compare the two groups
  mutate(esg_fund_pct_of_regular = esg_fund/regular_fund * 100,
         regular_times_larger_than_esg = regular_fund/esg_fund)

# A tibble: 7 × 5
  aggregates esg_fund regular_fund esg_fund_pct_of_regular regular_times_large…¹
  <chr>         <dbl>        <dbl>                   <dbl>                 <dbl>
1 avg            8.47        119.                     7.11                  14.1
2 std_dev        7.85        100.                     7.83                  12.8
3 min            3.69         64.9                    5.68                  17.6
4 max           22.4         298.                     7.52                  13.3
5 pctile_25      4.23         66.5                    6.36                  15.7
6 median         5.63         82.3                    6.83                  14.6
7 pctile_75      6.43         84.2                    7.63                  13.1
# … with abbreviated variable name ¹regular_times_larger_than_esg

Understanding the Output

1) How much larger is the largest regular fund than the largest ESG fund?

The largest regular fund is about 13.3 times larger than the largest ESG fund.

2) How much bigger is the average regular fund in our sample than the largest ESG fund?

The average regular fund in the sample is about 5.3 times larger than the largest ESG fund.

3) Which ones seem the most relevant to you?

Looking at the average regular fund to the largest ESG fund seems more relevant to me because if we want to see trends towards ESG objectives, we should hope that the average regular fund’s size to decrease relative to ESG funds.

Part III: Your Turn

Challenge 1: Use Tools on the Full Data-set

Code

blackrock_etf_data |>
  glimpse()

Rows: 393
Columns: 22
$ ticker                                                                            <chr> …
$ name                                                                              <chr> …
$ incept_date                                                                       <date> …
$ gross_expense_ratio_percent                                                       <dbl> …
$ net_expense_ratio_percent                                                         <dbl> …
$ net_assets_usd_mn                                                                 <dbl> …
$ asset_class                                                                       <chr> …
$ sub_asset_class                                                                   <chr> …
$ region                                                                            <chr> …
$ market                                                                            <chr> …
$ location                                                                          <chr> …
$ investment_style                                                                  <chr> …
$ sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc <chr> …
$ msci_esg_quality_score_0_10                                                       <dbl> …
$ msci_weighted_average_carbon_intensity_tons_co2e_m_sales                          <dbl> …
$ msci_esg_percent_coverage                                                         <dbl> …
$ sustainable_classification                                                        <chr> …
$ name_wo_ishares_etf                                                               <chr> …
$ is_esg                                                                            <chr> …
$ years_from_inception                                                              <dbl> …
$ year_launched                                                                     <dbl> …
$ net_assets_usd_bn                                                                 <dbl> …

Code

funds_by_assets_bn <- blackrock_etf_data |>
  select(name, is_esg, region, net_assets_usd_mn, asset_class) |>
  mutate(net_assets_usd_bn = net_assets_usd_mn/100) |>
  group_by(asset_class, is_esg)
funds_by_assets_bn

# A tibble: 393 × 6
# Groups:   asset_class, is_esg [8]
   name                                   is_esg  region net_a…¹ asset…² net_a…³
   <chr>                                  <chr>   <chr>    <dbl> <chr>     <dbl>
 1 iShares Core S&P 500 ETF               Regula… North… 297663. Equity    2977.
 2 iShares Core MSCI EAFE ETF             Regula… Global  84222. Equity     842.
 3 iShares Core U.S. Aggregate Bond ETF   Regula… North…  82344. Fixed …    823.
 4 iShares Core S&P Small-Cap ETF         Regula… North…  66533. Equity     665.
 5 iShares Core MSCI Emerging Markets ETF Regula… Global  64920. Equity     649.
 6 iShares Russell 1000 Growth ETF        Regula… North…  61831. Equity     618.
 7 iShares Core S&P Mid-Cap ETF           Regula… North…  61424. Equity     614.
 8 iShares Russell 2000 ETF               Regula… North…  53048. Equity     530.
 9 iShares Russell 1000 Value ETF         Regula… North…  51913. Equity     519.
10 iShares MSCI EAFE ETF                  Regula… Global  44144. Equity     441.
# … with 383 more rows, and abbreviated variable names ¹net_assets_usd_mn,
#   ²asset_class, ³net_assets_usd_bn

Code

funds_by_assets_bn |>
  ggplot(mapping=aes(x=asset_class, y=net_assets_usd_bn, fill=is_esg)) + geom_col()

Code

funds_by_year <- blackrock_etf_data |>
  select(name, is_esg, year_launched, net_assets_usd_mn) |>
  mutate(net_assets_usd_bn = net_assets_usd_mn/100)
funds_by_year

# A tibble: 393 × 5
   name                                   is_esg       year_la…¹ net_a…² net_a…³
   <chr>                                  <chr>            <dbl>   <dbl>   <dbl>
 1 iShares Core S&P 500 ETF               Regular Fund      2000 297663.   2977.
 2 iShares Core MSCI EAFE ETF             Regular Fund      2012  84222.    842.
 3 iShares Core U.S. Aggregate Bond ETF   Regular Fund      2003  82344.    823.
 4 iShares Core S&P Small-Cap ETF         Regular Fund      2000  66533.    665.
 5 iShares Core MSCI Emerging Markets ETF Regular Fund      2012  64920.    649.
 6 iShares Russell 1000 Growth ETF        Regular Fund      2000  61831.    618.
 7 iShares Core S&P Mid-Cap ETF           Regular Fund      2000  61424.    614.
 8 iShares Russell 2000 ETF               Regular Fund      2000  53048.    530.
 9 iShares Russell 1000 Value ETF         Regular Fund      2000  51913.    519.
10 iShares MSCI EAFE ETF                  Regular Fund      2001  44144.    441.
# … with 383 more rows, and abbreviated variable names ¹year_launched,
#   ²net_assets_usd_mn, ³net_assets_usd_bn

Code

funds_by_year |>
  ggplot(mapping = aes(x=year_launched, y=net_assets_usd_bn, fill = is_esg)) + geom_col()

)

Challenge 2: Comparing holdings of largest ESG Funds vs. largest Regular Fund

Code

etf_comparison_data_github_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"

etf_comparison <- read_csv(etf_comparison_data_github_url)

Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

etf_comparison

# A tibble: 537 × 14
   ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
   <chr>  <chr>   <chr>    <dbl>   <dbl> <lgl>   <lgl>   <lgl>     <dbl> <chr>  
 1 PRU    PRUDEN… Finan…   0.537  0.106  FALSE   FALSE   FALSE     0.431 Overwe…
 2 GIS    GENERA… Consu…   0.552  0.151  FALSE   FALSE   FALSE     0.401 Overwe…
 3 K      KELLOGG Consu…   0.453  0.0592 FALSE   FALSE   FALSE     0.394 Overwe…
 4 ADP    AUTOMA… Infor…   0.649  0.312  FALSE   FALSE   FALSE     0.337 Overwe…
 5 ECL    ECOLAB… Mater…   0.441  0.118  FALSE   FALSE   FALSE     0.322 Overwe…
 6 JCI    JOHNSO… Indus…   0.416  0.112  FALSE   FALSE   FALSE     0.304 Overwe…
 7 ES     EVERSO… Utili…   0.392  0.0896 FALSE   FALSE   FALSE     0.302 Overwe…
 8 PEG    PUBLIC… Utili…   0.376  0.0929 FALSE   FALSE   FALSE     0.284 Overwe…
 9 RTX    RAYTHE… Indus…   0.677  0.401  FALSE   FALSE   FALSE     0.277 Overwe…
10 LNG    CHENIE… Energy   0.274  0      TRUE    FALSE   TRUE      0.274 Overwe…
# … with 527 more rows, 4 more variables: esg_tilt_z_score <dbl>,
#   esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
#   and abbreviated variable names ¹company_name, ²standard_etf, ³in_esg_only,
#   ⁴in_standard_only, ⁵in_on_index_only, ⁶esg_tilt, ⁷esg_uw_ow

Code

glimpse(etf_comparison)

Rows: 537
Columns: 14
$ ticker              <chr> "PRU", "GIS", "K", "ADP", "ECL", "JCI", "ES", "PEG…
$ company_name        <chr> "PRUDENTIAL FINANCIAL INC", "GENERAL MILLS INC", "…
$ sector              <chr> "Financials", "Consumer Staples", "Consumer Staple…
$ esg_etf             <dbl> 0.5366803, 0.5522180, 0.4534279, 0.6486836, 0.4407…
$ standard_etf        <dbl> 0.10574313, 0.15134370, 0.05920732, 0.31168123, 0.…
$ in_esg_only         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ in_standard_only    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ in_on_index_only    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ esg_tilt            <dbl> 0.4309371, 0.4008743, 0.3942206, 0.3370024, 0.3222…
$ esg_uw_ow           <chr> "Overweight", "Overweight", "Overweight", "Overwei…
$ esg_tilt_z_score    <dbl> -3.396231, -3.159305, -3.106867, -2.655928, -2.539…
$ esg_tilt_rank       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
$ esg_tilt_percentile <dbl> 0.000000000, 0.001865672, 0.003731343, 0.005597015…
$ esg_tilt_quantile_5 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Code

etf <- etf_comparison |>
  select(sector, esg_etf, standard_etf, in_esg_only, in_standard_only) |>
  group_by(sector) |>
  summarize(avg_esg_etf = mean(esg_etf),
            avg_stand_etf =mean(standard_etf))
  etf

# A tibble: 11 × 3
   sector                 avg_esg_etf avg_stand_etf
   <chr>                        <dbl>         <dbl>
 1 Communication               0.289         0.323 
 2 Consumer Discretionary      0.184         0.192 
 3 Consumer Staples            0.205         0.202 
 4 Energy                      0.209         0.200 
 5 Financials                  0.153         0.162 
 6 Health Care                 0.219         0.225 
 7 Industrials                 0.110         0.105 
 8 Information Technology      0.307         0.290 
 9 Materials                   0.0918        0.0865
10 Real Estate                 0.0959        0.0875
11 Utilities                   0.0866        0.0979

Code

etf |>
  ggplot(mapping=aes(x =avg_esg_etf, y=avg_stand_etf, color=sector)) + geom_point()

Code

etf_tilt <- etf_comparison |>
  select(company_name, sector, esg_tilt, esg_tilt_z_score, in_esg_only, in_standard_only) |>
  group_by(sector, in_esg_only) |>
  summarize(avg_esg_tilt_z_score = mean(esg_tilt_z_score))

`summarise()` has grouped output by 'sector'. You can override using the
`.groups` argument.

Code

etf_tilt

# A tibble: 21 × 3
# Groups:   sector [11]
   sector                 in_esg_only avg_esg_tilt_z_score
   <chr>                  <lgl>                      <dbl>
 1 Communication          FALSE                     0.353 
 2 Communication          TRUE                     -0.774 
 3 Consumer Discretionary FALSE                     0.153 
 4 Consumer Discretionary TRUE                     -0.993 
 5 Consumer Staples       FALSE                     0.0274
 6 Consumer Staples       TRUE                     -1.57  
 7 Energy                 FALSE                     0.0218
 8 Energy                 TRUE                     -2.16  
 9 Financials             FALSE                     0.0941
10 Financials             TRUE                     -0.846 
# … with 11 more rows

Code

etf_tilt |>
  ggplot(mapping=aes(x=sector, y=avg_esg_tilt_z_score, color=sector)) + geom_boxplot() + scale_x_discrete(guide = guide_axis(n.dodge = 3))

Part I: Setup

Load R Package

Load Data

Part II: Finding Insights Through Aggregation

Looking at Mini Dataset

Feature Engineering

Creating Functions

Function I: add_rank_features()

Creating Features by Group

Understanding the Output

Understanding the Output

Aggregation: Summarizing Data By Group

Function 2: calculate_summary_stats()

Understanding the Output

Comparing Data Aggregates

Step 1: Calculate Summary Statistics

Step 2: Pivot Data with pivot_longer()

Step 3: Pivot with pivot_wider()

Step 4: Use janitor::clean_names()

Step 5: Use mutate() to make meaningful comparisons

Understanding the Output

Part III: Your Turn

Challenge 1: Use Tools on the Full Data-set

Challenge 2: Comparing holdings of largest ESG Funds vs. largest Regular Fund

Function I: `add_rank_features()`

Function 2: `calculate_summary_stats()`

Step 2: Pivot Data with `pivot_longer()`

Step 3: Pivot with `pivot_wider()`

Step 5: Use `mutate()` to make meaningful comparisons