Examining BlackRock’s ESG ETF Business: An Applied Introduction to Data Analysis in R
Author
Teal Emery
Introduction
BlackRock is the world’s largest asset manager. They have been an enthusiastic proponent of ESG investing. BlackRock caters to both institutional and retail investors. Their iShares platform provides Exchange Traded Funds (ETFs) and mutual funds targeted predominantly at retail investors. This is meaningful because this is how many ordinary investors will invest in ESG funds.
Our objective is to start to build up your toolkit for analyzing data in R using data relevant to understanding sustainable finance. This complements the work you do in R for Data Science 2e, and introduces you to a relevant dataset that you can use to practice what you learn in your textbook.
This exercise has two parts.
Setup: You’ll learn how to set up your R session and load the BlackRock ESG ETF dataset from GitHub.
Making Sense of Your Data: You’ll learn six core functions that will enable you to start transforming your data into actionable insights.
Part I: Setup
Load R Packages
At the start of any R session, load the packages you are going to use. Here, we are just using the tidyverse, which automatically loads a set of packages useful for working with your data.
If you haven’t installed the package already, you must install it before loading it.
# if you haven't installed the tidyverse yet, uncomment the next line and install it before loading the library: # install.packages("tidyverse")library(tidyverse)
We can read the .csv file directly from GitHub using read_csv().
# assign the url to `github_raw_csv_url`github_raw_csv_url <-"https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/blackrock_etf_screener_2022-08-30.csv"# read in the data, and assign it to the object `blackrock_etf_data`blackrock_etf_data <-read_csv(github_raw_csv_url)
Rows: 393 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): ticker, name, incept_date, net_assets_as_of, asset_class, sub_asse...
dbl (8): gross_expense_ratio_percent, net_expense_ratio_percent, net_assets...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# print out the tibble to take a look at the datablackrock_etf_data
When we print the data in our markdown document, we can see basic information, and scroll through the rows and columns of the data.
View()
If you’re used to using spreadsheet programs like Excel, View() will provide you a familiar way to look at the data. It will open up a spreadsheet-like interface that you can scroll through with your mouse.
# Uncomment (delete the #) to run the code below#View(blackrock_etf_data)
dplyr::glimpse()
package::function() notation
When you see ::, as above, it means that the function glimpse() is from the dplyr package. You can run the function just using the glimpse() command if you’ve already loaded the tidyverse packages (dplyr is part of the tidyverse) using library(tidyverse) as we’ve done above. If not, you can run it by specifying the whole thing dplyr::glimpse()
glimpse() is a great way to see all the variables and their data types in a compact manner.
Our dataset has 393 observations (rows), and 22 variables (columns).
Part II: The Secret to Making Sense of Your Data
The Tidyverse ecosystem provides a coherent set of R packages that make it easy to go from raw data to actionable insights. When you load library(tidyverse) at the beginning of a session in R, it automatically loads eight packages. Today, we’re starting with dplyr.
dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
mutate() adds new variables that are functions of existing variables
These all combine naturally with group_by() which allows you to perform any operation "by group". You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").
Learn these six verbs of data manipulation in R, and you’re well on your way to becoming a competent data analyst.
Take a minute and look at the Tidyverse documentation
The Tidyverse documentation is excellent. Click on the links above and explore. Getting to know these functions well early in this class will enable you to do cool stuff faster. Well worth your time.
Creating a Mini Dataset for Practice
Below, we will use a dplyrpipeline to make a mini dataset to practice on. Don’t worry about understanding all the code right now. The code is annotated, so you can get a sense of what each line is doing.
The Pipe Operators `|>` and `%>%`
Have you ever had to navigate nested functions in Excel? It’s annoying and causes errors.
Without the pipe operator, the first three lines of the code below would be:
slice_max(group_by(blackrock_etf_data, is_esg), order_by = net_assets_usd_mn, n = 5)
Do you know what’s happening there? It’s confusing.
Each line below is connected by a pipe operator |>, and can be read as “and then.”. It feeds each line into the first argument of the next function. The first argument in all tidyverse functions is the data argument.
So glimpse(blackrock_etf_data) is the same as blackrock_etf_data |> glimpse() . Try it out!.
What’s the difference between |> and %>%? R is an evolving language. %>% was introduced as part of the tidyverse in 2014, but was not part of base R – you needed to load library(tidyverse) to use it. In 2022 it was introduced into Base R as |>, and you can use it without loading any package.
Let’s see it in action. Try to read the following code chunk like a story, with the pipe operator being read as “and then”.
mini_blackrock_data <- blackrock_etf_data |># group by whether the fund is an ESG fund or notgroup_by(is_esg) |># take the top 5 from each group, by net assetsslice_max(order_by = net_assets_usd_mn, n =5) |># select the following columns select(ticker, name, asset_class, sub_asset_class, region, incept_date, net_assets_usd_mn, msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |># rename to `co2_intensity` because the full name is a mouthful, if descriptive.rename(co2_intensity = msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |># always good to ungroup() if you've used a group_by(). We'll discuss later.ungroup()
Adding missing grouping variables: `is_esg`
mini_blackrock_data
# A tibble: 10 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGU iShares E… Equity Large/… North… 12/1/16 22377. 104.
2 ESG Fund ESGD iShares E… Equity Large/… Global 6/28/16 6426. 104.
3 ESG Fund ICLN iShares G… Equity All Cap Global 6/24/08 5628. 266.
4 ESG Fund ESGE iShares E… Equity Large/… Global 6/28/16 4234. 168.
5 ESG Fund DSI iShares M… Equity Large/… North… 11/14/… 3689. 72.7
6 Regular Fund IVV iShares C… Equity Large … North… 5/15/00 297663. 148.
7 Regular Fund IEFA iShares C… Equity All Cap Global 10/18/… 84222. 127.
8 Regular Fund AGG iShares C… Fixed … Multi … North… 9/22/03 82344. 283.
9 Regular Fund IJR iShares C… Equity Small … North… 5/22/00 66533. 133.
10 Regular Fund IEMG iShares C… Equity All Cap Global 10/18/… 64920. 369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
# A tibble: 10 × 3
is_esg name net_assets_usd_mn
<chr> <chr> <dbl>
1 ESG Fund iShares ESG Aware MSCI USA ETF 22377.
2 ESG Fund iShares ESG Aware MSCI EAFE ETF 6426.
3 ESG Fund iShares Global Clean Energy ETF 5628.
4 ESG Fund iShares ESG Aware MSCI EM ETF 4234.
5 ESG Fund iShares MSCI KLD 400 Social ETF 3689.
6 Regular Fund iShares Core S&P 500 ETF 297663.
7 Regular Fund iShares Core MSCI EAFE ETF 84222.
8 Regular Fund iShares Core U.S. Aggregate Bond ETF 82344.
9 Regular Fund iShares Core S&P Small-Cap ETF 66533.
10 Regular Fund iShares Core MSCI Emerging Markets ETF 64920.
Look at the Help page for select() by running ?select()
?select()
you’ll see that there are a lot of helper functions that can assist you with selecting columns. For example, we can select all three columns that contain the word “asset” using the following:
mini_blackrock_data |>select(contains("asset"))
# A tibble: 10 × 3
asset_class sub_asset_class net_assets_usd_mn
<chr> <chr> <dbl>
1 Equity Large/Mid Cap 22377.
2 Equity Large/Mid Cap 6426.
3 Equity All Cap 5628.
4 Equity Large/Mid Cap 4234.
5 Equity Large/Mid Cap 3689.
6 Equity Large Cap 297663.
7 Equity All Cap 84222.
8 Fixed Income Multi Sectors 82344.
9 Equity Small Cap 66533.
10 Equity All Cap 64920.
Now You Try
Select the columns with 1) the name of the fund, 2) whether the fund is an ESG fund or not, and 3) the asset class.
Verb 2: filter()
filter() allows you to select the rows (observations) by any criteria that you provide.
Three operators to start with:
Equals: use two equals signs, not one (because, history).
mini_blackrock_data |># note that it uses `==` not `=`filter(is_esg =="ESG Fund")
# A tibble: 5 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGU iShares ESG Aw… Equity Large/… North… 12/1/16 22377. 104.
2 ESG Fund ESGD iShares ESG Aw… Equity Large/… Global 6/28/16 6426. 104.
3 ESG Fund ICLN iShares Global… Equity All Cap Global 6/24/08 5628. 266.
4 ESG Fund ESGE iShares ESG Aw… Equity Large/… Global 6/28/16 4234. 168.
5 ESG Fund DSI iShares MSCI K… Equity Large/… North… 11/14/… 3689. 72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
Greater/Less Than: use > or <
If we want to find the funds with greater than $10 billion USD in assets (10,000 million)
mini_blackrock_data |># column name goes on the leftfilter(net_assets_usd_mn >10000)
# A tibble: 6 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGU iShares ES… Equity Large/… North… 12/1/16 22377. 104.
2 Regular Fund IVV iShares Co… Equity Large … North… 5/15/00 297663. 148.
3 Regular Fund IEFA iShares Co… Equity All Cap Global 10/18/… 84222. 127.
4 Regular Fund AGG iShares Co… Fixed … Multi … North… 9/22/03 82344. 283.
5 Regular Fund IJR iShares Co… Equity Small … North… 5/22/00 66533. 133.
6 Regular Fund IEMG iShares Co… Equity All Cap Global 10/18/… 64920. 369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
And just as easily, we can find the funds with less than $10 billion in assets by using the less-than symbol.
mini_blackrock_data |># column name goes on the leftfilter(net_assets_usd_mn <10000)
# A tibble: 4 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGD iShares ESG Aw… Equity Large/… Global 6/28/16 6426. 104.
2 ESG Fund ICLN iShares Global… Equity All Cap Global 6/24/08 5628. 266.
3 ESG Fund ESGE iShares ESG Aw… Equity Large/… Global 6/28/16 4234. 168.
4 ESG Fund DSI iShares MSCI K… Equity Large/… North… 11/14/… 3689. 72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
Greater/Less than or Equal to: uses the same logic as above, but uses >= for greater than, and <= for less than. Just remember the greater/less than operator on the left, and the = goes on the right side.
mini_blackrock_data |>filter(co2_intensity >=100)
# A tibble: 9 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGU iShares ES… Equity Large/… North… 12/1/16 22377. 104.
2 ESG Fund ESGD iShares ES… Equity Large/… Global 6/28/16 6426. 104.
3 ESG Fund ICLN iShares Gl… Equity All Cap Global 6/24/08 5628. 266.
4 ESG Fund ESGE iShares ES… Equity Large/… Global 6/28/16 4234. 168.
5 Regular Fund IVV iShares Co… Equity Large … North… 5/15/00 297663. 148.
6 Regular Fund IEFA iShares Co… Equity All Cap Global 10/18/… 84222. 127.
7 Regular Fund AGG iShares Co… Fixed … Multi … North… 9/22/03 82344. 283.
8 Regular Fund IJR iShares Co… Equity Small … North… 5/22/00 66533. 133.
9 Regular Fund IEMG iShares Co… Equity All Cap Global 10/18/… 64920. 369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
mini_blackrock_data |>filter(co2_intensity <=100)
# A tibble: 1 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund DSI iShares MSCI K… Equity Large/… North… 11/14/… 3689. 72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
Selecting more than one variable: the syntax here isn’t intuitive, so just remember that this exists for now.
Let’s say we want to select three funds: ESGU, IVV, and DSI. Here’s how we do it:
# A tibble: 3 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund ESGU iShares ES… Equity Large/… North… 12/1/16 22377. 104.
2 ESG Fund DSI iShares MS… Equity Large/… North… 11/14/… 3689. 72.7
3 Regular Fund IVV iShares Co… Equity Large … North… 5/15/00 297663. 148.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
Take a look at the help page for filter()
?filter()
Help on topic 'filter' was found in the following packages:
Package Library
stats /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
dplyr /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
Using the first match ...
Now You Try:
Try filtering for the funds that are 1) ESG Funds, that 2) are fixed income funds.
Verb 3: mutate()
mutate() allows you to add new variables that are functions of existing variables
We’ll show two use cases here.
First, mutate can be helpful for coercing your data into the right data type that allows you to do cool stuff in R. For example, the inception date in our dataset was read in as a character vector (or a “string”). Once we turn it into a date object, we can do some cool stuff with it that will help our analysis
mini_blackrock_data |>#select the columns we wantselect(name, is_esg, incept_date)
# A tibble: 10 × 3
name is_esg incept_date
<chr> <chr> <chr>
1 iShares ESG Aware MSCI USA ETF ESG Fund 12/1/16
2 iShares ESG Aware MSCI EAFE ETF ESG Fund 6/28/16
3 iShares Global Clean Energy ETF ESG Fund 6/24/08
4 iShares ESG Aware MSCI EM ETF ESG Fund 6/28/16
5 iShares MSCI KLD 400 Social ETF ESG Fund 11/14/06
6 iShares Core S&P 500 ETF Regular Fund 5/15/00
7 iShares Core MSCI EAFE ETF Regular Fund 10/18/12
8 iShares Core U.S. Aggregate Bond ETF Regular Fund 9/22/03
9 iShares Core S&P Small-Cap ETF Regular Fund 5/22/00
10 iShares Core MSCI Emerging Markets ETF Regular Fund 10/18/12
mini_blackrock_data |>#select the columns we wantselect(name, is_esg, incept_date) |># use mdy() from the lubridate package to turn the character string into a date objectmutate(incept_date = lubridate::mdy(incept_date))
# A tibble: 10 × 3
name is_esg incept_date
<chr> <chr> <date>
1 iShares ESG Aware MSCI USA ETF ESG Fund 2016-12-01
2 iShares ESG Aware MSCI EAFE ETF ESG Fund 2016-06-28
3 iShares Global Clean Energy ETF ESG Fund 2008-06-24
4 iShares ESG Aware MSCI EM ETF ESG Fund 2016-06-28
5 iShares MSCI KLD 400 Social ETF ESG Fund 2006-11-14
6 iShares Core S&P 500 ETF Regular Fund 2000-05-15
7 iShares Core MSCI EAFE ETF Regular Fund 2012-10-18
8 iShares Core U.S. Aggregate Bond ETF Regular Fund 2003-09-22
9 iShares Core S&P Small-Cap ETF Regular Fund 2000-05-22
10 iShares Core MSCI Emerging Markets ETF Regular Fund 2012-10-18
Now we can calculate:
The year the fund was launched: This can be useful so you can aggregate how many funds have been launched each year.
Years since the fund was launched: Are differences in assets related to how long a fund has been around? This will help you answer that question.
mini_blackrock_data |>#select the columns we wantselect(name, is_esg, incept_date) |># use mdy() from the lubridate package to turn the character string into a date objectmutate(incept_date = lubridate::mdy(incept_date),# extract the year from the dateincept_year = lubridate::year(incept_date),# calculate how many years since the fund was launched.years_since_incept = lubridate::interval(incept_date, Sys.Date())/lubridate::years(1))
# A tibble: 10 × 5
name is_esg incept_d…¹ incep…² years…³
<chr> <chr> <date> <dbl> <dbl>
1 iShares ESG Aware MSCI USA ETF ESG Fund 2016-12-01 2016 6.17
2 iShares ESG Aware MSCI EAFE ETF ESG Fund 2016-06-28 2016 6.60
3 iShares Global Clean Energy ETF ESG Fund 2008-06-24 2008 14.6
4 iShares ESG Aware MSCI EM ETF ESG Fund 2016-06-28 2016 6.60
5 iShares MSCI KLD 400 Social ETF ESG Fund 2006-11-14 2006 16.2
6 iShares Core S&P 500 ETF Regular Fu… 2000-05-15 2000 22.7
7 iShares Core MSCI EAFE ETF Regular Fu… 2012-10-18 2012 10.3
8 iShares Core U.S. Aggregate Bond ETF Regular Fu… 2003-09-22 2003 19.4
9 iShares Core S&P Small-Cap ETF Regular Fu… 2000-05-22 2000 22.7
10 iShares Core MSCI Emerging Markets ETF Regular Fu… 2012-10-18 2012 10.3
# … with abbreviated variable names ¹incept_date, ²incept_year,
# ³years_since_incept
Take a look at the help page for mutate()
?mutate()
Now You Try:
Since the numbers are so large, it would be a lot more useful to have the net assets number in billions. Create net_assets_usd_bn from net_assets_usd_mn.
Verb 4: arrange()
arrange() allows you to order your data by one or more columns.
Let’s order our funds by their carbon intensity:
mini_blackrock_data |>arrange(co2_intensity)
# A tibble: 10 × 9
is_esg ticker name asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ESG Fund DSI iShares M… Equity Large/… North… 11/14/… 3689. 72.7
2 ESG Fund ESGU iShares E… Equity Large/… North… 12/1/16 22377. 104.
3 ESG Fund ESGD iShares E… Equity Large/… Global 6/28/16 6426. 104.
4 Regular Fund IEFA iShares C… Equity All Cap Global 10/18/… 84222. 127.
5 Regular Fund IJR iShares C… Equity Small … North… 5/22/00 66533. 133.
6 Regular Fund IVV iShares C… Equity Large … North… 5/15/00 297663. 148.
7 ESG Fund ESGE iShares E… Equity Large/… Global 6/28/16 4234. 168.
8 ESG Fund ICLN iShares G… Equity All Cap Global 6/24/08 5628. 266.
9 Regular Fund AGG iShares C… Fixed … Multi … North… 9/22/03 82344. 283.
10 Regular Fund IEMG iShares C… Equity All Cap Global 10/18/… 64920. 369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
# ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity
By default, it sorts from low to high. What if we want to sort by the most carbon intensive funds (high to low)?
What is the median (hint: median()) carbon intensity of the funds?
Verb 6: group_by()
group_by() pairs well with summarize(). What if we want to know the sum & average size of funds grouped by whether or not they are ESG funds? Here’s how we do it.
# A tibble: 2 × 3
is_esg sum_assets_mn avg_assets_mn
<chr> <dbl> <dbl>
1 ESG Fund 42353. 8471.
2 Regular Fund 595683. 119137.
Now that’s a lot more interesting!
Look at the help page for group_by()
?group_by()
Now You Try
Calculate the average carbon intensity of funds by asset class and sub asset class.
Homework
Make 3 data visualizations using this data (either the mini dataset or the full one) using the ggplot2 skills you learn in Chapter 3 of R for Data Science 2e.