Examining BlackRock’s ESG ETF Business: An Applied Introduction to Data Analysis in R

Author

Teal Emery

Introduction

BlackRock is the world’s largest asset manager. They have been an enthusiastic proponent of ESG investing. BlackRock caters to both institutional and retail investors. Their iShares platform provides Exchange Traded Funds (ETFs) and mutual funds targeted predominantly at retail investors. This is meaningful because this is how many ordinary investors will invest in ESG funds.

The dataset we are using here is a cleaned-up version of BlackRock’s publicly available ETF screener.

Objective

Our objective is to start to build up your toolkit for analyzing data in R using data relevant to understanding sustainable finance. This complements the work you do in R for Data Science 2e, and introduces you to a relevant dataset that you can use to practice what you learn in your textbook.

This exercise has two parts.

Setup: You’ll learn how to set up your R session and load the BlackRock ESG ETF dataset from GitHub.
Making Sense of Your Data: You’ll learn six core functions that will enable you to start transforming your data into actionable insights.

Part I: Setup

Load R Packages

At the start of any R session, load the packages you are going to use. Here, we are just using the tidyverse, which automatically loads a set of packages useful for working with your data.

If you haven’t installed the package already, you must install it before loading it.

# if you haven't installed the tidyverse yet, uncomment the next line and install it before loading the library: 
# install.packages("tidyverse")

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.1 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Load our data

We can read the .csv file directly from GitHub using read_csv().

# assign the url to `github_raw_csv_url`
github_raw_csv_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/blackrock_etf_screener_2022-08-30.csv"

# read in the data, and assign it to the object `blackrock_etf_data`
blackrock_etf_data <- read_csv(github_raw_csv_url)

Rows: 393 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): ticker, name, incept_date, net_assets_as_of, asset_class, sub_asse...
dbl  (8): gross_expense_ratio_percent, net_expense_ratio_percent, net_assets...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# print out the tibble to take a look at the data
blackrock_etf_data

# A tibble: 393 × 22
   ticker name    incep…¹ gross…² net_e…³ net_a…⁴ net_a…⁵ asset…⁶ sub_a…⁷ region
   <chr>  <chr>   <chr>     <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr> 
 1 IVV    iShare… 5/15/00    0.03    0.03 297663. 8/30/22 Equity  Large … North…
 2 IEFA   iShare… 10/18/…    0.07    0.07  84222. 8/30/22 Equity  All Cap Global
 3 AGG    iShare… 9/22/03    0.04    0.03  82344. 8/30/22 Fixed … Multi … North…
 4 IJR    iShare… 5/22/00    0.06    0.06  66533. 8/30/22 Equity  Small … North…
 5 IEMG   iShare… 10/18/…    0.09    0.09  64920. 8/30/22 Equity  All Cap Global
 6 IWF    iShare… 5/22/00    0.18    0.18  61831. 8/30/22 Equity  Large/… North…
 7 IJH    iShare… 5/22/00    0.05    0.05  61424. 8/30/22 Equity  Mid Cap North…
 8 IWM    iShare… 5/22/00    0.19    0.19  53048. 8/30/22 Equity  Small … North…
 9 IWD    iShare… 5/22/00    0.18    0.18  51913. 8/30/22 Equity  Large/… North…
10 EFA    iShare… 8/14/01    0.32    0.32  44144. 8/30/22 Equity  Large/… Global
# … with 383 more rows, 12 more variables: market <chr>, location <chr>,
#   investment_style <chr>,
#   sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc <chr>,
#   msci_esg_quality_score_0_10 <dbl>,
#   msci_weighted_average_carbon_intensity_tons_co2e_m_sales <dbl>,
#   msci_esg_percent_coverage <dbl>, sustainable_classification <chr>,
#   name_wo_ishares_etf <chr>, is_esg <chr>, years_from_inception <dbl>, …

Looking at our data in more detail

When we print the data in our markdown document, we can see basic information, and scroll through the rows and columns of the data.

`View()`

If you’re used to using spreadsheet programs like Excel, View() will provide you a familiar way to look at the data. It will open up a spreadsheet-like interface that you can scroll through with your mouse.

# Uncomment (delete the #) to run the code below
#View(blackrock_etf_data)

`dplyr::glimpse()`

package::function() notation

When you see ::, as above, it means that the function glimpse() is from the dplyr package. You can run the function just using the glimpse() command if you’ve already loaded the tidyverse packages (dplyr is part of the tidyverse) using library(tidyverse) as we’ve done above. If not, you can run it by specifying the whole thing dplyr::glimpse()

glimpse() is a great way to see all the variables and their data types in a compact manner.

glimpse(blackrock_etf_data)

Rows: 393
Columns: 22
$ ticker                                                                            <chr> …
$ name                                                                              <chr> …
$ incept_date                                                                       <chr> …
$ gross_expense_ratio_percent                                                       <dbl> …
$ net_expense_ratio_percent                                                         <dbl> …
$ net_assets_usd_mn                                                                 <dbl> …
$ net_assets_as_of                                                                  <chr> …
$ asset_class                                                                       <chr> …
$ sub_asset_class                                                                   <chr> …
$ region                                                                            <chr> …
$ market                                                                            <chr> …
$ location                                                                          <chr> …
$ investment_style                                                                  <chr> …
$ sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc <chr> …
$ msci_esg_quality_score_0_10                                                       <dbl> …
$ msci_weighted_average_carbon_intensity_tons_co2e_m_sales                          <dbl> …
$ msci_esg_percent_coverage                                                         <dbl> …
$ sustainable_classification                                                        <chr> …
$ name_wo_ishares_etf                                                               <chr> …
$ is_esg                                                                            <chr> …
$ years_from_inception                                                              <dbl> …
$ year_launched                                                                     <dbl> …

Our dataset has 393 observations (rows), and 22 variables (columns).

Part II: The Secret to Making Sense of Your Data

The Tidyverse ecosystem provides a coherent set of R packages that make it easy to go from raw data to actionable insights. When you load library(tidyverse) at the beginning of a session in R, it automatically loads eight packages. Today, we’re starting with dplyr.

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

mutate() adds new variables that are functions of existing variables

select() picks variables based on their names.

filter() picks cases based on their values.

summarise() reduces multiple values down to a single summary.

arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation "by group". You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

Learn these six verbs of data manipulation in R, and you’re well on your way to becoming a competent data analyst.

Take a minute and look at the Tidyverse documentation

The Tidyverse documentation is excellent. Click on the links above and explore. Getting to know these functions well early in this class will enable you to do cool stuff faster. Well worth your time.

Creating a Mini Dataset for Practice

Below, we will use a dplyr pipeline to make a mini dataset to practice on. Don’t worry about understanding all the code right now. The code is annotated, so you can get a sense of what each line is doing.

The Pipe Operators `|>` and `%>%`

Have you ever had to navigate nested functions in Excel? It’s annoying and causes errors.

Without the pipe operator, the first three lines of the code below would be:

slice_max(group_by(blackrock_etf_data, is_esg), order_by = net_assets_usd_mn, n = 5)

Do you know what’s happening there? It’s confusing.

Each line below is connected by a pipe operator |>, and can be read as “and then.”. It feeds each line into the first argument of the next function. The first argument in all tidyverse functions is the data argument.

So glimpse(blackrock_etf_data) is the same as blackrock_etf_data |> glimpse() . Try it out!.

What’s the difference between |> and %>%? R is an evolving language. %>% was introduced as part of the tidyverse in 2014, but was not part of base R – you needed to load library(tidyverse) to use it. In 2022 it was introduced into Base R as |>, and you can use it without loading any package.

Let’s see it in action. Try to read the following code chunk like a story, with the pipe operator being read as “and then”.

mini_blackrock_data <- blackrock_etf_data |> 
  # group by whether the fund is an ESG fund or not
  group_by(is_esg) |> 
  # take the top 5 from each group, by net assets
  slice_max(order_by = net_assets_usd_mn, n = 5) |> 
  # select the following columns 
  select(ticker, name, asset_class, sub_asset_class, region, incept_date, net_assets_usd_mn,
         msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |> 
  # rename to `co2_intensity` because the full name is a mouthful, if descriptive.
  rename(co2_intensity = msci_weighted_average_carbon_intensity_tons_co2e_m_sales) |> 
  # always good to ungroup() if you've used a group_by().  We'll discuss later.
  ungroup()

Adding missing grouping variables: `is_esg`

mini_blackrock_data

# A tibble: 10 × 9
   is_esg       ticker name       asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
   <chr>        <chr>  <chr>      <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
 1 ESG Fund     ESGU   iShares E… Equity  Large/… North… 12/1/16  22377.   104. 
 2 ESG Fund     ESGD   iShares E… Equity  Large/… Global 6/28/16   6426.   104. 
 3 ESG Fund     ICLN   iShares G… Equity  All Cap Global 6/24/08   5628.   266. 
 4 ESG Fund     ESGE   iShares E… Equity  Large/… Global 6/28/16   4234.   168. 
 5 ESG Fund     DSI    iShares M… Equity  Large/… North… 11/14/…   3689.    72.7
 6 Regular Fund IVV    iShares C… Equity  Large … North… 5/15/00 297663.   148. 
 7 Regular Fund IEFA   iShares C… Equity  All Cap Global 10/18/…  84222.   127. 
 8 Regular Fund AGG    iShares C… Fixed … Multi … North… 9/22/03  82344.   283. 
 9 Regular Fund IJR    iShares C… Equity  Small … North… 5/22/00  66533.   133. 
10 Regular Fund IEMG   iShares C… Equity  All Cap Global 10/18/…  64920.   369. 
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Let’s look at our data using glimpse()

mini_blackrock_data |> glimpse()

Rows: 10
Columns: 9
$ is_esg            <chr> "ESG Fund", "ESG Fund", "ESG Fund", "ESG Fund", "ESG…
$ ticker            <chr> "ESGU", "ESGD", "ICLN", "ESGE", "DSI", "IVV", "IEFA"…
$ name              <chr> "iShares ESG Aware MSCI USA ETF", "iShares ESG Aware…
$ asset_class       <chr> "Equity", "Equity", "Equity", "Equity", "Equity", "E…
$ sub_asset_class   <chr> "Large/Mid Cap", "Large/Mid Cap", "All Cap", "Large/…
$ region            <chr> "North America", "Global", "Global", "Global", "Nort…
$ incept_date       <chr> "12/1/16", "6/28/16", "6/24/08", "6/28/16", "11/14/0…
$ net_assets_usd_mn <dbl> 22376.688, 6425.673, 5628.011, 4233.544, 3688.930, 2…
$ co2_intensity     <dbl> 103.60, 103.83, 265.82, 167.71, 72.73, 148.34, 126.6…

Verb 1: `select()`

select() allows you to choose and re-order columns (variables).

If you want to compare the assets of ESG funds versus non-ESG funds, you might want to just select a few variables.

mini_blackrock_data |> 
  select(is_esg, name, net_assets_usd_mn)

# A tibble: 10 × 3
   is_esg       name                                   net_assets_usd_mn
   <chr>        <chr>                                              <dbl>
 1 ESG Fund     iShares ESG Aware MSCI USA ETF                    22377.
 2 ESG Fund     iShares ESG Aware MSCI EAFE ETF                    6426.
 3 ESG Fund     iShares Global Clean Energy ETF                    5628.
 4 ESG Fund     iShares ESG Aware MSCI EM ETF                      4234.
 5 ESG Fund     iShares MSCI KLD 400 Social ETF                    3689.
 6 Regular Fund iShares Core S&P 500 ETF                         297663.
 7 Regular Fund iShares Core MSCI EAFE ETF                        84222.
 8 Regular Fund iShares Core U.S. Aggregate Bond ETF              82344.
 9 Regular Fund iShares Core S&P Small-Cap ETF                    66533.
10 Regular Fund iShares Core MSCI Emerging Markets ETF            64920.

Look at the Help page for select() by running ?select()

?select()

you’ll see that there are a lot of helper functions that can assist you with selecting columns. For example, we can select all three columns that contain the word “asset” using the following:

mini_blackrock_data |> 
  select(contains("asset"))

# A tibble: 10 × 3
   asset_class  sub_asset_class net_assets_usd_mn
   <chr>        <chr>                       <dbl>
 1 Equity       Large/Mid Cap              22377.
 2 Equity       Large/Mid Cap               6426.
 3 Equity       All Cap                     5628.
 4 Equity       Large/Mid Cap               4234.
 5 Equity       Large/Mid Cap               3689.
 6 Equity       Large Cap                 297663.
 7 Equity       All Cap                    84222.
 8 Fixed Income Multi Sectors              82344.
 9 Equity       Small Cap                  66533.
10 Equity       All Cap                    64920.

Now You Try

Select the columns with 1) the name of the fund, 2) whether the fund is an ESG fund or not, and 3) the asset class.

Verb 2: `filter()`

filter() allows you to select the rows (observations) by any criteria that you provide.

Three operators to start with:

Equals: use two equals signs, not one (because, history).

mini_blackrock_data |>
  # note that it uses `==` not `=`
  filter(is_esg == "ESG Fund")

# A tibble: 5 × 9
  is_esg   ticker name            asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>    <chr>  <chr>           <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund ESGU   iShares ESG Aw… Equity  Large/… North… 12/1/16  22377.   104. 
2 ESG Fund ESGD   iShares ESG Aw… Equity  Large/… Global 6/28/16   6426.   104. 
3 ESG Fund ICLN   iShares Global… Equity  All Cap Global 6/24/08   5628.   266. 
4 ESG Fund ESGE   iShares ESG Aw… Equity  Large/… Global 6/28/16   4234.   168. 
5 ESG Fund DSI    iShares MSCI K… Equity  Large/… North… 11/14/…   3689.    72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Greater/Less Than: use > or <

If we want to find the funds with greater than $10 billion USD in assets (10,000 million)

mini_blackrock_data |>
  # column name goes on the left
  filter(net_assets_usd_mn > 10000)

# A tibble: 6 × 9
  is_esg       ticker name        asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>        <chr>  <chr>       <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund     ESGU   iShares ES… Equity  Large/… North… 12/1/16  22377.    104.
2 Regular Fund IVV    iShares Co… Equity  Large … North… 5/15/00 297663.    148.
3 Regular Fund IEFA   iShares Co… Equity  All Cap Global 10/18/…  84222.    127.
4 Regular Fund AGG    iShares Co… Fixed … Multi … North… 9/22/03  82344.    283.
5 Regular Fund IJR    iShares Co… Equity  Small … North… 5/22/00  66533.    133.
6 Regular Fund IEMG   iShares Co… Equity  All Cap Global 10/18/…  64920.    369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

And just as easily, we can find the funds with less than $10 billion in assets by using the less-than symbol.

mini_blackrock_data |>
  # column name goes on the left
  filter(net_assets_usd_mn < 10000)

# A tibble: 4 × 9
  is_esg   ticker name            asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>    <chr>  <chr>           <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund ESGD   iShares ESG Aw… Equity  Large/… Global 6/28/16   6426.   104. 
2 ESG Fund ICLN   iShares Global… Equity  All Cap Global 6/24/08   5628.   266. 
3 ESG Fund ESGE   iShares ESG Aw… Equity  Large/… Global 6/28/16   4234.   168. 
4 ESG Fund DSI    iShares MSCI K… Equity  Large/… North… 11/14/…   3689.    72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Greater/Less than or Equal to: uses the same logic as above, but uses >= for greater than, and <= for less than. Just remember the greater/less than operator on the left, and the = goes on the right side.

mini_blackrock_data |>
  filter(co2_intensity >= 100)

# A tibble: 9 × 9
  is_esg       ticker name        asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>        <chr>  <chr>       <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund     ESGU   iShares ES… Equity  Large/… North… 12/1/16  22377.    104.
2 ESG Fund     ESGD   iShares ES… Equity  Large/… Global 6/28/16   6426.    104.
3 ESG Fund     ICLN   iShares Gl… Equity  All Cap Global 6/24/08   5628.    266.
4 ESG Fund     ESGE   iShares ES… Equity  Large/… Global 6/28/16   4234.    168.
5 Regular Fund IVV    iShares Co… Equity  Large … North… 5/15/00 297663.    148.
6 Regular Fund IEFA   iShares Co… Equity  All Cap Global 10/18/…  84222.    127.
7 Regular Fund AGG    iShares Co… Fixed … Multi … North… 9/22/03  82344.    283.
8 Regular Fund IJR    iShares Co… Equity  Small … North… 5/22/00  66533.    133.
9 Regular Fund IEMG   iShares Co… Equity  All Cap Global 10/18/…  64920.    369.
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

mini_blackrock_data |>
  filter(co2_intensity <= 100)

# A tibble: 1 × 9
  is_esg   ticker name            asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>    <chr>  <chr>           <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund DSI    iShares MSCI K… Equity  Large/… North… 11/14/…   3689.    72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Selecting more than one variable: the syntax here isn’t intuitive, so just remember that this exists for now.

Let’s say we want to select three funds: ESGU, IVV, and DSI. Here’s how we do it:

mini_blackrock_data |> 
  filter(ticker %in% c("ESGU", "IVV", "DSI"))

# A tibble: 3 × 9
  is_esg       ticker name        asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>        <chr>  <chr>       <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund     ESGU   iShares ES… Equity  Large/… North… 12/1/16  22377.   104. 
2 ESG Fund     DSI    iShares MS… Equity  Large/… North… 11/14/…   3689.    72.7
3 Regular Fund IVV    iShares Co… Equity  Large … North… 5/15/00 297663.   148. 
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Sometimes it can be easier to define the vector of things you want to select for first. This is equivalent to the code above:

tickers_i_want <- c("ESGU", "IVV", "DSI")

mini_blackrock_data |> 
  filter(ticker %in% tickers_i_want)

# A tibble: 3 × 9
  is_esg       ticker name        asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
  <chr>        <chr>  <chr>       <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
1 ESG Fund     ESGU   iShares ES… Equity  Large/… North… 12/1/16  22377.   104. 
2 ESG Fund     DSI    iShares MS… Equity  Large/… North… 11/14/…   3689.    72.7
3 Regular Fund IVV    iShares Co… Equity  Large … North… 5/15/00 297663.   148. 
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Take a look at the help page for filter()

?filter()

Help on topic 'filter' was found in the following packages:

  Package               Library
  stats                 /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
  dplyr                 /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library


Using the first match ...

Now You Try:

Try filtering for the funds that are 1) ESG Funds, that 2) are fixed income funds.

Verb 3: `mutate()`

mutate() allows you to add new variables that are functions of existing variables

We’ll show two use cases here.

First, mutate can be helpful for coercing your data into the right data type that allows you to do cool stuff in R. For example, the inception date in our dataset was read in as a character vector (or a “string”). Once we turn it into a date object, we can do some cool stuff with it that will help our analysis

mini_blackrock_data |> 
  #select the columns we want
  select(name, is_esg, incept_date)

# A tibble: 10 × 3
   name                                   is_esg       incept_date
   <chr>                                  <chr>        <chr>      
 1 iShares ESG Aware MSCI USA ETF         ESG Fund     12/1/16    
 2 iShares ESG Aware MSCI EAFE ETF        ESG Fund     6/28/16    
 3 iShares Global Clean Energy ETF        ESG Fund     6/24/08    
 4 iShares ESG Aware MSCI EM ETF          ESG Fund     6/28/16    
 5 iShares MSCI KLD 400 Social ETF        ESG Fund     11/14/06   
 6 iShares Core S&P 500 ETF               Regular Fund 5/15/00    
 7 iShares Core MSCI EAFE ETF             Regular Fund 10/18/12   
 8 iShares Core U.S. Aggregate Bond ETF   Regular Fund 9/22/03    
 9 iShares Core S&P Small-Cap ETF         Regular Fund 5/22/00    
10 iShares Core MSCI Emerging Markets ETF Regular Fund 10/18/12

mini_blackrock_data |> 
  #select the columns we want
  select(name, is_esg, incept_date) |> 
  # use mdy() from the lubridate package to turn the character string into a date object
  mutate(incept_date = lubridate::mdy(incept_date))

# A tibble: 10 × 3
   name                                   is_esg       incept_date
   <chr>                                  <chr>        <date>     
 1 iShares ESG Aware MSCI USA ETF         ESG Fund     2016-12-01 
 2 iShares ESG Aware MSCI EAFE ETF        ESG Fund     2016-06-28 
 3 iShares Global Clean Energy ETF        ESG Fund     2008-06-24 
 4 iShares ESG Aware MSCI EM ETF          ESG Fund     2016-06-28 
 5 iShares MSCI KLD 400 Social ETF        ESG Fund     2006-11-14 
 6 iShares Core S&P 500 ETF               Regular Fund 2000-05-15 
 7 iShares Core MSCI EAFE ETF             Regular Fund 2012-10-18 
 8 iShares Core U.S. Aggregate Bond ETF   Regular Fund 2003-09-22 
 9 iShares Core S&P Small-Cap ETF         Regular Fund 2000-05-22 
10 iShares Core MSCI Emerging Markets ETF Regular Fund 2012-10-18

Now we can calculate:

The year the fund was launched: This can be useful so you can aggregate how many funds have been launched each year.
Years since the fund was launched: Are differences in assets related to how long a fund has been around? This will help you answer that question.

mini_blackrock_data |> 
  #select the columns we want
  select(name, is_esg, incept_date) |> 
  # use mdy() from the lubridate package to turn the character string into a date object
  mutate(incept_date = lubridate::mdy(incept_date),
         # extract the year from the date
         incept_year = lubridate::year(incept_date),
         # calculate how many years since the fund was launched.
         years_since_incept = lubridate::interval(incept_date, Sys.Date())/lubridate::years(1))

# A tibble: 10 × 5
   name                                   is_esg      incept_d…¹ incep…² years…³
   <chr>                                  <chr>       <date>       <dbl>   <dbl>
 1 iShares ESG Aware MSCI USA ETF         ESG Fund    2016-12-01    2016    6.17
 2 iShares ESG Aware MSCI EAFE ETF        ESG Fund    2016-06-28    2016    6.60
 3 iShares Global Clean Energy ETF        ESG Fund    2008-06-24    2008   14.6 
 4 iShares ESG Aware MSCI EM ETF          ESG Fund    2016-06-28    2016    6.60
 5 iShares MSCI KLD 400 Social ETF        ESG Fund    2006-11-14    2006   16.2 
 6 iShares Core S&P 500 ETF               Regular Fu… 2000-05-15    2000   22.7 
 7 iShares Core MSCI EAFE ETF             Regular Fu… 2012-10-18    2012   10.3 
 8 iShares Core U.S. Aggregate Bond ETF   Regular Fu… 2003-09-22    2003   19.4 
 9 iShares Core S&P Small-Cap ETF         Regular Fu… 2000-05-22    2000   22.7 
10 iShares Core MSCI Emerging Markets ETF Regular Fu… 2012-10-18    2012   10.3 
# … with abbreviated variable names ¹incept_date, ²incept_year,
#   ³years_since_incept

Take a look at the help page for mutate()

?mutate()

Now You Try:

Since the numbers are so large, it would be a lot more useful to have the net assets number in billions. Create net_assets_usd_bn from net_assets_usd_mn.

Verb 4: `arrange()`

arrange() allows you to order your data by one or more columns.

Let’s order our funds by their carbon intensity:

mini_blackrock_data |> 
  arrange(co2_intensity)

# A tibble: 10 × 9
   is_esg       ticker name       asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
   <chr>        <chr>  <chr>      <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
 1 ESG Fund     DSI    iShares M… Equity  Large/… North… 11/14/…   3689.    72.7
 2 ESG Fund     ESGU   iShares E… Equity  Large/… North… 12/1/16  22377.   104. 
 3 ESG Fund     ESGD   iShares E… Equity  Large/… Global 6/28/16   6426.   104. 
 4 Regular Fund IEFA   iShares C… Equity  All Cap Global 10/18/…  84222.   127. 
 5 Regular Fund IJR    iShares C… Equity  Small … North… 5/22/00  66533.   133. 
 6 Regular Fund IVV    iShares C… Equity  Large … North… 5/15/00 297663.   148. 
 7 ESG Fund     ESGE   iShares E… Equity  Large/… Global 6/28/16   4234.   168. 
 8 ESG Fund     ICLN   iShares G… Equity  All Cap Global 6/24/08   5628.   266. 
 9 Regular Fund AGG    iShares C… Fixed … Multi … North… 9/22/03  82344.   283. 
10 Regular Fund IEMG   iShares C… Equity  All Cap Global 10/18/…  64920.   369. 
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

By default, it sorts from low to high. What if we want to sort by the most carbon intensive funds (high to low)?

mini_blackrock_data |> 
  arrange(co2_intensity |> desc())

# A tibble: 10 × 9
   is_esg       ticker name       asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
   <chr>        <chr>  <chr>      <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
 1 Regular Fund IEMG   iShares C… Equity  All Cap Global 10/18/…  64920.   369. 
 2 Regular Fund AGG    iShares C… Fixed … Multi … North… 9/22/03  82344.   283. 
 3 ESG Fund     ICLN   iShares G… Equity  All Cap Global 6/24/08   5628.   266. 
 4 ESG Fund     ESGE   iShares E… Equity  Large/… Global 6/28/16   4234.   168. 
 5 Regular Fund IVV    iShares C… Equity  Large … North… 5/15/00 297663.   148. 
 6 Regular Fund IJR    iShares C… Equity  Small … North… 5/22/00  66533.   133. 
 7 Regular Fund IEFA   iShares C… Equity  All Cap Global 10/18/…  84222.   127. 
 8 ESG Fund     ESGD   iShares E… Equity  Large/… Global 6/28/16   6426.   104. 
 9 ESG Fund     ESGU   iShares E… Equity  Large/… North… 12/1/16  22377.   104. 
10 ESG Fund     DSI    iShares M… Equity  Large/… North… 11/14/…   3689.    72.7
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

We can also sort by more than one column. Let’s sort carbon intensity for ESG funds and for non-ESG funds:

mini_blackrock_data |> 
  arrange(is_esg,co2_intensity |> desc())

# A tibble: 10 × 9
   is_esg       ticker name       asset…¹ sub_a…² region incep…³ net_a…⁴ co2_i…⁵
   <chr>        <chr>  <chr>      <chr>   <chr>   <chr>  <chr>     <dbl>   <dbl>
 1 ESG Fund     ICLN   iShares G… Equity  All Cap Global 6/24/08   5628.   266. 
 2 ESG Fund     ESGE   iShares E… Equity  Large/… Global 6/28/16   4234.   168. 
 3 ESG Fund     ESGD   iShares E… Equity  Large/… Global 6/28/16   6426.   104. 
 4 ESG Fund     ESGU   iShares E… Equity  Large/… North… 12/1/16  22377.   104. 
 5 ESG Fund     DSI    iShares M… Equity  Large/… North… 11/14/…   3689.    72.7
 6 Regular Fund IEMG   iShares C… Equity  All Cap Global 10/18/…  64920.   369. 
 7 Regular Fund AGG    iShares C… Fixed … Multi … North… 9/22/03  82344.   283. 
 8 Regular Fund IVV    iShares C… Equity  Large … North… 5/15/00 297663.   148. 
 9 Regular Fund IJR    iShares C… Equity  Small … North… 5/22/00  66533.   133. 
10 Regular Fund IEFA   iShares C… Equity  All Cap Global 10/18/…  84222.   127. 
# … with abbreviated variable names ¹asset_class, ²sub_asset_class,
#   ³incept_date, ⁴net_assets_usd_mn, ⁵co2_intensity

Interesting fact: The clean energy fund is the most carbon intensive ESG fund. That’s something to look into!

Look at the help page for arrange()

?arrange()

Now You Try

Arrange the funds by asset class and the size of assets. What is the second largest fixed income fund?

Verb 5: `summarize()`

summarize() allows you to calculate summary statistics from your data.

What is the sum and the average size of our funds?

mini_blackrock_data |> 
  summarize(sum_assets_mn = sum(net_assets_usd_mn),
            avg_assets_mn = mean(net_assets_usd_mn))

# A tibble: 1 × 2
  sum_assets_mn avg_assets_mn
          <dbl>         <dbl>
1       638036.        63804.

Look at the help page for summarize()

?summarize()

Now You Try

What is the median (hint: median()) carbon intensity of the funds?

Verb 6: `group_by()`

group_by() pairs well with summarize(). What if we want to know the sum & average size of funds grouped by whether or not they are ESG funds? Here’s how we do it.

mini_blackrock_data |> 
  group_by(is_esg) |> 
  summarize(sum_assets_mn = sum(net_assets_usd_mn),
            avg_assets_mn = mean(net_assets_usd_mn))

# A tibble: 2 × 3
  is_esg       sum_assets_mn avg_assets_mn
  <chr>                <dbl>         <dbl>
1 ESG Fund            42353.         8471.
2 Regular Fund       595683.       119137.

Now that’s a lot more interesting!

Look at the help page for group_by()

?group_by()

Now You Try

Calculate the average carbon intensity of funds by asset class and sub asset class.

Homework

Make 3 data visualizations using this data (either the mini dataset or the full one) using the ggplot2 skills you learn in Chapter 3 of R for Data Science 2e.

Introduction

Objective

Part I: Setup

Load R Packages

Load our data

Looking at our data in more detail

View()

dplyr::glimpse()

Part II: The Secret to Making Sense of Your Data

Creating a Mini Dataset for Practice

Verb 1: select()

Verb 2: filter()

Verb 3: mutate()

Verb 4: arrange()

Verb 5: summarize()

Verb 6: group_by()

Homework

`View()`

`dplyr::glimpse()`

Verb 1: `select()`

Verb 2: `filter()`

Verb 3: `mutate()`

Verb 4: `arrange()`

Verb 5: `summarize()`

Verb 6: `group_by()`