week 3 data assignment

Author

Jenny Park

Week 3 Data Assignment

#Part 1

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
# assign the url 
github_raw_csv_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/blackrock_etf_screener_2022-08-30.csv"

# read data 
blackrock_etf_data <- read_csv(github_raw_csv_url)
Rows: 393 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): ticker, name, incept_date, net_assets_as_of, asset_class, sub_asse...
dbl  (8): gross_expense_ratio_percent, net_expense_ratio_percent, net_assets...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
blackrock_etf_data
# A tibble: 393 × 22
   ticker name    incep…¹ gross…² net_e…³ net_a…⁴ net_a…⁵ asset…⁶ sub_a…⁷ region
   <chr>  <chr>   <chr>     <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr> 
 1 IVV    iShare… 5/15/00    0.03    0.03 297663. 8/30/22 Equity  Large … North…
 2 IEFA   iShare… 10/18/…    0.07    0.07  84222. 8/30/22 Equity  All Cap Global
 3 AGG    iShare… 9/22/03    0.04    0.03  82344. 8/30/22 Fixed … Multi … North…
 4 IJR    iShare… 5/22/00    0.06    0.06  66533. 8/30/22 Equity  Small … North…
 5 IEMG   iShare… 10/18/…    0.09    0.09  64920. 8/30/22 Equity  All Cap Global
 6 IWF    iShare… 5/22/00    0.18    0.18  61831. 8/30/22 Equity  Large/… North…
 7 IJH    iShare… 5/22/00    0.05    0.05  61424. 8/30/22 Equity  Mid Cap North…
 8 IWM    iShare… 5/22/00    0.19    0.19  53048. 8/30/22 Equity  Small … North…
 9 IWD    iShare… 5/22/00    0.18    0.18  51913. 8/30/22 Equity  Large/… North…
10 EFA    iShare… 8/14/01    0.32    0.32  44144. 8/30/22 Equity  Large/… Global
# … with 383 more rows, 12 more variables: market <chr>, location <chr>,
#   investment_style <chr>,
#   sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc <chr>,
#   msci_esg_quality_score_0_10 <dbl>,
#   msci_weighted_average_carbon_intensity_tons_co2e_m_sales <dbl>,
#   msci_esg_percent_coverage <dbl>, sustainable_classification <chr>,
#   name_wo_ishares_etf <chr>, is_esg <chr>, years_from_inception <dbl>, …
#transforming both data columns (currently character columns)
# lubridate::mdy() function to turn them into date objects.
blackrock_etf_data <- blackrock_etf_data %>% 
  mutate(across(contains("date"), lubridate::mdy)) %>%
# turn assets into billions from millions by dividing `net_assets_millions` by 1,000 (10^3)
  mutate(net_assets_usd_bn = net_assets_usd_mn/10^3) %>%
# getting rid of net_assets_as_of since it is useless
  select(-net_assets_as_of)
mini_blackrock_data <- blackrock_etf_data %>% 
  # group by whether the fund is an ESG fund or not
  group_by(is_esg) %>% 
  # take the top 5 from each group, by net assets
  slice_max(order_by = net_assets_usd_bn, n = 5) %>% 
  select(ticker, fund_name = name_wo_ishares_etf, asset_class, sub_asset_class, region, incept_date, net_assets_usd_bn,
         msci_weighted_average_carbon_intensity_tons_co2e_m_sales) %>% 
  # rename to `co2_intensity` instead of long name
  rename(co2_intensity = msci_weighted_average_carbon_intensity_tons_co2e_m_sales) %>% 
  # always good to ungroup() if you've used a group_by(). 
  ungroup()
Adding missing grouping variables: `is_esg`
mini_blackrock_data
# A tibble: 10 × 9
   is_esg       ticker fund_…¹ asset…² sub_a…³ region incept_d…⁴ net_a…⁵ co2_i…⁶
   <chr>        <chr>  <chr>   <chr>   <chr>   <chr>  <date>       <dbl>   <dbl>
 1 ESG Fund     ESGU   ESG Aw… Equity  Large/… North… 2016-12-01   22.4    104. 
 2 ESG Fund     ESGD   ESG Aw… Equity  Large/… Global 2016-06-28    6.43   104. 
 3 ESG Fund     ICLN   Global… Equity  All Cap Global 2008-06-24    5.63   266. 
 4 ESG Fund     ESGE   ESG Aw… Equity  Large/… Global 2016-06-28    4.23   168. 
 5 ESG Fund     DSI    MSCI K… Equity  Large/… North… 2006-11-14    3.69    72.7
 6 Regular Fund IVV    Core S… Equity  Large … North… 2000-05-15  298.     148. 
 7 Regular Fund IEFA   Core M… Equity  All Cap Global 2012-10-18   84.2    127. 
 8 Regular Fund AGG    Core U… Fixed … Multi … North… 2003-09-22   82.3    283. 
 9 Regular Fund IJR    Core S… Equity  Small … North… 2000-05-22   66.5    133. 
10 Regular Fund IEMG   Core M… Equity  All Cap Global 2012-10-18   64.9    369. 
# … with abbreviated variable names ¹​fund_name, ²​asset_class, ³​sub_asset_class,
#   ⁴​incept_date, ⁵​net_assets_usd_bn, ⁶​co2_intensity
blackrock_etf_data %>%
  glimpse()
Rows: 393
Columns: 22
$ ticker                                                                            <chr> …
$ name                                                                              <chr> …
$ incept_date                                                                       <date> …
$ gross_expense_ratio_percent                                                       <dbl> …
$ net_expense_ratio_percent                                                         <dbl> …
$ net_assets_usd_mn                                                                 <dbl> …
$ asset_class                                                                       <chr> …
$ sub_asset_class                                                                   <chr> …
$ region                                                                            <chr> …
$ market                                                                            <chr> …
$ location                                                                          <chr> …
$ investment_style                                                                  <chr> …
$ sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc <chr> …
$ msci_esg_quality_score_0_10                                                       <dbl> …
$ msci_weighted_average_carbon_intensity_tons_co2e_m_sales                          <dbl> …
$ msci_esg_percent_coverage                                                         <dbl> …
$ sustainable_classification                                                        <chr> …
$ name_wo_ishares_etf                                                               <chr> …
$ is_esg                                                                            <chr> …
$ years_from_inception                                                              <dbl> …
$ year_launched                                                                     <dbl> …
$ net_assets_usd_bn                                                                 <dbl> …
mini_blackrock_data %>% 
  glimpse()
Rows: 10
Columns: 9
$ is_esg            <chr> "ESG Fund", "ESG Fund", "ESG Fund", "ESG Fund", "ESG…
$ ticker            <chr> "ESGU", "ESGD", "ICLN", "ESGE", "DSI", "IVV", "IEFA"…
$ fund_name         <chr> "ESG Aware MSCI USA", "ESG Aware MSCI EAFE", "Global…
$ asset_class       <chr> "Equity", "Equity", "Equity", "Equity", "Equity", "E…
$ sub_asset_class   <chr> "Large/Mid Cap", "Large/Mid Cap", "All Cap", "Large/…
$ region            <chr> "North America", "Global", "Global", "Global", "Nort…
$ incept_date       <date> 2016-12-01, 2016-06-28, 2008-06-24, 2016-06-28, 200…
$ net_assets_usd_bn <dbl> 22.376688, 6.425673, 5.628011, 4.233544, 3.688930, 2…
$ co2_intensity     <dbl> 103.60, 103.83, 265.82, 167.71, 72.73, 148.34, 126.6…
blackrock_etf_data %>%
  # the column you want to examine
  select(asset_class) %>%  
  # selects the unique values
  distinct()
# A tibble: 5 × 1
  asset_class 
  <chr>       
1 Equity      
2 Fixed Income
3 Commodity   
4 Real Estate 
5 Multi Asset 
## Now Your Try

blackrock_etf_data %>% 
  # the column you want to examine
  select(region)  %>% 
  # selects the unique values
  distinct()
# A tibble: 7 × 1
  region                
  <chr>                 
1 North America         
2 Global                
3 Asia Pacific          
4 Latin America         
5 Europe                
6 Middle East and Africa
7 Kuwait                
blackrock_etf_data %>% 
  # the column you want to examine
  select(sub_asset_class) %>%  
  # selects the unique values
  distinct()
# A tibble: 18 × 1
   sub_asset_class       
   <chr>                 
 1 Large Cap             
 2 All Cap               
 3 Multi Sectors         
 4 Small Cap             
 5 Large/Mid Cap         
 6 Mid Cap               
 7 Credit                
 8 Inflation             
 9 Municipals            
10 Precious Metals       
11 Government            
12 Mortgages             
13 High Yield            
14 Real Estate Securities
15 Multi Commodity       
16 Static Allocation     
17 Mid/Small Cap         
18 Multi Strategy        
blackrock_etf_data %>% 
  # selecting all character columns and finding distinct numbers
  summarize(across(where(is.character), n_distinct)) %>% 
  # rename column 
  rename(msci_esg_rating = sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc) %>% 
  # pivot into long format 
  pivot_longer(cols = everything()) %>% 
  #arrange from highest to lowest
  arrange(value %>% desc())
# A tibble: 12 × 2
   name                       value
   <chr>                      <int>
 1 ticker                       393
 2 name                         393
 3 name_wo_ishares_etf          393
 4 location                      41
 5 sub_asset_class               18
 6 region                         7
 7 msci_esg_rating                7
 8 asset_class                    5
 9 sustainable_classification     5
10 market                         3
11 investment_style               2
12 is_esg                         2
blackrock_etf_data %>% 
  # select columns that are character data types and take out the columns discussed above
  select(where(is.character), -ticker, -name, -name_wo_ishares_etf) %>% 
  # for each column, find the unique values 
  map(unique)
$asset_class
[1] "Equity"       "Fixed Income" "Commodity"    "Real Estate"  "Multi Asset" 

$sub_asset_class
 [1] "Large Cap"              "All Cap"                "Multi Sectors"         
 [4] "Small Cap"              "Large/Mid Cap"          "Mid Cap"               
 [7] "Credit"                 "Inflation"              "Municipals"            
[10] "Precious Metals"        "Government"             "Mortgages"             
[13] "High Yield"             "Real Estate Securities" "Multi Commodity"       
[16] "Static Allocation"      "Mid/Small Cap"          "Multi Strategy"        

$region
[1] "North America"          "Global"                 "Asia Pacific"          
[4] "Latin America"          "Europe"                 "Middle East and Africa"
[7] "Kuwait"                

$market
[1] "Developed" "Emerging"  "Frontier" 

$location
 [1] "United States"        "Broad"                "Japan"               
 [4] "China"                "Brazil"               "India"               
 [7] "Taiwan"               "Canada"               "United Kingdom"      
[10] "South Korea"          "Australia"            "Switzerland"         
[13] "Germany"              "Saudi Arabia"         "Mexico"              
[16] "France"               "Hong Kong"            "Singapore"           
[19] "Chile"                "Indonesia"            "Spain"               
[22] "Sweden"               "Thailand"             "Netherlands"         
[25] "South Africa"         "Turkey"               "Malaysia"            
[28] "Italy"                "Denmark"              "Israel"              
[31] "Poland"               "New Zealand"          "Philippines"         
[34] "Qatar"                "Austria"              "United Arab Emirates"
[37] "Norway"               "Kuwait"               "Finland"             
[40] "Belgium"              "Russia"              

$investment_style
[1] "Index"  "Active"

$sustainability_characteristics_msci_esg_fund_ratings_msci_esg_fund_rating_aaa_ccc
[1] "AA"  "AAA" "A"   "BBB" NA    "BB"  "B"  

$sustainable_classification
[1] NA             "Broad ESG"    "Thematic ESG" "Impact"       "Screened"    

$is_esg
[1] "Regular Fund" "ESG Fund"    

Week 3 Part 1 HW

blackrock_etf_data %>% 
  ggplot(aes(x = is_esg, fill = region)) +
  geom_bar()

This visualization shows the data set separated by the types of fund. It is colored (or filled) by the regions. Through this bar graph, we can see that North America holds the most counts of regular fund and ESG fund, and the total count of Regular Fund overall greatly outnumbers the total count of ESG fund.

blackrock_etf_data %>% 
  ggplot(aes(x = net_assets_usd_bn)) +
  geom_histogram(binwidth = 3) +
  facet_wrap(~region, ncol = 3)

This graph compares the distributions by region, and thereby shows North America as the largest net assets. This helps show which market may be the most robust.

blackrock_etf_data %>% 
  ggplot (aes(x = gross_expense_ratio_percent, fill= region)) + geom_histogram(position = "identity", alpha =.5)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This graph illustrates the gross expense ratios of reach region. There is not a clear similarity between the spread of each region, with North America clustered towards the left, and other funds towards the right. Most are clustering below .5%, which is a good sign for potential investors.

Part 2

mini_blackrock_data %>% 
  glimpse()
Rows: 10
Columns: 9
$ is_esg            <chr> "ESG Fund", "ESG Fund", "ESG Fund", "ESG Fund", "ESG…
$ ticker            <chr> "ESGU", "ESGD", "ICLN", "ESGE", "DSI", "IVV", "IEFA"…
$ fund_name         <chr> "ESG Aware MSCI USA", "ESG Aware MSCI EAFE", "Global…
$ asset_class       <chr> "Equity", "Equity", "Equity", "Equity", "Equity", "E…
$ sub_asset_class   <chr> "Large/Mid Cap", "Large/Mid Cap", "All Cap", "Large/…
$ region            <chr> "North America", "Global", "Global", "Global", "Nort…
$ incept_date       <date> 2016-12-01, 2016-06-28, 2008-06-24, 2016-06-28, 200…
$ net_assets_usd_bn <dbl> 22.376688, 6.425673, 5.628011, 4.233544, 3.688930, 2…
$ co2_intensity     <dbl> 103.60, 103.83, 265.82, 167.71, 72.73, 148.34, 126.6…
etf_comparison_data_github_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"

etf_comparison <- read_csv(etf_comparison_data_github_url)
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
etf_comparison 
# A tibble: 537 × 14
   ticker compa…¹ sector esg_etf stand…² in_es…³ in_st…⁴ in_on…⁵ esg_t…⁶ esg_u…⁷
   <chr>  <chr>   <chr>    <dbl>   <dbl> <lgl>   <lgl>   <lgl>     <dbl> <chr>  
 1 PRU    PRUDEN… Finan…   0.537  0.106  FALSE   FALSE   FALSE     0.431 Overwe…
 2 GIS    GENERA… Consu…   0.552  0.151  FALSE   FALSE   FALSE     0.401 Overwe…
 3 K      KELLOGG Consu…   0.453  0.0592 FALSE   FALSE   FALSE     0.394 Overwe…
 4 ADP    AUTOMA… Infor…   0.649  0.312  FALSE   FALSE   FALSE     0.337 Overwe…
 5 ECL    ECOLAB… Mater…   0.441  0.118  FALSE   FALSE   FALSE     0.322 Overwe…
 6 JCI    JOHNSO… Indus…   0.416  0.112  FALSE   FALSE   FALSE     0.304 Overwe…
 7 ES     EVERSO… Utili…   0.392  0.0896 FALSE   FALSE   FALSE     0.302 Overwe…
 8 PEG    PUBLIC… Utili…   0.376  0.0929 FALSE   FALSE   FALSE     0.284 Overwe…
 9 RTX    RAYTHE… Indus…   0.677  0.401  FALSE   FALSE   FALSE     0.277 Overwe…
10 LNG    CHENIE… Energy   0.274  0      TRUE    FALSE   TRUE      0.274 Overwe…
# … with 527 more rows, 4 more variables: esg_tilt_z_score <dbl>,
#   esg_tilt_rank <dbl>, esg_tilt_percentile <dbl>, esg_tilt_quantile_5 <dbl>,
#   and abbreviated variable names ¹​company_name, ²​standard_etf, ³​in_esg_only,
#   ⁴​in_standard_only, ⁵​in_on_index_only, ⁶​esg_tilt, ⁷​esg_uw_ow
 etf_comparison %>% 
  ggplot(aes(x=esg_tilt,y=standard_etf, label= company_name)) +geom_point() +geom_text(aes(label=ifelse(esg_etf>2,as.character(company_name),'')),hjust=0,vjust=0)

This graph highlights the outliers in ESG_tilt and standard ETF. Tech companies seem to be outliers in both esg_etf, standard_etf, but not esg_tilt. This makes us question if tech companies performances will be if they are outliers in ESG_tilt also.

 etf_comparison %>% 
  ggplot(aes(x=sector, fill=in_standard_only)) +geom_bar()

This graph breaks down by sector within these two largest funds. Generally, a greater count of technology companies are not included in standard only, although certain sectors look very even. This indicates that ESG funds are included in most sectors.

 etf_comparison %>% 
  ggplot(aes(y = esg_etf)) +
  geom_boxplot()

This box is flattened, which indicates that even the 75th percentile of funds is very small. We can therefore say that almost all of the ESG ETFs are in a few large outlier funds. Therefore, we can confidently focus on a few large funds, most likely tech funds, instead of looking at all of them.