1.5.1.1 Setup Esquisse and read the documents: Load Libraries & Data & Get a Glimpse
library(tidyverse)
Warning: package 'tidyr' was built under R version 4.2.3
Warning: package 'readr' was built under R version 4.2.3
Warning: package 'dplyr' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(ggplot2)library(esquisse)
Warning: package 'esquisse' was built under R version 4.2.3
# the URL of our data on GitHubgithub_url <-"https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"# read the data from GitHubblackrock_esg_vs_non_esg_etf <- github_url |>read_csv() |># select the four columns we will use in our anlaysis hereselect(company_name:standard_etf)
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# use dplyr::glimpse() to get an overview of our datablackrock_esg_vs_non_esg_etf |>glimpse()
1.5.1.2 Homework problem 1: Recreate the chart above in Esquisse
I didn’t add the legend as each graph has a Title on it explaining which industry it is. Adding the legend presented a big issue in terms of formatting, either the graphs would be flattened top down if legend was on the top or bottom, or the Full titles for each graph would not be readable if the legend was left or right.
ggplot(blackrock_esg_vs_non_esg_etf) +aes(x = esg_etf, y = standard_etf, colour = sector) +geom_point(shape ="circle", size =1.5) +geom_smooth(span =0.75) +scale_x_continuous(trans ="log10") +scale_y_continuous(trans ="log10") +scale_color_viridis_d(option ="magma", direction =1) +labs(x ="ESG ETF(ESGU)",y ="Standard ETF (IVV)",title ="We made this chart using Esquisse!",subtitle ="It's a great tool for learning ggplot2. Even if it has limitations.",caption ="Ann-Alice made this!" ) +theme_minimal() +theme(legend.position ="none") +facet_wrap(vars(sector))
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
1.5.1.3 Homework problem 2: exploring the outliers
blackrock_esg_vs_non_esg_etf_long <- blackrock_esg_vs_non_esg_etf |># we'll learn a lot more about long data & pivot_longer() in future weeks. pivot_longer(cols =contains("etf"), names_to ="fund_type", values_to ="weight") |># case_when() is like an extended "if else"mutate(fund_type =case_when(fund_type =="esg_etf"~"ESG ETF (ESGU)", fund_type =="standard_etf"~"Standard ETF (IVV)"))blackrock_esg_vs_non_esg_etf_long
# A tibble: 1,074 × 4
company_name sector fund_type weight
<chr> <chr> <chr> <dbl>
1 PRUDENTIAL FINANCIAL INC Financials ESG ETF (ESGU) 0.537
2 PRUDENTIAL FINANCIAL INC Financials Standard ETF (IV… 0.106
3 GENERAL MILLS INC Consumer Staples ESG ETF (ESGU) 0.552
4 GENERAL MILLS INC Consumer Staples Standard ETF (IV… 0.151
5 KELLOGG Consumer Staples ESG ETF (ESGU) 0.453
6 KELLOGG Consumer Staples Standard ETF (IV… 0.0592
7 AUTOMATIC DATA PROCESSING INC Information Technology ESG ETF (ESGU) 0.649
8 AUTOMATIC DATA PROCESSING INC Information Technology Standard ETF (IV… 0.312
9 ECOLAB INC Materials ESG ETF (ESGU) 0.441
10 ECOLAB INC Materials Standard ETF (IV… 0.118
# ℹ 1,064 more rows
blackrock_esg_vs_non_esg_etf_long %>%filter(weight >= 1L & weight <= 7L) %>%#this filters for funds above 1% weightggplot() +aes(x = weight, # This determines x axis valuesy = company_name, , # This determines y axis valuescolour = fund_type, # This determines what different colors are generated bysize = weight ) +geom_point(shape ="circle") +# This determines the shape of the pointsscale_color_manual(values =c(`ESG ETF (ESGU)`="#62E859", # These two lines determine the colors of points`Standard ETF (IVV)`="#BFBFBF") # These two lines determine the colors of points ) +labs( # This part names all different labels and (sub)titles.x ="Weight(%)", # X label namey ="Company Name", # Y label name title ="Largest ESG funds and their ESG vs non-ESG Portfolio", #title namesubtitle ="1% Weight cut off for ESG vs standard funds", # subtitle namecaption ="Ann-Alice", # caption textcolor ="Fund Type", # title for color legendsize ="Weight"# title for size legend ) +theme_minimal() # sets overall theme of the graph
We can see that biggest investor in ETFs is Apple Inc, followed by Microsoft CORP, Alphabet Inc, and Amazon Com Inc. Those are also known to be largest companies in the world. All of their weights are below 7%. We can see that for most of the outliers on the graph, their ESG and standard weights are very similar. With Standard ETFs largely outweighing ESG ETFs. Apple Inc is the outlier as its ESG ETF outweighs its Standard ETF.
1.5.1.4 Homework problem 3: Make your own charts with Esquisse
Graph 1: Box Plot of ESG Weights Across Industries
From this graph we can see that average weights of ESG ETF investments across industries are very low, close to 0%. We can also see that Consumer Discretionary, Communication, and IT have the largest outliers, which are very probably due to the existence of large information and communication companies in this area. I chose the free y so that we could read the details IQR for each of the box plots.
# Graph 1: Box Plot of ESG Weights Across Industries ggplot(blackrock_esg_vs_non_esg_etf) +aes(x ="", y = esg_etf, colour = sector) +# picking x and y axes and sort by sectorgeom_boxplot() +# creating box plotsscale_colour_viridis_d(option ="plasma", direction =1) +# choosing color filllabs( #annotating labelsx ="Count",y ="ESG ETF",title ="Distribution of ESG ETF Weights Across Sectors",caption ="Ann-Alice made this" ) +theme_minimal() +facet_wrap(vars(sector), scales ="free_y") # create separate graph for each sector
Graph 2: Log-Scaled Distribution of Standard ETF Investments Across Sectors
From the histogram, it’s evident that the distribution of standard ETF investments spans a broad range on a log scale. Most investments are under 0.11. Investment sizes are most commonly 0.08, highlighting the utility of the log scale for a wide range of ETF investment values. The histogram’s distribution being right-skewed indicates that most standard ETF investments are concentrated towards the lower end of the scale, with a long tail extending towards higher values. This skewness suggests a prevalence of smaller investments within the dataset, with fewer larger investments.
ggplot(blackrock_esg_vs_non_esg_etf) +aes(x = standard_etf, fill = sector) +# picking x and y axes and sort by sectorgeom_histogram(bins = 30L) +# generating a histogramscale_fill_hue(direction =1) +scale_x_continuous(trans ="log10") +# creating logged valueslabs( # labelling axes and titlesx ="Log10(Standard ETF Investment)",y ="Frequency",title ="Log-Scaled Distribution of Standard ETF Investments Across Sectors",caption ="Ann-Alice made this" ) +theme_minimal()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Chart 2: Sorting sectors through colors here. Smoothed line is traced for each sector separately instead of for the data as a whole.
1.5.3.1 Homework problem 5: make a new chart from the R Graph Gallery
My Chart 1: Total Weight of ESG ETFs by sector
We can see that the IT industry is a disproportionatelly large investor in ESG funds as compared to other industries, accounting for close to 30% of investments, whereas materials, utilities, and real estate are the smallest investors in ESG funds. This could be due to two reasons: currently, IT companies are globally leading by their asset size, making them more able to invest their funds in general. Secondly, this could be because the IT sector is more attuned to invest ESG funds as it is less reliant on resources that are represented in the standard funds.
# Calculate Total Weights By Sector and then Plot Graphp1<-ggplot(blackrock_esg_vs_non_esg_etf %>%group_by(sector) %>%mutate(total_weight =sum(esg_etf)) %>%ungroup() %>%mutate(sector =reorder(sector, total_weight)))+aes(x = sector,fill = sector,weight = esg_etf ) +geom_bar() +scale_fill_hue(direction =1) +labs(x ="Sector",y ="Total Weight (%)",caption ="Ann-Alice made this",fill ="Sector",title ="Total Weight of ESG ETFs by sector" ) +coord_flip() +theme_minimal()print(p1)
My Chart 2: We can see here that the IT sector is again the biggest investor in Standard funds, which indicates that it is perhaps company size or overall investment sum that determine their leading position in both ESG and standard ETFs –> it is not a result of ESG-focused or standard-ETF-focused investment strategy. We can see that Materials, Utilities, and the Real Estate sectors have, again the smallest weights, indicating they may be perceived as more risky or less profitable.
pie_data <- blackrock_esg_vs_non_esg_etf %>%group_by(sector) %>%summarise(total_weight =sum(standard_etf), .groups ='drop') %>%mutate(proportion = total_weight /sum(total_weight),sector =factor(sector, levels = sector[order(-total_weight)])) %>%arrange(desc(total_weight))ggplot(pie_data, aes(x ="", y = proportion, fill = sector)) +geom_bar(stat ="identity", width =1) +coord_polar(theta ="y") +# This transforms the bar chart into a pie chartscale_fill_hue(direction =1) +labs(x =NULL,y =NULL,caption ="Ann-Alice made this",title ="Total Weight of Standard ETFs by Sector",fill ="Sector", ) +theme_minimal() +geom_text(aes(label = scales::percent(proportion)), position =position_stack(vjust =0.5),hjust =0.5, vjust =-0.5, size =2)
1.5.3.2 Homework problem 6: make a new chart from the ggplot2 Extensions Gallery
p2 <-ggplot(blackrock_esg_vs_non_esg_etf %>%group_by(sector) %>%mutate(total_weight =sum(standard_etf)) %>%ungroup() %>%mutate(sector =reorder(sector, total_weight)))+aes(x = sector,fill = sector,weight = standard_etf ) +geom_bar() +scale_fill_hue(direction =1) +labs(x ="Sector",y ="Total Weight (%)",caption ="Ann-Alice made this",fill ="Sector",title ="Total Weight of Standard ETFs by sector" ) +coord_flip() +theme_minimal()print(p2)