HW2

Week 2 Data Visualization

Homework problem 1

Preparations

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(esquisse)

github_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"


# read the data from GitHub
blackrock_esg_vs_non_esg_etf <- github_url |> 
  read_csv() |> 
  # select the four columns we will use in our anlaysis here
  select(company_name:standard_etf)
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Graphing in Esquisse:

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf, colour = sector) +
  geom_point(shape = "circle", size = 1.2) +
  geom_smooth(span = 0.75) +
  scale_color_brewer(palette = "Set2", direction = 1) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  labs(
    x = "Weight in ESG ETC (ESGU)",
    y = "Weight in Standard ETF (IVV)",
    title = "Large Cap American Equities ETFs: ESG vs. Non-ESG",
    subtitle = "A comparison of the holdings of BlackRock iShares ESGU and IVV",
    caption = "Dandan Gong"
  ) +
  theme_gray() +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 22L,
    face = "bold"),
    plot.subtitle = element_text(size = 18L),
    plot.caption = element_text(size = 14L),
    axis.title.y = element_text(size = 14L),
    axis.title.x = element_text(size = 14L)
  ) +
  facet_wrap(vars(sector), ncol = 4L)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Homework problem 2

For this section we compared across several companies that take an unusually high weight (>1%) in the two funds. These turn out to be major, influential companies such as Tesla and Google. We then look to see if they are featured in both ESG and non-ESG funds, and if so, which type of fund gives the company more weight.

The results are illuminating. First, out of these large outliers, P&G, Meta, Johnson & Johnson, and Berkshire Hathaway are only featured in non-ESG funds, which may seem to indicate that these funds either didn’t satisfy ESG criteria, or that their weight in the ESG funds were smaller and isn’t captured in this graph.

Conversely, Pepsi and Home Depot appear to be only featured in ESG funds as large outliers.

The weight of the same company can be substantially different in different types of fund. For example, Google weights higher in the non-ESG than the ESG fund.

#changing dataframe to long format
blackrock_esg_vs_non_esg_etf_long <- blackrock_esg_vs_non_esg_etf |> 
  # we'll learn a lot more about long data & pivot_longer() in future weeks. 
  pivot_longer(cols = contains("etf"), names_to = "fund_type", values_to = "weight") |> 
  # case_when() is like an extended "if else"
  mutate(fund_type = case_when(fund_type == "esg_etf" ~ "ESG ETF (ESGU)",
                               fund_type == "standard_etf" ~ "Standard ETF (IVV)"))

blackrock_esg_vs_non_esg_etf_long
# A tibble: 1,074 × 4
   company_name                  sector                 fund_type         weight
   <chr>                         <chr>                  <chr>              <dbl>
 1 PRUDENTIAL FINANCIAL INC      Financials             ESG ETF (ESGU)    0.537 
 2 PRUDENTIAL FINANCIAL INC      Financials             Standard ETF (IV… 0.106 
 3 GENERAL MILLS INC             Consumer Staples       ESG ETF (ESGU)    0.552 
 4 GENERAL MILLS INC             Consumer Staples       Standard ETF (IV… 0.151 
 5 KELLOGG                       Consumer Staples       ESG ETF (ESGU)    0.453 
 6 KELLOGG                       Consumer Staples       Standard ETF (IV… 0.0592
 7 AUTOMATIC DATA PROCESSING INC Information Technology ESG ETF (ESGU)    0.649 
 8 AUTOMATIC DATA PROCESSING INC Information Technology Standard ETF (IV… 0.312 
 9 ECOLAB INC                    Materials              ESG ETF (ESGU)    0.441 
10 ECOLAB INC                    Materials              Standard ETF (IV… 0.118 
# ℹ 1,064 more rows
#Loading libraries
library(dplyr)
library(ggplot2)

#Creating graphic in Esquisse
blackrock_esg_vs_non_esg_etf_long %>%
 filter(weight >= 1L & weight <= 7L) %>% #Selecting large outlier companies with more than 1% weight in funds
 ggplot() +
 aes(x = weight, y = company_name, colour = fund_type, size = weight) + #Plotting weight on the x-axis and company name on y-axis, coloring the two types of funds differently ,and changing the size of the point based on the size of the weight
 geom_point(shape = "circle open") +
 scale_color_manual(values = c(`ESG ETF (ESGU)` = "#03754F", `Standard ETF (IVV)` = "#A8A2A6"
)) + #assigning green for ESG and gray for non-ESG points
 labs(x = "Weight in fund (%)", y = "Company name", title = "ESG vs Non-ESG weight", subtitle = "Comparison of weight among large component stocks", 
 caption = "Dandan Gong", color = "Fund type", size = "Weight (%)") + #Assigning labels for the chart
 theme_minimal() +
 theme(legend.position = "bottom", 
 plot.title = element_text(size = 18L, face = "bold"), plot.subtitle = element_text(size = 14L), plot.caption = element_text(size = 12L), 
 axis.title.y = element_text(size = 12L), axis.title.x = element_text(size = 12L))

Homework problem 3

For extra exploration, I wanted to see how energy sector companies were doing across ESG and non-ESG funds. My hypothesis was that energy companies probably had much less weight in ESG funds, and may not even be featured.

Surprisingly, in the strictly energy sector, I did not see a dramatic difference in fund weightings across fund types, and some energy companies even had higher weight in ESG funds.

I then separately looked at the utilities sector and had found the phenomenon there. The data has demonstrated that most utilities companies are not a part of ESG funds. The ones that are featured in ESG funds tend not to be coal or gas powered utility companies.

#loading data and packages
blackrock_esg_vs_non_esg_etf_long 
# A tibble: 1,074 × 4
   company_name                  sector                 fund_type         weight
   <chr>                         <chr>                  <chr>              <dbl>
 1 PRUDENTIAL FINANCIAL INC      Financials             ESG ETF (ESGU)    0.537 
 2 PRUDENTIAL FINANCIAL INC      Financials             Standard ETF (IV… 0.106 
 3 GENERAL MILLS INC             Consumer Staples       ESG ETF (ESGU)    0.552 
 4 GENERAL MILLS INC             Consumer Staples       Standard ETF (IV… 0.151 
 5 KELLOGG                       Consumer Staples       ESG ETF (ESGU)    0.453 
 6 KELLOGG                       Consumer Staples       Standard ETF (IV… 0.0592
 7 AUTOMATIC DATA PROCESSING INC Information Technology ESG ETF (ESGU)    0.649 
 8 AUTOMATIC DATA PROCESSING INC Information Technology Standard ETF (IV… 0.312 
 9 ECOLAB INC                    Materials              ESG ETF (ESGU)    0.441 
10 ECOLAB INC                    Materials              Standard ETF (IV… 0.118 
# ℹ 1,064 more rows
library(dplyr)
library(ggplot2)

#Creating chart
blackrock_esg_vs_non_esg_etf_long %>%
 filter(sector %in% c("Utilities", "Energy")) %>% #selecting only utilities and energy companies
 ggplot() + 
 aes(x = weight, y = company_name, colour = fund_type, size = weight, 
 group = fund_type) + #mapping aesthetics
 geom_point(shape = "circle") +
 scale_color_manual(values = c(`ESG ETF (ESGU)` = "#038E4F", 
`Standard ETF (IVV)` = "#999296")) +
 theme_minimal() +
 facet_wrap(vars(sector)) #created two charts, comparing sectors side by side 

Homework problem 4

Re-creating the three charts

# Begin a ggplot object using the blackrock_esg_vs_non_esg_etf dataset
ggplot(blackrock_esg_vs_non_esg_etf) +  
  # Define aesthetics: x-axis, y-axis, and color
  aes(x = esg_etf, y = standard_etf, colour = sector) +  
  # Add points to the plot with circle shape and size 1.5
  geom_point(shape = "circle", size = 1.5) +  
  # Add a local regression smooth line to the plot with specified span and color
  geom_smooth(method = "loess", span = 0.71, color = "blue") +  
  # Adjust the color scale to make it more visually distinguishable
  scale_color_hue(direction = 1) +  
  # Apply log10 transformation to the x-axis and y-axis
  scale_x_continuous(trans = "log10") +  
  scale_y_continuous(trans = "log10") +  
  # Label the x-axis, y-axis and add title
  labs(
    x = "ESG Fund",  
    y = "Non-ESG Fund",  
    title = "Correlation between ESG Fund and Non-ESG Fund by Sector",  
    caption = "Dandan Gong",  
    color = "Sector"  
  ) +
  # Apply a gray theme to the plot
  theme_gray()  
`geom_smooth()` using formula = 'y ~ x'

# Begin a ggplot object using the blackrock_esg_vs_non_esg_etf dataset
ggplot(blackrock_esg_vs_non_esg_etf) +
  # Define aesthetics: x-axis, y-axis, and color
  aes(x = esg_etf, y = standard_etf, colour = sector) +
  # Add points to the plot with circle shape and size 1.5
  geom_point(shape = "circle", size = 1.5) +
  # Add a smooth line to the plot with specified span
  geom_smooth(span = 0.75) +
  # Adjust the color scale
  scale_color_hue(direction = 1) +
  # Apply log10 transformation to the x-axis and y-axis
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  # Apply a gray theme to the plot
  theme_gray()  
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Begin a ggplot object using the blackrock_esg_vs_non_esg_etf dataset
ggplot(blackrock_esg_vs_non_esg_etf) +
  # Define aesthetics: x-axis, y-axis, and color
  aes(x = esg_etf, y = standard_etf) +
  # Add points to the plot with circle shape, size 1.5, and purple colour
  geom_point(shape = "circle", size = 1.5, colour = "#8D4BB5") +
  # Add a local regression smooth line to the plot with specified span and color
  geom_smooth(method = "loess", span = 0.71, color = "blue") + 
  # Apply log10 transformation to the x-axis and y-axis
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  # Apply a gray theme to the plot
  theme_gray() 
`geom_smooth()` using formula = 'y ~ x'

Homework problem 5

I found the Hexbin 2D chart to be really interesting and wanted to see how it would look like. The graphics doesn’t look that great for our data, but it works to show that there is a very high concentration of companies that are not listed in ESG chart, evident from the light blue corner in the bottom left.

library(dplyr)
library(ggplot2)


blackrock_esg_vs_non_esg_etf %>%
    filter(esg_etf <= 1 & standard_etf <=1) %>%
    ggplot(aes(x=esg_etf, y=standard_etf)) +
    geom_bin2d() +
    theme_bw()

Homework problem 6

Not all of the new extensions would work with our given data so I selected the theme package, and gave our dataset a new FT look!

options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages("hrbrthemes")
Installing package into 'C:/Users/dgong6/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'hrbrthemes' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\dgong6\AppData\Local\Temp\RtmpILCC5g\downloaded_packages
library(hrbrthemes)
Warning: package 'hrbrthemes' was built under R version 4.2.3
NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
      Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
      if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(dplyr)
library(ggplot2)


ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf, colour = sector) +
  geom_point(shape = "circle", size = 1.2) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  theme_ft_rc()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database

Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database