My First Quarto Document

1.5 Practise Problems

1.5.1 Using esquisse to practice ggplot2basics

1.5.1.1 Setup Esquisse and read the documents: Load Libraries & Data & Get a Glimpse

library(tidyverse)
Warning: package 'tidyr' was built under R version 4.2.3
Warning: package 'readr' was built under R version 4.2.3
Warning: package 'dplyr' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(esquisse)
Warning: package 'esquisse' was built under R version 4.2.3
# the URL of our data on GitHub
github_url <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv"


# read the data from GitHub
blackrock_esg_vs_non_esg_etf <- github_url |> 
  read_csv() |> 
  # select the four columns we will use in our anlaysis here
  select(company_name:standard_etf)
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# use dplyr::glimpse() to get an overview of our data
blackrock_esg_vs_non_esg_etf |> 
  glimpse()
Rows: 537
Columns: 4
$ company_name <chr> "PRUDENTIAL FINANCIAL INC", "GENERAL MILLS INC", "KELLOGG…
$ sector       <chr> "Financials", "Consumer Staples", "Consumer Staples", "In…
$ esg_etf      <dbl> 0.5366803, 0.5522180, 0.4534279, 0.6486836, 0.4407025, 0.…
$ standard_etf <dbl> 0.10574313, 0.15134370, 0.05920732, 0.31168123, 0.1184507…

1.5.1.2 Homework problem 1: Recreate the chart above in Esquisse

I didn’t add the legend as each graph has a Title on it explaining which industry it is. Adding the legend presented a big issue in terms of formatting, either the graphs would be flattened top down if legend was on the top or bottom, or the Full titles for each graph would not be readable if the legend was left or right.

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf, colour = sector) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(span = 0.75) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  scale_color_viridis_d(option = "magma", direction = 1) +
  labs(
    x = "ESG ETF(ESGU)",
    y = "Standard ETF (IVV)",
    title = "We made this chart using Esquisse!",
    subtitle = "It's a great tool for learning ggplot2. Even if it has limitations.",
    caption = "Ann-Alice made this!"
  ) +
  theme_minimal() +
  theme(legend.position = "none") +
  facet_wrap(vars(sector))
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

1.5.1.3 Homework problem 2: exploring the outliers

blackrock_esg_vs_non_esg_etf_long <- blackrock_esg_vs_non_esg_etf |> 
  # we'll learn a lot more about long data & pivot_longer() in future weeks. 
  pivot_longer(cols = contains("etf"), names_to = "fund_type", values_to = "weight") |> 
  # case_when() is like an extended "if else"
  mutate(fund_type = case_when(fund_type == "esg_etf" ~ "ESG ETF (ESGU)",
                               fund_type == "standard_etf" ~ "Standard ETF (IVV)"))
blackrock_esg_vs_non_esg_etf_long
# A tibble: 1,074 × 4
   company_name                  sector                 fund_type         weight
   <chr>                         <chr>                  <chr>              <dbl>
 1 PRUDENTIAL FINANCIAL INC      Financials             ESG ETF (ESGU)    0.537 
 2 PRUDENTIAL FINANCIAL INC      Financials             Standard ETF (IV… 0.106 
 3 GENERAL MILLS INC             Consumer Staples       ESG ETF (ESGU)    0.552 
 4 GENERAL MILLS INC             Consumer Staples       Standard ETF (IV… 0.151 
 5 KELLOGG                       Consumer Staples       ESG ETF (ESGU)    0.453 
 6 KELLOGG                       Consumer Staples       Standard ETF (IV… 0.0592
 7 AUTOMATIC DATA PROCESSING INC Information Technology ESG ETF (ESGU)    0.649 
 8 AUTOMATIC DATA PROCESSING INC Information Technology Standard ETF (IV… 0.312 
 9 ECOLAB INC                    Materials              ESG ETF (ESGU)    0.441 
10 ECOLAB INC                    Materials              Standard ETF (IV… 0.118 
# ℹ 1,064 more rows
blackrock_esg_vs_non_esg_etf_long %>%
  filter(weight >= 1L & weight <= 7L) %>% #this filters for funds above 1% weight
  ggplot() +
  aes(
    x = weight, # This determines x axis values
    y = company_name, , # This determines y axis values
    colour = fund_type, # This determines what different colors are generated by
    size = weight
  ) +
  geom_point(shape = "circle") + # This determines the shape of the points
  scale_color_manual(
    values = c(`ESG ETF (ESGU)` = "#62E859", # These two lines determine the colors of points
               `Standard ETF (IVV)` = "#BFBFBF") # These two lines determine the colors of points
  ) +
  labs( # This part names all different labels and (sub)titles.
    x = "Weight(%)", # X label name
    y = "Company Name", # Y label name 
    title = "Largest ESG funds and their ESG vs non-ESG Portfolio", #title name
    subtitle = "1% Weight cut off for ESG vs standard funds", # subtitle name
    caption = "Ann-Alice", # caption text
    color = "Fund Type", # title for color legend
    size = "Weight" # title for size legend
  ) +
  theme_minimal() # sets overall theme of the graph

We can see that biggest investor in ETFs is Apple Inc, followed by Microsoft CORP, Alphabet Inc, and Amazon Com Inc. Those are also known to be largest companies in the world. All of their weights are below 7%. We can see that for most of the outliers on the graph, their ESG and standard weights are very similar. With Standard ETFs largely outweighing ESG ETFs. Apple Inc is the outlier as its ESG ETF outweighs its Standard ETF.

1.5.1.4 Homework problem 3: Make your own charts with Esquisse

Graph 1: Box Plot of ESG Weights Across Industries

From this graph we can see that average weights of ESG ETF investments across industries are very low, close to 0%. We can also see that Consumer Discretionary, Communication, and IT have the largest outliers, which are very probably due to the existence of large information and communication companies in this area. I chose the free y so that we could read the details IQR for each of the box plots.

# Graph 1: Box Plot of ESG Weights Across Industries 
ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = "", y = esg_etf, colour = sector) + # picking x and y axes and sort by sector
  geom_boxplot() + # creating box plots
  scale_colour_viridis_d(option = "plasma", direction = 1) + # choosing color fill
  labs( #annotating labels
    x = "Count",
    y = "ESG ETF",
    title = "Distribution of ESG ETF Weights Across Sectors",
    caption = "Ann-Alice made this"
  ) +
  theme_minimal() +
  facet_wrap(vars(sector), scales = "free_y") # create separate graph for each sector

Graph 2: Log-Scaled Distribution of Standard ETF Investments Across Sectors

From the histogram, it’s evident that the distribution of standard ETF investments spans a broad range on a log scale. Most investments are under 0.11. Investment sizes are most commonly 0.08, highlighting the utility of the log scale for a wide range of ETF investment values. The histogram’s distribution being right-skewed indicates that most standard ETF investments are concentrated towards the lower end of the scale, with a long tail extending towards higher values. This skewness suggests a prevalence of smaller investments within the dataset, with fewer larger investments.

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = standard_etf, fill = sector) + # picking x and y axes and sort by sector
  geom_histogram(bins = 30L) + # generating a histogram
  scale_fill_hue(direction = 1) + 
  scale_x_continuous(trans = "log10") + # creating logged values
  labs( # labelling axes and titles
    x = "Log10(Standard ETF Investment)",
    y = "Frequency",
    title = "Log-Scaled Distribution of Standard ETF Investments Across Sectors",
    caption = "Ann-Alice made this"
  ) +
  theme_minimal()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Removed 35 rows containing non-finite values (`stat_bin()`).

1.5.2.1 Homework problem 4: Understanding aes()

Chart 1: Sorting sectors through colors here. However, one smoothed line is traced for one entire group of all sectors.

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf, colour = sector) +
  geom_point(shape = "circle", size = 1.65) +
  scale_color_hue(direction = 1) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  labs(
    x = "ESG ETF Weights",
    y = "Standard ETF Weights",
    title = "ESG vs Standard Funds Weights Correlation",
    caption = "Ann-Alice",
    color = "Sector",) +
  geom_smooth(aes(group = 1), method = "auto") +
  theme_minimal()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).
Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Chart 2: Sorting sectors through colors here. Smoothed line is traced for each sector separately instead of for the data as a whole.

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf, colour = sector) + # sectors sorted using color
  geom_point(shape = "circle", size = 1.65) +
  scale_color_hue(direction = 1) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  labs(
    x = "ESG ETF Weights",
    y = "Standard ETF Weights",
    title = "ESG vs Standard Funds Weights Correlation",
    caption = "Ann-Alice",
    color = "Sector",) +
  geom_smooth() +
  theme_minimal()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

Chart 3: Sort sector through colors deleted. Smoothed line is traced for the data as a whole instead of by sector.

ggplot(blackrock_esg_vs_non_esg_etf) +
  aes(x = esg_etf, y = standard_etf) + # sort color by sector deleted
  geom_point(shape = "circle", size = 1.65) +
  scale_color_hue(direction = 1) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  labs(
    x = "ESG ETF Weights",
    y = "Standard ETF Weights",
    title = "ESG vs Standard Funds Weights Correlation",
    caption = "Ann-Alice",
    color = "Sector",) +
  geom_smooth() + # smooth by one group
  theme_minimal()
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).