Jue Wang-HW2

HW2

library(esquisse)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggpubr)

Import data:

blackrock_esg_vs_non_esg_etf <- "https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/etf_comparison-2022-10-03.csv" |> 
  read_csv() |> 
  select(company_name:standard_etf)
Rows: 537 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ticker, company_name, sector, esg_uw_ow
dbl (7): esg_etf, standard_etf, esg_tilt, esg_tilt_z_score, esg_tilt_rank, e...
lgl (3): in_esg_only, in_standard_only, in_on_index_only

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Problem 1:

to open equisser, use esquisser(blackrock_esg_vs_non_esg_etf)

ggplot(blackrock_esg_vs_non_esg_etf) +
 aes(x = esg_etf, y = standard_etf, colour = sector) +
 geom_point(shape = "circle", 
 size = 1.5) +
 geom_smooth(span = 0.75) +
 scale_color_hue(direction = 1) +
 scale_x_continuous(trans = "log10") +
 scale_y_continuous(trans = "log10") +
 labs(x = "ESG ETF(ESGU)", y = "Standard ETF(IVV)", title = "We made this chart using esquisse!", 
 subtitle = "It's a great tool for learning ggplot2. Even if it has limitations", caption = "I made this!") +
 theme_minimal() +
 facet_wrap(vars(sector))
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

Problem 2:

blackrock_esg_vs_non_esg_etf_long <- blackrock_esg_vs_non_esg_etf |> 
  pivot_longer(cols = contains("etf"), names_to = "fund_type", values_to = "weight") |> 
  mutate(fund_type = case_when(fund_type == "esg_etf" ~ "ESG ETF (ESGU)",
                               fund_type == "standard_etf" ~ "Standard ETF (IVV)"))
blackrock_esg_vs_non_esg_etf_long
# A tibble: 1,074 × 4
   company_name                  sector                 fund_type         weight
   <chr>                         <chr>                  <chr>              <dbl>
 1 PRUDENTIAL FINANCIAL INC      Financials             ESG ETF (ESGU)    0.537 
 2 PRUDENTIAL FINANCIAL INC      Financials             Standard ETF (IV… 0.106 
 3 GENERAL MILLS INC             Consumer Staples       ESG ETF (ESGU)    0.552 
 4 GENERAL MILLS INC             Consumer Staples       Standard ETF (IV… 0.151 
 5 KELLOGG                       Consumer Staples       ESG ETF (ESGU)    0.453 
 6 KELLOGG                       Consumer Staples       Standard ETF (IV… 0.0592
 7 AUTOMATIC DATA PROCESSING INC Information Technology ESG ETF (ESGU)    0.649 
 8 AUTOMATIC DATA PROCESSING INC Information Technology Standard ETF (IV… 0.312 
 9 ECOLAB INC                    Materials              ESG ETF (ESGU)    0.441 
10 ECOLAB INC                    Materials              Standard ETF (IV… 0.118 
# ℹ 1,064 more rows
library(dplyr)
library(ggplot2)

blackrock_esg_vs_non_esg_etf_long %>%
  #we keep data with weight beterrn 1L and 7L
 filter(weight >= 1L & weight <= 7L) %>%
  #use a scatter with weight on the x axis and company name on the y-axis, differen
  #-tiated the fund type by coloring their points on the plot. Those with higher weight
  #are setted to appear larger in the graph. 
  #with higher weight 
 ggplot() +
 aes(x = weight, y = company_name, colour = fund_type, size = weight) +
 geom_point(shape = "circle") +
  #Funds types are seperated by assigning green to ESG and grey to non-ESG fund 
 scale_color_manual(values = c(`ESG ETF (ESGU)` = "#55E368", `Standard ETF (IVV)` = "#989898"
)) +
  #add titles and other descriptive informations on the graph
 labs(title = "Blackrock ETF", subtitle = "ESG vs. non-ESG", caption = "Jue Wang") +
 theme_minimal()

Problem 3:

library(ggplot2)
#Use a boxplot to access the distribution of weight within each sectors 
#and put them into one horizontal axis to compare the difference between sectors
ggplot(blackrock_esg_vs_non_esg_etf) +
 aes(x = sector, y = esg_etf, fill = sector) +
 geom_boxplot() +
 scale_fill_hue(direction = 1) +
  #y-axis is scaled to better reflect weights 
 scale_y_continuous(trans = "log10") +
  #adding title to describe the purpose of the graph
 labs(title = "ESG ETF Among Different Sectors ") +
 theme_minimal()
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 226 rows containing non-finite values (`stat_boxplot()`).

#Use the data for non-ESG fund, access the weight distribution within each sector
#put them into one horizontal axis to compare their differences.
ggplot(blackrock_esg_vs_non_esg_etf) +
 aes(x = sector, y = standard_etf, fill = sector) +
 geom_boxplot() +
 scale_fill_hue(direction = 1) +
  #scale the y-axis to better reflect data values
 scale_y_continuous(trans = "log10") +
 labs(title = "Non-ESG ETF Among Different Sectors ") +
 theme_minimal()
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 35 rows containing non-finite values (`stat_boxplot()`).

From the two graphs, we can see that financial sector has the highest median weight for ESG ETF, while materials sector has the lowest. On average, companies have a median around 0.2 between all sectors. More than half of the sectors have their#3rd quartile of ESG ETF weight exceeding 0.3.

For non-ESG ETF, the two sectors with highest weight of funds are communication and information technology. The consumer sector covers the widest range of weight, while utilities sector covers the smallest. The graph exhibit some outliers, which are mainly from consumer discretionary, energy, and IT sectors. Therefore, we may trimmed out those outliers to better predict the relationship between ESG and non-ESG ETF of the companies.

Problem 4:

ggplot(
  data = blackrock_esg_vs_non_esg_etf,
  mapping = aes(x = esg_etf, y = standard_etf)
) +
  geom_point(mapping = aes(color = sector)) +
  geom_smooth() +
  scale_x_log10(limits = c(.01,10)) +  
  scale_y_log10(limits = c(.01,10)) 
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

Graph 1: a scatterplot is made with different sectors differentiated by colors ESG is on the x-axis, non-ESG on the y axis, with both axis scaled to log10. A smooth correlation line is added.

We can use ggplot(blackr…, aes(x=, y=,..)) for ggplot2

ggplot(
  data = blackrock_esg_vs_non_esg_etf,
  mapping = aes(x = esg_etf, y = standard_etf, color = sector))+
  geom_point() +
  geom_smooth() +
  scale_x_log10(limits = c(.01,10)) +  
  scale_y_log10(limits = c(.01,10)) 
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

Graph 2: Another scatterplot is made with different sectors differentiated by colors.However, instead of a smooth curve, lines are segmented by sectors as #we put the color command in the first mapping part.

Use ggplot(blackr…, aes(x=, y=,..)) for ggplot2

ggplot(
  data = blackrock_esg_vs_non_esg_etf,
  mapping = aes(x = esg_etf, y = standard_etf)
) +
  geom_point(colour = "purple") +
  geom_smooth(colour = "yellow") +
  scale_x_log10(limits = c(.01,10)) +  
  scale_y_log10(limits = c(.01,10)) 
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Transformation introduced infinite values in continuous y-axis
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 261 rows containing non-finite values (`stat_smooth()`).

Graph 3: datas are given to a unified color, which is done by putting the “colour =” command in geom_point(). The smooth line are made with yellow color, also by the colour command.

Use ggplot(blackr…, aes(x=, y=,..)) for ggplot2

Problem 5:

Basic Lollipop plot

This is very close to a barplot and scatterplot, but can replace the barplot as its x axis can be either numerical or categoricall variables.

Example:

# Libraries
library(ggplot2)

# Create data
data <- data.frame(
  x=LETTERS[1:26], 
  y=abs(rnorm(26))
)

# Plot
ggplot(data, aes(x=x, y=y)) +
  geom_point() + 
  geom_segment( aes(x=x, xend=x, y=0, yend=y))

In our example:

ggplot(blackrock_esg_vs_non_esg_etf_long, aes(x=company_name, y=weight)) +
  geom_point() + 
  geom_segment( aes(x=company_name, xend=company_name, y=0, yend=weight))

This shows the weight for each company respectively, and it’s eash to notice the#higest weight among all.

Problem 6:

Violin plot: can reflect the statistic of each category in a more direct and interesting#manner.

Example:

data("ToothGrowth")
df <- ToothGrowth
ggviolin(df, x = "dose", y = "len", fill = "dose",
         palette = c("#00AFBB", "#E7B800", "#FC4E07"),
         add = "boxplot", add.params = list(fill = "white"))

In our example:

ggviolin(blackrock_esg_vs_non_esg_etf_long, x = "sector", y = "weight", fill = "sector",
         palette = c("#00AFBB", "#E7B800", "#FC4E07", "#10AFBB", "#E7A002", 
         "#FCFF07","#99AFBB", "#A7B800", "#AC4E07","#60AC1B", "#FC8E07"),
         add = "boxplot", add.params = list(fill = "white"))

This can show the weight ditribution of each sectors through changing the shape#of each violins.