ESG Investing is increasingly becoming a crucial part of investment strategies.This analysis leverages S&P 500 ESG Scores to uncover trends and insights, guiding the creation of investment portfolios that not only promise returns but also reflect responsible stewardship of environmental, social, and governance values.”
The core mission of of this project is to harness the power of data analytics to forge a path for ESG investment strategies. By meticulously analyzing S&P 500 ESG Scores, this project aims to pinpoint key trends and actionable insights that will empower investors to build portfolios with a conscience, aligning financial goals with environmental, social, and governance benchmarks.
This dataset exclusively showcases companies from the S&P 500 index. Researchers, investors, analysts, and policy-makers can utilize this dataset to gain insights into the ESG performance and risk profiles of these major corporations. Whether exploring trends, conducting ESG assessments, or making informed investment decisions, this dataset serves as a valuable resource for comprehending the sustainability and governance practices of S&P 500 companies. Fore more information on the dataset !click here(https://www.kaggle.com/datasets/pritish509/s-and-p-500-esg-risk-ratings)
# Loading required libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(viridis) # For a vibrant color palette
## Loading required package: viridisLite
library(RColorBrewer)
library(forcats) # For factor releveling
library(cluster) # For cluster analysis
library(reshape2) # for melt function
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
esg_data <- read_csv("SP 500 ESG Risk Ratings.csv")
## Rows: 503 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): Symbol, Name, Address, Sector, Industry, Description, Controversy L...
## dbl (5): Total ESG Risk score, Environment Risk Score, Governance Risk Score...
## num (1): Full Time Employees
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(esg_data)
## # A tibble: 6 × 15
## Symbol Name Address Sector Industry `Full Time Employees` Description
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 A Agilent Tech… "5301 … Healt… Diagnos… 18000 Agilent Te…
## 2 AAL American Air… "1 Sky… Indus… Airlines 132500 American A…
## 3 AAP Advance Auto… "4200 … Consu… Special… 40000 Advance Au…
## 4 AAPL Apple Inc "One A… Techn… Consume… 164000 Apple Inc.…
## 5 ABBV Abbvie Inc "1 Nor… Healt… Drug Ma… 50000 AbbVie Inc…
## 6 ABC Amerisourceb… "1 Wes… Healt… Medical… 46000 Amerisourc…
## # ℹ 8 more variables: `Total ESG Risk score` <dbl>,
## # `Environment Risk Score` <dbl>, `Governance Risk Score` <dbl>,
## # `Social Risk Score` <dbl>, `Controversy Level` <chr>,
## # `Controversy Score` <dbl>, `ESG Risk Percentile` <chr>,
## # `ESG Risk Level` <chr>
colnames(esg_data)
## [1] "Symbol" "Name" "Address"
## [4] "Sector" "Industry" "Full Time Employees"
## [7] "Description" "Total ESG Risk score" "Environment Risk Score"
## [10] "Governance Risk Score" "Social Risk Score" "Controversy Level"
## [13] "Controversy Score" "ESG Risk Percentile" "ESG Risk Level"
glimpse(esg_data)
## Rows: 503
## Columns: 15
## $ Symbol <chr> "A", "AAL", "AAP", "AAPL", "ABBV", "ABC", "AB…
## $ Name <chr> "Agilent Technologies Inc", "American Airline…
## $ Address <chr> "5301 Stevens Creek Boulevard\nSanta Clara, C…
## $ Sector <chr> "Healthcare", "Industrials", "Consumer Cyclic…
## $ Industry <chr> "Diagnostics & Research", "Airlines", "Specia…
## $ `Full Time Employees` <dbl> 18000, 132500, 40000, 164000, 50000, 46000, 1…
## $ Description <chr> "Agilent Technologies, Inc. provides applicat…
## $ `Total ESG Risk score` <dbl> 15, 29, 12, 17, 28, 12, 25, 21, 10, 12, 24, 3…
## $ `Environment Risk Score` <dbl> 0.3, 12.0, 0.0, 0.6, 1.1, 1.3, 3.0, 1.0, 0.3,…
## $ `Governance Risk Score` <dbl> 6.3, 5.0, 3.0, 9.2, 9.9, 5.2, 8.4, 12.0, 4.8,…
## $ `Social Risk Score` <dbl> 8.6, 12.0, 8.0, 6.9, 16.8, 5.6, 13.6, 7.0, 4.…
## $ `Controversy Level` <chr> "Low", "Moderate", "Moderate", "Significant",…
## $ `Controversy Score` <dbl> 1, 2, 2, 3, 3, 3, 3, 2, 2, 1, 0, 3, 1, 0, 2, …
## $ `ESG Risk Percentile` <chr> "11th percentile", "62nd percentile", "4th pe…
## $ `ESG Risk Level` <chr> "Low", NA, "Negligible", "Low", "Medium", "Lo…
# Data Cleaning Steps
# 1. Remove rows with any missing values
esg_data_clean <- na.omit(esg_data)
# 2. Remove unnecessary columns, e.g., 'Address'
esg_data_clean <- select(esg_data_clean, -Address)
# 3. Convert 'Full Time Employees' to integer
esg_data_clean$`Full Time Employees` <- as.integer(esg_data_clean$`Full Time Employees`)
# 4. Rename 'Symbol' to 'Ticker'
esg_data_clean <- rename(esg_data_clean, Ticker = Symbol)
# 5. Address outliers in 'Total ESG Risk score'
esg_outliers <- boxplot.stats(esg_data_clean$`Total ESG Risk score`)$out
esg_data_clean <- esg_data_clean[!esg_data_clean$`Total ESG Risk score` %in% esg_outliers, ]
# 6. Normalize 'Total ESG Risk score'
esg_data_clean$`Total ESG Risk score` <- scale(esg_data_clean$`Total ESG Risk score`)
# Summary Statistics
# Create a summary statistics table, excluding the 'count' equivalent row
esg_data_summary <- summary(esg_data)
# Convert the summary to a data frame for better control over styling
esg_data_summary_df <- as.data.frame(t(esg_data_summary))
# Use kable from knitr to create a basic table
esg_data_describe <- kable(esg_data_summary_df[-1, ], "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
# Add a header above with the correct number of columns
# The number '4' should match the actual number of columns in your table
esg_data_describe <- esg_data_describe %>%
add_header_above(c("Summary Statistics" = 4)) %>%
row_spec(0, background = "white", color = "black") %>%
column_spec(1, bold = TRUE, border_right = TRUE) %>%
scroll_box(width = "100%", height = "500px")
# Display the table
esg_data_describe
| Var1 | Var2 | Freq | |
|---|---|---|---|
| 2 | Name | Length:503 | |
| 3 | Address | Length:503 | |
| 4 | Sector | Length:503 | |
| 5 | Industry | Length:503 | |
| 6 | Full Time Employees | Min. : 23 | |
| 7 | Description | Length:503 | |
| 8 | Total ESG Risk score | Min. : 7.00 | |
| 9 | Environment Risk Score | Min. : 0.000 | |
| 10 | Governance Risk Score | Min. : 3.000 | |
| 11 | Social Risk Score | Min. : 1.100 | |
| 12 | Controversy Level | Length:503 | |
| 13 | Controversy Score | Min. :0.000 | |
| 14 | ESG Risk Percentile | Length:503 | |
| 15 | ESG Risk Level | Length:503 | |
| 16 | Symbol | Class :character | |
| 17 | Name | Class :character | |
| 18 | Address | Class :character | |
| 19 | Sector | Class :character | |
| 20 | Industry | Class :character | |
| 21 | Full Time Employees | 1st Qu.: 10076 | |
| 22 | Description | Class :character | |
| 23 | Total ESG Risk score | 1st Qu.:16.00 | |
| 24 | Environment Risk Score | 1st Qu.: 1.500 | |
| 25 | Governance Risk Score | 1st Qu.: 5.000 | |
| 26 | Social Risk Score | 1st Qu.: 6.600 | |
| 27 | Controversy Level | Class :character | |
| 28 | Controversy Score | 1st Qu.:1.000 | |
| 29 | ESG Risk Percentile | Class :character | |
| 30 | ESG Risk Level | Class :character | |
| 31 | Symbol | Mode :character | |
| 32 | Name | Mode :character | |
| 33 | Address | Mode :character | |
| 34 | Sector | Mode :character | |
| 35 | Industry | Mode :character | |
| 36 | Full Time Employees | Median : 20000 | |
| 37 | Description | Mode :character | |
| 38 | Total ESG Risk score | Median :21.00 | |
| 39 | Environment Risk Score | Median : 3.800 | |
| 40 | Governance Risk Score | Median : 6.000 | |
| 41 | Social Risk Score | Median : 8.700 | |
| 42 | Controversy Level | Mode :character | |
| 43 | Controversy Score | Median :2.000 | |
| 44 | ESG Risk Percentile | Mode :character | |
| 45 | ESG Risk Level | Mode :character | |
| 46 | Symbol | NA | |
| 47 | Name | NA | |
| 48 | Address | NA | |
| 49 | Sector | NA | |
| 50 | Industry | NA | |
| 51 | Full Time Employees | Mean : 57060 | |
| 52 | Description | NA | |
| 53 | Total ESG Risk score | Mean :21.42 | |
| 54 | Environment Risk Score | Mean : 5.679 | |
| 55 | Governance Risk Score | Mean : 6.674 | |
| 56 | Social Risk Score | Mean : 9.045 | |
| 57 | Controversy Level | NA | |
| 58 | Controversy Score | Mean :1.896 | |
| 59 | ESG Risk Percentile | NA | |
| 60 | ESG Risk Level | NA | |
| 61 | Symbol | NA | |
| 62 | Name | NA | |
| 63 | Address | NA | |
| 64 | Sector | NA | |
| 65 | Industry | NA | |
| 66 | Full Time Employees | 3rd Qu.: 53400 | |
| 67 | Description | NA | |
| 68 | Total ESG Risk score | 3rd Qu.:26.00 | |
| 69 | Environment Risk Score | 3rd Qu.: 8.900 | |
| 70 | Governance Risk Score | 3rd Qu.: 7.700 | |
| 71 | Social Risk Score | 3rd Qu.:11.600 | |
| 72 | Controversy Level | NA | |
| 73 | Controversy Score | 3rd Qu.:2.000 | |
| 74 | ESG Risk Percentile | NA | |
| 75 | ESG Risk Level | NA | |
| 76 | Symbol | NA | |
| 77 | Name | NA | |
| 78 | Address | NA | |
| 79 | Sector | NA | |
| 80 | Industry | NA | |
| 81 | Full Time Employees | Max. :2100000 | |
| 82 | Description | NA | |
| 83 | Total ESG Risk score | Max. :46.00 | |
| 84 | Environment Risk Score | Max. :25.000 | |
| 85 | Governance Risk Score | Max. :15.500 | |
| 86 | Social Risk Score | Max. :21.000 | |
| 87 | Controversy Level | NA | |
| 88 | Controversy Score | Max. :5.000 | |
| 89 | ESG Risk Percentile | NA | |
| 90 | ESG Risk Level | NA | |
| 91 | Symbol | NA | |
| 92 | Name | NA | |
| 93 | Address | NA | |
| 94 | Sector | NA | |
| 95 | Industry | NA | |
| 96 | Full Time Employees | NA’s :7 | |
| 97 | Description | NA | |
| 98 | Total ESG Risk score | NA’s :70 | |
| 99 | Environment Risk Score | NA’s :70 | |
| 100 | Governance Risk Score | NA’s :70 | |
| 101 | Social Risk Score | NA’s :70 | |
| 102 | Controversy Level | NA | |
| 103 | Controversy Score | NA’s :70 | |
| 104 | ESG Risk Percentile | NA | |
| 105 | ESG Risk Level | NA |
## ESG Score Distribution Analysis
# Select the ESG score columns
esg_scores <- esg_data %>%
select(`Total ESG Risk score`, `Environment Risk Score`, `Governance Risk Score`, `Social Risk Score`)
# Define a vibrant color palette
colors <- viridis::viridis(4)
# Create histograms and box plots for each ESG score column
plots <- list()
for (col in c("Total ESG Risk score", "Environment Risk Score", "Governance Risk Score", "Social Risk Score")) {
# Histogram with KDE
p1 <- ggplot(esg_data, aes(x = .data[[col]], fill = I(colors[which(names(esg_scores) == col)]))) +
geom_histogram(bins=20, alpha=0.6) +
geom_density(alpha=0.7, color="white") +
theme_minimal() +
labs(title=paste("Distribution of", col), y="Frequency") +
theme(legend.position="none", plot.title = element_text(hjust = 0.5))
# Box plot
p2 <- ggplot(esg_data, aes(y = .data[[col]], fill = I(colors[which(names(esg_scores) == col)]))) +
geom_boxplot(notch=TRUE, outlier.shape=4, outlier.colour="coral") +
theme_minimal() +
labs(title=paste("Box Plot of", col)) +
theme(legend.position="none", plot.title = element_text(hjust = 0.5))
plots[[paste("histogram", col)]] <- p1
plots[[paste("boxplot", col)]] <- p2
}
# Reshape the data using gather() before plotting
esg_scores_long <- gather(esg_scores, key = "score_type", value = "score_value", factor_key=TRUE)
# Combined KDE plot for all ESG scores
p3 <- ggplot(esg_scores_long, aes(x = score_value, fill = score_type)) +
geom_density(alpha = 0.7) +
scale_fill_viridis(discrete = TRUE) +
labs(x = "ESG Scores", y = "Density", title = "Combined Distribution of ESG Scores") +
theme_minimal() +
theme(legend.position = "right", plot.title = element_text(hjust = 0.5))
plots[["combined_kde"]] <- p3
# Display all the plots
plots
## $`histogram Total ESG Risk score`
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
##
## $`boxplot Total ESG Risk score`
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##
## $`histogram Environment Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
##
## $`boxplot Environment Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##
## $`histogram Governance Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
##
## $`boxplot Governance Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##
## $`histogram Social Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range (`stat_bin()`).
## Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
##
## $`boxplot Social Risk Score`
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##
## $combined_kde
## Warning: Removed 280 rows containing non-finite outside the scale range
## (`stat_density()`).
# Create a copy of the original dataset without missing values
esg_data_no_na <- na.omit(esg_data)
# Identify and remove outliers
esg_data_no_outliers <- esg_data_no_na
for (col in c("Total ESG Risk score", "Environment Risk Score", "Governance Risk Score", "Social Risk Score")) {
Q1 <- quantile(esg_data_no_na[[col]], 0.25, na.rm = TRUE)
Q3 <- quantile(esg_data_no_na[[col]], 0.75, na.rm = TRUE)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Remove outliers from the copy of the dataset
esg_data_no_outliers <- esg_data_no_outliers %>%
filter(.data[[col]] >= lower_bound & .data[[col]] <= upper_bound)
}
# Group by 'ESG Risk Level' and calculate the mean 'Total ESG Risk Score' for each level
grouped_data <- esg_data_no_outliers %>%
group_by(`ESG Risk Level`) %>%
summarise(Mean_Total_ESG_Risk_Score = mean(`Total ESG Risk score`, na.rm = TRUE)) %>%
ungroup()
# Define the ESG risk levels
risk_levels <- c('Negligible', 'Low', 'Medium', 'High', 'Severe')
# Reorder the factor levels of 'ESG Risk Level' based on the defined risk levels
grouped_data <- grouped_data %>%
mutate(`ESG Risk Level` = factor(`ESG Risk Level`, levels = risk_levels))
# Create a bar chart for 'Total ESG Risk Score' for each risk level
ggplot(grouped_data, aes(x = `ESG Risk Level`, y = Mean_Total_ESG_Risk_Score, fill = `ESG Risk Level`)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Set3") + # Using a Color Brewer palette
labs(x = 'ESG Risk Level', y = 'Mean Total ESG Risk Score', title = 'Total ESG Risk Score by ESG Risk Level') +
theme_light(base_size = 14) + # Using a light theme with base font size set
theme(legend.position = "none") + # Remove legend
geom_text(aes(label = sprintf("%.2f", Mean_Total_ESG_Risk_Score)), vjust = -0.5, size = 3.5) # Add data labels
# Convert the ESG Risk Level column to a factor if it's not already
esg_data$`ESG Risk Level` <- factor(esg_data$`ESG Risk Level`)
# Calculate the count of companies at each ESG risk level
risk_level_counts <- table(esg_data$`ESG Risk Level`)
# Convert the table to a dataframe for plotting
risk_level_df <- data.frame(Risk_Level = names(risk_level_counts), Count = as.numeric(risk_level_counts))
# Create a donut chart
ggplot(risk_level_df, aes(x = 2, y = Count, fill = Risk_Level)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
theme_void() +
theme(legend.position = "bottom") +
labs(fill = "ESG Risk Level") +
# Use geom_text to add labels to each slice of the donut
geom_text(aes(label = Count), position = position_stack(vjust = 0.5))
## Sector & Industry Analysis
# Sector & Industry Analysis
# Count the number of unique sectors
sectors_count <- length(unique(esg_data$Sector))
cat("Number of Unique Sectors:", sectors_count, "\n")
## Number of Unique Sectors: 12
# Count the number of unique industries
industries_count <- length(unique(esg_data$Industry))
cat("Number of Unique Industries:", industries_count, "\n")
## Number of Unique Industries: 115
# Get unique sectors and their frequencies
sector_counts <- esg_data %>%
count(Sector, name = "Frequency") %>%
arrange(desc(Frequency))
# Create a bar chart using ggplot2
ggplot(sector_counts, aes(x = reorder(Sector, Frequency), y = Frequency, fill = Frequency)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates for horizontal bar chart
scale_fill_viridis_c(option = "viridis", direction = -1) + # Use viridis color scale
labs(title = 'S&P 500 Sectors', x = 'Sector', y = 'Frequency Count') +
theme_minimal() + # Use a minimal theme
theme(legend.position = "none") + # Remove legend
geom_text(aes(label = Frequency), hjust = -0.1) # Add text labels
# Calculate sector-wise average ESG scores
sector_avg_scores <- esg_data %>%
group_by(Sector) %>%
summarise(
Average_Environment_Risk_Score = mean(`Environment Risk Score`, na.rm = TRUE),
Average_Governance_Risk_Score = mean(`Governance Risk Score`, na.rm = TRUE),
Average_Social_Risk_Score = mean(`Social Risk Score`, na.rm = TRUE)
) %>%
ungroup()
# Calculate industry-wise average ESG scores
industry_avg_scores <- esg_data %>%
group_by(Industry) %>%
summarise(
Average_Environment_Risk_Score = mean(`Environment Risk Score`, na.rm = TRUE),
Average_Governance_Risk_Score = mean(`Governance Risk Score`, na.rm = TRUE),
Average_Social_Risk_Score = mean(`Social Risk Score`, na.rm = TRUE)
) %>%
ungroup()
# Define custom colors
custom_colors <- c("Environment Risk Score" = "#008080",
"Governance Risk Score" = "#FF00FF",
"Social Risk Score" = "#FFD700")
# Create a visual plot for sector-wise average ESG scores with custom colors
ggplot(sector_avg_scores, aes(x = Sector)) +
geom_bar(aes(y = Average_Environment_Risk_Score, fill = "Environment Risk Score"), stat = "identity") +
geom_bar(aes(y = Average_Governance_Risk_Score, fill = "Governance Risk Score"), stat = "identity") +
geom_bar(aes(y = Average_Social_Risk_Score, fill = "Social Risk Score"), stat = "identity") +
scale_fill_manual(values = custom_colors) +
labs(title = 'Sector-wise Average ESG Scores', x = 'Sector', y = 'Average Score') +
theme_minimal() +
theme(legend.position = "bottom")
# Create a visual plot for industry-wise average ESG scores with custom colors
ggplot(industry_avg_scores, aes(x = Industry)) +
geom_bar(aes(y = Average_Environment_Risk_Score, fill = "Environment Risk Score"), stat = "identity") +
geom_bar(aes(y = Average_Governance_Risk_Score, fill = "Governance Risk Score"), stat = "identity") +
geom_bar(aes(y = Average_Social_Risk_Score, fill = "Social Risk Score"), stat = "identity") +
scale_fill_manual(values = custom_colors) +
labs(title = 'Industry-wise Average ESG Scores', x = 'Industry', y = 'Average Score') +
theme_minimal() +
theme(legend.position = "bottom")
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_bar()`).
## Removed 6 rows containing missing values or values outside the scale range
## (`geom_bar()`).
## Removed 6 rows containing missing values or values outside the scale range
## (`geom_bar()`).
# Create a pair plot with a vibrant color palette
p <- ggpairs(esg_scores,
upper = list(continuous = wrap("points", color = I("#E0B0FF"), size = 3)),
lower = list(continuous = wrap("points", color = I("#CBC3E3"), size = 3)),
diag = list(continuous = wrap("densityDiag", fill = I("#7F00FF"))),
aes(color = Symbol)
)
# Customize the color palette
p <- p + scale_color_brewer(palette = "Set3")
# Add title and adjust theme
p <- p + ggtitle('Pairwise Scatter Plots of ESG Scores') +
theme(
plot.title = element_text(hjust = 0.5), # Center the plot title
strip.background = element_blank(), # Remove background of facet labels
strip.text = element_text(face = "bold") # Bold facet labels
)
# Print the plot
print(p)
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning: Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning: Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning: Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_density()`).
## Cluster Analysis
# Prepare data for clustering
esg_for_clustering <- esg_data %>%
select(`Environment Risk Score`, `Governance Risk Score`, `Social Risk Score`) %>%
na.omit() # Remove NA values
# Perform k-means clustering
set.seed(123) # Set seed for reproducibility
esg_clusters <- kmeans(scale(esg_for_clustering), centers = 5)
# Create a cluster assignment column in the original data
# Initialize the column with NA to account for the omitted rows
esg_data$Cluster <- NA
# Assign the cluster results to the rows that were not omitted
esg_data$Cluster[!is.na(esg_data$`Environment Risk Score`) &
!is.na(esg_data$`Governance Risk Score`) &
!is.na(esg_data$`Social Risk Score`)] <- esg_clusters$cluster
# Display the data with clusters
print(esg_data)
## # A tibble: 503 × 16
## Symbol Name Address Sector Industry `Full Time Employees` Description
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 A Agilent Tec… "5301 … Healt… Diagnos… 18000 Agilent Te…
## 2 AAL American Ai… "1 Sky… Indus… Airlines 132500 American A…
## 3 AAP Advance Aut… "4200 … Consu… Special… 40000 Advance Au…
## 4 AAPL Apple Inc "One A… Techn… Consume… 164000 Apple Inc.…
## 5 ABBV Abbvie Inc "1 Nor… Healt… Drug Ma… 50000 AbbVie Inc…
## 6 ABC Amerisource… "1 Wes… Healt… Medical… 46000 Amerisourc…
## 7 ABT Abbott Labo… "100 A… Healt… Medical… 115000 Abbott Lab…
## 8 ACGL Arch Capita… "Water… Finan… Insuran… 5800 Arch Capit…
## 9 ACN Accenture P… "1 Gra… Techn… Informa… 732000 Accenture …
## 10 ADBE Adobe Inc "345 P… Techn… Softwar… 29239 Adobe Inc.…
## # ℹ 493 more rows
## # ℹ 9 more variables: `Total ESG Risk score` <dbl>,
## # `Environment Risk Score` <dbl>, `Governance Risk Score` <dbl>,
## # `Social Risk Score` <dbl>, `Controversy Level` <chr>,
## # `Controversy Score` <dbl>, `ESG Risk Percentile` <chr>,
## # `ESG Risk Level` <fct>, Cluster <int>
# Create scatter plots for each ESG score against the number of full-time employees
# Total ESG Risk score vs Full Time Employees
ggplot(esg_data, aes(x = `Full Time Employees`, y = `Total ESG Risk score`)) +
geom_point(aes(color = `Total ESG Risk score`), alpha = 0.6) +
scale_color_gradient(low = "blue", high = "red") +
labs(title = "Total ESG Risk score vs Full Time Employees",
x = "Number of Full Time Employees",
y = "Total ESG Risk score") +
theme_minimal()
## Warning: Removed 76 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Repeat the plot for Environment, Governance, and Social Risk Scores
# Environment Risk Score vs Full Time Employees
ggplot(esg_data, aes(x = `Full Time Employees`, y = `Environment Risk Score`)) +
geom_point(aes(color = `Environment Risk Score`), alpha = 0.6) +
scale_color_gradient(low = "blue", high = "red") +
labs(title = "Environment Risk Score vs Full Time Employees",
x = "Number of Full Time Employees",
y = "Environment Risk Score") +
theme_minimal()
## Warning: Removed 76 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Governance Risk Score vs Full Time Employees
ggplot(esg_data, aes(x = `Full Time Employees`, y = `Governance Risk Score`)) +
geom_point(aes(color = `Governance Risk Score`), alpha = 0.6) +
scale_color_gradient(low = "blue", high = "red") +
labs(title = "Governance Risk Score vs Full Time Employees",
x = "Number of Full Time Employees",
y = "Governance Risk Score") +
theme_minimal()
## Warning: Removed 76 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Social Risk Score vs Full Time Employees
ggplot(esg_data, aes(x = `Full Time Employees`, y = `Social Risk Score`)) +
geom_point(aes(color = `Social Risk Score`), alpha = 0.6) +
scale_color_gradient(low = "blue", high = "red") +
labs(title = "Social Risk Score vs Full Time Employees",
x = "Number of Full Time Employees",
y = "Social Risk Score") +
theme_minimal()
## Warning: Removed 76 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Create a violin plot to examine the distribution of Total ESG Risk score across different controversy levels
ggplot(esg_data, aes(x = `Controversy Level`, y = `Total ESG Risk score`, fill = `Controversy Level`)) +
geom_violin(trim = FALSE) + # Use trim=FALSE to show the full distribution
scale_fill_brewer(palette = "Set3") + # Vibrant color palette
labs(title = "Violin Plot of Total ESG Risk Scores by Controversy Level",
x = "Controversy Level",
y = "Total ESG Risk Score") +
theme_minimal() +
theme(legend.position = "none") # Hide the legend if not needed
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
## Warning: Groups with fewer than two datapoints have been dropped.
## ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
# Select relevant columns for correlation analysis
esg_subset <- esg_data[, c("Total ESG Risk score", "Environment Risk Score", "Governance Risk Score", "Social Risk Score")]
# Compute the correlation matrix
cor_matrix <- cor(esg_subset)
# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)
# Create a vibrant heatmap
ggplot(data = cor_melted, aes(x = Var1, y = Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradientn(colors = brewer.pal(9, "Spectral")) +
theme_minimal() +
labs(title = "Correlation Heatmap of ESG Scores",
x = "ESG Scores",
y = "ESG Scores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Run a linear regression model
model <- lm(`Total ESG Risk score` ~ `Environment Risk Score` + `Governance Risk Score` + `Social Risk Score`, data = esg_data)
# Summary of the model to see coefficients and statistics
summary(model)
##
## Call:
## lm(formula = `Total ESG Risk score` ~ `Environment Risk Score` +
## `Governance Risk Score` + `Social Risk Score`, data = esg_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.11381 -0.22486 -0.03485 0.20012 1.18280
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.212983 0.072291 2.946 0.00339 **
## `Environment Risk Score` 1.001930 0.003710 270.068 < 2e-16 ***
## `Governance Risk Score` 0.964371 0.009754 98.873 < 2e-16 ***
## `Social Risk Score` 1.004262 0.005565 180.453 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4034 on 429 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.9969, Adjusted R-squared: 0.9969
## F-statistic: 4.655e+04 on 3 and 429 DF, p-value: < 2.2e-16
# To check for model assumptions, create diagnostic plots
par(mfrow = c(2, 2))
# Scatter plot of residuals vs. fitted values
plot(model, which = 1, col = "#FF5733", pch = 16)
# Normal Q-Q plot
plot(model, which = 2, col = "#4CBB17", pch = 16)
# Scale-location plot (square root of standardized residuals vs. fitted values)
plot(model, which = 3, col = "#0074D9", pch = 16)
# Residuals vs. leverage plot
plot(model, which = 5, col = "#FFC300", pch = 16)
In the landscape of ESG risk assessment, it is observed that the Energy, Utility, and Materials sectors stand out as the leading trio, each bearing significant ESG risks. These sectors are intrinsically linked to the extraction, production, and consumption of raw materials, placing them at the forefront of environmental scrutiny and sustainability challenges. Their operations are resource-intensive, often directly impacting ecological systems, which heightens their exposure to ESG-related risks. Contrastingly, the Real Estate sector operates on a different paradigm. It is less directly tied to the immediate use of raw materials, as its primary transactions—renting and purchasing existing properties—do not inherently require new construction. This sector offers an alternative that circumvents the initial phase of raw material consumption, distinguishing it from those industries where the procurement and processing of raw goods are fundamental to their business models. The Real Estate sector’s ESG risk profile is thus shaped by different factors, such as property management practices, tenant relations, and investment in sustainable buildings, rather than the direct use of natural resources.
A nuanced landscape emerges where company size, as measured by the number of full-time employees, does not consistently predict ESG risk scores. For instance, industry giants such as Apple Inc, boasting a workforce of 164,000, demonstrate a relatively modest ESG risk score of 17. This indicates that despite their vast operations, they manage to maintain lower levels of ESG-related risks. On the contrary, smaller entities like Albemarle Corp, with a workforce of 7,400, exhibit a higher ESG risk score of 29, suggesting that smaller scale does not inherently equate to lower ESG risk. The spectrum of Controversy Levels further adds complexity to the ESG narrative. These levels span from ‘None’ to ‘Significant’, painting a picture of the varied challenges companies face that could potentially tarnish their reputations and affect their ESG standings. Notably, companies such as Apple Inc and Abbvie Inc find themselves in the ‘Significant’ controversy bracket, hinting at past or ongoing issues that have caught public attention and may influence their ESG ratings. Such controversies could stem from a range of issues, including environmental incidents, social injustices, or governance failures, and they underscore the importance of robust risk management and transparent corporate practices. These insights reveal that ESG performance is influenced by a complex interplay of factors beyond just company size. They underscore the importance for investors and stakeholders to look beyond the surface and consider a holistic view of a company’s operations, policies, and the broader impact of its actions when evaluating ESG performance. The data suggests that effective ESG management can occur at any scale and that vigilance is key in maintaining a positive public image and strong ESG profile. Also, there is no correlation between environment, social and governance scores. Regression analysis is also of no use in this case.
Diversify Across Sectors: Balance portfolios by investing in a mix of sectors with varying ESG risk profiles, including those with high raw material dependency and those like Real Estate with different risk factors.
Assess Beyond Size: Evaluate ESG risks without preconceived notions about company size. Both large and small companies can exhibit a range of ESG risk scores, and each should be assessed on its individual merits and practices.
Monitor Controversy Levels: Keep a close watch on controversy levels as they can significantly impact a company’s ESG rating and public perception. Engage in active dialogue with companies that have higher controversy levels to understand their risk mitigation strategies.
Holistic ESG Evaluation: Adopt a comprehensive approach to ESG evaluation that considers the interplay of environmental, social, and governance factors, rather than relying solely on regression analysis or individual score correlations.
Sustainable Real Estate Practices: For investments in the Real Estate sector, prioritize companies that demonstrate sustainable property management, strong tenant relations, and investments in eco-friendly buildings.
Risk Management: Implement robust risk management practices to navigate the complexities of ESG risks, especially for sectors heavily reliant on raw materials.
Transparent Corporate Practices: Advocate for transparency in corporate practices, as this can mitigate the risks associated with controversies and improve a company’s ESG standing.
Active Engagement: Engage proactively with companies to encourage improvements in ESG practices, regardless of their size or sector.
Continuous Education: Stay informed about the latest developments in ESG criteria and adjust investment strategies accordingly to reflect the evolving landscape.
Positive Impact Investments: Consider the broader impact of investments and support companies that not only have strong ESG scores but also contribute positively to societal and environmental goals.
These recommendations aim to guide ESG-focused investment decisions, ensuring that portfolios are not only financially sound but also aligned with sustainable and ethical practices.