This report follows the 6-step Google Data Analytics framework: Ask, Prepare, Process, Analyze, Share, and Act.
The primary business task is to analyze our company’s historical carbon emissions data to assess the effectiveness of our sustainability initiatives. We aim to identify key emission trends, understand our environmental impact relative to business growth, and communicate these findings to internal and external stakeholders.
Stakeholders (investors, employees, customers, and leadership) want to know:
We are using four CSV datasets provided by the company, reflecting its environmental and financial performance:
greenhouse_gas_emissions.csv: Detailed records of
corporate and product life cycle emissions by year, category, scope, and
source.carbon_footprint_by_product.csv: Life cycle carbon
footprint for individual products by their release year.normalizing_factors.csv: Key business metrics (Revenue,
Market Cap, Employees) by fiscal year. This is crucial for calculating
intensity.data_dictionary.csv: A metadata file explaining the
fields, categories, and definitions used in the datasets.This data is considered reliable as it is based on internal reporting typically used for public corporate sustainability reports (CSR). The analysis is conducted using R and RStudio, with a final report generated using R Markdown.
First, we load the four datasets. We must be mindful of different delimiters (commas vs. semicolons).
# Load the main datasets (comma-separated)
ghg_emissions <- read_csv("greenhouse_gas_emissions.csv")
product_footprint <- read_csv("carbon_footprint_by_product.csv")
factors <- read_csv("normalizing_factors.csv")
# Load the data dictionary (semicolon-separated)
data_dict <- read_delim("data_dictionary.csv", delim = ";")
# Show the first few rows of the main emissions data
head(ghg_emissions)
## # A tibble: 6 × 6
## `Fiscal Year` Category Type Scope Description Emissions
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 2022 Corporate emissions Gross emissions Scope… Natural ga… 39700
## 2 2022 Corporate emissions Gross emissions Scope… Fleet vehi… 12600
## 3 2022 Corporate emissions Gross emissions Scope… Other (R&D… 2900
## 4 2022 Corporate emissions Gross emissions Scope… Electricity 0
## 5 2022 Corporate emissions Gross emissions Scope… Steam, hea… 3000
## 6 2022 Corporate emissions Gross emissions Scope… Business t… 113500
The ghg_emissions dataset has some blank values in the
Emissions column. For accurate analysis, we must convert
these to 0. We will also ensure all column names are clean
and R-friendly.
# Clean column names for easier use (e.g., remove spaces)
ghg_emissions <- ghg_emissions %>%
janitor::clean_names()
product_footprint <- product_footprint %>%
janitor::clean_names()
factors <- factors %>%
janitor::clean_names()
# Check for NAs in the 'emissions' column and replace them with 0
# Blanks are often read as NA, and for summing emissions, NA should be 0.
ghg_emissions <- ghg_emissions %>%
mutate(emissions = as.numeric(emissions), # Ensure it's numeric
emissions = replace_na(emissions, 0)) # Replace NA with 0
# Let's check the structure of our cleaned main dataset
str(ghg_emissions)
## tibble [136 × 6] (S3: tbl_df/tbl/data.frame)
## $ fiscal_year: num [1:136] 2022 2022 2022 2022 2022 ...
## $ category : chr [1:136] "Corporate emissions" "Corporate emissions" "Corporate emissions" "Corporate emissions" ...
## $ type : chr [1:136] "Gross emissions" "Gross emissions" "Gross emissions" "Gross emissions" ...
## $ scope : chr [1:136] "Scope 1" "Scope 1" "Scope 1" "Scope 2 (market-based)" ...
## $ description: chr [1:136] "Natural gas, diesel, propane" "Fleet vehicles" "Other (R&D processes & refrigerant leaks)" "Electricity" ...
## $ emissions : num [1:136] 39700 12600 2900 0 3000 ...
The data is now loaded and cleaned. fiscal_year is
numeric, emissions is numeric, and categorical variables
(category, type, scope) are
characters, which is perfect for ggplot2.
In this section, we will perform the calculations needed to answer our stakeholder questions. The visualizations for these findings will be presented in the Share section.
We need to aggregate total emissions by year and type (Gross vs. Removals) to calculate the net footprint.
# Group by year and type, then sum emissions
total_emissions_summary <- ghg_emissions %>%
group_by(fiscal_year, type) %>%
summarise(total_emissions = sum(emissions)) %>%
ungroup()
# Pivot the data to have 'Gross emissions' and 'Carbon removals' as columns
total_emissions_wide <- total_emissions_summary %>%
pivot_wider(names_from = type, values_from = total_emissions) %>%
janitor::clean_names() # Clean new column names
# Calculate Net Emissions
total_emissions_wide <- total_emissions_wide %>%
mutate(net_emissions = gross_emissions + carbon_removals) # Removals are already negative
head(total_emissions_wide)
## # A tibble: 6 × 4
## fiscal_year carbon_removals gross_emissions net_emissions
## <dbl> <dbl> <dbl> <dbl>
## 1 2015 0 38383470 38383470
## 2 2016 0 29579370 29579370
## 3 2017 0 27416300 27416300
## 4 2018 0 25136170 25136170
## 5 2019 0 25033730 25033730
## 6 2020 -70000 22594440 22524440
Next, we analyze the composition of our gross emissions by GHG Protocol Scope.
# Filter for gross emissions and group by year and scope
scope_summary <- ghg_emissions %>%
filter(type == "Gross emissions" & scope != "") %>% # Filter out non-scoped emissions
group_by(fiscal_year, scope) %>%
summarise(total_emissions = sum(emissions)) %>%
ungroup()
# Calculate the percentage of total for each scope
scope_summary <- scope_summary %>%
group_by(fiscal_year) %>%
mutate(percent_of_total = total_emissions / sum(total_emissions)) %>%
ungroup()
head(scope_summary)
## # A tibble: 6 × 4
## fiscal_year scope total_emissions percent_of_total
## <dbl> <chr> <dbl> <dbl>
## 1 2015 Scope 1 28100 0.000732
## 2 2015 Scope 2 (market-based) 42460 0.00111
## 3 2015 Scope 3 38312910 0.998
## 4 2016 Scope 1 34370 0.00116
## 5 2016 Scope 2 (market-based) 41000 0.00139
## 6 2016 Scope 3 29504000 0.997
This is the most critical analysis for our stakeholders. We will join
our total gross emissions with our financial/HR data
(factors) to create intensity metrics.
# First, get total gross emissions per year
total_gross_by_year <- ghg_emissions %>%
filter(type == "Gross emissions") %>%
group_by(fiscal_year) %>%
summarise(total_gross_emissions = sum(emissions)) %>%
ungroup()
# Join with the normalizing factors
intensity_data <- total_gross_by_year %>%
left_join(factors, by = "fiscal_year")
# Calculate intensity metrics
# Revenue is in millions USD, so we divide by (revenue * 1,000,000)
# This gives us metric tons of CO2e per dollar of revenue
intensity_data <- intensity_data %>%
mutate(
# Emissions (tons) per $1M USD Revenue
intensity_per_mil_revenue = total_gross_emissions / revenue,
# Emissions (tons) per employee
intensity_per_employee = total_gross_emissions / employees,
# Revenue (Millions) per ton of CO2e
revenue_per_ton_co2e = revenue / total_gross_emissions
)
head(intensity_data)
## # A tibble: 6 × 8
## fiscal_year total_gross_emissions revenue market_capitalization employees
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 38383470 233715 580 110000
## 2 2016 29579370 215639 600 116000
## 3 2017 27416300 229234 740 123000
## 4 2018 25136170 265595 830 132000
## 5 2019 25033730 260174 1090 137000
## 6 2020 22594440 274515 1720 147000
## # ℹ 3 more variables: intensity_per_mil_revenue <dbl>,
## # intensity_per_employee <dbl>, revenue_per_ton_co2e <dbl>
Finally, we’ll look at the product_footprint data. This
data is already clean and ready for plotting.
Based on the analysis, here are the top-level findings and actionable recommendations for our stakeholders:
Next Steps: A deeper dive into the
Scope 3 “Manufacturing” category would be beneficial to
identify which specific commodities or suppliers are the biggest
drivers, allowing for more targeted interventions.