The Google Data Analytics Framework

This report follows the 6-step Google Data Analytics framework: Ask, Prepare, Process, Analyze, Share, and Act.


1. Ask

Business Task

The primary business task is to analyze our company’s historical carbon emissions data to assess the effectiveness of our sustainability initiatives. We aim to identify key emission trends, understand our environmental impact relative to business growth, and communicate these findings to internal and external stakeholders.

Key Stakeholder Questions

Stakeholders (investors, employees, customers, and leadership) want to know:

  1. Overall Trend: How has our total carbon footprint (Gross, Removals, and Net) changed over the last several years?
  2. Emission Sources: What are the primary sources of our emissions (i.e., Scopes 1, 2, and 3)? Is our focus on the right areas?
  3. Decoupling: Are we successfully decoupling our carbon footprint from our company’s growth? (i.e., are we growing our business without growing our emissions?)
  4. Product Impact: Are our efforts to create more sustainable products working? How has the carbon footprint of our flagship products changed?

2. Prepare

Data Sources

We are using four CSV datasets provided by the company, reflecting its environmental and financial performance:

  1. greenhouse_gas_emissions.csv: Detailed records of corporate and product life cycle emissions by year, category, scope, and source.
  2. carbon_footprint_by_product.csv: Life cycle carbon footprint for individual products by their release year.
  3. normalizing_factors.csv: Key business metrics (Revenue, Market Cap, Employees) by fiscal year. This is crucial for calculating intensity.
  4. data_dictionary.csv: A metadata file explaining the fields, categories, and definitions used in the datasets.

Data Credibility & Tools

This data is considered reliable as it is based on internal reporting typically used for public corporate sustainability reports (CSR). The analysis is conducted using R and RStudio, with a final report generated using R Markdown.


3. Process

Loading Data

First, we load the four datasets. We must be mindful of different delimiters (commas vs. semicolons).

# Load the main datasets (comma-separated)
ghg_emissions <- read_csv("greenhouse_gas_emissions.csv")
product_footprint <- read_csv("carbon_footprint_by_product.csv")
factors <- read_csv("normalizing_factors.csv")

# Load the data dictionary (semicolon-separated)
data_dict <- read_delim("data_dictionary.csv", delim = ";")

# Show the first few rows of the main emissions data
head(ghg_emissions)
## # A tibble: 6 × 6
##   `Fiscal Year` Category            Type            Scope  Description Emissions
##           <dbl> <chr>               <chr>           <chr>  <chr>           <dbl>
## 1          2022 Corporate emissions Gross emissions Scope… Natural ga…     39700
## 2          2022 Corporate emissions Gross emissions Scope… Fleet vehi…     12600
## 3          2022 Corporate emissions Gross emissions Scope… Other (R&D…      2900
## 4          2022 Corporate emissions Gross emissions Scope… Electricity         0
## 5          2022 Corporate emissions Gross emissions Scope… Steam, hea…      3000
## 6          2022 Corporate emissions Gross emissions Scope… Business t…    113500

Cleaning Data

The ghg_emissions dataset has some blank values in the Emissions column. For accurate analysis, we must convert these to 0. We will also ensure all column names are clean and R-friendly.

# Clean column names for easier use (e.g., remove spaces)
ghg_emissions <- ghg_emissions %>%
  janitor::clean_names()

product_footprint <- product_footprint %>%
  janitor::clean_names()

factors <- factors %>%
  janitor::clean_names()

# Check for NAs in the 'emissions' column and replace them with 0
# Blanks are often read as NA, and for summing emissions, NA should be 0.
ghg_emissions <- ghg_emissions %>%
  mutate(emissions = as.numeric(emissions), # Ensure it's numeric
         emissions = replace_na(emissions, 0)) # Replace NA with 0

# Let's check the structure of our cleaned main dataset
str(ghg_emissions)
## tibble [136 × 6] (S3: tbl_df/tbl/data.frame)
##  $ fiscal_year: num [1:136] 2022 2022 2022 2022 2022 ...
##  $ category   : chr [1:136] "Corporate emissions" "Corporate emissions" "Corporate emissions" "Corporate emissions" ...
##  $ type       : chr [1:136] "Gross emissions" "Gross emissions" "Gross emissions" "Gross emissions" ...
##  $ scope      : chr [1:136] "Scope 1" "Scope 1" "Scope 1" "Scope 2 (market-based)" ...
##  $ description: chr [1:136] "Natural gas, diesel, propane" "Fleet vehicles" "Other (R&D processes & refrigerant leaks)" "Electricity" ...
##  $ emissions  : num [1:136] 39700 12600 2900 0 3000 ...

The data is now loaded and cleaned. fiscal_year is numeric, emissions is numeric, and categorical variables (category, type, scope) are characters, which is perfect for ggplot2.


4. Analyze

In this section, we will perform the calculations needed to answer our stakeholder questions. The visualizations for these findings will be presented in the Share section.

1. Total Footprint Trend

We need to aggregate total emissions by year and type (Gross vs. Removals) to calculate the net footprint.

# Group by year and type, then sum emissions
total_emissions_summary <- ghg_emissions %>%
  group_by(fiscal_year, type) %>%
  summarise(total_emissions = sum(emissions)) %>%
  ungroup()

# Pivot the data to have 'Gross emissions' and 'Carbon removals' as columns
total_emissions_wide <- total_emissions_summary %>%
  pivot_wider(names_from = type, values_from = total_emissions) %>%
  janitor::clean_names() # Clean new column names

# Calculate Net Emissions
total_emissions_wide <- total_emissions_wide %>%
  mutate(net_emissions = gross_emissions + carbon_removals) # Removals are already negative

head(total_emissions_wide)
## # A tibble: 6 × 4
##   fiscal_year carbon_removals gross_emissions net_emissions
##         <dbl>           <dbl>           <dbl>         <dbl>
## 1        2015               0        38383470      38383470
## 2        2016               0        29579370      29579370
## 3        2017               0        27416300      27416300
## 4        2018               0        25136170      25136170
## 5        2019               0        25033730      25033730
## 6        2020          -70000        22594440      22524440

2. Emissions by Scope

Next, we analyze the composition of our gross emissions by GHG Protocol Scope.

# Filter for gross emissions and group by year and scope
scope_summary <- ghg_emissions %>%
  filter(type == "Gross emissions" & scope != "") %>% # Filter out non-scoped emissions
  group_by(fiscal_year, scope) %>%
  summarise(total_emissions = sum(emissions)) %>%
  ungroup()

# Calculate the percentage of total for each scope
scope_summary <- scope_summary %>%
  group_by(fiscal_year) %>%
  mutate(percent_of_total = total_emissions / sum(total_emissions)) %>%
  ungroup()

head(scope_summary)
## # A tibble: 6 × 4
##   fiscal_year scope                  total_emissions percent_of_total
##         <dbl> <chr>                            <dbl>            <dbl>
## 1        2015 Scope 1                          28100         0.000732
## 2        2015 Scope 2 (market-based)           42460         0.00111 
## 3        2015 Scope 3                       38312910         0.998   
## 4        2016 Scope 1                          34370         0.00116 
## 5        2016 Scope 2 (market-based)           41000         0.00139 
## 6        2016 Scope 3                       29504000         0.997

3. Decoupling & Emission Intensity

This is the most critical analysis for our stakeholders. We will join our total gross emissions with our financial/HR data (factors) to create intensity metrics.

# First, get total gross emissions per year
total_gross_by_year <- ghg_emissions %>%
  filter(type == "Gross emissions") %>%
  group_by(fiscal_year) %>%
  summarise(total_gross_emissions = sum(emissions)) %>%
  ungroup()

# Join with the normalizing factors
intensity_data <- total_gross_by_year %>%
  left_join(factors, by = "fiscal_year")

# Calculate intensity metrics
# Revenue is in millions USD, so we divide by (revenue * 1,000,000)
# This gives us metric tons of CO2e per dollar of revenue
intensity_data <- intensity_data %>%
  mutate(
    # Emissions (tons) per $1M USD Revenue
    intensity_per_mil_revenue = total_gross_emissions / revenue,
    # Emissions (tons) per employee
    intensity_per_employee = total_gross_emissions / employees,
    # Revenue (Millions) per ton of CO2e
    revenue_per_ton_co2e = revenue / total_gross_emissions
  )

head(intensity_data)
## # A tibble: 6 × 8
##   fiscal_year total_gross_emissions revenue market_capitalization employees
##         <dbl>                 <dbl>   <dbl>                 <dbl>     <dbl>
## 1        2015              38383470  233715                   580    110000
## 2        2016              29579370  215639                   600    116000
## 3        2017              27416300  229234                   740    123000
## 4        2018              25136170  265595                   830    132000
## 5        2019              25033730  260174                  1090    137000
## 6        2020              22594440  274515                  1720    147000
## # ℹ 3 more variables: intensity_per_mil_revenue <dbl>,
## #   intensity_per_employee <dbl>, revenue_per_ton_co2e <dbl>

4. Product Footprint

Finally, we’ll look at the product_footprint data. This data is already clean and ready for plotting.


5. Share

This section presents the visualizations from our analysis to communicate our findings clearly.

Visualization 1: Total Carbon Footprint (2015-2022)

This chart shows the company’s total (net) carbon footprint over time, detailing the contribution of gross emissions and carbon removals.

ggplot(total_emissions_wide, aes(x = fiscal_year)) +
  geom_col(aes(y = gross_emissions, fill = "Gross Emissions")) +
  geom_col(aes(y = carbon_removals, fill = "Carbon Removals")) +
  geom_line(aes(y = net_emissions, color = "Net Emissions"), size = 1.5, group = 1) +
  geom_point(aes(y = net_emissions, color = "Net Emissions"), size = 3) +
  scale_y_continuous(labels = scales::comma, name = "Emissions (metric tons CO2e)") +
  scale_x_continuous(breaks = unique(total_emissions_wide$fiscal_year)) +
  scale_fill_manual(name = "Flow", values = c("Gross Emissions" = "#d95f02", "Carbon Removals" = "#1b9e77")) +
  scale_color_manual(name = "Total", values = c("Net Emissions" = "#333333")) +
  labs(
    title = "Total Carbon Footprint Has Trended Downward Since 2015",
    subtitle = "Net emissions show a significant decline, accelerated by carbon removals.",
    x = "Fiscal Year"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "top")

Finding: Our company’s net carbon footprint has significantly decreased since 2015. While gross emissions have fallen, the introduction of carbon removals has dramatically accelerated our progress toward net zero.

Visualization 2: Gross Emissions by Scope (2015-2022)

This chart breaks down our gross emissions by Scopes 1, 2, and 3 to show where our impact truly lies.

ggplot(scope_summary, aes(x = fiscal_year, y = total_emissions, fill = scope)) +
  geom_area(alpha = 0.8, position = "fill") +
  scale_y_continuous(labels = scales::percent, name = "Percent of Total Emissions") +
  scale_x_continuous(breaks = unique(scope_summary$fiscal_year)) +
  scale_fill_brewer(palette = "Set2", name = "GHG Scope") +
  labs(
    title = "Scope 3 Dominates Our Carbon Footprint",
    subtitle = "Our direct (Scope 1) and electricity-related (Scope 2) emissions are minimal.",
    x = "Fiscal Year"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "top")

Finding: An overwhelming majority of our emissions (over 98%) are Scope 3. This confirms that our sustainability strategy must focus on our supply chain, manufacturing, product use, and transportation, rather than just our own corporate facilities.

Visualization 3: Emission Intensity vs. Growth (Decoupling)

This is the key stakeholder chart. It answers: “Are we growing responsibly?” We plot our total revenue and employee count against our carbon intensity metrics.

# We need to use a dual-axis plot, which can be tricky.
# A simpler, clearer way is to plot intensity metrics directly.

# Plot 1: Intensity per $1M Revenue
p1 <- ggplot(intensity_data, aes(x = fiscal_year, y = intensity_per_mil_revenue)) +
  geom_line(color = "#7570b3", size = 1.5) +
  geom_point(color = "#7570b3", size = 3) +
  scale_y_continuous(labels = scales::comma) +
  scale_x_continuous(breaks = unique(intensity_data$fiscal_year)) +
  labs(
    title = "Carbon Intensity per $1M Revenue",
    subtitle = "Emissions per dollar of revenue are decreasing.",
    x = "Fiscal Year",
    y = "Tons CO2e / $1M Revenue"
  ) +
  theme_minimal(base_size = 12)

# Plot 2: Intensity per Employee
p2 <- ggplot(intensity_data, aes(x = fiscal_year, y = intensity_per_employee)) +
  geom_line(color = "#e7298a", size = 1.5) +
  geom_point(color = "#e7298a", size = 3) +
  scale_y_continuous(labels = scales::comma) +
  scale_x_continuous(breaks = unique(intensity_data$fiscal_year)) +
  labs(
    title = "Carbon Intensity per Employee",
    subtitle = "Emissions per employee are also on a downward trend.",
    x = "Fiscal Year",
    y = "Tons CO2e / Employee"
  ) +
  theme_minimal(base_size = 12)

# Combine the plots
# You may need to install this package: install.packages("patchwork")
library(patchwork)
p1 + p2

Finding: We are successfully decoupling our environmental impact from our business growth. Both our emissions per million dollars of revenue and our emissions per employee have significantly decreased. This is a powerful message that our business model is becoming more efficient and sustainable.

Visualization 4: Product Life Cycle Carbon Footprint

This chart shows the carbon footprint of our flagship products at the time of their release.

# Filter for just the main product line for a clear story
product_story <- product_footprint %>%
  filter(str_detect(product, "iPhone"))

ggplot(product_story, aes(x = release_year, y = carbon_footprint)) +
  geom_line(color = "#66a61e", size = 1) +
  geom_point(color = "#66a61e", size = 3) +
  # Add labels for the product names
  ggrepel::geom_text_repel(aes(label = product), size = 3.5, max.overlaps = 20) +
  scale_x_continuous(breaks = seq(min(product_story$release_year), max(product_story$release_year), by = 1)) +
  labs(
    title = "Our Products Are Getting Greener",
    subtitle = "The life cycle carbon footprint of new iPhones has generally decreased over time.",
    x = "Release Year",
    y = "Product Life Cycle Footprint (kg CO2e)"
  ) +
  theme_minimal(base_size = 14)

Finding: Our product design and manufacturing innovations are paying off. Despite adding new features, the carbon footprint of new products is lower than that of previous generations, demonstrating a commitment to sustainability at the product level.


6. Act

Key Insights & Recommendations

Based on the analysis, here are the top-level findings and actionable recommendations for our stakeholders:

  1. Insight: Our sustainability strategy is working. We have successfully reduced our net carbon footprint and, most importantly, decoupled our emissions from our revenue and headcount growth.
    • Recommendation (Leadership & Investors): Continue to fund and support the current sustainability strategy. This data proves we are mitigating climate-related transition risk and building a more resilient, efficient business.
  2. Insight: Scope 3 emissions (supply chain, manufacturing, product use) are, and will continue to be, our largest source of emissions. Our own operations (Scope 1 & 2) are already highly optimized.
    • Recommendation (Operations & Supply Chain Teams): Double-down on Scope 3 interventions. This includes:
      • Expanding the Supplier Clean Energy Program.
      • Investing in R&D for low-carbon materials and manufacturing processes.
      • Continuing to improve product energy efficiency to reduce “product use” emissions.
  3. Insight: Our product-level innovations (e.g., in the iPhone) show a clear, positive trend of reduced carbon footprints.
    • Recommendation (Marketing & Sales): Leverage this data. Communicate our proven track record of “building better and greener” products to our customers. This is a powerful competitive differentiator.

Next Steps: A deeper dive into the Scope 3 “Manufacturing” category would be beneficial to identify which specific commodities or suppliers are the biggest drivers, allowing for more targeted interventions.