This part is just importing the libraries and data.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
tuesdata <- tidytuesdayR::tt_load('2024-05-21')
## --- Compiling #TidyTuesday Information for 2024-05-21 ----
## --- There is 1 file available ---
## --- Starting Download ---
##
## Downloading file 1 of 1: `emissions.csv`
## --- Download complete ---
emissions <- tuesdata$emissions
First, I will create a subset of our data to figure out how many of which types of entity there are.
entity_type <- subset(emissions, select = c(parent_entity, parent_type))
entity_type <- entity_type |> distinct()
table(entity_type$parent_type)
##
## Investor-owned Company Nation State State-owned Entity
## 75 11 36
The majority of entities in this dataset are investor-owned companies. Investigating which portion of the entities are producing how much emissions could be interesting.
Next, a brief summary of total emissions over the timespan of the data would be necessary.
summary(emissions$total_emissions_MtCO2e)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 8.785 33.059 113.220 102.155 8646.906
The maximum recorded emissions by an entity within a calendar year is extremely large in proportion to the other measures of distribution, possibly an outlier. Since the data starts in the year 1854 and ends in 2022, we would require context on how the total emissions of each year have changed.
How much emissions are produced by each type of entity over time?
library(ggthemes)
emissions %>%
aggregate(total_emissions_MtCO2e ~ year + parent_type, data = ., FUN = sum) %>%
ggplot(., aes(year, total_emissions_MtCO2e, group = parent_type, color = parent_type)) + labs(title = "Emissions over time by Entity Type",
x = "Year",
y = "Total Emissions (Million Tonnes of CO2 Equivalent)") + geom_line()
The emissions for each entity type all roughly follow an exponential increase. The entity type that produces the most emissions changes several times, but as of 2022, nation states and state-owned entities are roughly tied for the most emissions produced, with investor-owned companies slowing down in emission increases. This could possibly indicate a shift towards state-owned production of materials such as oil, gas, and coal.
How much emissions are produced by each type of commodity production?
commodity_emissions <- subset(emissions, select = c(commodity, total_emissions_MtCO2e))
ggplot(commodity_emissions, aes(x=commodity, y=total_emissions_MtCO2e, fill=commodity)) +
labs(title = "Emissions by Commodity Type",
y = "Total Emissions (Million Tonnes of CO2 Equivalent)") +
geom_bar(stat="identity") +
theme_minimal() +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
Most of the emissions in the dataset came from oil and natural gas emissions, with bituminous coal and natural gas behind. This could be because oil and NGL production was higher than anything else, or it could mean those production types create more emissions. In either case, we should look into alternative fuel sources such as solar, hydrothermal, and nuclear.
How much emissions from each type of coal production was produced each year?
remove <- c("Cement", "Oil & NGL", "Natural Gas")
coal <- emissions[!grepl(paste(remove, collapse='|'), emissions$commodity),]
coal %>%
aggregate(total_emissions_MtCO2e ~ year + commodity, data = ., FUN = sum) %>%
ggplot(., aes(year, total_emissions_MtCO2e, group = commodity, color = commodity)) + labs(title = "Emissions from Coal Production by Year",
x = "Year",
y = "Total Emissions (Million Tonnes of CO2 Equivalent)") + geom_line()
Bituminous coal beats every other type of coal in total emission production by year. As for whether this might be from greater production in proportion to other types or just from the process itself, an analysis on how much of each coal type was produced each year would be necessary.
How much of each type of coal was produced each year?
coal %>%
aggregate(production_value ~ year + commodity, data = ., FUN = sum) %>%
ggplot(., aes(year, production_value, group = commodity, color = commodity)) + labs(title = "Total Coal Production by Year",
x = "Year",
y = "Million Tonnes") + geom_line()
Bituminous coal once again beats every other type, this time in total production of coal. It appears that the relatively large proportion of emissions is in large part due to the the massive volume of coal being produced.