Research Question: How does the U.S. and China compare in terms of CO2 emissions?
Introduction The dataset I’ll be using consists of four different variables: Country Name, Country Code, Year, and CO2. For this project, I will be using the Country Name, Year, and CO2 variables. The dataset gives information on CO2 emissions for every country between the years of 2000 and 2014. The Country Name variable gives us the name of the countries, the Year variable provides the year in which the data was taken from, and the CO2 variable give us the maximum value recorded within a specific year (2000-2014). I want to compare the mean CO2 emissions between the U.S. and China. This data was taken from the World Bank Group (WBG) and was cleaned by Ruddy Setiadi Gunawan.
Data Analysis For this specific project, I’ll be utilizing things such as summary statistics and data visualization which fall under descriptive statistics. In terms of the visualization, I plan to create a scatterplot which will compare the mean CO2 emissions of the U.S. and China over a course of 14 years (2000-2014).
Loaded in necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Assigned the dataset I’ll be using to a new variable
data <- read_csv('co2_global_emissions_lab_clean.csv')
## Rows: 3732 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country Name, Country Code
## dbl (2): year, co2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Changed the name of column Country Name to Country because r wouldn’t recognize the original name.
colnames(data)[colnames(data) == "Country Name"] <- "Country"
Printed out the new column names to check that they work/as a visual reminder for myself.
colnames(data)
## [1] "Country" "Country Code" "year" "co2"
Used the filter() function to isolate the rows including China and the United States (removed the rest of the countries).
filter <- data %>%
filter(Country %in% c("China", "United States"))
Provided the summary statistics as another visual reminder for myself.
summary(filter)
## Country Country Code year co2
## Length:30 Length:30 Min. :2000 Min. : 2.697
## Class :character Class :character 1st Qu.:2003 1st Qu.: 5.427
## Mode :character Mode :character Median :2007 Median :11.931
## Mean :2007 Mean :11.820
## 3rd Qu.:2011 3rd Qu.:18.936
## Max. :2014 Max. :20.179
Calculated the mean CO2 value for China. Between the 14 years of data, the mean CO2 emission for China was 5.26 metric tons. Used the round function to round the mean to two decimal points.
china_mean_co2 <- mean(data$co2[data$Country == "China"], na.rm = TRUE)
round(china_mean_co2, 2)
## [1] 5.26
Calculated the mean CO2 value for the U.S. Between the 14 years of data, the mean CO2 emission for the U.S was 18.38 metric tons.Used the round function to round the mean to two decimal points.
us_mean_co2 <- mean(data$co2[data$Country == "United States"], na.rm = TRUE)
round(us_mean_co2, 2)
## [1] 18.38
Added a border around the graph by using theme_bw and changed the default colors by using scale_color_brewer.
plot1 <- ggplot(filter, aes(x = year, y = co2, color = Country)) +
geom_point() +
scale_color_brewer(palette = "Set2") +
labs(title = "CO2 Emissions Comparison: US vs China (2000-2014)",
x = "Year",
y = "CO2 Emissions (in metric tons)",
color = "Country",
caption = "Source: World Bank Group (data cleaned by Ruddy Setiadi Gunawan)") +
theme_bw()
plot1
Conclusion
With this dataset, I was able to draw a comparison of the CO2 emissions between the US and China (between the years of 2000 and 2014). Through my scatterplot, I was able to draw a clear visualization of the CO2 emissions between the two countries. We can see that China has significantly less CO2 emissions than the U.S. However, we can also see that the U.S. is decreasing its CO2 emissions while China is increasing. I previously calculated the mean CO2 emissions that these two countries had in order to verify that the graph summarizes the correct values. Since China’s mean CO2 emission is 5.26 metric tons and the United State’s is 18.38 metric tons, we can see that this visualization is correct. In the future, I would like to add a few more graphs to better visualize this data; however, it would be difficult since there aren’t many variables to work with.