The dataset I chose for this project is about global warming contributions by gas and source, pulled from the website ourworldindata.org. The data was prepared by going through the process of standardizing country names and world region definitions, converting units, calculating derived indicators such as per capita measures, and lastly adding or adapting metadata such as the name or the description given to an indicator. This data includes information about changes in global mean surface temperature caused by the emissions of three gases; carbon dioxide, methane, and nitrous oxide. It also groups the data by the source; fossil fuels/ industry or agriculture / land. For my project, I focused on the categorical variables Entity and Year and the quantitative variables ‘Change in global mean surface temperature caused by CO₂ emissions from fossil fuels and industry’ and ‘Change in global mean surface temperature caused by CO₂ emissions from agriculture and land use’.I first started cleaning the Entity and chose the ten most visited countries. I then filtered the years to be every 20 years starting in 1840. I wanted to include the 1800s since that is when the Industrial Revolution first began, causing an increase in the fossil fuels used. I decided I wanted to rename the quantitative variables to be shorter so that it would be easier to code. Lastly, I mutated the quantitative variables and multiplies them by 100 so that my visualizations would show up better. I decided to choose this topic for my final project because global warming is currently a huge rising issue, and I wanted to visualize the trends behind it.
Global warming can be defined as the planet’s overall temperature. As the human population increases and the world continues to advance, it has become evident that the burning of fossil fuels such as coal, oil, natural gas, etc. has caused the global surface temperature to increase rapidly. This can essentially lead to climate changes, where the weather starts to be affected. It can lead to a rise in sea levels because of melting ice and glaciers (Global Warming).
Work Cited: Global Warming. education.nationalgeographic.org/resource/global-warming.
library(tidyverse) #setting libraries
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
setwd("C:/Users/asman/Documents/data110")
globalwarming <- read_csv("globalwarming.csv") #Dataset
## Rows: 41280 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (7): Year, Change in global mean surface temperature caused by nitrous o...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(globalwarming)
## # A tibble: 6 × 9
## Entity Code Year Change in global mean surface…¹ Change in global mea…²
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1851 0.00000000262 0.0000000999
## 2 Afghanistan AFG 1852 0.00000000529 0.000000202
## 3 Afghanistan AFG 1853 0.00000000800 0.000000306
## 4 Afghanistan AFG 1854 0.0000000107 0.000000411
## 5 Afghanistan AFG 1855 0.0000000135 0.000000518
## 6 Afghanistan AFG 1856 0.0000000163 0.000000627
## # ℹ abbreviated names:
## # ¹`Change in global mean surface temperature caused by nitrous oxide emissions from fossil fuels and industry`,
## # ²`Change in global mean surface temperature caused by nitrous oxide emissions from agriculture and land use`
## # ℹ 4 more variables:
## # `Change in global mean surface temperature caused by methane emissions from fossil fuels and industry` <dbl>,
## # `Change in global mean surface temperature caused by methane emissions from agriculture and land use` <dbl>,
## # `Change in global mean surface temperature caused by CO₂ emissions from fossil fuels and industry` <dbl>, …
globalwarming1 <- globalwarming |>
filter(Entity %in% c("France", "Spain", "United States", "China", "Italy", "Brazil", "United Kingdom", "Mexico", "Germany", "Canada")) |> #Filtering Most Visited Countries
filter(Year %in% c("1840", "1860", "1880", "1900", "1920", "1940", "1960", "1980", "2000", "2020")) |> #Filtering by every 20 years
rename(n20fossilfuels_industry = `Change in global mean surface temperature caused by nitrous oxide emissions from fossil fuels and industry`) |> #renaming to make the name shorter
rename(n20agr_land = `Change in global mean surface temperature caused by nitrous oxide emissions from agriculture and land use`) |>
rename(ch4fossilfuels_industry = `Change in global mean surface temperature caused by methane emissions from fossil fuels and industry`) |>#ch4 is methane
rename(ch4agr_land = `Change in global mean surface temperature caused by methane emissions from agriculture and land use`) |> #methane
rename(c02fossilfuels_industry = `Change in global mean surface temperature caused by CO₂ emissions from fossil fuels and industry`) |>
rename(c02agr_land =`Change in global mean surface temperature caused by CO₂ emissions from agriculture and land use`)
head(globalwarming1)
## # A tibble: 6 × 9
## Entity Code Year n20fossilfuels_industry n20agr_land ch4fossilfuels_industry
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Brazil BRA 1860 0.00000308 0.00000233 0.0000195
## 2 Brazil BRA 1880 0.0000102 0.00000804 0.0000625
## 3 Brazil BRA 1900 0.0000193 0.0000206 0.000132
## 4 Brazil BRA 1920 0.0000313 0.0000660 0.000290
## 5 Brazil BRA 1940 0.0000468 0.000120 0.000582
## 6 Brazil BRA 1960 0.0000770 0.000310 0.00116
## # ℹ 3 more variables: ch4agr_land <dbl>, c02fossilfuels_industry <dbl>,
## # c02agr_land <dbl>
globalwarming2 <- globalwarming1 |>
mutate(n20fossilfuels_industry = n20fossilfuels_industry * 100) |> #multiplying by 100
mutate(n20agr_land = n20agr_land * 100) |>
mutate(ch4fossilfuels_industry = ch4fossilfuels_industry * 100) |>
mutate(ch4agr_land = ch4agr_land * 100) |>
mutate(c02fossilfuels_industry = c02fossilfuels_industry * 100) |>
mutate(c02agr_land = c02agr_land * 100)
head(globalwarming2)
## # A tibble: 6 × 9
## Entity Code Year n20fossilfuels_industry n20agr_land ch4fossilfuels_industry
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Brazil BRA 1860 0.000308 0.000233 0.00195
## 2 Brazil BRA 1880 0.00102 0.000804 0.00625
## 3 Brazil BRA 1900 0.00193 0.00206 0.0132
## 4 Brazil BRA 1920 0.00313 0.00660 0.0290
## 5 Brazil BRA 1940 0.00468 0.0120 0.0582
## 6 Brazil BRA 1960 0.00770 0.0310 0.116
## # ℹ 3 more variables: ch4agr_land <dbl>, c02fossilfuels_industry <dbl>,
## # c02agr_land <dbl>
linearmodel <- lm(c02agr_land ~ c02fossilfuels_industry, data = globalwarming2) #equation
summary(linearmodel)
##
## Call:
## lm(formula = c02agr_land ~ c02fossilfuels_industry, data = globalwarming2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5067 -0.4092 -0.2910 0.0801 3.9902
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.31778 0.11343 2.802 0.00625 **
## c02fossilfuels_industry 0.26695 0.03633 7.347 9.77e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9878 on 88 degrees of freedom
## Multiple R-squared: 0.3802, Adjusted R-squared: 0.3732
## F-statistic: 53.98 on 1 and 88 DF, p-value: 9.766e-11
The model has the equation: c02agr_land = 0.27(c02fossilfuels_industry) + 0.32
The p-value on the right of c02fossilfuels_industry has 3 asterisks which suggests it is a meaningful variable to explain the linear increase in c02agr_land. However, the Adjusted R-Squared value states that about 37% of the variation may be explained by the model. In other words, 63% of the variation in the data is likely not explained by this model.
linearplot <- ggplot(globalwarming2, aes(x = c02fossilfuels_industry, y = c02agr_land)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "#344E41") + #linear method
labs(x = "C02 in Fossil Fuels & Industry",
y = "C02 in Agriculture & Land",
title = "Global Surface Temperature: C02 in Fossil Fuels vs C02 in Agriculture & Land")+ # Axis labels and title
theme_classic() +
theme(panel.background = element_rect(fill = "#A3B18A")) #background color
linearplot
## `geom_smooth()` using formula = 'y ~ x'
simpleplot1 <- ggplot(globalwarming2, aes(x = c02fossilfuels_industry, y = c02agr_land, color = Entity)) +
geom_point(aes(size = 2)) + #Bigger point size
scale_color_manual(values = c("#A26360", "#E8B298", "#EDCC8B", "#BDE1B3", "#31572c", "#8DD6E2" , "#9194E2","#C6A0c4", "#370c11","#67657f"))+ #adding color
labs(x = "C02 in Fossil Fuels & Industry",
y = "C02 in Agriculture & Land",
title = "C02 in Fossil Fuels vs C02 in Agriculture & Land by Entity")+ # Axis labels and title
theme_test() +
theme(panel.background = element_rect(fill = "#ecf1e6")) #background color
simpleplot1
This visualization groups together the Entitys by color and shows us that the country with the highesst changes of global surface temperature caused by c02 is the United States and the second highest is China.
For my next two plots, I wanted to take a closer look at the differences between fossil fuels / industry and agriculture / land grouped by Entity.
simpleplot2 <- ggplot(globalwarming2, aes(x = Entity, y = c02fossilfuels_industry)) +
geom_boxplot(fill = "darkolivegreen", color = "darkseagreen") +
coord_flip()+ #Flipping the axes
labs(x = "Entity",
y = "C02 in Fossil Fuels",
title = "C02 in Fossil Fuels by Entity")+ # Axis labels and title
theme_test() +
theme(panel.background = element_rect(fill = "#ecf1e6"))
simpleplot2
From this visualization we can conclude that the United States, Germany, United Kingdom, and China have the highest use of C02.
simpleplot3 <- ggplot(globalwarming2, aes(x = Entity, y = c02agr_land)) +
geom_boxplot(fill = "darkolivegreen", color = "darkseagreen") +
coord_flip() + #Flipping the axes
labs(x = "Entity",
y = "C02 in Agriculture & Land",
title = "C02 in Agriculture & Land by Entity")+ # Axis labels and title
theme_test() +
theme(panel.background = element_rect(fill = "#ecf1e6"))
simpleplot3
From this visualization, we can see that Brazil and Canada had a huge difference in agriculture / land compared to fossil fuels. The United States and China remain at a high level.
cols <- c("#31572c","#a6b196", "#4f772d", "#90a955", "#ecf39e","#d4f3b7", "#eaeeea","#505c45", "#96d031", "#132a13")#colors
highchart () |>
hc_add_series(data = globalwarming2,
type = "streamgraph", #creating a stream graph
hcaes(x = Year,
y = c02fossilfuels_industry,
group = Entity)) |> #grouping by country
hc_chart(backgroundColor = "#d0cdc9") |> #background color
hc_xAxis(title = list(text="Year")) |>
hc_yAxis(title = list(text="C02 of Fossil Fuels and Industry")) |>
hc_title(text="Changes in Global Surface Temperature caused by C02 of Fossil Fuels & Industry")|>
hc_colors(cols)
highchart () |>
hc_add_series(data = globalwarming2,
type = "streamgraph",
hcaes(x = Year,
y = c02agr_land,
group = Entity)) |>
hc_chart(backgroundColor = "#d0cdc9") |>
hc_colors(cols) |>
hc_xAxis(title = list(text="Year")) |>
hc_yAxis(title = list(text="C02 of Agriculture and Land")) |>
hc_title(text="Changes in Global Surface Temperature caused by C02 of Agriculture and Land")|>
hc_caption(text = "Source: Our World in Data")
Overall, these visualizations show us that over time, the changes in global mean surface temperature caused by c02 has increased rapidly. In the fossil fuels and industry visualization we can see that the increase only started around 1880, and this is explained by the start of the industrial revolution in the 1800s. Even though I selected the countries which are most visited, mostly the countries with more population such as China and the United States seem to have high amounts of c02. It is also evident that Brazil has an extremely low amount of c02 of fossil fuels and industry, however they have a really high amount of c02 in agriculture and land. This could be explained by the large forests and tropical land that exist in Brazil. For this project, I wish I could have faceted the two visualizations together so that it would be visible next to eachother, but I couldn’t figure out how especially with highcharter. I attempted with ggplot but didn’t get enough time to further work on it.