DS Labs HW

Author

Jonathan RH

Loading Libaraies

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library("dslabs")

I loaded in the needed libraries to complete this assignment. “tidyverse” for all the needed packages, “plotly” for the interactive scatter plot, and “dslabs” for the data set.

Data

data("temp_carbon")

The data set I chose for this assignment was the “temp_carbon” data set. The data set came with five columns: 1) year 2-4) Ocean, land, and temp anomaly 5) carbon emissions. The year starts from 1751 to 2018. Then the ocean, land, and temp anomaly refer to the average temperature reading for a specific region and measure where it departs from its assigned reference value. Finally, carbon emissions are the release of carbon dioxide (CO2) and other related carbon compounds into the atmosphere throughout the years.

Cleaning the Data

TC_Cleaned <- temp_carbon |>
  filter(year > 1879, year <= 2015) |>
  pivot_longer(2:4,
               names_to = "Environment",
               values_to = "Levels")

Before I create a data visualization, I need to organize the data, so the graphing process is easier. First, I filter data out that contained missing data which was years before 1880 and years after 2015. Then, I transform my data set from a wide format to a long format structure by putting all the different environments together.

Changing the Names of the Environment

TC_Cleaned$Environment[TC_Cleaned$Environment == "land_anomaly"] <- "Land"
TC_Cleaned$Environment[TC_Cleaned$Environment == "ocean_anomaly"] <- "Ocean"
TC_Cleaned$Environment[TC_Cleaned$Environment == "temp_anomaly"] <- "Atmosphere"

Finally, I changed the name of the cells in the Environment column to simplify the names.

Scatter Plot

P1 <- ggplot(TC_Cleaned, aes(x = year, y = Levels, fill = carbon_emissions,color = Environment))+
  geom_point(stroke = 0.9, shape = 21, aes(size = carbon_emissions))+
  labs(fill = "Carbon Emissions:",
       title = 'Temperature Anomaly in the Environment Due to Carbon Emissions \n (1880 - 2015)',
       x = "Year",
       y = "Temperature Anomaly (C)") +
  theme_bw() +
  scale_fill_gradient(low = "#AE1987", high = "#EACA00")+
  scale_color_manual(values = c(Land = "limegreen", Ocean = '#3271B8', Atmosphere = 'hotpink')) +
  guides(size= FALSE)+
  theme(
    legend.position = "bottom",
    legend.key = element_rect(fill = "white", colour = "black")
    )
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
P1
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_point()`).

Now that the data is cleaned, I decided that I will make a scatter plot. The scatter plot was created to represent the increase in carbon emission and the changes of the temperature anomaly throughout the years. I used the data from the carbon_emissions column to add the gradient inside the dots, as well as the varied sizes of each dot based on the amount of carbon emission that year. Then, I used the categorical data in environment to add distinct colors outline to represent each environment (land, ocean, and temp/atmosphere). Finally, I added my title and labels to the axis and moved the legends to the bottom of the plot.

Scatter Plot for Interactive Plot

IP <- ggplot(TC_Cleaned, aes(x = year, y = Levels, fill = carbon_emissions, color = Environment))+
  geom_point(stroke = 0.5, shape = 21, size = 3) +
  labs(fill = "Carbon Emissions",
       title = 'Temperature Anomaly in the Environment Due to Carbon Emissions \n (1880 - 2015)',
       x = "Years",
       y = "Temperature Anomaly (C)",
       color = "Environment") +
  theme_bw() +
  scale_fill_gradient(low = "#AE1987", high = "#EACA00")+
  scale_color_manual(values = c(Land = "limegreen", Ocean = '#3271B8', Atmosphere = 'hotpink')) +
  theme(
    legend.position = "bottom",
    legend.key = element_rect(fill = "white", colour = "black")
    )

Interactive Scatter Plot

ISP <- ggplotly(IP)

ISP

I like my scatter plot, but I wanted to make it interactive. I was able to do so by using the package “plotly”.

BONUS: Heatmap

Cleaning Data

Data_TCA <- temp_carbon |>
  filter(year > 1915, year <= 2014) |>
  pivot_longer(2:4,
               names_to = "Environment")

Data_TCA$Environment[Data_TCA$Environment == "land_anomaly"] <- "Land"
Data_TCA$Environment[Data_TCA$Environment == "ocean_anomaly"] <- "Ocean"
Data_TCA$Environment[Data_TCA$Environment == "temp_anomaly"] <- "Atmosphere"

Heatmap

ggplot(data = Data_TCA, aes(x=year, y=Environment, fill = value)) +
  geom_tile()+
  scale_fill_distiller(palette="Spectral") +
  theme_dark()+
  theme(axis.text.x = element_text(angle = 90))

Soucre

plotly https://r-graph-gallery.com/interactive-charts.html

colors: https://r-charts.com/color-palettes/