DS Labs assignment

Author

Steve Donfack

INTRODUCTION

The data set we’re working with today collected global temperature anomalies and carbon emissions from 1751 to 2018. The analysis we will perfom today will help us answering this question: Are carbon emissions increasing or decreasing over time? To do this, we selected a 10-year sample, from 2004 to 2014.

Packages Installation

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("dslabs")
library(dplyr)
library(ggfortify)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-stars.R"                        
[25] "make-temp_carbon.R"                  
[26] "make-tissue-gene-expression.R"       
[27] "make-trump_tweets.R"                 
[28] "make-weekly_us_contagious_diseases.R"
[29] "save-gapminder-example-csv.R"        

Import the Data set

First let’s start by loading our Data set and all our libraries

library(ggthemes) # a package of new themes
library(ggrepel) # will help us add text labels to points
data("temp_carbon")

Perfoms the necessary cleaning

To perform our analysis here, we first need to clean our dataset and remove all na. We will also filter our dataset to work only on the period from 2004 to 2014.

temp_carbon_na <- na.omit(temp_carbon)
sample001=filter(temp_carbon_na,year >= "2004")
head(sample001)
  year temp_anomaly land_anomaly ocean_anomaly carbon_emissions
1 2004         0.58         0.81          0.49             7743
2 2005         0.66         1.08          0.50             8042
3 2006         0.63         0.97          0.50             8336
4 2007         0.61         1.12          0.43             8503
5 2008         0.54         0.89          0.41             8776
6 2009         0.64         0.90          0.54             8697
library(extrafont)
Registering fonts with R

Graph our chart

Now let’s create a scatter plot to analyze the evolution of carbon emissions over the years. For our analysis, we chose a sample of 10 years, from 2004 to 2014. We first defined our graph by adding the x and y axes, a legend, and an attractive title. We then added points to the graph to determine if there is a relationship between the years and carbon emissions. The goal here is , we want to know if carbon emissions have increased or decreased over the years. We also added the temperature anomaly to our graph, which allows us to precisely observe the relationship between the temperature anomaly and carbon emissions.

ggplot(sample001, aes(x=year,y=carbon_emissions, color=temp_anomaly, label = temp_anomaly))+
  labs(x="Year",
       y="Carbon emissions in T/co2",
       caption = "Source: DS Labs Database",
       title = "COMPARISON OF THE CARBON EMISSIONS OVER THE YEAR ")+
  theme_minimal(base_size = 12, base_family = "serif")+
  geom_point(size=5, alpha=0.5)+
   geom_smooth(method = lm, se=FALSE,color="maroon",lty=2, linewidth=0.6)+
  geom_text_repel(nudge_x = 0.9)
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

As we can see from our scatter plot, from 2004 to 2014 carbon emissions increased.