Overview

ANLY 512: Lab 2 - Data Exploration and Analysis

Dashboard Laboratory

Overview Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

Objective The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

World CO2 Emission

Column

World CO2 Emission

World CO2 Emission by Continent

Column

Regions and Countires

Column

Global CO2 Emissions by Top Regions and Countries

Cumulative CO2 Emmission

Global GHG Emission by Sector

Column

Global Temperature anomaly

CO2 emission is considered as one of the main factor causing global warming. It is then essential to take a look at how the global temperature has changed over time and whether the changes align with the movement of CO2 emission.

This Chart shows the Global Temperature anomaly from 1850 - 2022

Column

Conclusion

The Global Temperature anomaly chart shows that the much of the anomalies in temperature over the last 40 years can be attributed to the rise in CO2 emissions. As the CO2 emissions continue to rise, so those the global temperature.

This dashboard shows the top offenders and emitters of CO2, and shows a correlation between CO2 emission and rising global temperature.

Cumulatively, the United States had the most C02 emissions, however, the country has managed to turn its emission levels around after seeing a downward trend that began in 2007.China tops the list, making it the largest emitter in the most recent years. China surpassed the United States sometime in 2007 and has continued to lead since then. The country may want to create strategies and policies to drive its emission levels down.

If anything, the turnaround in the US emissions trends shows that with the right strategies and policies, it is possible for a nation, no matter how advance, to reduce its carbon emissions. While reduction rate may not be very high per nation, collectively the impact on Global temperatures will likely be huge.

Antarctica shines as the continent with the least CO2 emissions.

References:

Data Source: https://ourworldindata.org/co2-dataset

https://www.ncei.noaa.gov/access/monitoring/global-temperature-anomalies/anomalies

https://ourworldindata.org/emissions-by-sector

---
title: "ANLY 512: Lab 2 - Data Exploration and Analysis"
author: "Tin Bui"
date: "`r Sys.Date()`"
output:  
  flexdashboard::flex_dashboard:
           source_code: embed
    
 
---


---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

```{r include=FALSE}


library(dplyr)
library(knitr)
library(kableExtra)
library(ggplot2)
library(readxl)
library(tidyr)
library(vcd)
library(devtools)
library(car)
library(plotly)
library(treemap)
library(reshape2)

```

# Overview

ANLY 512: Lab 2 - Data Exploration and Analysis

Dashboard Laboratory

Overview
Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

Objective
The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

* Identify what information interests you about climate change.
* Find, collect, organize, and summarize the data necessary to create your data exploration plan.
D* esign and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
* Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
* Develop four questions or ideas about climate change from your visualizations.



```{r echo=FALSE}
data<- read.csv("C:\\Harrisburg\\02-2022-Late Fall\\ANLY 512-90-O\\Assignments\\Lab 02\\annual-co2-emissions-per-country.csv")
```


# World CO2 Emission 

## Column{.sidebar}

Question 1: How serious is the global CO2 emission problem and more importantly, what is the trend like after the the Paris Agreement in 2015? 

In 2015, 195 nations gathered in Paris to sign and adopt the Paris Climate Accord. The goal of the Paris Agreement on climate change is to create awareness and reduce the global warming to 1.5 degree Celsius.

As seen in the chart, the Global C02 Emissions began to rise after the year 1850. Indeed since then, the Global CO2 emissions has seen an exponential upward trend, reaching as high as 37.1 billion tonnes in 2021.The rate at which the global C02 emissions rate has increased over the last 50 years creates a real concern. 

Since 2015, the global CO2 emission is still on record high level.Even though there is a slightly drop in CO2 emission in 2020 due to the pandemic but the emission increase back on after production recovers.   

## Column

### World CO2 Emission 

```{r echo=FALSE}

data<- read.csv("C:\\Harrisburg\\02-2022-Late Fall\\ANLY 512-90-O\\Assignments\\Lab 02\\annual-co2-emissions-per-country.csv")

world_data <-
  data %>%
  filter(Entity == "World" ) 

Co2_1 <-
  world_data %>%
  select(Annual.CO2.emissions, Year) %>%
  group_by(Year) %>%
  mutate(
    Emissions = Annual.CO2.emissions/1000000000)

ggplot(Co2_1, aes(Year, Emissions)) + 
  geom_area(colour = 'darkred', alpha = 0.2, fill= "red") +
  theme_minimal()+
  labs(
    x = "Year",
    y = "Annual CO2 Emissions (Billions)",
    title = "Global CO2 Emissions 1750 - 2021")


ggplotly(p =  ggplot2:: last_plot())

```



# World CO2 Emission by Continent

## Column{.sidebar}

###Q
Apart from looking at the global CO2 emission, it is also interesting to look at the contribution of CO2 emission across the different continents. 

Question 2: What does the distribution of CO2 emission look like across various continents?

At first glance, the chart shows that the top 3 CO2 emitting continents are: Asia, Europe and North America.

Asia tops the list having emitted a total of over 400 billion tonnes of CO2. Asia is followed closely by Europe, with North America taking the third spot.

At the bottom of the list is Antarctica, Oceania, and South America.


## Column


### 
```{r echo=FALSE}


Continents <-
  data %>%
  filter(Entity == "Africa" | Entity == "Oceania" | Entity == "Asia" | Entity == "Antarctica" | Entity == "Europe" | Entity == "South America" | Entity == "North America" ) 


Continents <- 
  Continents %>%
  select(Entity, Annual.CO2.emissions, Year) %>%
  group_by(Year) %>%
  mutate(
    Emissions = Annual.CO2.emissions/1000000000)

Continents$Entity <- factor(Continents$Entity)

ggplot(Continents, aes(x = reorder(Entity, Emissions), y = Emissions, fill = Entity)) +
   geom_bar(stat="identity", width = 0.8) +
  theme_classic() +
  theme(legend.position="none") +
   scale_fill_brewer() +
  coord_flip() +
  labs(
    x = 'Continent',
    y = 'CO2 Emissions (Billions T)',
    title = 'Global CO2 Emissions by Continent')

ggplotly(p =  ggplot2:: last_plot())

```


# Regions and Countires



## Column{.sidebar}



###Q

Once we have an idea about the distribution of CO2 emission among continents, it is also important to see which region or country might have larger impact on global CO2 emission. 

Question 3: What region/ country in Asia drives the CO2 emission in the Continent, and how does it compare to other top Regions, countries and Continents?.

For this line chart, the two counties in Asia - China, and India - were added as separate entities. Asia with the exclusion of China and India was also added as a separate entity.  

China appears to be the largest emitter in the most recent years. China surpassed the United States sometime in 2006 and has continued to lead since then. The United States on the other hand has seen a drop in emissions since around 2006. The reduction in CO2 emission may be thank to various policies and strategies that have been put in place by the government and companies to reduce the carbon footprint. 

Asia (excluding China and India) is still seen as one of the top emitters, while India has a lower emission rate compared to China and the rest of Asia.

Cumulatively, the United States appears to have the most C02 emissions.

## Column {.tabset .tabset-fade}


### Global CO2 Emissions by Top Regions and Countries
```{r echo=FALSE}

Regions_data <- 
  data %>%
  filter(Entity == "Africa" | Entity == "Oceania" | Entity == "Asia (excl. China and India)" | Entity == "China" | Entity == "India" | Entity == "European Union (27)" | Entity == "Europe (excl. EU-27)"  | Entity == "North America (excl. USA)" | Entity == "United States") 
    

Regions_data <- 
  Regions_data %>%
  select(Entity, Annual.CO2.emissions, Year) %>%
  group_by(Year) %>%
  mutate(
    Emissions = Annual.CO2.emissions/1000000000)

Regions_data$Entity <- factor(Regions_data$Entity)

ggplot(Regions_data, aes(Year, Emissions, colour = Entity)) +
  geom_line(width = 0.9) +
  theme_minimal() +
  labs(
    x = 'Year)',
    y = 'CO2 Emissions (Billions)',
    colour = "Region",
    title = 'Global CO2 Emissions by Top Regions and Countries')

ggplotly(p =  ggplot2:: last_plot())

```



### Cumulative CO2 Emmission

```{r}

data2<- read.csv("C:\\Harrisburg\\02-2022-Late Fall\\ANLY 512-90-O\\Assignments\\Lab 02\\cumulative-co-emissions.csv")


data2$Entity <- factor(data2$Entity)



Cul_ems <- 
  data2 %>%
  filter(Entity == "Africa" | Entity == "Oceania" | Entity == "Asia (excl. China and India)" | Entity == "China" | Entity == "India" | Entity == "European Union (27)" | Entity == "Europe (excl. EU-27)" | Entity == "United Kingdom" | Entity == "Brazil" | Entity == "South Africa" |Entity == "North America (excl. USA)" | Entity == "United States") 


Cul_ems <-
  Cul_ems %>%
  select(Entity, Cumulative.CO2.emissions, Year) %>%
  group_by(Year) %>%
  mutate(
    Emissions = Cumulative.CO2.emissions/1000000000)


ggplot(Cul_ems, aes(Year, Emissions, colour = Entity)) +
  geom_line(width = 0.9) +
  theme_minimal() +
  labs(
    x = 'Year)',
    y = 'CO2 Emissions (Billions)',
    colour = "Region/ Country",
    title = 'Regional Cumulative CO2 Emissions by Region')


ggplotly(p =  ggplot2:: last_plot())
```


# Global GHG Emission by Sector

## Column{.sidebar}

###q
As seen from previous graph, United States is still a very large emitter. It is, therefore, also essential to take a look at the different sectors' contribution to greenhouse gas emission to better understand where to tackle when trying to restrict emission in the US and also globally. 

Question 4: Which is the largest contributing Sector TO GHG emissions?

In the united states, the United States Environmental Protection Agency named transportation as the largest source of GHG Emission among all other sector. This appears to also be the case on a global scale.

As see in the treemap chart, the Road transportation sector contributes the largest percentage to the GHG Emissions with 11.9%, followed by Residential at 10.9%.

## Column

### 
```{r echo=FALSE}

data3<- read_excel("C:\\Harrisburg\\02-2022-Late Fall\\ANLY 512-90-O\\Assignments\\Lab 02\\Global-GHG-Emissions-by-sector-based-on-WRI-2020.xlsx")


data3$SubSec<- data3$`Sub-sector`
data3$GHG <- data3$`Share of global greenhouse gas emissions (%)`



data3%>%
  group_by(SubSec)%>%
  mutate(GHG2 = paste(SubSec, GHG, sep ="\n"))%>%
  treemap(index="GHG2", 
          vSize="GHG",
          type = "index",
        palette = "Set2",
        fontcolor.labels ="white",
        border.col=c("white"),
        border.lwds=c(2),
        title = "Share of Global Greenhouse Gas Emissions (%)",
        fontsize.labels=c(7),
        inflate.labels=T)
          

```



# Global Temperature anomaly
CO2 emission is considered as one of the main factor causing global warming. It is then essential to take a look at how the global temperature has changed over time and whether the changes align with the movement of CO2 emission. 

This Chart shows the Global Temperature anomaly from 1850 - 2022

## Column

### 
```{r echo=FALSE}

data4<- read.csv("C:\\Harrisburg\\02-2022-Late Fall\\ANLY 512-90-O\\Assignments\\Lab 02\\temperature-anomaly.csv")

data4$Average.temperature.anomaly <- data4$Average.temperature.anomaly.from.1961.1990.average


data4 <-
  data4 %>%
  select(Average.temperature.anomaly, Year) %>%
  group_by(Year)

ggplot(data4, aes(Year, Average.temperature.anomaly)) + 
  geom_area(colour = 'darkgreen', alpha = 0.2, fill= "yellow") +
  theme_minimal()+
  labs(
    x = "Year",
    y = "Average Temperature Anomaly",
    title = "Global Temperature Anomaly 1850 -2022")


ggplotly(p =  ggplot2:: last_plot())

```


# Conclusion

The Global Temperature anomaly chart shows that the much of the anomalies in temperature over the last 40 years can be attributed to the rise in CO2 emissions. As the CO2 emissions continue to rise, so those the global temperature.

This dashboard shows the top offenders and emitters of CO2, and shows a correlation between CO2 emission and rising global temperature. 

Cumulatively, the United States had the most C02 emissions, however, the country has managed to turn its emission levels around after seeing a downward trend that began in 2007.China tops the list, making it the largest emitter in the most recent years. China surpassed the United States sometime in 2007 and has continued to lead since then. The country may want to create strategies and policies to drive its emission levels down.  

If anything, the turnaround in the US emissions trends shows that with the right strategies and policies, it is possible for a nation, no matter how advance, to reduce its carbon emissions. While reduction rate may not be very high per nation, collectively the impact on Global temperatures will likely be huge. 


Antarctica shines as the continent with the least CO2 emissions. 




References:

Data Source:
https://ourworldindata.org/co2-dataset

https://www.ncei.noaa.gov/access/monitoring/global-temperature-anomalies/anomalies

https://ourworldindata.org/emissions-by-sector