Overview & Objective

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

  1. Identify what information interests you about climate change.
  2. Find, collect, organize, and summarize the data necessary to create your data exploration plan.
  3. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
  4. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
  5. Develop four questions or ideas about climate change from your visualizations.

Questions

  • Are Top 3 CO2 emitting countries the same list of countries?
  • How is the average temperature changing in the United States?
  • How is GLOBAL LAND-OCEAN TEMPERATURE INDEX changing?
  • What are the anomalies projections for next 50 years?
  • How is co2 emissions spread for major countries in last century?

Top Carbon emitting countries

Column

Are Top 3 Carbon emitting countries in last years the same?

  • India, Russia, and United States have been the top 3 countries for the period
  • It seems like India is distancing Russia as the 2nd country with highest CO2 emissions

Conclusion

  • Unsurprisingly, United States is the 1st country when looking at total carbon emissions.
  • We expected to see China in the list but the country is missing.
  • It would be interesting to look at the data to make sure information is correct in the dataset.

Column

Average Temperature - United States

Column

What is the average temperature changing in the United States?

  • Southern States have an average temperature around 65 C
  • There is a 30 degrees difference between northern and southern states

How is GLOBAL LAND-OCEAN TEMPERATURE INDEX changing?

  • This shows temperature data for each year from 1880 to 2019, with “No_Smoothing” and “Lowess(5)”.
  • The “No_Smoothing” shows the actual temperature measurement for each year, while the “Lowess(5)” column shows a smoothed version of the temperature data using a locally weighted regression method with a smoothing parameter of 5.

Conclusion

  • Temperatures are continuously increasing since the last century at the same pace.
  • Both land and ocean temperatures show similar behavior.

Column

Anomalies in next 50 years and temperatures dispersion

Column

What are projected temp anomalies over next 50 years

  • Anomalies are expected to increase over the next 50 years.
  • Countries need to take this into consideration to anticipate future major high scale climatic events.
  • Projections were calculated using linear regression model based on past available years.

How is co2 emissions spread for major countries in last century?

  • China co2 emissions has more outliers on the higher end.
  • India also has a lot out outliers on the higher end.
  • The Unites States is still the country with the highest level of CO2 emissions.

Conclusion

  • Outliers for China and India are due to recent results.
  • We can expect in a near future that India and China will surpass the United States.

Column

References

---
title: 'Lab 2 Data Climate Change'
author: "Rabah Douadi"
date: "04-08-2023"
output:
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
    theme: sandstone
---

Overview & Objective {data-orientation=columns}
===================================================================
Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of "information overload" or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:
  
1. Identify what information interests you about climate change.
2. Find, collect, organize, and summarize the data necessary to create your data exploration plan.
3. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
4. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
5. Develop four questions or ideas about climate change from your visualizations.


### **Questions**

* Are Top 3 CO2 emitting countries the same list of countries?
* How is the average temperature changing in the United States?  
* How is GLOBAL LAND-OCEAN TEMPERATURE INDEX changing? 
* What are the anomalies projections for next 50 years?  
* How is co2 emissions spread for major countries in last century?

```{r setup, include=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(data.table)
library(dplyr)
library(tidyr)
library(tidyverse)
library(ggplot2)
library(zoo)
library(randomForest)
library(maps)
library(RColorBrewer)
library(scales)
library(maps)
library(ggmap)
library(maptools)
library(rgdal)
library (RCurl)
library(plotly)
library(scatterpie)
library(rnoaa)
library(usmap)
library(mapproj)
```

Top Carbon emitting countries {data-orientation=columns}
===================================================================
Column {data-width=400}
-------------------------------------------------------------------
**Are Top 3 Carbon emitting countries in last years the same?**

* India, Russia, and United States have been the top 3 countries for the period   
* It seems like India is distancing Russia as the 2nd country with highest CO2 emissions

**Conclusion**

* Unsurprisingly, United States is the 1st country when looking at total carbon emissions.  
* We expected to see China in the list but the country is missing.  
* It would be interesting to look at the data to make sure information is correct in the dataset.

Column {data-width=400}
-------------------------------------------------------------------

```{r, warning=FALSE, echo=FALSE, fig.width=8, fig.height=5}
# Read in the data
data <- read.csv("C:/Users/rabdo/OneDrive/Desktop/HU/512_90/emissions_top.csv", skip = 4)

# Compute the total CO2 emissions for each Country for last 3 years
totals <- data %>% 
  group_by(Country.Name) %>% 
  summarize(across(X2014:X2019, sum, na.rm = TRUE), .groups = "drop")

# Create an empty data frame to store the results
top_3_by_year <- data.frame(year = numeric(),
                             rank = numeric(),
                             country = character(),
                             emissions = numeric(),
                             stringsAsFactors = FALSE)

# Loop through each year and get the top 3 emitting countries
for (i in 2:ncol(totals)) {
  year_emissions <- totals[, i]
  year_top_3 <- head(totals[order(year_emissions, decreasing = TRUE), ], 3)
  year_top_3$rank <- 1:nrow(year_top_3)
  year_top_3$year <- names(totals)[i]
  year_top_3$emissions <- head(year_top_3[, i], 3)
  top_3_by_year <- rbind(top_3_by_year, year_top_3)
}

selected_cols <- c("Country.Name", "year", "rank", "emissions")
top_emissions <- top_3_by_year[, selected_cols]


ggplot(top_emissions, aes(x = Country.Name, y = emissions$X2014, fill = Country.Name)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
  facet_wrap(~ year, scales = "free_x", nrow = 2) +
  labs(x = "", y = "Emissions (metric tons per capita)") +
  theme(axis.text.x = element_text(size = 4)) +
  ggtitle("Top 3 Carbon Emissions by Country (2014-2019)") +
  theme(plot.title = element_text(hjust = 0.5))
```

Average Temperature - United States
===================================== 
Column {data-width=300}
-------------------------------------------------------------------
**What is the average temperature changing in the United States?**

* Southern States have an average temperature around 65 C   
* There is a 30 degrees difference between northern and southern states 

**How is GLOBAL LAND-OCEAN TEMPERATURE INDEX changing?**

* This shows temperature data for each year from 1880 to 2019, with "No_Smoothing" and "Lowess(5)".  
* The "No_Smoothing" shows the actual temperature measurement for each year, while the "Lowess(5)" column shows a smoothed version of the temperature data using a locally weighted regression method with a smoothing parameter of 5.  

**Conclusion**

* Temperatures are continuously increasing since the last century at the same pace.  
* Both land and ocean temperatures show similar behavior.

Column {data-width=600, .tabset}
-------------------------------------------------------------------

```{r,echo = FALSE, message = FALSE, fig.width=8, fig.height=3}
data_avg <- read.csv(url("https://www.ncdc.noaa.gov/cag/statewide/mapping/110-tavg-201503-60.csv"),skip=3)

state = map_data("state")
data_avg$region = tolower(data_avg$Location)
temp = merge(state, data_avg, by="region", all=T)
temp<-temp[-6]
temp<-drop_na(temp)

mt = ggplot(temp, aes(x = long, y = lat, group = group, fill = Value)) + 
geom_polygon(color = "white")
mt = mt + scale_fill_gradient(name = "Temperature(F)",  low = "Yellow", high = "Red" , na.value="white") + 
labs(x="Longitude",y="Latitude", title = "60 Months Average Temperature")
mt + coord_map() + theme(plot.title = element_text(hjust = 0.5))

```

```{r,echo = FALSE, message = FALSE, fig.width=8, fig.height=2}
ocean_t <- read.csv("C:/Users/rabdo/OneDrive/Desktop/HU/512_90/ocean_t.csv")

# create a ggplot with Year on the x-axis and temperature on the y-axis
ggplot(ocean_t, aes(x = Year)) +
geom_line(aes(y = No_Smoothing), color = "blue") +
geom_line(aes(y = Lowess.5.), color = "red") +
scale_color_manual(values = c("blue", "red"), labels = c("No_Smoothing", "Lowess.5.")) + xlab("Year") + ylab("Temperature (°C)") +
ggtitle("Global Temperature Anomaly (1880-2022)") +
theme(plot.title = element_text(hjust = 0.5))

```

Anomalies in next 50 years and temperatures dispersion {data-orientation=columns}
===================================================================
Column {data-width=400}
-------------------------------------------------------------------
**What are projected temp anomalies over next 50 years**

* Anomalies are expected to increase over the next 50 years.  
* Countries need to take this into consideration to anticipate future major high scale climatic events.  
* Projections were calculated using linear regression model based on past available years.

**How is co2 emissions spread for major countries in last century?**

* China co2 emissions has more outliers on the higher end.  
* India also has a lot out outliers on the higher end.  
* The Unites States is still the country with the highest level of CO2 emissions.

**Conclusion**

* Outliers for China and India are due to recent results.  
* We can expect in a near future that India and China will surpass the United States.

Column {data-width=400}
-------------------------------------------------------------------

```{r, warning=FALSE, echo=FALSE, fig.width=8, fig.height=2}
# read in the CSV file
anomaly_df <- read.csv("https://datahub.io/core/global-temp/r/annual.csv")

# Create a new data frame with the projected Years
proj_Years <- data.frame(Year = 2023:2073)

# Use predict() to obtain the projected values of temp_anomaly
proj_temp_anomaly <- predict(lm(Mean ~ Year, data = anomaly_df), newdata = proj_Years)

# Combine the original data frame with the projected values
df_combined <- rbind(anomaly_df[, c("Year", "Mean")], data.frame(Year = proj_Years$Year, Mean = proj_temp_anomaly))

# Plot the data with projections
ggplot(df_combined, aes(x = Year, y = Mean)) +
  geom_line(color="Light Blue", fill= "Light Blue") +
  geom_ribbon(data = filter(df_combined, Year >= 2022), 
              aes(ymin = Mean *.9, ymax = Mean *1.1, fill = "Projection"), 
              alpha = 0.2) +
  scale_fill_manual(values = c("Projection" = "red")) +
  ggtitle("Global temperature anomaly") +
  theme_dark() +
  xlab("Year") +
  ylab("Temperature anomaly (°C)") +
  scale_x_continuous(breaks = seq(1880, 2080, by = 20)) +
  theme(plot.title = element_text(hjust = 0.5))

```
```{r, warning=FALSE, echo=FALSE, fig.width=8, fig.height=3}
# Load data from URL
url <- "https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv"
co2_data <- read.csv(url)

# Remove rows with missing CO2 emissions data
co2_data_filtered <- co2_data %>% filter(!is.na(co2))

co2_data_filtered <- co2_data_filtered %>%
  filter(country %in% c("China", "United States", "Russia", "India", "France"))

ggplot(co2_data_filtered,aes(x=country, y=co2, color=country)) +
  geom_boxplot() +
  theme_classic() +
  ggtitle("Last century maximum avearge Monthly Land Temperatures") +
  xlab('Country') +
  ylab("Temperature (\u00B0C)") +
  theme_light() +
  theme(panel.background = element_rect(fill = "lightblue")) +
  theme(plot.title = element_text(hjust = 0.5))
  
```

References {data-orientation=columns}
===================================================================
Column {data-width=400}
-------------------------------------------------------------------
**data sources below:**
  
* https://datahub.io/core/global-temp/r/annual.csv  
* https://data.worldbank.org/indicator/EN.ATM.CO2E.KT  
* https://data.giss.nasa.gov/gistemp/graphs/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.txt  
* "https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv"