Instruction

Row {data-height=320}


Overview

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

Row

Objective

The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

Identify what information interests you about climate change. Find, collect, organize, and summarize the data necessary to create your data exploration plan. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration. Develop four questions or ideas about climate change from your visualizations.

Dates & Deliverables

You are responsible for submitting a link to your dashboard hosted on the Rpubs site. The dashboard must include the source_code = embed parameter.

The due date for this project is XX at the start of class. This assignment is worth 75 points, 3x a normal homework, the additional time should allow you to spend the neccessary effort on this assignment.

You are welcome to work in groups of ≤2 people. However, each person in a group must submit their own link to the assignment on Canvas for grading! Each team member can submit the same link to a single rpubs account, however it may be a good idea for each of you to post your own copy to rpubs in case you want to share it to prospective employers.

Methods Help

There are lots of places we can get climate data to answer your questions. The simplest would be to go to NOAA National Centers for Environmental Information (https://www.ncdc.noaa.gov/). There are all kinds of data here (regional, global, marine). Also, on the front page of the NOAA website there are also other websites that have climate data, such as: (https://www.climate.gov/), (https://www.weather.gov/), (https://www.drought.gov/drought/), and (https://www.globalchange.gov/). Obviously, you don’t have to use all of them but it might be helpful to browse them to get ideas for the development of your questions.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.

knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(ggplot2)
library(plotly)
library("RColorBrewer")
library(lubridate)
library(ggridges)
library(maps)
library(ggmap)
library(maptools)
theme_set(theme_bw())

Questions

Row


Questions

Row

  1. Are temperatures actually rising? Is global warming real?

  2. What are the factors that impact global temperatures? Carbon dioxide is the most important factor

  3. Which countries have emitted the most carbon dioxide? Which countries have had the most temperature increase?

  4. What will be the temperature in a few years if current trends continue?

Temp change yoy

Row


Summary

Temperature is continuing to increase from one year to another. Global warming is real despite a large number of people denying the phenomenon.

Row

Factors impacting Temperature

Row


Summary

Temperature is increasing due to multiple factors: Greenhouse gas emissions (C02,CH4, N20), Total solar index (TSI), etc. Aerosols, on the other hand, lead to a cooling effect.

Row

Row

C02 emissions

Country Data

Row


Summary

Visualizing the temperature increases for each country and the countries that contributed to carbon emissions.

Row

Tmperature Change by Country

Carbon Emissions by Country

Temperature Prediction

Row

### Summary
Predicting future temperatures with a regression model built from historical data.
Row

References

Row


---
title: "Lab2 - Data Exploration and Analysis Laboratory"
author: "Anushmita Roy Choudhury"
output:
  flexdashboard::flex_dashboard:
    orientation: rows
    social: menu
    source: embed
    Horizontal_layout: fill
    vertical_layout:   scroll
   
    
---
                    

```{r setup, include=FALSE}
library(flexdashboard)
```

  
  Instruction
===================================== 
  Row {data-height=320}
  
  
  
  
-------------------------------------
  
### **Overview**

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of "information overload" or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.

The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.

Row {data-height=680}
-------------------------------------

### **Objective** 

The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:

Identify what information interests you about climate change.
Find, collect, organize, and summarize the data necessary to create your data exploration plan.
Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
Develop four questions or ideas about climate change from your visualizations.


### **Dates & Deliverables**
  
You are responsible for submitting a link to your dashboard hosted on the Rpubs site. The dashboard must include the source_code = embed parameter.

The due date for this project is XX at the start of class. This assignment is worth 75 points, 3x a normal homework, the additional time should allow you to spend the neccessary effort on this assignment.

You are welcome to work in groups of ≤2 people. However, each person in a group must submit their own link to the assignment on Canvas for grading! Each team member can submit the same link to a single rpubs account, however it may be a good idea for each of you to post your own copy to rpubs in case you want to share it to prospective employers.

### **Methods Help**


There are lots of places we can get climate data to answer your questions. The simplest would be to go to NOAA National Centers for Environmental Information (https://www.ncdc.noaa.gov/). There are all kinds of data here (regional, global, marine). Also, on the front page of the NOAA website there are also other websites that have climate data, such as: (https://www.climate.gov/), (https://www.weather.gov/), (https://www.drought.gov/drought/), and (https://www.globalchange.gov/). Obviously, you don’t have to use all of them but it might be helpful to browse them to get ideas for the development of your questions.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.


```{r, echo = TRUE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(ggplot2)
library(plotly)
library("RColorBrewer")
library(lubridate)
library(ggridges)
library(maps)
library(ggmap)
library(maptools)
theme_set(theme_bw())

```

```{r Clean_Data,echo = FALSE, message = FALSE}
Data= read.csv("climate_change.csv",TRUE, sep = ",", stringsAsFactors = FALSE)
#summary(Data)
#dim(Data)
Data$Month[Data$Month == 1]= "Jan"
Data$Month[Data$Month == 2]= "Feb"
Data$Month[Data$Month == 3]= "Mar"
Data$Month[Data$Month == 4]= "Apr"
Data$Month[Data$Month == 5]= "May"
Data$Month[Data$Month == 6]= "Jun"
Data$Month[Data$Month == 7]= "Jul"
Data$Month[Data$Month == 8]= "Aug"
Data$Month[Data$Month == 9]= "Sep"
Data$Month[Data$Month == 10]= "Oct"
Data$Month[Data$Month == 11]= "Nov"
Data$Month[Data$Month == 12]= "Dec"
Data = na.omit(Data)
#dim(Data)
TemperatureChangeData=read.csv("Temperature_change_Data.csv",TRUE, sep = ",", stringsAsFactors = FALSE)
#summary(TemperatureChangeData)
 
colnames(TemperatureChangeData)[2] ='region'
TemperatureChangeData$region=tolower(TemperatureChangeData$region)
TemperatureChangeData$carbon_emissions=as.numeric(TemperatureChangeData$carbon_emissions)
TemperatureChangeData=na.omit(TemperatureChangeData)

```
Questions 
===================================== 
Row

------------------------------------

### **Questions**  


Row 
-------------------------------------
1. Are temperatures actually rising? Is global warming real? 

2. What are the factors that impact global temperatures? Carbon dioxide is the most important factor

3. Which countries have emitted the most carbon dioxide? Which countries have had the most temperature increase?

4. What will be the temperature in a few years if current trends continue?


Temp change yoy
===================================== 
Row 

-------------------------------------

### **Summary** 


Temperature is continuing to increase from one year to another. Global warming is real despite a large number of people denying the phenomenon.

Row       
-------------------------------------

###                                                                                     
```{r,echo = FALSE, message = FALSE}

Data_ymd = Data %>%
  mutate(year_month = ymd(paste(Data$Year, Data$Month, truncated = 1))) 

L1 = ggplot(Data_ymd, aes(year_month, Temp)) + 
  geom_line() + 
  geom_smooth(se=FALSE, linetype = "dotted") + 
  labs(title = "Temperature (1983-2008)",
       x = "Year", 
       y = "Temperature") +
  theme(plot.title = element_text(hjust = 0.5))
ggplotly(L1)

```
###                                          
```{r,echo = FALSE, message = FALSE }
library(ggridges)
ggplot(Data, aes(x = Temp, y = as.factor(Year))) + 
  geom_density_ridges_gradient(aes(fill = ..x..), 
                               scale = 3, size = 0.3, alpha = 0.5) +
  scale_fill_gradientn(colours = c("#0D0887FF", "#CC4678FF", "#F0F921FF"),
                       name = "Temp") +
  labs(title = 'Temperature density') + 
  theme(legend.position = c(0.9,0.2)) +
  xlab("Temperature") + 
  ylab("Year")+theme_minimal(base_size = 10)

```

Factors impacting Temperature
===================================== 
Row

------------------------------------

### **Summary**  


Temperature is increasing due to multiple factors: Greenhouse gas emissions (C02,CH4, N20), Total solar index (TSI), etc. Aerosols, on the other hand, lead to a cooling effect.  


Row                                                           
--------------------------------

###
```{r,echo = FALSE, message = FALSE}
library(ggplot2)
mg1 = ggplot(Data, aes(x = CO2, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("C02 concetration") + ylab("Temperature") +geom_point(color = "firebrick")


mg2 = ggplot(Data, aes(x = N2O, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("Nitrous Oxide") + ylab("Temperature") +geom_point(color = "firebrick")

mg3 = ggplot(Data, aes(x = CH4, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("Methane") + ylab("Temperature") +geom_point(color = "firebrick")
library(gridExtra)
grid.arrange(mg1, mg2,mg3, ncol=3)

```

### 
```{r,echo = FALSE, message = FALSE}

mg4 = ggplot(Data, aes(x = Aerosols, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("Aerosols") + ylab("Temperature") +geom_point(color = "firebrick")

mg5 = ggplot(Data, aes(x = CFC.11+CFC.12, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("Chlorofluoromethanes") + ylab("Temperature") +geom_point(color = "firebrick")

mg6= ggplot(Data, aes(x = TSI, y = Temp))+
    geom_point() +
    stat_smooth(method = "lm",
        col = "#C42126",
        se = FALSE,
        size = 1) + xlab("Total solar irradience") + ylab("Temperature") +geom_point(color = "firebrick")


library(gridExtra)
grid.arrange(mg4,mg5,mg6, ncol=3)

```

Row                                                           
--------------------------------
### C02 emissions
```{r,echo = FALSE, message = FALSE}

Tempertature_vs_Year = ggplot(Data, aes(x = Year, y = Temp, fill = CO2)) + 
  xlab("Year") +
  ylab("Temperature") +
  theme_minimal(base_size = 8)
barplot = Tempertature_vs_Year +
  geom_bar( position = "dodge", stat = "identity",color= "white")
ggplotly(barplot)

```

Country Data
===================================== 

Row


------------------------------------

### **Summary**  


Visualizing the temperature increases for each country and the countries that contributed to carbon emissions.

Row
-------------------------------------

### Tmperature Change by Country
```{r,echo = FALSE, message = FALSE}
library(ggmap)
library(maptools)
library(maps)
library(mapproj)
country=map_data("world")
country$region=tolower(country$region)
temp = merge(country, TemperatureChangeData, by="region")

ggplot() +
  geom_map(
    data = temp, map = temp,
    aes(long, lat, map_id = region),
    color = "white", fill = "lightgray", size = 0.1
  ) +
  geom_point(
    data = temp,
    aes(long, lat, color = tem_change),
    alpha = 0.7
  ) 

```


###  Carbon Emissions by Country             
```{r,echo = FALSE, message = FALSE}


ggplot() +
  geom_map(
    data = temp, map = temp,
    aes(long, lat, map_id = region),
    color = "white", fill = "lightgray", size = 0.1
  ) +
  geom_point(
    data = temp,
    aes(long, lat, color = carbon_emissions),
    alpha = 0.7
  ) 


```

Temperature Prediction
===================================== 
Row


-------------------------------------
### **Summary**  

Predicting future temperatures with a regression model built from historical data. 

Row
-------------------------------------
###
```{r,echo = FALSE, message = FALSE}
model = lm(Temp ~MEI+CO2+CH4+N2O+CFC.11+CFC.12+TSI+Aerosols, data=Data)
#summary(model)
model1 = lm(Data$Temp ~MEI+CO2+N2O+CFC.11+CFC.12+TSI+Aerosols, data=Data)
#summary(model1)
Data_ts=ts(Data)
NewData=as.data.frame(Data_ts)
NewData$PredictTemp=predict(model1,newdata = NewData)
ggplot(NewData, aes(Year)) +geom_line(aes(y = Temp, colour = "Temperature")) +   geom_line(aes(y = PredictTemp, colour = "Predicted Temperature"))+scale_color_manual(values=c("darkcyan", "sienna"))

```







References
===================================== 
Row


------------------------------------

###



Row {data-height=20}
-------------------------------------

https://www.ncdc.noaa.gov/

Row {data-height=200}
-------------------------------------

https://www.kaggle.com/vageeshabudanur/riseintemp-dataset