Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.
The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.
The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:
The raw data comes from the Berkeley Earth data page. A copy can be found at - https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
library(choroplethr)
library(choroplethrMaps)
library(choroplethrAdmin1)
<- function(x) {
remove_param gsub("\\s*\\([^\\)]+\\)","",as.character(x))
}
<- read_csv('C:/Users/dhari/Downloads/Archive/GlobalLandTemperaturesByState.csv',show_col_types = FALSE) df
options(dplyr.summarise.inform = FALSE)
<-na.omit(df)
df_India<- df_India %>%
df_India filter(Country=="India") %>%
separate(col = dt, into = c("Year", "Month", "Day"), convert = TRUE)
<- df_India %>%
df_India2 select(Year,AverageTemperature,State) %>%
group_by(Year,State) %>%
:: summarise(value=mean(AverageTemperature)) dplyr
colnames(df_India2)[2]<- "region"
$region<-tolower(df_India2$region)
df_India2$region<-tolower(df_India2$region)
df_India2<- df_India2 %>%
df_India1950 filter(Year==1950)
<- df_India1950[,2:3]
df_India1950
$region <- paste("state of", df_India1950$region , sep=" ")
df_India1950$region[25] <- paste("state of odisha")
df_India1950$region[33] <- paste("state of uttarakhand")
df_India1950
admin1_choropleth(country.name = "india", df_India1950, title = "Average Temperatures in India (1950)")
<- df_India2 %>%
df_India2013 filter(Year==2013)
<- df_India2013[,2:3]
df_India2013
$region <- paste("state of", df_India2013$region , sep=" ")
df_India2013$region[25] <- paste("state of odisha")
df_India2013$region[33] <- paste("state of uttarakhand")
df_India2013
admin1_choropleth(country.name = "india", df_India2013, title = "Average Temperatures in India (2013)")
<- as.data.frame(df_India2013$value-df_India1950$value)
diff_1950_2013 <- cbind(df_India2013$region, diff_1950_2013)
diff_1950_2013colnames(diff_1950_2013)[1]<- "region"
colnames(diff_1950_2013)[2]<- "value"
admin1_choropleth(country.name = "india", diff_1950_2013,title = "Average Temperature Increase in India (1950-2013)")
It would be interesting to perform similar analyses in other countries and continents to see how their temperature trends compare to those in the United States. Additionally, it would also be valuable to look at other climate variables beyond just temperature, such as precipitation, humidity, and atmospheric circulation, in order to get a more complete picture of the impacts of global warming. Understanding how different regions are affected by climate change is crucial for developing effective mitigation and adaptation strategies to reduce the negative impacts of climate change.
---
title: "Lab 2 Data Exploration and Visualization - Climate Change"
author: "Kopal Dhariwal"
date: "4/10/2023"
output:
flexdashboard::flex_dashboard:
orientation: columns
social: menu
source: embed
html_document:
df_print: paged
vertical_layout: fill
---
Overview & Objective {data-orientation=columns}
===================================================================
Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of "information overload" or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets.
The ability to synthesize large data sets using visualizations is a skill that all data scientists should have. In addition to this data scientists are called upon to present data syntheses and develop questions or ideas based on their data exploration. This lab should take you through the major steps in data exploration and presentation.
The objective of this laboratory is to survey the available data, plan, design, and create an information dashboard/presentation that not only explores the data but helps you develop questions based on that data exploration. To accomplish this task you will have to complete a number of steps:
1. Identify what information interests you about climate change.
2. Find, collect, organize, and summarize the data necessary to create your data exploration plan.
3. Design and create the most appropriate visualizations (no less than 5 visualizations) to explore the data and present that information.
4. Finally organize the layout of those visualizations into a dashboard (use the flexdashboard package) in a way that shows your path of data exploration.
5. Develop four questions or ideas about climate change from your visualizations.
### **Four Questions to answer through this analysis**
1. What is the trend of surface temperature? How much has it increased?
2. How have temperatures changed across cities and states in the United States across all the months?
3. How have temperatures changed across United States?
4. Do we see an effect on the major cities in United States?
5. How has global warmimg affected a developing country like India?
The raw data comes from the Berkeley Earth data page.
A copy can be found at - https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
```{r setup, include=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(data.table)
library(dplyr)
library(tidyr)
library(tidyverse)
library(ggplot2)
library(zoo)
library(randomForest)
library(maps)
```
Is climate change real? {data-orientation=columns}
===================================================================
Sidebar {.sidebar}
-------------------------------------------------------------------
In recent years, global warming has had widespread effects, including more frequent heat waves with record-breaking temperatures. For instance, in France, there were several instances of record high temperatures this past summer (2019), leading to an increase in hospitalizations due to heat stroke. There has also been a rise in the frequency of dangerous storms, including winter-related storms and several hurricanes that hit Houston and Puerto Rico in 2017.
We see how drastic of an effect we get by graphing the landmass temperature.
In particular, we see a 1.5 degree celsius increase in land mass temperature.
Although there is a wealth of data on the effects of global warming around the world, we will only be focusing on temperature changes in different cities in the United States. It's important to note that by doing this, we are ignoring many other crucial impacts of global warming, such as rising sea levels and the increased prevalence of extreme weather events. To make things more manageable, we will only consider data from the period between 1960 and 2014.
It is important to note the following effects.
1. One significant constraint in our analysis is the concept of an "urban heat island," which refers to the phenomenon that cities become warmer as they grow, to a greater extent than their surrounding areas. However, urban areas do not contribute significantly to global land mass, which implies that this effect has a relatively insignificant impact on global warming.
2. This statement highlights the difference in temperature increase between land and ocean areas. Landmass temperatures have increased by around twice as much as ocean temperatures. Therefore, it is important to note that a 1.5°C increase in land mass temperature since 1900 does not necessarily equate to the same increase in global temperature.
3. Global warming affects different regions of the world in different ways. For example, projections suggest that Europe will experience more severe warming, with snowfall predicted to become extremely rare by 2100. In contrast, central Africa is expected to be relatively less impacted, and there may be a potential increase in habitation of the Sahara as a means of escaping the heat. It is important to note that the effects of global warming on the United States may not necessarily be representative of the effects in other continents, so caution should be taken when extrapolating findings.
Column {data-width=500}
-------------------------------------------------------------------
```{r, include=FALSE}
world_wide <- read_csv("C:/Users/dhari/Downloads/Archive/GlobalTemperatures.csv")
```
```{r, warning=FALSE,echo=FALSE}
world_wide %>%
ggplot(aes(x = dt, y = LandAverageTemperature)) +
geom_smooth(method="loess") +
# geom_line() +
labs(title="Average Landmass Temperature",
x="Year",
y="Average Temperature (°C)")
```
Seasonal Effect {data-orientation=columns}
===================================================================
Sidebar {.sidebar}
-------------------------------------------------------------------
The analysis of temperature trends in the United States by using locally weighted regression to smooth out the data. This method allows for more flexibility in shaping trends, and you will be analyzing the average landmass temperature throughout the country.
Averaging temperatures across all cities and averaging temperatures across all states can give different results, as the number of cities in each state and their geographical distribution can affect the overall temperature trend. Averaging across all cities may be more sensitive to extreme temperatures in a few areas, while averaging across all states can provide a broader and more representative picture of temperature trends across a larger region. It may be worth exploring both methods and comparing the results to better understand temperature trends in the United States.
It is indeed expected to see higher temperatures in cities due to the urban heat island effect. The fact that winter months are affected more than summer months can be attributed to the fact that winter nights are longer and therefore have more time to cool off, resulting in a greater temperature drop.
The irregularities in the temperature trends can be due to a variety of factors such as regional weather patterns, changes in land use, and variations in climate systems. It is also possible that the irregularities are due to measurement errors or inaccuracies in the data. Regardless, it is important to carefully examine and analyze the data to identify the underlying causes of the irregularities and ensure the accuracy of the results.
Column {data-width=500}
-------------------------------------------------------------------
```{r, include=FALSE}
by_city <- read_csv("C:/Users/dhari/Downloads/Archive/GlobalLandTemperaturesByCity.csv")
# Remove the missing values
by_city <- na.omit(by_city)
#####
by_city <- mutate( by_city,
Month = as.numeric(format(dt,"%m")),
MonthString = format(dt,"%B"),
Year = as.numeric(format(dt,"%Y")))
by_city <- filter(by_city, Year >= 1960, Country == 'United States')
```
```{r, include=FALSE, warning=FALSE, message=FALSE}
remove_param <- function(x) {
gsub("\\s*\\([^\\)]+\\)","",as.character(x))
}
us <- read_csv('C:/Users/dhari/Downloads/Archive/GlobalLandTemperaturesByState.csv')
us <- na.omit(us)
us <- mutate(
us,
Month = as.numeric(format(dt,"%m")),
MonthString = format(dt,"%B"),
Year = as.numeric(format(dt,"%Y"))
)
us <- filter(us, Year >= 1960, Country == 'United States')
us <- mutate(us, State = tolower(State))
us <- mutate(us, State = remove_param(State))
states_by_data <- unique(us$State)
states_map <- map_data("state")
states_by_map <- unique(states_map$region)
```
```{r,echo=FALSE}
by_city %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_smooth(method="loess") +
labs(title="Average Temperature (By City)",
x="Year",
y="Average Temperature",
colour="Month")
```
```{r,echo=FALSE}
us %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_smooth(method="loess") +
labs(title="Average Temperatures (By State)",
x="Year",
y="Average Temperature",
colour="Month")
```
State-Level Changes {data-orientation=columns}
===================================================================
Sidebar {.sidebar}
-------------------------------------------------------------------
Using a linear assumption for the increase in temperature is a simplification and not entirely accurate, as the previous plot showed that temperature trends can be nonlinear and can vary over time. However, it can be a useful way to estimate the average increase in temperature for each state.
By using a linear model to estimate the increase in temperature, we can calculate the slope of the temperature trend line for each state and use this as a measure of the average increase in temperature over time. The states with the steepest slope would have the largest average increase in temperature.
It's important to keep in mind that this method does have limitations and should not be the only way to analyze temperature trends.
We see two prominent trends.
1. There are greater increases in the west coast United States than the east coast.
2. There are greater increases in the northern states than southern.
This makes sense. The western and northern states are known to be much cooler on average.
Column {data-width=500}
-------------------------------------------------------------------
```{r, include=FALSE}
extract_coef <- function(state) {
df <- filter(us, State==state)
linear_model <- lm(df$AverageTemperature ~ df$Year)
summary(linear_model)$coefficients[2, 4]
}
temps <- c()
for (i in states_by_map) {
temps <- c(temps, extract_coef(i))
}
df_coef <- tibble(
State=states_by_map,
TemperatureIncrease=temps
)
```
```{r,echo=FALSE}
temperature_map <- merge(states_map, df_coef, by.x = "region", by.y = "State")
ggplot(filter(temperature_map), aes(x = long, y = lat, group = group, fill = TemperatureIncrease)) +
geom_polygon(colour = "black") +
coord_map("polyconic")
```
Focusing on Cities {data-orientation=columns}
===================================================================
Sidebar {.sidebar}
-------------------------------------------------------------------
Let us look at a couple representative metropolitan areas. Namely, Los Angeles, Seattle, New York City, and Miami. Not in all cases we see an increase population density, hence we expect an increase in temperatures even with no global warming effect, due to the urban heat island effect.
Los Angeles -
Los Angeles, known for it's uniformly sunny warm weather year 'round receives a fairly uniform increases month to month.
Seattle -
Similar to Los Angeles.
New York City -
New York City, which is known for it's cold winters, probably sees its biggest jump during the winter months.
During spring, summer, and fall temperatures see a small increase.
Miami -
Miami is a little a little finicky. Each month seems to have its own pattern, and there isn't any resemblance of a uniform increase. For instance, January actually dipped in temperature, before rising again, and then settling back down to where it originally was. This indicates a lot of noise in the data.
Column {data-width=500}
-------------------------------------------------------------------
```{r,echo=FALSE}
by_city %>%
filter(City == "Los Angeles") %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_point() +
geom_smooth(method="loess") +
labs(title="Average Temperature (Los Angeles)",
x="Year",
y="Average Temperature (°C)",
colour="Month")
```
```{r,echo=FALSE}
by_city %>%
filter(City == "Seattle") %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_point() +
geom_smooth(method="loess") +
labs(title="Average Temperature (Seattle)",
x="Year",
y="Average Temperature (°C)",
colour="Month")
```
```{r,echo=FALSE}
by_city %>%
filter(City == "New York") %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_point() +
geom_smooth(method="loess") +
labs(title="Average Temperature (New York City)",
x="Year",
y="Average Temperature (°C)",
colour="Month")
```
```{r,echo=FALSE}
by_city %>%
filter(City == "Miami") %>%
ggplot(aes(x = dt, y = AverageTemperature, color = reorder(MonthString, -AverageTemperature, mean))) +
geom_point() +
geom_smooth(method="loess") +
labs(title="Average Temperature (Miami)",
x="Year",
y="Average Temperature (°C)",
colour="Month")
```
Looking at India {data-orientation=columns}
===================================================================
Sidebar {.sidebar}
-------------------------------------------------------------------
The average temperature increase in interior provinces of India is higher than that of the coastal ones.
This observation must make us aware of different challenges brought by climate change, since presently, the main discussion about global warming takes place about coastal areas and flooding, not continental areas and drought.
Column {data-width=500}
--------------------------------------------------------------------------
```{r}
library(choroplethr)
library(choroplethrMaps)
library(choroplethrAdmin1)
```
```{r}
remove_param <- function(x) {
gsub("\\s*\\([^\\)]+\\)","",as.character(x))
}
df <- read_csv('C:/Users/dhari/Downloads/Archive/GlobalLandTemperaturesByState.csv',show_col_types = FALSE)
```
```{r}
options(dplyr.summarise.inform = FALSE)
df_India<-na.omit(df)
df_India <- df_India %>%
filter(Country=="India") %>%
separate(col = dt, into = c("Year", "Month", "Day"), convert = TRUE)
df_India2 <- df_India %>%
select(Year,AverageTemperature,State) %>%
group_by(Year,State) %>%
dplyr :: summarise(value=mean(AverageTemperature))
```
```{r}
colnames(df_India2)[2]<- "region"
df_India2$region<-tolower(df_India2$region)
df_India2$region<-tolower(df_India2$region)
df_India1950 <- df_India2 %>%
filter(Year==1950)
df_India1950 <- df_India1950[,2:3]
df_India1950$region <- paste("state of", df_India1950$region , sep=" ")
df_India1950$region[25] <- paste("state of odisha")
df_India1950$region[33] <- paste("state of uttarakhand")
admin1_choropleth(country.name = "india", df_India1950, title = "Average Temperatures in India (1950)")
```
```{r}
df_India2013 <- df_India2 %>%
filter(Year==2013)
df_India2013 <- df_India2013[,2:3]
df_India2013$region <- paste("state of", df_India2013$region , sep=" ")
df_India2013$region[25] <- paste("state of odisha")
df_India2013$region[33] <- paste("state of uttarakhand")
admin1_choropleth(country.name = "india", df_India2013, title = "Average Temperatures in India (2013)")
```
```{r}
diff_1950_2013 <- as.data.frame(df_India2013$value-df_India1950$value)
diff_1950_2013<- cbind(df_India2013$region, diff_1950_2013)
colnames(diff_1950_2013)[1]<- "region"
colnames(diff_1950_2013)[2]<- "value"
admin1_choropleth(country.name = "india", diff_1950_2013,title = "Average Temperature Increase in India (1950-2013)")
```
# Conclusion
It would be interesting to perform similar analyses in other countries and continents to see how their temperature trends compare to those in the United States. Additionally, it would also be valuable to look at other climate variables beyond just temperature, such as precipitation, humidity, and atmospheric circulation, in order to get a more complete picture of the impacts of global warming. Understanding how different regions are affected by climate change is crucial for developing effective mitigation and adaptation strategies to reduce the negative impacts of climate change.