In the current global climate news, we hear about predictions for sea levels rising high enough to sink cities such as Jakarta in Indonesia. Polar bears are often going hungry because they are losing their natural habitats to shrinking glaciers. Global temperatures have been sharply increasing over time. Carbon Dixoide emissions are greater than ever before and are largely impacted by large economies.
One interesting feature used that was not covered in class was plotting data to maps. In this project, global temperature data is plotted to maps! Please see the 4th section for Data Validation and Exploratory Analysis under the tab for Countries by Average Temperature.
The objective of this project is to observe how the climate and the environment have changed over time. Our planet is facing dramatic shifts in the climate and environment overall, with upward linear trends in sea levels and global temperatures. In this project, changes are observed over time in: sea levels, glacier thickness, global temperatures, and pollution emissions. Exploratory analysis is performed on the following various climate data:
We also observe human impact on the environment - more specifically Carbon Dioxide emissions by different countries. Most of the statistical analysis is done on the Carbonx Dioxide Emissions data for this project. The hypothesis is that some countries have higher Carbon Dixoide emissions than other countries. To support this claim, we use statistical analysis such as ANOVA to test our hypothesis and perform TUK analysis to compare countries’ Carbon Dioxide emissions. It would be interesting to see how the fossil fuel emissions from countries of major economies of the world differ amongst each other.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(DT)
library(dygraphs) # timeseries
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library(ggplot2)
library(jsonlite)
library(lubridate) # parse datetimes
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(readr)
library(rworldmap) # plot map
## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type : vignette('rworldmap')
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble 2.0.0 ✔ purrr 0.2.5
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ tibble 2.0.0 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ lubridate::intersect() masks base::intersect()
## ✖ dplyr::lag() masks stats::lag()
## ✖ lubridate::setdiff() masks base::setdiff()
## ✖ lubridate::union() masks base::union()
library(viridis) # map color
## Loading required package: viridisLite
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
Data Source
Below, we acquire data in JSON format. The data is copyrighted by CSIRO. Here is the data source site link: https://datahub.io/core/sea-level-rise#resource-csiro_alt_gmsl_yr_2015
CISRO Adjusted Sea level is measured in units of inches, representing changes in sea level. This data describes cumulative changes in sea level for the world’s oceans between years 1880 - 2014. It is based on a combination of long-term tide gauge measurements and recent satellite measurements. The average refers to the height of the ocean surface. It is said on the data source’s site, that the ocean floor has been gradually sinking since the last Ice Age peak which was 20,000 years ago!
json_seaLevelFile <- 'https://datahub.io/core/sea-level-rise/datapackage.json'
json_seaLevelData <- fromJSON(paste(readLines(json_seaLevelFile), collapse=""))
## Warning in readLines(json_seaLevelFile): incomplete final line found on
## 'https://datahub.io/core/sea-level-rise/datapackage.json'
# get list of all resources:
print(json_seaLevelData$resources$name)
## [1] "validation_report" "csiro_alt_gmsl_mo_2015_csv"
## [3] "csiro_alt_gmsl_yr_2015_csv" "csiro_recons_gmsl_mo_2015_csv"
## [5] "csiro_recons_gmsl_yr_2015_csv" "epa-sea-level_csv"
## [7] "csiro_alt_gmsl_mo_2015_json" "csiro_alt_gmsl_yr_2015_json"
## [9] "csiro_recons_gmsl_mo_2015_json" "csiro_recons_gmsl_yr_2015_json"
## [11] "epa-sea-level_json" "sea-level-rise_zip"
## [13] "csiro_alt_gmsl_mo_2015" "csiro_alt_gmsl_yr_2015"
## [15] "csiro_recons_gmsl_mo_2015" "csiro_recons_gmsl_yr_2015"
## [17] "epa-sea-level"
# get all tabular data(if exists any)
for(i in 1:length(json_seaLevelData$resources$datahub$type)){
if(json_seaLevelData$resources$datahub$type[i]=='derived/csv'){
path_to_file = json_seaLevelData$resources$path[i]
seaLevelData <- read.csv(url(path_to_file))
}
}
Here is the raw data:
datatable(seaLevelData, options = list(pageLength = 5))
Data Transformation
Here we perform a data transformation by: * Performing a select operation on the data from 2 specific columns, called: Year and CISRO Adjusted Sea Level.
* Converting the Year column to a datetime format, using the lubridate library, a Feature not covered in class.
seaLevels <- seaLevelData %>%
select("Year", "CSIRO.Adjusted.Sea.Level")
seaLevels$Year <- as_datetime(seaLevels$Year)
## Warning in strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz): unknown
## timezone 'zone/tz/2019a.1.0/zoneinfo/America/New_York'
# output to data table format
datatable(seaLevels, options = list(pageLength = 5))
Data Source
Below we acquire average cumulative mass balance of reference Glaciers worldwide, from years 1945 - 2014. Here is the data source’s website: https://datahub.io/core/glacier-mass-balance#readme Data is sourced from sourced from US EPA and the World Glacier Monitoring Service (WGMS).
Negative mass values point to a net loss of ice and snow compared with 1945. The units of mass measurement are in meters, representing changes in average glacier thickness.
json_glacierFile <- 'https://datahub.io/core/glacier-mass-balance/datapackage.json'
json_glacierData <- fromJSON(paste(readLines(json_glacierFile), collapse=""))
## Warning in readLines(json_glacierFile): incomplete final line found on
## 'https://datahub.io/core/glacier-mass-balance/datapackage.json'
# get list of all resources:
print(json_glacierData$resources$name)
## [1] "validation_report" "glaciers_csv"
## [3] "glaciers_json" "glacier-mass-balance_zip"
## [5] "glaciers"
# print all tabular data(if exists any)
for(i in 1:length(json_glacierData$resources$datahub$type)){
if(json_glacierData$resources$datahub$type[i]=='derived/csv'){
path_to_file = json_glacierData$resources$path[i]
glacierData <- read.csv(url(path_to_file))
}
}
Here is the data:
Data Transformation is done to convert the Year column to Year time format.
glacierData$Year <- strptime(glacierData$Year, "%Y")
datatable(glacierData, options = list(pageLength = 5))
Data Source
Below we acquire data on Carbox Dioxide emissions from fossil fuels by country, since the year of 1751, from a CSV file. Here is the data source’s website: https://datahub.io/core/co2-fossil-by-nation#data.
Data citation: Boden, T.A., G. Marland, and R.J. Andres. 2013. Global, Regional, and National Fossil-Fuel CO2 Emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A. doi 10.3334/CDIAC/00001_V2013
Here is the data:
nationEmissionsData <- read_csv("fossil-fuel-co2-emissions-by-nation_csv.csv")
## Parsed with column specification:
## cols(
## Year = col_double(),
## Country = col_character(),
## Total = col_double(),
## `Solid Fuel` = col_double(),
## `Liquid Fuel` = col_double(),
## `Gas Fuel` = col_double(),
## Cement = col_double(),
## `Gas Flaring` = col_double(),
## `Per Capita` = col_double(),
## `Bunker fuels (Not in Total)` = col_double()
## )
datatable(nationEmissionsData, options = list(pageLength = 5))
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
# This is emissions data from older years
oldestEmissions <- nationEmissionsData[which(nationEmissionsData$Year <= 1800),]
# This is emissions data from recent years
recentEmissions <- nationEmissionsData[which(nationEmissionsData$Year >= 2010),]
recentEmissionsCountries <- subset(recentEmissions, Country == "UNITED STATES OF AMERICA" | Country == "CHINA (MAINLAND)" | Country == "INDIA" | Country == "RUSSIAN FEDERATION" | Country == "JAPAN")
datatable(recentEmissionsCountries)
datatable(oldestEmissions)
Data Source
Below we acquire data from a CSV file on Global Land Tempatures by Country, between years 1850 and 2013. The raw data had over 1 million records so below we are displaying a random sample of the data for 50,000 records. Here is the data source’ website: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data
Additionally, we have created data frames for oldest data from 1850 and the site’s most recent data from 2013.
Data Transformations
temperatureData <- read_csv("GlobalLandTemperaturesByCity.csv")
## Parsed with column specification:
## cols(
## dt = col_date(format = ""),
## AverageTemperature = col_double(),
## AverageTemperatureUncertainty = col_double(),
## City = col_character(),
## Country = col_character(),
## Latitude = col_character(),
## Longitude = col_number()
## )
sampleTemp <- temperatureData %>% sample_n(size = 50,000)%>% group_by(dt)
sampleTemp$dt <- as_datetime(sampleTemp$dt)
#Parse date to year in data
sampleTempYr <- sampleTemp
sampleTempYr$dt <- as.numeric(format(sampleTempYr$dt,'%Y'))
#Get most recent data from year 2013
temperatureData$dt <- as.numeric(format(temperatureData$dt, '%Y'))
oldTemp <- temperatureData[which(temperatureData$dt == 1850),]
recentTemp <- temperatureData[ which(temperatureData$dt == 2013), ]
#datatable(recentTemp)
datatable(sampleTemp, options = list(pageLength = 5))
1 Feature we didn’t cover in class - Creating time series using dygraph library
Below is a feature that we did not cover in class: dygraph library. It helps us create a time series, with the X-axis representing time.
The time series plot below shows a linear, upward trend of sea levels. We see what looks like the sharpest rise around the year 1985.
analysisSeaLevels=xts(x = seaLevels$CSIRO.Adjusted.Sea.Level, order.by = seaLevels$Year)
dygraph(analysisSeaLevels)
The time series plot below shows an inverse relationship with Glacier Mass and time. As time increases, glacier mass decreases.
analysisGlaciers=xts(x = glacierData$Mean.cumulative.mass.balance, order.by = glacierData$Year)
dygraph(analysisGlaciers)
1 Feature we didn’t cover in class - displaying data onto maps.
We want to compare how temperatures from the past to recent years. Below are two maps displaying global temperature data in degrees Celsius, for years:
* 1850 - Oldest temperature data from our data set
* 2013 - Most recent temperature data from our data set
Please note that every time this rmd file is run, we will be generating new random samples from the raw data each time, for each year. So the graph will look different each time, with possibly different countries highlighted.
During my run of the code, there appeared to be a higher occurence of cooler colors of pink, blues and purples on the map.
map.world <- map_data('world')
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
oldTempMap <- left_join(map.world, oldTemp %>% sample_n(size=200), by = c('region' = 'Country'))
ggplot(data = oldTempMap, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = `AverageTemperature`)) +
scale_fill_viridis(option = 'plasma')+
labs(title = "Countries by Average Temperature - Year 1850", subtitle = "Global Warming", caption = "TBA") +
theme_bw()
During my run of the code, most of the map leaned towards the warmer colors of orange. North America appeared to have cooler colors of purple.
map.world <- map_data('world')
recentTempMap <- left_join(map.world, recentTemp %>% sample_n(size = 200), by = c('region' = 'Country'))
ggplot(data = recentTempMap, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = `AverageTemperature`)) +
scale_fill_viridis(option = 'plasma')+
labs(title = "Countries by Average Temperature - Year 2013", subtitle = "Global Warming", caption = "TBA") +
theme_bw()
Below we perform an ANOVA test to analyze variances in our CO2 emissions data by country. We want to analyze if there is a statistically significant difference in means of CO2 emissions, between different countries.
* Categorical variable: Country * Response variable: Total Emissions in units of ppm
Null hypothesis: There is no relation between countries and means of CO2 emissions.
Alternative hypothesis: There are differences between means of CO2 emissions between countries.
Below is a summary of emissions means across some countries that produce some of the highest CO2 emissions.
emissionsMeans <- round(tapply(recentEmissionsCountries$Total, recentEmissionsCountries$Country, mean), digits = 2)
emissionsMeans
## CHINA (MAINLAND) INDIA JAPAN
## 2677288.6 537393.0 330157.2
## RUSSIAN FEDERATION UNITED STATES OF AMERICA
## 477157.0 1429947.6
Here is a plot of emission means vs country. We can see on the graph that there are visible differences of emissions between countries. China shows to produce the highest emissions out of all countries in our data set.
plotmeans(recentEmissionsCountries$Total~recentEmissionsCountries$Country, digits=2, ccol='red', mean.labels=T, main='Plot of Fossil Fuel Emissions Means by Country')
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
This may not be enough to prove statistical difference. We see in the box plot below that there is very different variation and spread across countries for CO2 emissions. The red dots on the plot represent means. Thus, this analysis is not statistically significant enough for us to reject the null hypothesis.
ggplot(recentEmissionsCountries, aes(x=Country, y=Total, fill=Country)) +
geom_boxplot(alpha=0.4) +
stat_summary(fun.y=mean, geom="point", shape=20, size=10, color="red", fill="red") +
theme(legend.position="none") +
scale_fill_brewer(palette="Set3")
In conclusion, we see that there is a significantly statistical difference of CO2 emissions by country. See below:
We apply ANOVA below figure out if there is a true difference between emissions means and country or if it’s only due to sampling variability. We have a signifcantly high F value of 735.2 and a very low p-value of <2e-16 which is less than the normal scientific standard of 0.05. This means that the variance of CO2 emission means among the countries China, U.S., India, Russia, and Japan is much larger than variations of CO2 emissions within each country. This means that we can accept our alternative hypothesis that there are differences between means of CO2 emissions between countries.
anovaTest <- aov(recentEmissionsCountries$Total ~ recentEmissionsCountries$Country)
summary(anovaTest)
## Df Sum Sq Mean Sq F value Pr(>F)
## recentEmissionsCountries$Country 4 1.947e+13 4.867e+12 735.2 <2e-16 ***
## Residuals 20 1.324e+11 6.620e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For pair comparisons: In in the table below, we are applying a Tukey post hoc test to see which countries have signficant CO2 emissions differences from other countries. This further supports our claim that there are differences in CO2 emissions among different countries.
tuk<- TukeyHSD(anovaTest)
tuk
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = recentEmissionsCountries$Total ~ recentEmissionsCountries$Country)
##
## $`recentEmissionsCountries$Country`
## diff lwr
## INDIA-CHINA (MAINLAND) -2139895.6 -2293877.431
## JAPAN-CHINA (MAINLAND) -2347131.4 -2501113.231
## RUSSIAN FEDERATION-CHINA (MAINLAND) -2200131.6 -2354113.431
## UNITED STATES OF AMERICA-CHINA (MAINLAND) -1247341.0 -1401322.831
## JAPAN-INDIA -207235.8 -361217.631
## RUSSIAN FEDERATION-INDIA -60236.0 -214217.831
## UNITED STATES OF AMERICA-INDIA 892554.6 738572.769
## RUSSIAN FEDERATION-JAPAN 146999.8 -6982.031
## UNITED STATES OF AMERICA-JAPAN 1099790.4 945808.569
## UNITED STATES OF AMERICA-RUSSIAN FEDERATION 952790.6 798808.769
## upr p adj
## INDIA-CHINA (MAINLAND) -1985913.77 0.0000000
## JAPAN-CHINA (MAINLAND) -2193149.57 0.0000000
## RUSSIAN FEDERATION-CHINA (MAINLAND) -2046149.77 0.0000000
## UNITED STATES OF AMERICA-CHINA (MAINLAND) -1093359.17 0.0000000
## JAPAN-INDIA -53253.97 0.0053001
## RUSSIAN FEDERATION-INDIA 93745.83 0.7674649
## UNITED STATES OF AMERICA-INDIA 1046536.43 0.0000000
## RUSSIAN FEDERATION-JAPAN 300981.63 0.0657801
## UNITED STATES OF AMERICA-JAPAN 1253772.23 0.0000000
## UNITED STATES OF AMERICA-RUSSIAN FEDERATION 1106772.43 0.0000000
Below we see many signficant differences in C02 emissions between countries. Significant differences are represented by those lines which do not cross the zero value. This further asserts to accept the alternative hypothesis. Please note, the country names are accidentally cut off on the y-axis.
plot(tuk)
Closing comments
As predicted, we observe the following:
Sea levels Increase, Glacier Mass Decrease, Global Temperature Increase
With regards to the sea level data analysis, it is evident that seal levels have been sharply increasing with time. Glacier mass size has been sharply decreasing with time. Global temperatures have also been having a linear relationship with time.
Carbon Dioxide Emissions
From the statistical analysis we see that China appears to generate the world’s highest CO2 emissions, up to year 2014 from the presented data. This makes sense because according to current world news reports, China is one of, if not the largest economies in the world.
It is evident that climate change is occuring at an accelerated pace. For a decade into the future, I predict these observed data trends to continue. However, one can hope that these trends don’t continue and that people will work together to save our environment.