Climate change is one of the most fiercely debated scientific issues of the past 20 years. Human-induced warming is superimposed on a naturally varying climate, the temperature rise has not been, and will not be, uniform or smooth across the country or over time.
Thousands of studies conducted by researchers around the world have documented increases in temperature at Earth’s surface, as well as in the atmosphere and oceans. Many other aspects of global climate are changing as well. Human activities, especially emissions of heat-trapping greenhouse gases from fossil fuel combustion, deforestation, and land-use change, are the primary driver of the climate changes observed in the industrial era.
The air, water and land of our planet Earth are all linked to its atmosphere through the exchange of gases. Natural and human activities foster exchange of many types of gases between the earth and its atmosphere, including clouds / water vapor (\(H_2O\)), oxygen (\(O_2\)) and carbon dioxide (\(CO_2\)). Some of these are called Greenhouse Gases (GHG) due to the similar heat-trapping effect they have to the ‘greenhouses’ used to manage growing conditions of plants. These gases - especially \(CO_2\), nitrous oxide (\(N_2O\)) and methane (\(CH_4\)) - absorb radiation from the Earth’s surface, clouds and gas molecules and trap it as heat within the lower levels of the atmosphere (see Change (2001)).
Climate change fatalities are generally linked to four different catalysts:
In this project I am concentrating on the reason for Rising Temperature. For which, I have taken climate change data to visualize the temerature variation through Exploratory Data Analysis using \(R\) studio.
There have been many studies documenting that the average global temperature has been increasing over the last century. The consequences of a continued rise in global temperature will be dire. Rising sea levels and an increased frequency of extreme weather events will affect billions of people.
In this problem, we will attempt to study the relationship between average global temperature and several other factors.
climate_change <- read.csv(file = "climate_change.csv",TRUE, sep = ",", stringsAsFactors = FALSE)
This data is chosen from https://www.kaggle.com/vageeshabudanur/riseintemp-dataset
The file climate_change.csv
contains climate data from May 1983 to December 2008. The available variables include:
Here \(CO_2\), \(N_2O\) and \(CH_4\) are expressed in ppmv (parts per million by volume – i.e., 397 ppmv of \(CO_2\) means that \(CO_2\) constitutes 397 millionths of the total volume of the atmosphere). CFC.11 and CFC.12 are expressed in ppbv (parts per billion by volume).
I used openrefine and converted all the columns to number except year and month.
We can see the strcture and dimension of the data set using the command:
str(climate_change)
## 'data.frame': 308 obs. of 11 variables:
## $ Year : int 1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
## $ Month : int 5 6 7 8 9 10 11 12 1 2 ...
## $ MEI : num 2.556 2.167 1.741 1.13 0.428 ...
## $ CO2 : num 346 346 344 342 340 ...
## $ CH4 : num 1639 1634 1633 1631 1648 ...
## $ N2O : num 304 304 304 304 304 ...
## $ CFC_a : num 191 192 193 194 194 ...
## $ CFC_b : num 350 352 354 356 357 ...
## $ TSI : num 1366 1366 1366 1366 1366 ...
## $ Aerosols: num 0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416 ...
## $ Temp : num 0.109 0.118 0.137 0.176 0.149 0.093 0.232 0.078 0.089 0.013 ...
dim(climate_change)
## [1] 308 11
climate_change$Month[climate_change$Month == 1]<- "Jan"
climate_change$Month[climate_change$Month == 2]<- "Feb"
climate_change$Month[climate_change$Month == 3]<- "March"
climate_change$Month[climate_change$Month == 4]<- "April"
climate_change$Month[climate_change$Month == 5]<- "May"
climate_change$Month[climate_change$Month == 6]<- "June"
climate_change$Month[climate_change$Month == 7]<- "July"
climate_change$Month[climate_change$Month == 8]<- "Aug"
climate_change$Month[climate_change$Month == 9]<- "Sep"
climate_change$Month[climate_change$Month == 10]<- "Oct"
climate_change$Month[climate_change$Month == 11]<- "Nov"
climate_change$Month[climate_change$Month == 12]<- "Dec"
Before cleaning and analyzing the data set we will include some libraries that needed in R
Markdown.
library(dplyr)
library(ggplot2)
library(plotly)
library("RColorBrewer")
where
dplyr
provides a set of tools for efficiently manipulating datasets in R
, it focuses on data frames.ggplot2
is a data visualization package.plotly
provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries.RColorBrewer
package has a variety of sequential, divergent and qualitative palettes that has color palettes.It might happen that your dataset is not complete, and when information is not available we call it missing values. In R
, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number).
Now create a new data set without missing data.
new_climate_change <- na.omit(climate_change)
Now we look at the dimension of the new data frame.
dim(new_climate_change)
## [1] 308 11
We can observe that dimension of the orginal data frame and new data frame is same. Thus the given data frame is complete.
EDA aims to find patterns and relationships in data.
climate_change_chart <- ggplot(climate_change, aes(x = Year, y = Temp, fill = CO2)) +
xlab("Year") +
ylab("Temperature") +
theme_minimal(base_size = 14)
barplot <- climate_change_chart +
geom_bar( position = "dodge", stat = "identity",color= "white")
ggplotly(barplot)
library(lubridate)
# adding Year-Month variable as date
climate_change_ymd <- climate_change %>%
mutate(year_month = ymd(paste(climate_change$Year, climate_change$Month, truncated = 1)))
L1 <- ggplot(climate_change_ymd, aes(year_month, Temp)) +
geom_line() +
geom_smooth(se=FALSE, linetype = "dotted") +
labs(title = "Temperature (1983-2008)",
x = "Year",
y = "Temperature") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(L1)
We can see from the plot that there has been a steady rise in temperature over the years. If its a permanent shift in the trend or seasonality, we don’t know. However, lets see if there’s any seasonality to temperature across the year.
# Temperature Month-wise plot for each year in data
Tg <- ggplot(climate_change, aes(as.factor(Month), Temp)) +
geom_point(aes(color = as.factor(Year))) +
geom_line(aes(group = as.factor(Year),
color = as.factor(Year)),
alpha = 0.7) +
labs(title = 'Temperature by month') +
xlab("Months") +
ylab("Temperature") +
theme(axis.text.x = element_text(size = 6,angle = 90,hjust = 0.5, vjust = 0.5))
# theme(legend.position = "none")
ggplotly(Tg)
We can see that the temperature has been steadily rising across years but this plot is little bit crowded, so lets try another approach and plot a ‘Temperature-density’ distribution.
library(ggridges)
ggplot(climate_change, aes(x = Temp, y = as.factor(Year))) +
geom_density_ridges_gradient(aes(fill = ..x..),
scale = 3, size = 0.3, alpha = 0.5) +
scale_fill_gradientn(colours = c("#0D0887FF", "#CC4678FF", "#F0F921FF"),
name = "Temp") +
labs(title = 'Temperature density') +
theme(legend.position = c(0.9,0.2)) +
xlab("Temperature") +
ylab("Year")+theme_minimal(base_size = 10)
From the plot it is clear that for last few decades we have an uptick in temperature. Although there is a dip from 2005 through 2008, we notice that that extreme temperatures (>0.5° C) occurred in all years since 2000.
library(ggpubr)
#par(mfrow=c(2,2))
scat_plot1 <- ggplot(climate_change_ymd, aes(year_month, CO2))+geom_line(colour="blueviolet")+geom_smooth(method = "lm")+ggtitle("Carbon Dioxide")
scat_plot2<- ggplot(climate_change_ymd, aes(year_month, N2O))+geom_line()+geom_smooth(method = "lm")+ggtitle("Nitrous Oxide")
scat_plot3<- ggplot(climate_change_ymd, aes(year_month, CH4))+geom_line(colour="springgreen4")+geom_smooth(method = "lm")+ggtitle("Methane")
scat_plot4 <- ggplot(climate_change_ymd, aes(year_month, MEI))+geom_line(colour="mediumorchid4")+ggtitle("MEI")
grapgh_arrange<-ggarrange(scat_plot1, scat_plot2, scat_plot3, scat_plot4 + rremove("x.text"),
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
annotate_figure(grapgh_arrange,
top = text_grob("Vartations of CO_2, N2O, CH4 and MEI by year", color = "red", face = "bold", size = 14)
)
Here to arrange multiple ggplots on one single page, I used the function ggarrange()
. To use this code we need to install ggpubr
.
Carbon Dioxide (\(CO_2\)) : Carbon dioxide is a Greenhouse which basically traps the solar enegery and helps warm up the planet. As carbon dioxide in the atmosphere increases, the greenhouse effect is strengthened, and a greater degree of warming occurs. From the graph it is clear that \(CO_2\) levels have been constantly increasing in atmosphere.
Nitrous Oxide (\(N_2O\)) : Nitrous Oxide has lot more Global Warming Potential (GWP) than \(CO_2\) or \(N_2O\). Effects of temperature on nitrous oxide (\(N_2O\)) is well explained in the article Paudel et al. (Paudel et al. (2015)).
Methane (\(CH_4\)) : Methane is 20-30% more effective at absorbing infrared radiation than Carbon dioxide. We can see that its levels are constantly rising. Due to climate change, more methane is bubbling up from lakes, ponds, rivers and wetlands throughout the world. The release of methane, a potent greenhouse gas, leads to a further increase in temperature, thus creating a vicious circle.
Multivariate EI Nino Southern Oscillation Index (MEI) : There is a seasonality to MEI but not a continuous positive or negative trend over the years. For more details about MEI refer https://bobtisdale.wordpress.com/2010/09/11/the-multivariate-enso-index-mei-captures-the-global-temperature-impacts-of-enso-differently-than-sst-based-indices/
scat_plot5 <- ggplot(climate_change_ymd, aes(year_month, CFC_a))+geom_line(colour="blue")+ggtitle("CFC-11") +
ylab("CFC-11")
scat_plot6<- ggplot(climate_change_ymd, aes(year_month, CFC_b))+geom_line(colour="green")+ggtitle("CFC-12") +
ylab("CFC-12")
scat_plot7<- ggplot(climate_change_ymd, aes(year_month, TSI))+geom_line(colour="red")+ggtitle("TSI")
scat_plot8 <- ggplot(climate_change_ymd, aes(year_month, Aerosols))+geom_line(colour="magenta")+ggtitle("Aerosols")
grapgh_arrange<-ggarrange(scat_plot5, scat_plot6, scat_plot7, scat_plot8+ rremove("x.text"),
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
annotate_figure(grapgh_arrange,
top = text_grob("Vartations of CFC-11,CFC-12, TSI and Aerosols by year", color = "blue", face = "bold", size = 14)
)
Chlorofluorocarbons (CFC-11 and CFC-12): We can see that after rising steadily there is a decline in levels of CFC-11 and CFC-12 after 2000. The most abundant CFCs emitted into the troposphere are CFC-11 and CFC-12. These CFCs are not soluble in water, so deposition does not removed them from the air.
Total Solar Irradiance (TSI): The radiant energy is measured and reported as the solar irradiance. When all of the radiation is measured it is called TSI. TSI is a measure of the solar power over all wavelengths per unit area incident on the Earth’s upper atmosphere.
TSI is the main contributor of energy to Earth. Although TSI varies by only a fraction of a percent, it has the greatest magnitude of change. This may be enough to cause observable changes at Earth. The effects of total solar irradiance (TSI) is studied by Biktash (Biktash (2017)).
Aerosols: Tiny liquid and solid particles suspended in atmosphere also cause climate change. Atmospheric aerosols influence the climate directly by scattering and absorbing incoming solar radiation and indirectly by acting as cloud condensation nuclei and/or ice nuclei (see Huang, Dickinson, and Chameides (2006), Ohring (1979) for more details).
From these plots we can conclude that pollution is one of the major cause for rising in temperature. Doctors, nurses and other medical personnel are drawing attention to the negative effects on human health caused by an increasingly warm, more heavily polluted environment. We need to take care of our environment and made it echo friendly.
So far, the average global temperature has gone up by about 0.8 degrees C (1.4 F).
“According to an ongoing temperature analysis conducted by scientists at NASA’s Goddard Institute for Space Studies (GISS)…the average global temperature on Earth has increased by about 0.8°Celsius (1.4°Fahrenheit) since 1880. Two-thirds of the warming has occurred since 1975, at a rate of roughly 0.15-0.20°C per decade.”
Source: NASA Earth Observatory (https://earthobservatory.nasa.gov/world-of-change/decadaltemp.php)
R
is a freely distributed software package for statistical analysis and graphics, developed and managed by the R
Development Core Team. Statistical analysis is a component of data analytics. Rosister (see Rossiter (2008)) in his tutorial gave an example of statistical data analysis.
The aim of linear regression is to model a continuous variable \(y\) as a mathematical function of one or more \(x\) variable(s), so that we can use this regression model to predict the \(y\) when only the \(x\) is known. This mathematical equation can be generalized as follows:
\(y = a+bx+\epsilon\)
where, \(a\) is the intercept and \(b\) is the slope and they are called regression coefficients. \(\epsilon\) is the error term. We will discuss this through an example. ggpredict()
is used to visualize this model.
fit=lm(Temp~CO2,data=climate_change)
summary(fit)
##
## Call:
## lm(formula = Temp ~ CO2, data = climate_change)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41822 -0.07992 -0.00072 0.06742 0.45304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.5931404 0.1950997 -18.42 <2e-16 ***
## CO2 0.0105992 0.0005368 19.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.119 on 306 degrees of freedom
## Multiple R-squared: 0.5603, Adjusted R-squared: 0.5588
## F-statistic: 389.9 on 1 and 306 DF, p-value: < 2.2e-16
Summary
is a set of descriptive statistics that provide information about a dataset. It consists of the five most important sample percentiles:
Also we can get the regression equation from summary of regression model:
\[\begin{equation} 0.01*x-3.59 \end{equation}\]
We can make interactive plot easily with ggPredict()
function included in ggiraphExtra
package.
library(ggiraph)
library(ggiraphExtra)
ggPredict(fit,se=TRUE,interactive=TRUE)
With this plot, we can identify the points and see the regression equation with your mouse.
ggPredict()
computes predicted (fitted) values for the response, at the margin of specific values from certain model terms, where additional model terms indicate the grouping structure.
Multiple linear regression is used to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. As a predictive analysis, the multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables. The independent variables can be continuous or categorical.
climate_multreg <- filter(climate_change, Month == "Jan" | Month == "Feb")
fit1=lm(Temp~Year+Month,data=climate_multreg)
summary(fit1)
##
## Call:
## lm(formula = Temp ~ Year + Month, data = climate_multreg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39863 -0.07861 0.01844 0.08512 0.41405
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.672555 5.944573 -5.664 8.64e-07 ***
## Year 0.017016 0.002978 5.713 7.29e-07 ***
## MonthJan -0.022480 0.042952 -0.523 0.603
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1519 on 47 degrees of freedom
## Multiple R-squared: 0.4119, Adjusted R-squared: 0.3869
## F-statistic: 16.46 on 2 and 47 DF, p-value: 3.822e-06
We can visualize this model with ggPredict()
.
ggPredict(fit1,se=TRUE, interactive=TRUE)
I also tried to visualize temperature variation through animation using gganimate
library.
library(gganimate)
ggplot(climate_change, aes(Year, Temp, size = CO2, colour = Month)) + geom_point(alpha = 0.2, show.legend = FALSE) +
facet_wrap(~Month) +
# Here comes the gganimate specific bits
labs(title = 'Year: {frame_time}', x = 'Year', y = 'Temperature') +
transition_time(Year) +
ease_aes('linear')
anim_save("geompoint.gif")
p <- ggplot(climate_change, aes(x = Year, y = Temp, color = Month, group = Month)) +
geom_path() +
geom_point() +
facet_wrap(~ Month) +
theme(legend.position = 'none') +
labs(title = 'Temperature Variation, Year: {frame_along}') +
transition_reveal(along = Year) +
ease_aes('linear')
animate(p, 100, 10)
anim_save("geompath.gif")
Note : The data set climate_change.csv
has global temperature details but I was interested in the temperature variation in different countries in the world. So, I have chosen the data set from https://climateknowledgeportal.worldbank.org for USA and India. These data sets have 1392 observations and 5 variables.
temp_USA <- read.csv(file = "tas_1901_2016_USA.csv",TRUE, sep = ",", stringsAsFactors = FALSE)
temp_IND <- read.csv(file = "tas_1901_2016_IND.csv",TRUE, sep = ",", stringsAsFactors = FALSE)
dim(temp_USA)
## [1] 1392 5
dim(temp_IND)
## [1] 1392 5
The files tas_1901_2016_USA.csv
and tas_1901_2016_IND.csv
contain temperature data from 1901 to 2016 for different months.
I have used Tableau Public to visualize these data sets.
Please refere the following links for the visualization :
Temperature Variation (1901-2016) - USA
Temperature Variation (1901-2016) - India
I acknowledge https://www.kaggle.com/vageeshabudanur/riseintemp-dataset and https://climateknowledgeportal.worldbank.org for the data set. I would also like to thank Professor Rachel Saidi, Department of Mathematics, Statistics, and Data Science, Montgomery College for the support and encouragement throughout the course.
Biktash, Lilia. 2017. “Long-Term Global Temperature Variations Under Total Solar Irradiance, Cosmic Rays, and Volcanic Activity.” Journal of Advanced Research 8 (4): 329–32. https://doi.org/https://doi.org/10.1016/j.jare.2017.03.002.
Change, Climate. 2001. “Climate Change.” Synthesis Report.
Huang, Yan, Robert E Dickinson, and William L Chameides. 2006. “Impact of Aerosol Indirect Effect on Surface Temperature over East Asia.” Proceedings of the National Academy of Sciences 103 (12): 4371–6.
Ohring, George. 1979. “The Effect of Aerosols on the Temperatures of a Zonal Average Climate Model.” Pure and Applied Geophysics 117 (5): 851–64.
Paudel, Shukra Raj, Ohkyung Choi, Samir Kumar Khanal, Kartik Chandran, Sungpyo Kim, and Jae Woo Lee. 2015. “Effects of Temperature on Nitrous Oxide (N2o) Emission from Intensive Aquaculture System.” Science of the Total Environment 518: 16–23.
Rossiter, D G. 2008. “Tutorial: An Example of Statistical Data Analysis Using the R Environment for Statistical Computing.”