For this study I have selected the global temperature data downloaded from https://datahub.io/collections/climate-change which is sourced from the GISS Surface Temperature (GISTEMP) analysis by NASA.
Climate change refers to long-term shifts in temperatures and weather patterns. These shifts may be natural, but since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels (like coal, oil, and gas) which produces heat-trapping gases. As the greenhouse gas emissions blanket the Earth, they trap the sun’s heat. This leads to global warming. The world is now warming faster than at any point in recorded history. Since the temperatures are rising steadily for the last century, mean temperature anomalies from the base years are easy to forecast with respect to the year.
Photo By Bjorn Anders Nymoen, National Geographic
The data explores annual mean temperature anomalies in degrees Celsius from 1880 to 2016 . Temperature anomalies indicate how much warmer or colder it is than normal for a particular place and time. For the GISS analysis, normal always means the average over the 30-year period 1951-1980 for that place and time of year. This base period is specific to GISS, not universal.The GISTEMP analysis is updated regularly and graphs and tables are posted around the middle of every month using the latest GHCN and ERSST data.
The GISTEMP analysis is based on temperature reports from weather stations and water temperature reports from ships and buoys. It recalculates consistent temperature anomaly series from 1880 to the present for a regularly spaced array of virtual stations covering the whole globe.
Being passionate about the efforts to tackle the impending effects of climate change, I have selected this dataset to work on my time series and forecasting skills. The experience working on this project will enable me apply these skills in the real world and help me do my part in solving global climate change.
annual_temp1 <- read_csv('https://pkgstore.datahub.io/core/global-temp/annual_csv/data/a26b154688b061cdd04f1df36e4408be/annual_csv.csv')
annual_temp <- select(annual_temp1[annual_temp1$Source == 'GISTEMP',],c('Year','Mean'))
names(annual_temp) <- c('Year','Temperature_Anomaly')
head(annual_temp)
## # A tibble: 6 × 2
## Year Temperature_Anomaly
## <dbl> <dbl>
## 1 2016 0.99
## 2 2015 0.87
## 3 2014 0.74
## 4 2013 0.65
## 5 2012 0.63
## 6 2011 0.6
Here are the summary statistics of the dataset
sumtable(annual_temp,add.median=TRUE)
Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 50 | Pctl. 75 | Max |
---|---|---|---|---|---|---|---|---|
Year | 137 | 1948 | 39.693 | 1880 | 1914 | 1948 | 1982 | 2016 |
Temperature_Anomaly | 137 | 0.024 | 0.327 | -0.47 | -0.21 | -0.07 | 0.19 | 0.99 |
Checking if the dates are continuous ie, we have not missed any years.
length(unique(annual_temp$Year)) == nrow(annual_temp)
## [1] TRUE
#histogram
ggplot(data = annual_temp, aes(x = Temperature_Anomaly)) +
geom_histogram(bins = 30) + labs(title = "Histogram")
Here we have plotted the temperature anomalies as a histogram. This does not seem to produce any value for us as this graph does not make sense in a time series context.
#density plot
ggplot(data = annual_temp, aes(x = Temperature_Anomaly)) +
geom_density() + labs(title = "Density Plot")
The density plot is similar to the histogram in that it does not provide any meaningful information in a time series context.
#boxplot
ggplot(data = annual_temp, aes(y = Temperature_Anomaly)) +
geom_boxplot() + labs(title = "Box Plot")
The boxplot shows a few outlier values lying above the interquartile range. These points which are for the recent values, suggest that there has been an increase in rate of global temperature rise in recent years.
#line chart
ggplot(data = annual_temp, aes(x = Year, y = Temperature_Anomaly )) +
geom_line() + labs(title = "Line Chart")
The line chart is the visualization that makes the most sense in a time series. We can clearly see the change in the outcome value with respect to the time predictor.
There are no null values in the data. The boxplot shows a few outliers beyond the IQR range with mean temperature anomalies close to 1. These are for the recent years, which could only suggest that the rate of temperature rise has increased in the recent years.
One interesting observation is that the temperature anomaly values range from negative values to positive values. This makes sense only when we understand the context and how the mean anomalies are calculated (explained above). Here the negative sign does not have any intrinsic value other than the fact that in those years the global temperatures were lower than the temperature in the base period (1951-1980) and hence the anomalies are negative. For the same reason the anomaly values for the base period years are near to zero.
fore_lm <- lm(Temperature_Anomaly ~ Year, data=annual_temp)
tab_model(fore_lm, show.intercept= FALSE)
Temperature_Anomaly | |||
---|---|---|---|
Predictors | Estimates | CI | p |
Year | 0.01 | 0.01 – 0.01 | <0.001 |
Observations | 137 | ||
R2 / R2 adjusted | 0.754 / 0.752 |
Here the p value is very low (<<0..05) which suggests that the impact of time on the mean temperature anomaly is statistically significant.
The estimate of the coefficient can be interpreted as for every increase in year, the mean temperature anomaly increases by 0.01°C
The R value indicates that 75.4% of the variation in outcome is explained by the time trend. The linear fit does a decent job in predicting the behavior of the time series, but it could be better.