Climate change refers to long-term shifts in temperatures and weather patterns. These shifts may be natural, but since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels (like coal, oil, and gas) which produces heat-trapping gases. As the greenhouse gas emissions blanket the Earth, they trap the sun’s heat. This leads to global warming. The world is now warming faster than at any point in recorded history. Since the temperatures are rising steadily for the last century, mean temperature anomalies from the base years are easy to forecast with respect to the year.
Photo By Bjorn Anders Nymoen, National Geographic
For this study I have selected the global temperature data downloaded from https://datahub.io/collections/climate-change which is sourced from the GISS Surface Temperature (GISTEMP) analysis by NASA.
The data explores annual mean temperature anomalies in degrees Celsius from 1880 to 2016 . Temperature anomalies indicate how much warmer or colder it is than normal for a particular place and time. For the GISS analysis, normal always means the average over the 30-year period 1951-1980 for that place and time of year. This base period is specific to GISS, not universal.The GISTEMP analysis is updated regularly and graphs and tables are posted around the middle of every month using the latest GHCN and ERSST data.
The GISTEMP analysis is based on temperature reports from weather stations and water temperature reports from ships and buoys. It recalculates consistent temperature anomaly series from 1880 to the present for a regularly spaced array of virtual stations covering the whole globe.
Being passionate about the efforts to tackle the impending effects of climate change, I have selected this dataset to work on my time series and forecasting skills. The experience working on this project will enable me apply these skills in the real world and help me do my part in solving global climate change.
head(annual_temp)
## # A tibble: 6 × 2
## Year Temperature_Anomaly
## <dbl> <dbl>
## 1 2016 0.99
## 2 2015 0.87
## 3 2014 0.74
## 4 2013 0.65
## 5 2012 0.63
## 6 2011 0.6
Here are the summary statistics of the dataset
sumtable(annual_temp,add.median=TRUE)
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 50 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|---|
| Year | 137 | 1948 | 39.693 | 1880 | 1914 | 1948 | 1982 | 2016 |
| Temperature_Anomaly | 137 | 0.024 | 0.327 | -0.47 | -0.21 | -0.07 | 0.19 | 0.99 |
Checking if the dates are continuous ie, we have not missed any years.
length(unique(annual_temp$Year)) == nrow(annual_temp)
## [1] TRUE
There are no null values in the data.
One interesting observation is that the temperature anomaly values range from negative values to positive values. This makes sense only when we understand the context and how the mean anomalies are calculated (explained above). Here the negative sign does not have any intrinsic value other than the fact that in those years the global temperatures were lower than the temperature in the base period (1951-1980) and hence the anomalies are negative. For the same reason the anomaly values for the base period years are near to zero.
Assessing the time series at hand and applying necessary transformations.
Let’s have a look at the time series
The time series seems to be variance stationary. However we will try out the transformations to confirm this.
We cannot apply natural log or box-cox transformation directly because our data has negative values. Hence we will be adding a constant value of 0.5 to make all the Temperature Anomalies positive. This will not affect the time series, because the very nature of the temperature anomaly value is that it’s a difference from a fixed point which was chosen from the middle values. Adding a constant value will just shift the time series up above negative values, but the nature of the time series will be intact.
##
## Augmented Dickey-Fuller Test
##
## data: ann_temp_log$TA_log
## Dickey-Fuller = -1.7972, Lag order = 5, p-value = 0.6608
## alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: ann_temp_boxcox$TA_boxcox
## Dickey-Fuller = -1.7366, Lag order = 5, p-value = 0.6861
## alternative hypothesis: stationary
Doing these transformations does not change our time series. This can confirm our initial hypothesis that time series is variance stationary
ADF test on undifferenced data
##
## Augmented Dickey-Fuller Test
##
## data: annual_temp$Temperature_Anomaly
## Dickey-Fuller = -1.6814, Lag order = 5, p-value = 0.709
## alternative hypothesis: stationary
ADF test on differenced data
##
## Augmented Dickey-Fuller Test
##
## data: annual_temp_diff$TA_diff
## Dickey-Fuller = -6.7862, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
The time series has been made mean stationary
Since I have taken yearly data there is no reason to expect any seasonality in the time series. Had I taken the monthly time series, we could have seen seasonality.
The time series appears to be an AR3 process. It could also be a MA2 process.
Checking BIC for different models
## df BIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0)) 2 91.44968
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1)) 3 -20.17431
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0)) 1 -213.43017
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1)) 2 -220.63216
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0)) 3 -204.16828
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1)) 4 -209.81140
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0)) 2 -214.61432
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1)) 3 -220.60552
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1)) 4 -217.51630
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1)) 5 -216.96070
According to BIC values, the best model has an order = c(0, 1, 1))
Now checking the AIC values
## df AIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0)) 2 85.60972
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1)) 3 -28.93425
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0)) 1 -216.34282
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1)) 2 -226.45747
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0)) 3 -212.92823
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1)) 4 -221.49132
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0)) 2 -220.43963
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1)) 3 -229.34348
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1)) 4 -229.16692
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1)) 5 -231.52397
But according to AIC ,the best model has an order of (3,1,1)
BIC penalizes complex models. That is why (3,1,1) has worse BIC value but a good AIC value.
To decide which one to choose,we can do an auto.arima test.
## Series: annual_temp$Temperature_Anomaly
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## -0.9142 -0.4568 -0.3452 0.6427
## s.e. 0.3064 0.1238 0.0813 0.3347
##
## sigma^2 = 0.01018: log likelihood = 120.76
## AIC=-231.52 AICc=-231.06 BIC=-216.96
auto.arima suggests (3,1,1) as the best model, so from here this model will be used.
forecast::checkresiduals(best_model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 4.9832, df = 6, p-value = 0.546
##
## Model df: 4. Total lags used: 10
##
## Box-Ljung test
##
## data: resi
## X-squared = 14.282, df = 20, p-value = 0.8159
In-sample root mean squared error of the model based on the residuals.
## [1] 0.0990563
train = annual_temp %>%
filter(annual_temp$Year<'2012')
test = annual_temp %>%
filter(annual_temp$Year>='2012')
## $pred
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
##
## $se
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
RMSE = sqrt(mean((test$Temperature_Anomaly - pred_data2$pred.pred)^2))
MAE = mean(abs(test$Temperature_Anomaly - pred_data2$pred.pred))
MAPE = mean(abs((test$Temperature_Anomaly - pred_data2$pred.pred)/test$Temperature_Anomaly))
print(paste("RMSE:",round(RMSE,2)))
## [1] "RMSE: 0.19"
print(paste("MAE:",round(MAE,2)))
## [1] "MAE: 0.16"
print(paste("MAPE:",round(MAPE,2)))
## [1] "MAPE: 0.23"
5 time-period forecast for the model
## $pred
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
##
## $se
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061
## pred.pred pred.se Year
## 1 0.9034629 0.09941981 2017
## 2 0.8828516 0.12298671 2018
## 3 0.8997931 0.13340248 2019
## 4 0.9235973 0.14199221 2020
## 5 0.9012220 0.15840061 2021
We use Prophet models to:
Prophet can potentially produce superior results to more traditional time series models such as ARIMA by identifying structural breaks in a time series and making forecasts by taking change points as well as seasonality patterns into account.
library(prophet)
library(lubridate)
prophet_data = annual_temp
prophet_data$ds <- as.character(prophet_data$ds)
prophet_data$ds = paste(prophet_data$ds,'-01-01',sep = "")
train = prophet_data %>%
filter(ds<ymd("2012-01-01"))
test = prophet_data %>%
filter(ds>=ymd("2012-01-01"))
model = prophet(train, weekly.seasonality=FALSE, daily.seasonality=FALSE)
The model is then used to predict 5 years forward, with the predictions compared to the test set (actual values).
future = make_future_dataframe(model,periods = 5, freq="year")
forecast = predict(model,future)
plot(model, forecast) +
ylab("Temp Anomalies")+xlab("Year")+theme_bw()
Very similar to general time series decomposition
prophet_plot_components(model,forecast)
We observe an increasing trend with year.
Most real time series frequently have abrupt changes in their trajectories. These are referred to as changepoints. Identifying them could be important for accurately modelling changes in trends. By default, prophet identifies the changepoints. However, we can also control this using arguments.
plot(model,forecast)+add_changepoints_to_plot(model)+theme_bw()+xlab("Year")+ylab("Temp Anomalies")
From the change points we observe that there has been a slight increase from 1950s onwards.
model = prophet(train, n.changepoints=50, weekly.seasonality=FALSE, daily.seasonality=FALSE)
forecast = predict(model,future)
plot(model,forecast)+add_changepoints_to_plot(model)+theme_bw()+xlab("Year")+ylab("Temp Anomalies")
The model doesn’t need to consider the minimum/maximum point.
Prophet can handle multiple types of seasonality simultaneously
Since we only have yearly data we cant expect any seasonality
prophet_plot_components(mul, mul_fcst)
Estimation based on RMSE and associated visualizations of model performance
## [1] "RMSE: 0.26"
## [1] "MAE: 0.22"
## [1] "MAPE: 0.26"
df.cv <- cross_validation(model, initial = 50*365, period = 5*365 , horizon = 5*365, units = 'days')
df.cv %>%
ggplot()+
geom_point(aes(ds,y)) +
geom_point(aes(ds,yhat,color=factor(cutoff)))+
theme_bw()+
xlab("Year")+
ylab("Temperature Anomaly")+
scale_color_discrete(name = 'Cutoff')+
ggtitle("CV-Ouput")
performance_metrics(df.cv)
## horizon mse rmse mae mape mdape smape
## 1 354 days 0.009221509 0.09602869 0.07074937 2.0556052 1.2613752 1.0639996
## 2 355 days 0.007828527 0.08847896 0.05789598 2.0126305 1.1200250 1.0450094
## 3 357 days 0.006052888 0.07780031 0.05009838 1.4656703 0.3988728 0.9426831
## 4 358 days 0.001646891 0.04058191 0.02955005 1.2266644 0.3079326 0.7192992
## 5 359 days 0.006783064 0.08235936 0.05352129 1.2754472 0.3988728 0.7986822
## 6 360 days 0.006722193 0.08198898 0.05302591 0.5150087 0.3079326 0.5857926
## 7 362 days 0.010131632 0.10065601 0.07328755 0.5483724 0.3413052 0.6277085
## 8 363 days 0.019853741 0.14090331 0.10690198 0.5614333 0.3935485 0.6626462
## 9 364 days 0.026650650 0.16325027 0.13168132 0.3684846 0.3510539 0.4671490
## 10 711 days 0.027842294 0.16686010 0.14354425 0.3759139 0.3510539 0.4775214
## 11 712 days 0.030378633 0.17429467 0.15379775 1.0047373 0.3510539 0.5862442
## 12 713 days 0.038412363 0.19599072 0.18251388 1.2256307 0.4004735 0.8096281
## 13 714 days 0.033372599 0.18268169 0.16088210 1.1903451 0.3510539 0.7484897
## 14 716 days 0.033203666 0.18221873 0.15929120 1.2310470 0.4004735 0.8145177
## 15 717 days 0.030204096 0.17379326 0.14581165 1.2904335 0.5135462 0.9360111
## 16 718 days 0.020683250 0.14381673 0.11610179 1.3212938 0.6369875 0.9965232
## 17 719 days 0.014322912 0.11967837 0.09535324 1.4020423 0.7545358 1.1920204
## 18 721 days 0.013597520 0.11660840 0.09078568 1.4108873 0.7545358 1.2053791
## 19 722 days 0.010209629 0.10104271 0.07077831 0.7329354 0.6369875 1.0301502
## 20 723 days 0.007398550 0.08601482 0.06467624 0.5737911 0.6352785 0.9107930
## 21 724 days 0.018567436 0.13626238 0.09848271 0.6237661 0.6881416 1.0054129
## 22 726 days 0.018294620 0.13525761 0.09277617 0.5513118 0.6881416 0.9028371
## 23 727 days 0.033240995 0.18232113 0.12942214 0.5173687 0.6170547 0.8236638
## 24 728 days 0.044395383 0.21070212 0.16194531 0.4919391 0.5170452 0.7723692
## 25 729 days 0.046276811 0.21512046 0.17083401 0.3991718 0.4469961 0.5600278
## 26 1076 days 0.046517387 0.21567890 0.17259796 0.4154105 0.5119509 0.5489953
## 27 1077 days 0.050748400 0.22527406 0.19502301 1.1739517 0.5425826 0.7295151
## 28 1078 days 0.065341821 0.25562046 0.21914949 1.2876185 0.5425826 0.8488722
## 29 1080 days 0.056210083 0.23708666 0.19798887 1.3781530 0.5425826 0.9543270
## 30 1081 days 0.058614214 0.24210373 0.21519011 1.5109451 0.8072517 1.2037645
## 31 1082 days 0.043443185 0.20843029 0.17620307 1.6033891 1.1770279 1.3594885
## 32 1083 days 0.037082695 0.19256868 0.16352614 1.6676880 1.1770279 1.5291472
## 33 1085 days 0.034600786 0.18601287 0.14730564 1.6557154 1.1770279 1.5106535
## 34 1086 days 0.036219259 0.19031358 0.15495195 1.8321344 1.3721119 1.7076987
## 35 1087 days 0.033696064 0.18356488 0.14656968 1.1513275 1.1770279 1.6502946
## 36 1088 days 0.014357811 0.11982408 0.10503902 1.0074371 1.0338157 1.4717641
## 37 1090 days 0.013646167 0.11681681 0.10205364 0.8715397 0.8444578 1.2784778
## 38 1091 days 0.012407582 0.11138933 0.09678881 0.7759085 0.5663768 1.0729033
## 39 1092 days 0.015249935 0.12349063 0.11141922 0.6613323 0.4076311 0.8797684
## 40 1093 days 0.018942534 0.13763188 0.11937760 0.5961926 0.4076311 0.7086481
## 41 1095 days 0.021081828 0.14519583 0.13434183 0.6010885 0.4076311 0.7180681
## 42 1441 days 0.018955330 0.13767836 0.12228652 0.3824687 0.3702174 0.4907832
## 43 1442 days 0.029494572 0.17173984 0.14679496 0.7855897 0.3702174 0.6096534
## 44 1444 days 0.032670521 0.18074988 0.16005652 0.9099977 0.3702174 0.7881839
## 45 1445 days 0.035889794 0.18944602 0.17077797 0.9978112 0.4251533 0.9814701
## 46 1446 days 0.037099233 0.19261161 0.17593741 1.0749592 0.6993245 1.1495630
## 47 1447 days 0.034076369 0.18459786 0.15733845 1.0715601 0.6993245 1.1293352
## 48 1449 days 0.029775469 0.17255570 0.14781137 1.2461134 0.9956423 1.3004555
## 49 1450 days 0.027809612 0.16676214 0.13627177 1.4643389 1.2561150 1.3953211
## 50 1451 days 0.028151271 0.16778341 0.13950138 1.8506866 1.6580102 1.6226060
## 51 1452 days 0.022951114 0.15149625 0.13005559 1.4714962 1.2561150 1.5688574
## 52 1454 days 0.019247414 0.13873505 0.11033780 1.3071813 0.9956423 1.3336063
## 53 1455 days 0.024040187 0.15504898 0.12057944 1.2512521 0.8992586 1.1972133
## 54 1456 days 0.029597143 0.17203820 0.13483717 1.2048347 0.7522223 1.0789580
## 55 1457 days 0.033667598 0.18348732 0.15655169 1.2156517 0.7522223 1.1107846
## 56 1459 days 0.039016455 0.19752583 0.16800838 1.0317290 0.5862556 0.9242115
## 57 1460 days 0.045261500 0.21274750 0.19149315 0.8277987 0.4887927 0.8483050
## 58 1806 days 0.046415843 0.21544336 0.19782571 0.5135271 0.4887927 0.6682216
## 59 1808 days 0.049495165 0.22247509 0.20372321 0.7001035 0.4887927 0.7219702
## 60 1809 days 0.049659331 0.22284374 0.20674731 0.8070532 0.5862556 0.7894916
## 61 1810 days 0.040634425 0.20157982 0.17966594 0.8330495 0.6620568 0.8542820
## 62 1811 days 0.036347824 0.19065105 0.16952447 0.8713307 0.8046756 0.9462704
## 63 1813 days 0.032282347 0.17967289 0.14811729 0.8409033 0.8046756 0.8991351
## 64 1814 days 0.024129959 0.15533821 0.12733460 1.0728202 0.8433657 1.0857082
## 65 1815 days 0.019769116 0.14060269 0.11507556 1.2117986 0.9174288 1.2881643
## 66 1816 days 0.020422174 0.14290617 0.11765120 1.2705085 1.1135783 1.4682477
## 67 1818 days 0.015997770 0.12648229 0.10877195 1.0549067 0.9174288 1.3378929
## 68 1819 days 0.016349683 0.12786588 0.11211611 0.9756997 0.8433657 1.3081634
## 69 1820 days 0.025901741 0.16094018 0.14014311 0.9537190 0.7554431 1.2520707
## 70 1821 days 0.025339899 0.15918511 0.13842933 0.9030794 0.6521829 1.1381410
## 71 1823 days 0.030846881 0.17563280 0.16352618 0.9275119 0.6521829 1.1758366
## 72 1824 days 0.037187313 0.19284012 0.18089809 0.6949993 0.5480531 0.9883300
## 73 1825 days 0.037642542 0.19401686 0.18257736 0.5457115 0.4243604 0.7719537
## coverage
## 1 0.750
## 2 0.750
## 3 0.875
## 4 1.000
## 5 0.875
## 6 0.875
## 7 0.750
## 8 0.625
## 9 0.500
## 10 0.500
## 11 0.375
## 12 0.250
## 13 0.375
## 14 0.375
## 15 0.500
## 16 0.625
## 17 0.750
## 18 0.750
## 19 0.875
## 20 0.875
## 21 0.750
## 22 0.750
## 23 0.625
## 24 0.500
## 25 0.500
## 26 0.500
## 27 0.375
## 28 0.375
## 29 0.500
## 30 0.500
## 31 0.625
## 32 0.625
## 33 0.625
## 34 0.500
## 35 0.625
## 36 0.750
## 37 0.750
## 38 0.750
## 39 0.750
## 40 0.750
## 41 0.750
## 42 0.875
## 43 0.750
## 44 0.625
## 45 0.500
## 46 0.500
## 47 0.500
## 48 0.500
## 49 0.500
## 50 0.500
## 51 0.500
## 52 0.625
## 53 0.625
## 54 0.500
## 55 0.375
## 56 0.375
## 57 0.250
## 58 0.250
## 59 0.250
## 60 0.250
## 61 0.375
## 62 0.375
## 63 0.500
## 64 0.625
## 65 0.750
## 66 0.625
## 67 0.625
## 68 0.625
## 69 0.500
## 70 0.500
## 71 0.375
## 72 0.250
## 73 0.250
p1<-plot_cross_validation_metric(df.cv, metric = 'rmse')
p2<-plot_cross_validation_metric(df.cv, metric = 'mae')
p3<-plot_cross_validation_metric(df.cv, metric = 'mape')
gridExtra::grid.arrange(p1, p2,p3, nrow = 1)
Train - Test Split - 5 Years
ARIMA(3,1,1) :RMSE: 0.22 | MAE: 0.17| MAPE: 0.19
Prophet :RMSE: 0.26 | MAE:0.22| MAPE: 0.26
The ARIMA seems to be performing better with lower RMSE, MAE and MAPE values. Hence we will be choosing this model