Final Project

Understanding the Time Series

For this study I have selected the global temperature data downloaded from https://datahub.io/collections/climate-change which is sourced from the GISS Surface Temperature (GISTEMP) analysis by NASA.

The data explores annual mean temperature anomalies in degrees Celsius from 1880 to 2016 . Temperature anomalies indicate how much warmer or colder it is than normal for a particular place and time. For the GISS analysis, normal always means the average over the 30-year period 1951-1980 for that place and time of year. This base period is specific to GISS, not universal.The GISTEMP analysis is updated regularly and graphs and tables are posted around the middle of every month using the latest GHCN and ERSST data.

The GISTEMP analysis is based on temperature reports from weather stations and water temperature reports from ships and buoys. It recalculates consistent temperature anomaly series from 1880 to the present for a regularly spaced array of virtual stations covering the whole globe.

Motivation

Being passionate about the efforts to tackle the impending effects of climate change, I have selected this dataset to work on my time series and forecasting skills. The experience working on this project will enable me apply these skills in the real world and help me do my part in solving global climate change.

The data

head(annual_temp)

## # A tibble: 6 × 2
##    Year Temperature_Anomaly
##   <dbl>               <dbl>
## 1  2016                0.99
## 2  2015                0.87
## 3  2014                0.74
## 4  2013                0.65
## 5  2012                0.63
## 6  2011                0.6

Here are the summary statistics of the dataset

sumtable(annual_temp,add.median=TRUE)

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 50	Pctl. 75	Max
Year	137	1948	39.693	1880	1914	1948	1982	2016
Temperature_Anomaly	137	0.024	0.327	-0.47	-0.21	-0.07	0.19	0.99

Checking if the dates are continuous ie, we have not missed any years.

length(unique(annual_temp$Year)) == nrow(annual_temp)

## [1] TRUE

Visualizing the Data

Observations

There are no null values in the data.

One interesting observation is that the temperature anomaly values range from negative values to positive values. This makes sense only when we understand the context and how the mean anomalies are calculated (explained above). Here the negative sign does not have any intrinsic value other than the fact that in those years the global temperatures were lower than the temperature in the base period (1951-1980) and hence the anomalies are negative. For the same reason the anomaly values for the base period years are near to zero.

Arima Modelling and Forecasting

Assessing the time series at hand and applying necessary transformations.

Let’s have a look at the time series

The time series seems to be variance stationary. However we will try out the transformations to confirm this.

nlog and Boxcox transformations

We cannot apply natural log or box-cox transformation directly because our data has negative values. Hence we will be adding a constant value of 0.5 to make all the Temperature Anomalies positive. This will not affect the time series, because the very nature of the temperature anomaly value is that it’s a difference from a fixed point which was chosen from the middle values. Adding a constant value will just shift the time series up above negative values, but the nature of the time series will be intact.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ann_temp_log$TA_log
## Dickey-Fuller = -1.7972, Lag order = 5, p-value = 0.6608
## alternative hypothesis: stationary

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ann_temp_boxcox$TA_boxcox
## Dickey-Fuller = -1.7366, Lag order = 5, p-value = 0.6861
## alternative hypothesis: stationary

Doing these transformations does not change our time series. This can confirm our initial hypothesis that time series is variance stationary

Testing for mean stationarity

ADF test on undifferenced data

## 
##  Augmented Dickey-Fuller Test
## 
## data:  annual_temp$Temperature_Anomaly
## Dickey-Fuller = -1.6814, Lag order = 5, p-value = 0.709
## alternative hypothesis: stationary

ADF test on differenced data

## 
##  Augmented Dickey-Fuller Test
## 
## data:  annual_temp_diff$TA_diff
## Dickey-Fuller = -6.7862, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

The time series has been made mean stationary

Seasonality

Since I have taken yearly data there is no reason to expect any seasonality in the time series. Had I taken the monthly time series, we could have seen seasonality.

Plotting ACF/PACF

The time series appears to be an AR3 process. It could also be a MA2 process.

Best Model

Checking BIC for different models

##                                                            df        BIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0))  2   91.44968
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1))  3  -20.17431
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0))  1 -213.43017
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1))  2 -220.63216
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0))  3 -204.16828
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1))  4 -209.81140
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0))  2 -214.61432
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1))  3 -220.60552
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1))  4 -217.51630
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1))  5 -216.96070

According to BIC values, the best model has an order = c(0, 1, 1))

Now checking the AIC values

##                                                            df        AIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0))  2   85.60972
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1))  3  -28.93425
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0))  1 -216.34282
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1))  2 -226.45747
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0))  3 -212.92823
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1))  4 -221.49132
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0))  2 -220.43963
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1))  3 -229.34348
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1))  4 -229.16692
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1))  5 -231.52397

But according to AIC ,the best model has an order of (3,1,1)

BIC penalizes complex models. That is why (3,1,1) has worse BIC value but a good AIC value.

To decide which one to choose,we can do an auto.arima test.

## Series: annual_temp$Temperature_Anomaly 
## ARIMA(3,1,1) 
## 
## Coefficients:
##           ar1      ar2      ar3     ma1
##       -0.9142  -0.4568  -0.3452  0.6427
## s.e.   0.3064   0.1238   0.0813  0.3347
## 
## sigma^2 = 0.01018:  log likelihood = 120.76
## AIC=-231.52   AICc=-231.06   BIC=-216.96

auto.arima suggests (3,1,1) as the best model, so from here this model will be used.

Plotting the actual and predicted values

Box-Ljung test

forecast::checkresiduals(best_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 4.9832, df = 6, p-value = 0.546
## 
## Model df: 4.   Total lags used: 10

## 
##  Box-Ljung test
## 
## data:  resi
## X-squared = 14.282, df = 20, p-value = 0.8159

In-sample root mean squared error of the model based on the residuals.

## [1] 0.0990563

Out of sample testing

Train Test Split

train = annual_temp %>% 
    filter(annual_temp$Year<'2012')
test = annual_temp %>% 
    filter(annual_temp$Year>='2012')

Training the model on train

## $pred
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
## 
## $se
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061

## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220

RMSE = sqrt(mean((test$Temperature_Anomaly - pred_data2$pred.pred)^2))
MAE = mean(abs(test$Temperature_Anomaly - pred_data2$pred.pred))
MAPE = mean(abs((test$Temperature_Anomaly - pred_data2$pred.pred)/test$Temperature_Anomaly))


print(paste("RMSE:",round(RMSE,2)))

## [1] "RMSE: 0.19"

print(paste("MAE:",round(MAE,2)))

## [1] "MAE: 0.16"

print(paste("MAPE:",round(MAPE,2)))

## [1] "MAPE: 0.23"

Forecast

5 time-period forecast for the model

## $pred
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
## 
## $se
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061

##   pred.pred    pred.se Year
## 1 0.9034629 0.09941981 2017
## 2 0.8828516 0.12298671 2018
## 3 0.8997931 0.13340248 2019
## 4 0.9235973 0.14199221 2020
## 5 0.9012220 0.15840061 2021

PROPHET model

We use Prophet models to:

Model “change points” — or periods of significant structural change in the data
Forecast future temperature anomaly values using seasonal and change point parameters

Prophet can potentially produce superior results to more traditional time series models such as ARIMA by identifying structural breaks in a time series and making forecasts by taking change points as well as seasonality patterns into account.

Model Building

library(prophet)
library(lubridate)

prophet_data = annual_temp
prophet_data$ds <-  as.character(prophet_data$ds)
prophet_data$ds = paste(prophet_data$ds,'-01-01',sep = "")


train = prophet_data %>% 
    filter(ds<ymd("2012-01-01"))
test = prophet_data %>%
    filter(ds>=ymd("2012-01-01"))


model = prophet(train, weekly.seasonality=FALSE, daily.seasonality=FALSE)

The model is then used to predict 5 years forward, with the predictions compared to the test set (actual values).

future = make_future_dataframe(model,periods = 5, freq="year")
forecast = predict(model,future)

plot(model, forecast) +
ylab("Temp Anomalies")+xlab("Year")+theme_bw()

Visualizing the Components

Very similar to general time series decomposition

prophet_plot_components(model,forecast)

We observe an increasing trend with year.

Plotting Changepoints

Most real time series frequently have abrupt changes in their trajectories. These are referred to as changepoints. Identifying them could be important for accurately modelling changes in trends. By default, prophet identifies the changepoints. However, we can also control this using arguments.

plot(model,forecast)+add_changepoints_to_plot(model)+theme_bw()+xlab("Year")+ylab("Temp Anomalies")

From the change points we observe that there has been a slight increase from 1950s onwards.

model = prophet(train, n.changepoints=50, weekly.seasonality=FALSE, daily.seasonality=FALSE)
forecast = predict(model,future)
plot(model,forecast)+add_changepoints_to_plot(model)+theme_bw()+xlab("Year")+ylab("Temp Anomalies")

Saturation Point

The model doesn’t need to consider the minimum/maximum point.

Seasonality

Prophet can handle multiple types of seasonality simultaneously

Since we only have yearly data we cant expect any seasonality

prophet_plot_components(mul, mul_fcst)

Out of sample performance of the model

Estimation based on RMSE and associated visualizations of model performance

## [1] "RMSE: 0.26"

## [1] "MAE: 0.22"

## [1] "MAPE: 0.26"

Cross Validation

df.cv <- cross_validation(model, initial = 50*365, period = 5*365 , horizon = 5*365, units = 'days')



df.cv %>%
 ggplot()+
 geom_point(aes(ds,y)) +
 geom_point(aes(ds,yhat,color=factor(cutoff)))+
 theme_bw()+
 xlab("Year")+
 ylab("Temperature Anomaly")+
 scale_color_discrete(name = 'Cutoff')+
 ggtitle("CV-Ouput")

performance_metrics(df.cv)

##      horizon         mse       rmse        mae      mape     mdape     smape
## 1   354 days 0.009221509 0.09602869 0.07074937 2.0556052 1.2613752 1.0639996
## 2   355 days 0.007828527 0.08847896 0.05789598 2.0126305 1.1200250 1.0450094
## 3   357 days 0.006052888 0.07780031 0.05009838 1.4656703 0.3988728 0.9426831
## 4   358 days 0.001646891 0.04058191 0.02955005 1.2266644 0.3079326 0.7192992
## 5   359 days 0.006783064 0.08235936 0.05352129 1.2754472 0.3988728 0.7986822
## 6   360 days 0.006722193 0.08198898 0.05302591 0.5150087 0.3079326 0.5857926
## 7   362 days 0.010131632 0.10065601 0.07328755 0.5483724 0.3413052 0.6277085
## 8   363 days 0.019853741 0.14090331 0.10690198 0.5614333 0.3935485 0.6626462
## 9   364 days 0.026650650 0.16325027 0.13168132 0.3684846 0.3510539 0.4671490
## 10  711 days 0.027842294 0.16686010 0.14354425 0.3759139 0.3510539 0.4775214
## 11  712 days 0.030378633 0.17429467 0.15379775 1.0047373 0.3510539 0.5862442
## 12  713 days 0.038412363 0.19599072 0.18251388 1.2256307 0.4004735 0.8096281
## 13  714 days 0.033372599 0.18268169 0.16088210 1.1903451 0.3510539 0.7484897
## 14  716 days 0.033203666 0.18221873 0.15929120 1.2310470 0.4004735 0.8145177
## 15  717 days 0.030204096 0.17379326 0.14581165 1.2904335 0.5135462 0.9360111
## 16  718 days 0.020683250 0.14381673 0.11610179 1.3212938 0.6369875 0.9965232
## 17  719 days 0.014322912 0.11967837 0.09535324 1.4020423 0.7545358 1.1920204
## 18  721 days 0.013597520 0.11660840 0.09078568 1.4108873 0.7545358 1.2053791
## 19  722 days 0.010209629 0.10104271 0.07077831 0.7329354 0.6369875 1.0301502
## 20  723 days 0.007398550 0.08601482 0.06467624 0.5737911 0.6352785 0.9107930
## 21  724 days 0.018567436 0.13626238 0.09848271 0.6237661 0.6881416 1.0054129
## 22  726 days 0.018294620 0.13525761 0.09277617 0.5513118 0.6881416 0.9028371
## 23  727 days 0.033240995 0.18232113 0.12942214 0.5173687 0.6170547 0.8236638
## 24  728 days 0.044395383 0.21070212 0.16194531 0.4919391 0.5170452 0.7723692
## 25  729 days 0.046276811 0.21512046 0.17083401 0.3991718 0.4469961 0.5600278
## 26 1076 days 0.046517387 0.21567890 0.17259796 0.4154105 0.5119509 0.5489953
## 27 1077 days 0.050748400 0.22527406 0.19502301 1.1739517 0.5425826 0.7295151
## 28 1078 days 0.065341821 0.25562046 0.21914949 1.2876185 0.5425826 0.8488722
## 29 1080 days 0.056210083 0.23708666 0.19798887 1.3781530 0.5425826 0.9543270
## 30 1081 days 0.058614214 0.24210373 0.21519011 1.5109451 0.8072517 1.2037645
## 31 1082 days 0.043443185 0.20843029 0.17620307 1.6033891 1.1770279 1.3594885
## 32 1083 days 0.037082695 0.19256868 0.16352614 1.6676880 1.1770279 1.5291472
## 33 1085 days 0.034600786 0.18601287 0.14730564 1.6557154 1.1770279 1.5106535
## 34 1086 days 0.036219259 0.19031358 0.15495195 1.8321344 1.3721119 1.7076987
## 35 1087 days 0.033696064 0.18356488 0.14656968 1.1513275 1.1770279 1.6502946
## 36 1088 days 0.014357811 0.11982408 0.10503902 1.0074371 1.0338157 1.4717641
## 37 1090 days 0.013646167 0.11681681 0.10205364 0.8715397 0.8444578 1.2784778
## 38 1091 days 0.012407582 0.11138933 0.09678881 0.7759085 0.5663768 1.0729033
## 39 1092 days 0.015249935 0.12349063 0.11141922 0.6613323 0.4076311 0.8797684
## 40 1093 days 0.018942534 0.13763188 0.11937760 0.5961926 0.4076311 0.7086481
## 41 1095 days 0.021081828 0.14519583 0.13434183 0.6010885 0.4076311 0.7180681
## 42 1441 days 0.018955330 0.13767836 0.12228652 0.3824687 0.3702174 0.4907832
## 43 1442 days 0.029494572 0.17173984 0.14679496 0.7855897 0.3702174 0.6096534
## 44 1444 days 0.032670521 0.18074988 0.16005652 0.9099977 0.3702174 0.7881839
## 45 1445 days 0.035889794 0.18944602 0.17077797 0.9978112 0.4251533 0.9814701
## 46 1446 days 0.037099233 0.19261161 0.17593741 1.0749592 0.6993245 1.1495630
## 47 1447 days 0.034076369 0.18459786 0.15733845 1.0715601 0.6993245 1.1293352
## 48 1449 days 0.029775469 0.17255570 0.14781137 1.2461134 0.9956423 1.3004555
## 49 1450 days 0.027809612 0.16676214 0.13627177 1.4643389 1.2561150 1.3953211
## 50 1451 days 0.028151271 0.16778341 0.13950138 1.8506866 1.6580102 1.6226060
## 51 1452 days 0.022951114 0.15149625 0.13005559 1.4714962 1.2561150 1.5688574
## 52 1454 days 0.019247414 0.13873505 0.11033780 1.3071813 0.9956423 1.3336063
## 53 1455 days 0.024040187 0.15504898 0.12057944 1.2512521 0.8992586 1.1972133
## 54 1456 days 0.029597143 0.17203820 0.13483717 1.2048347 0.7522223 1.0789580
## 55 1457 days 0.033667598 0.18348732 0.15655169 1.2156517 0.7522223 1.1107846
## 56 1459 days 0.039016455 0.19752583 0.16800838 1.0317290 0.5862556 0.9242115
## 57 1460 days 0.045261500 0.21274750 0.19149315 0.8277987 0.4887927 0.8483050
## 58 1806 days 0.046415843 0.21544336 0.19782571 0.5135271 0.4887927 0.6682216
## 59 1808 days 0.049495165 0.22247509 0.20372321 0.7001035 0.4887927 0.7219702
## 60 1809 days 0.049659331 0.22284374 0.20674731 0.8070532 0.5862556 0.7894916
## 61 1810 days 0.040634425 0.20157982 0.17966594 0.8330495 0.6620568 0.8542820
## 62 1811 days 0.036347824 0.19065105 0.16952447 0.8713307 0.8046756 0.9462704
## 63 1813 days 0.032282347 0.17967289 0.14811729 0.8409033 0.8046756 0.8991351
## 64 1814 days 0.024129959 0.15533821 0.12733460 1.0728202 0.8433657 1.0857082
## 65 1815 days 0.019769116 0.14060269 0.11507556 1.2117986 0.9174288 1.2881643
## 66 1816 days 0.020422174 0.14290617 0.11765120 1.2705085 1.1135783 1.4682477
## 67 1818 days 0.015997770 0.12648229 0.10877195 1.0549067 0.9174288 1.3378929
## 68 1819 days 0.016349683 0.12786588 0.11211611 0.9756997 0.8433657 1.3081634
## 69 1820 days 0.025901741 0.16094018 0.14014311 0.9537190 0.7554431 1.2520707
## 70 1821 days 0.025339899 0.15918511 0.13842933 0.9030794 0.6521829 1.1381410
## 71 1823 days 0.030846881 0.17563280 0.16352618 0.9275119 0.6521829 1.1758366
## 72 1824 days 0.037187313 0.19284012 0.18089809 0.6949993 0.5480531 0.9883300
## 73 1825 days 0.037642542 0.19401686 0.18257736 0.5457115 0.4243604 0.7719537
##    coverage
## 1     0.750
## 2     0.750
## 3     0.875
## 4     1.000
## 5     0.875
## 6     0.875
## 7     0.750
## 8     0.625
## 9     0.500
## 10    0.500
## 11    0.375
## 12    0.250
## 13    0.375
## 14    0.375
## 15    0.500
## 16    0.625
## 17    0.750
## 18    0.750
## 19    0.875
## 20    0.875
## 21    0.750
## 22    0.750
## 23    0.625
## 24    0.500
## 25    0.500
## 26    0.500
## 27    0.375
## 28    0.375
## 29    0.500
## 30    0.500
## 31    0.625
## 32    0.625
## 33    0.625
## 34    0.500
## 35    0.625
## 36    0.750
## 37    0.750
## 38    0.750
## 39    0.750
## 40    0.750
## 41    0.750
## 42    0.875
## 43    0.750
## 44    0.625
## 45    0.500
## 46    0.500
## 47    0.500
## 48    0.500
## 49    0.500
## 50    0.500
## 51    0.500
## 52    0.625
## 53    0.625
## 54    0.500
## 55    0.375
## 56    0.375
## 57    0.250
## 58    0.250
## 59    0.250
## 60    0.250
## 61    0.375
## 62    0.375
## 63    0.500
## 64    0.625
## 65    0.750
## 66    0.625
## 67    0.625
## 68    0.625
## 69    0.500
## 70    0.500
## 71    0.375
## 72    0.250
## 73    0.250

p1<-plot_cross_validation_metric(df.cv, metric = 'rmse')
p2<-plot_cross_validation_metric(df.cv, metric = 'mae')
p3<-plot_cross_validation_metric(df.cv, metric = 'mape')
gridExtra::grid.arrange(p1, p2,p3, nrow = 1)

Model Comparison and Validation

Out of sample model performance

Train - Test Split - 5 Years

ARIMA(3,1,1) :RMSE: 0.22 | MAE: 0.17| MAPE: 0.19

Prophet :RMSE: 0.26 | MAE:0.22| MAPE: 0.26

The ARIMA seems to be performing better with lower RMSE, MAE and MAPE values. Hence we will be choosing this model

Final Project

abhiramhari

2/22/2022

A Time Series Study on Rise in Global Temperature

Understanding the Time Series

Motivation

The data

Visualizing the Data

Observations

Arima Modelling and Forecasting

nlog and Boxcox transformations

Testing for mean stationarity

Seasonality

Plotting ACF/PACF

Best Model

Plotting the actual and predicted values

Box-Ljung test

Out of sample testing

Train Test Split

Training the model on train

Forecast

PROPHET model

Model Building

Visualizing the Components

Plotting Changepoints

Saturation Point

Seasonality

Out of sample performance of the model

Cross Validation

Model Comparison and Validation

Out of sample model performance