This Assignment is segregated into three tasks. The first task is to present the best 4 weeks ahead forecast in terms of R squared, AIC, BIC, MASE, etc for the mortality series. The second task is to model and forecast the best 4 years ahead the First Flowering Day (FFD) series.
The third task was split into two, an analysis based on univariate climate regressors, and analysis to obtain the 3 year ahead forecasts.
mort.csv : The dataset contains the mortality rates (mort), temperature (temp), pollutant particle size (part), chemical 1 (chem1), and Chemical 2 (chem2) between 2010 to 2020. FFD.csv : The dataset contains the First Flowering Day (FFD), temperature (temp), rainfall (rain), radiation (rad), and relative humidity (humid) from 1984 to 2014. RBO.csv:
Task 1 : The analysis and forecast of the affects of different factors on Mortality rates.
First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, three methods will be utilized to find the best model for forecasting 4 weeks ahead. The methods that is aimed to be used would be Time Series Regression Models, Exponential Smoothing Methods, and State-space Models. The models from all three methods will be compared based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.
## New names:
## * `` -> ...1
## Rows: 508 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): ...1, mortality, temp, chem1, chem2, particle size
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Figure 1
Figure 2
Figure 1 shows the combined graph of the average weekly mortality in Paris against disease specific mortality factors between 2010 to 2020, while Figure 2 shows the scaled and centered version of the combined graph.
Figure 3
Figure 3 shows the changes of the averaged weekly mortality in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph has an intervention point near the end of 2012 where the average weekly mortality reached its peak.
## Warning in adf.test(mort_mort_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: mort_mort_ts
## Dickey-Fuller = -5.4125, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Figure 4
Figure 4 shows the ACF plot for the average weekly mortality rate. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the mortality time series is a stationary time series that contains seasonality.
Figure 5
Figure 5 shows the changes of the averaged weekly temperature in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.
## Warning in adf.test(mort_temp_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: mort_temp_ts
## Dickey-Fuller = -4.4572, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Figure 6
Figure 6 shows the ACF plot for averaged weekly temperature. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the temperature time series is a stationary time series that contains seasonality.
Figure 7
Figure 7 shows the changes of the averaged weekly chemical 1 readings in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.
## Warning in adf.test(mort_chem1_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: mort_chem1_ts
## Dickey-Fuller = -4.4926, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Figure 8
Figure 8 shows the ACF plot for averaged weekly chemical 1 readings. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the chemical 1 time series is a stationary time series that contains seasonality.
Figure 9
Figure 9 shows the changes of the averaged weekly Chemical 2 readings in Paris over the years. The graph shows a very slight decreasing trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.
## Warning in adf.test(mort_chem2_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: mort_chem2_ts
## Dickey-Fuller = -5.2791, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Figure 10
Figure 10 shows the ACF plot for averaged weekly chemical 2 readings. The ACF plot suggests the existence of a trend. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the chemical 2 time series is a stationary time series that contains a trend.
Figure 11
Figure 11 shows the changes of the averaged weekly particle size readings in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph has an intervention point near 2019 where the averaged particle size reached its peak.
## Warning in adf.test(mort_part_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: mort_part_ts
## Dickey-Fuller = -4.493, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Figure 12
Figure 12 shows the ACF plot for averaged weekly particle size readings. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the particle size time series is a stationary time series that contains seasonality.
Figure 13
From the STL decomposition in Figure 13, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series follows the same overall pattern of the series - with spikes around 2013 and a drop at around 2017. Lastly, the remainder in the STL suggests marginal interventions in the series.
Figure 14
From the STL decomposition in Figure 14, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series follows the same overall pattern of the series - with drops around 2011, 2015, late 2018 and 2019. Lastly, the remainder in the STL suggests marginal interventions in the series.
Figure 15
From the STL decomposition in Figure 15, the seasonality components reflect the overall pattern of the series, suggesting seasonality. The trend graph slightly follows the overtall pattern of the series, but largely deviates from 2014 to 2016. Lastly, the remainder in the STL suggests marginal interventions in the series.
Figure 16
From the STL decomposition in Figure 16, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series generally follows the same overall pattern of the series. Lastly, the remainder in the STL suggests marginal interventions in the series.
Figure 17
From the STL decomposition in Figure 17, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series roughly follows the same overall pattern of the series. Lastly, the remainder in the STL suggests marginal interventions in the series.
## mortality temp chem1 chem2 particle size
## mortality 1.0000000 -0.43863962 0.55744759 0.2569989 0.44387133
## temp -0.4386396 1.00000000 -0.09785582 0.4043740 -0.01723095
## chem1 0.5574476 -0.09785582 1.00000000 0.5130047 0.86611747
## chem2 0.2569989 0.40437401 0.51300467 1.0000000 0.46793404
## particle size 0.4438713 -0.01723095 0.86611747 0.4679340 1.00000000
From the results of the Correlation Coefficient above, a high
correlation between Chemical 1 and Particle Size can be observed at
0.86611747 (implies multicollienarity). Hence these
variables should not be treated as independent variables simultaneously.
Another observation that can be observed is a moderate level correlation
between Chemical 1 and Mortality at 0.55744759. For
modelling purposes, Temperature, Chemical 1, and Chemical 2 will be used
as independent variables with Mortality being the dependent
variable.
## q = 1 AIC = 3357.164 BIC = 3390.992 MASE = 0.9355909
## q = 2 AIC = 3311.412 BIC = 3357.904 MASE = 0.8968938
## q = 3 AIC = 3302.736 BIC = 3361.88 MASE = 0.8915604
## q = 4 AIC = 3290.634 BIC = 3362.418 MASE = 0.8831327
## q = 5 AIC = 3285.337 BIC = 3369.749 MASE = 0.8738662
## q = 6 AIC = 3279.125 BIC = 3376.153 MASE = 0.872103
## q = 7 AIC = 3273.536 BIC = 3383.168 MASE = 0.8611472
## q = 8 AIC = 3272.002 BIC = 3394.226 MASE = 0.8558678
## q = 9 AIC = 3272.207 BIC = 3407.01 MASE = 0.8555663
## q = 10 AIC = 3271.771 BIC = 3419.142 MASE = 0.8553405
## q = 11 AIC = 3272.315 BIC = 3432.241 MASE = 0.8559131
## q = 12 AIC = 3272.547 BIC = 3445.017 MASE = 0.858027
From the results above, a lag length of 10 provides the
lowest AIC, BIC (relative), and MASE values. This means that a value of
q=10 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.747 -3.991 -0.497 3.864 35.304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 112.241711 5.568128 20.158 < 2e-16 ***
## x1.t -0.019743 0.068466 -0.288 0.77320
## x1.1 -0.194046 0.070246 -2.762 0.00597 **
## x1.2 -0.154467 0.072274 -2.137 0.03310 *
## x1.3 -0.058983 0.072991 -0.808 0.41945
## x1.4 0.021698 0.074285 0.292 0.77034
## x1.5 -0.125043 0.074317 -1.683 0.09313 .
## x1.6 -0.094553 0.073752 -1.282 0.20047
## x1.7 -0.038808 0.072428 -0.536 0.59234
## x1.8 -0.021044 0.072333 -0.291 0.77123
## x1.9 -0.021308 0.070011 -0.304 0.76100
## x1.10 0.016000 0.069759 0.229 0.81869
## x3.t 0.822583 0.549961 1.496 0.13541
## x3.1 0.443087 0.551247 0.804 0.42193
## x3.2 0.817593 0.567822 1.440 0.15058
## x3.3 -0.044295 0.568023 -0.078 0.93788
## x3.4 0.486762 0.570700 0.853 0.39414
## x3.5 0.320478 0.572441 0.560 0.57586
## x3.6 0.950698 0.572213 1.661 0.09730 .
## x3.7 0.393360 0.564801 0.696 0.48649
## x3.8 0.298915 0.557305 0.536 0.59197
## x3.9 0.041245 0.546535 0.075 0.93988
## x3.10 0.211691 0.541224 0.391 0.69588
## x4.t 0.114905 0.045779 2.510 0.01241 *
## x4.1 0.031179 0.046125 0.676 0.49939
## x4.2 0.020768 0.047269 0.439 0.66061
## x4.3 -0.008694 0.047953 -0.181 0.85622
## x4.4 0.015852 0.049322 0.321 0.74805
## x4.5 0.039415 0.049634 0.794 0.42754
## x4.6 -0.006331 0.049639 -0.128 0.89857
## x4.7 0.041482 0.048317 0.859 0.39104
## x4.8 0.018642 0.047517 0.392 0.69500
## x4.9 0.041326 0.046109 0.896 0.37057
## x4.10 -0.010018 0.044552 -0.225 0.82219
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.455 on 464 degrees of freedom
## Multiple R-squared: 0.6141, Adjusted R-squared: 0.5867
## F-statistic: 22.38 on 33 and 464 DF, p-value: < 2.2e-16
##
## AIC and BIC values for the model:
## AIC BIC
## 1 3305.418 3452.789
The finite distributed lag model with q=10 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 58.6% (0.5867) of variability in mortality
rates. The p-value of the finite distributed lag model is
2.2e-16 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm$model)
## W = 0.97277, p-value = 5.358e-08
Figure 18
##
## Breusch-Godfrey test for serial correlation of order up to 37
##
## data: Residuals
## LM test = 208.71, df = 37, p-value < 2.2e-16
The residuals in Figure 18 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of <2.2e-16 (<0.05). The histogram of
standardised residuals contain slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 5.358e-08 (<0.05) which rejects the null
hypothesis of normality.
## x1.t x1.1 x1.2 x1.3 x1.4 x1.5 x1.6 x1.7
## 4.591187 4.837413 5.133621 5.235821 5.422560 5.432629 5.341759 5.147467
## x1.8 x1.9 x1.10 x3.t x3.1 x3.2 x3.3 x3.4
## 5.118225 4.757270 4.690602 4.038018 4.045829 4.295368 4.293476 4.321524
## x3.5 x3.6 x3.7 x3.8 x3.9 x3.10 x4.t x4.1
## 4.338476 4.334681 4.210682 4.096153 3.939884 3.864821 5.758777 5.836553
## x4.2 x4.3 x4.4 x4.5 x4.6 x4.7 x4.8 x4.9
## 6.150734 6.300165 6.664955 6.748520 6.742381 6.391789 6.182337 5.816225
## x4.10
## 5.448359
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the finite distributed lag model with lag length of 10 is not successful in capturing the autocorrelation and seasonality in the series, nor is it a good performing model due to its lack of significant terms at 5% significance level and explainability.
As finite distributed lag models suffer from multicollinearity in
general due to the inclusion of of the lags of the same model. To reduce
this multicollinearity, a polynomial shape is imposed on the lag
distribution (Judge and Griffiths, 2000). From the Correlation
Coefficient table above, Mortality has the highest
correlation with Chemical 1. Hence, Chemical 1
will be used to fit the polynomial distributed lag model.
## q = 1 AIC = 3554.683 BIC = 3571.597 MASE = 1.171233
## q = 2 AIC = 3505.901 BIC = 3527.034 MASE = 1.122964
## q = 3 AIC = 3482.544 BIC = 3503.667 MASE = 1.108563
## q = 4 AIC = 3419.705 BIC = 3440.817 MASE = 1.046494
## q = 5 AIC = 3401.12 BIC = 3422.223 MASE = 1.030782
## q = 6 AIC = 3363.07 BIC = 3384.163 MASE = 0.9961896
## q = 7 AIC = 3327.74 BIC = 3348.823 MASE = 0.9528036
## q = 8 AIC = 3310.924 BIC = 3331.997 MASE = 0.9442986
## q = 9 AIC = 3300.181 BIC = 3321.244 MASE = 0.9355695
## q = 10 AIC = 3293.549 BIC = 3314.602 MASE = 0.9339622
## q = 11 AIC = 3287.592 BIC = 3308.634 MASE = 0.9376734
## q = 12 AIC = 3280.596 BIC = 3301.629 MASE = 0.9396848
From the results above, a lag length of 10 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=10 will be selected as the lag length of the
model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.200 0.0841 2.38 1.78e-02
## beta.1 0.196 0.0432 4.54 7.17e-06
## beta.2 0.196 0.0229 8.54 1.76e-16
## beta.3 0.199 0.0318 6.24 9.25e-10
## beta.4 0.205 0.0434 4.73 3.00e-06
## beta.5 0.215 0.0477 4.51 8.19e-06
## beta.6 0.228 0.0434 5.25 2.27e-07
## beta.7 0.244 0.0318 7.67 9.10e-14
## beta.8 0.264 0.0229 11.50 2.57e-27
## beta.9 0.287 0.0432 6.64 8.20e-11
## beta.10 0.314 0.0841 3.73 2.14e-04
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.519 -4.292 -0.555 3.686 32.589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.444286 0.855229 80.030 <2e-16 ***
## z.t0 0.199889 0.084077 2.377 0.0178 *
## z.t1 -0.005372 0.049603 -0.108 0.9138
## z.t2 0.001675 0.004911 0.341 0.7332
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.566 on 494 degrees of freedom
## Multiple R-squared: 0.575, Adjusted R-squared: 0.5724
## F-statistic: 222.8 on 3 and 494 DF, p-value: < 2.2e-16
The finite distributed lag model with q=10 and
k=2 has almost no significant term at 5% level of
significance. The adjusted R-squared score only explains
57.2% (0.5724) of variability in mortality rates. The
p-value of the finite distributed lag model is 2.2e-16
(<0.05) hence, it is statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_k2$model)
## W = 0.97445, p-value = 1.225e-07
Figure 19
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 168.02, df = 10, p-value < 2.2e-16
The residuals in Figure 19 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of <2.2e-16 (<0.05) which confirms the
presence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 9.668e-08
(<0.05) which rejects the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.174 0.03600 4.83 1.83e-06
## beta.1 0.185 0.02930 6.32 5.81e-10
## beta.2 0.197 0.02280 8.64 8.06e-17
## beta.3 0.208 0.01660 12.60 1.63e-31
## beta.4 0.219 0.01130 19.30 2.62e-62
## beta.5 0.231 0.00894 25.80 2.58e-93
## beta.6 0.242 0.01130 21.40 5.63e-72
## beta.7 0.254 0.01660 15.30 1.81e-43
## beta.8 0.265 0.02280 11.60 8.15e-28
## beta.9 0.276 0.02930 9.43 1.66e-19
## beta.10 0.288 0.03600 7.99 9.84e-15
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.514 -4.329 -0.515 3.708 32.530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.512699 0.830623 82.48 < 2e-16 ***
## z.t0 0.173989 0.036026 4.83 1.83e-06 ***
## z.t1 0.011375 0.006979 1.63 0.104
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.56 on 495 degrees of freedom
## Multiple R-squared: 0.5749, Adjusted R-squared: 0.5732
## F-statistic: 334.7 on 2 and 495 DF, p-value: < 2.2e-16
The finite distributed lag model with q=10 and
k=1 has one significant term at 5% level of significance.
The adjusted R-squared score only explains 57.3% (0.5732)
of variability in mortality rates. The p-value of the finite distributed
lag model is 2.2e-16 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_k1$model)
## W = 0.97448, p-value = 1.247e-07
Figure 20
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 167.95, df = 10, p-value < 2.2e-16
The residuals in Figure 20 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of <2.2e-16 (<0.05) which confirms the
presence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 9.666e-08
(<0.05) which rejects the null hypothesis of normality.
In conclusion, the polynomial distributed lag model with lag length of 12 is not successful in capturing the autocorrelation and seasonality in the series, nor is it an excellent performing model due to its lack of explainability.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.19302 -4.18513 -0.06397 3.61514 23.08021
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.50952 2.55334 9.599 < 2e-16 ***
## Y.1 0.66674 0.03637 18.331 < 2e-16 ***
## X.t 0.63624 0.15311 4.155 3.81e-05 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 504 173.382 <2e-16 ***
## Wu-Hausman 1 503 0.372 0.542
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.891 on 504 degrees of freedom
## Multiple R-Squared: 0.6544, Adjusted R-squared: 0.653
## Wald test: 443.5 on 2 and 504 DF, p-value: < 2.2e-16
The Koyck model with q=12 has all of its scores
significant at 5% level of significance. The adjusted R-squared
score explains 65.3% (0.653) of variability in mortality
rates. The p-value of the finite distributed lag model is
2.2e-16 (<0.05) hence, it is statistically significant.
The “Weak Instruments” test resulted in a p-value of 2e-16
which means that the model at the first stage of least squares fitting
is significant at 5% level of significance. The “Wu-Hausman” test
resulted in a p-value of 0.542 which means that there is
significant correlation between the error term and the lag-dependent
variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_dlm$model)
## W = 0.99486, p-value = 0.08876
Figure 21
The residuals in Figure 13 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.08876 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.932188 1.932188
The Variance Inflation Factor (VIF) values suggest that the estimates from the Koyck model does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.
for (i in 1:5){
for(j in 1:5){
ardlm_tempchem = ardlDlm(formula = y~x1+x2+x3,
data = data.frame(mort_import),
p = i,
q = j)
cat("p =", i,
"q =", j,
"AIC =", AIC(ardlm_tempchem$model),
"BIC =", BIC(ardlm_tempchem$model),
"MASE =", MASE(ardlm_tempchem)$MASE, "\n")
}
}
## p = 1 q = 1 AIC = 3172.828 BIC = 3210.884 MASE = 0.7992248
## p = 1 q = 2 AIC = 3096.521 BIC = 3138.787 MASE = 0.7421265
## p = 1 q = 3 AIC = 3092.607 BIC = 3139.078 MASE = 0.7415002
## p = 1 q = 4 AIC = 3089.171 BIC = 3139.842 MASE = 0.7418239
## p = 1 q = 5 AIC = 3085.657 BIC = 3140.525 MASE = 0.7404108
## p = 2 q = 1 AIC = 3152.829 BIC = 3203.548 MASE = 0.7825795
## p = 2 q = 2 AIC = 3094.987 BIC = 3149.931 MASE = 0.7365205
## p = 2 q = 3 AIC = 3091.072 BIC = 3150.215 MASE = 0.7361552
## p = 2 q = 4 AIC = 3087.887 BIC = 3151.226 MASE = 0.736728
## p = 2 q = 5 AIC = 3084.559 BIC = 3152.088 MASE = 0.7357571
## p = 3 q = 1 AIC = 3151.333 BIC = 3214.701 MASE = 0.7820418
## p = 3 q = 2 AIC = 3091.897 BIC = 3159.49 MASE = 0.7345853
## p = 3 q = 3 AIC = 3093.835 BIC = 3165.653 MASE = 0.7343553
## p = 3 q = 4 AIC = 3090.528 BIC = 3166.534 MASE = 0.7349866
## p = 3 q = 5 AIC = 3086.97 BIC = 3167.161 MASE = 0.7336521
## p = 4 q = 1 AIC = 3141.082 BIC = 3217.089 MASE = 0.7737713
## p = 4 q = 2 AIC = 3085.842 BIC = 3166.071 MASE = 0.7266196
## p = 4 q = 3 AIC = 3087.804 BIC = 3172.256 MASE = 0.726738
## p = 4 q = 4 AIC = 3089.783 BIC = 3178.457 MASE = 0.7265216
## p = 4 q = 5 AIC = 3085.988 BIC = 3178.841 MASE = 0.7246658
## p = 5 q = 1 AIC = 3137.901 BIC = 3226.534 MASE = 0.7693759
## p = 5 q = 2 AIC = 3083.264 BIC = 3176.117 MASE = 0.7240193
## p = 5 q = 3 AIC = 3085.252 BIC = 3182.326 MASE = 0.7239573
## p = 5 q = 4 AIC = 3087.202 BIC = 3188.496 MASE = 0.7237336
## p = 5 q = 5 AIC = 3088.542 BIC = 3194.057 MASE = 0.7222653
From the autoregressive distributed lag model above, it can be
observed that the model with the lowest AIC value is
p=2, q=5, the lowest BIC value is p=1, q=5,
and the lowest MASE is p=5, q=5. However, as we are after
the lowest MASE values, the models that will be selected are
ARDL(5,3), ARDL(5,4), and
ARDL(5,5)
ardl_5.3 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 3)
summary(ardl_5.3)
##
## Time series regression with "ts" data:
## Start = 6, End = 508
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.7559 -3.6759 -0.3462 3.3930 25.9639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.866385 4.836345 1.213 0.225719
## X.t 0.118708 0.028294 4.195 3.23e-05 ***
## X.1 -0.140714 0.028959 -4.859 1.59e-06 ***
## X.2 -0.040958 0.029878 -1.371 0.171050
## X.3 -0.013007 0.029501 -0.441 0.659490
## X.4 0.110387 0.028395 3.888 0.000115 ***
## X.5 0.008874 0.028308 0.313 0.754034
## Y.1 0.464299 0.045176 10.278 < 2e-16 ***
## Y.2 0.397573 0.046117 8.621 < 2e-16 ***
## Y.3 0.030109 0.045199 0.666 0.505633
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.453 on 493 degrees of freedom
## Multiple R-squared: 0.7078, Adjusted R-squared: 0.7024
## F-statistic: 132.7 on 9 and 493 DF, p-value: < 2.2e-16
The autoregressive distributed lag model ardl_5.3, has
p=5 and q=3. The x.2, x.3, x.5, and y.3 are
not significant at 5% level of significance. The adjusted R-squared
model explains 70.24% (0.7024) of the variability in
mortality rates. The p value of ardl_5.3 is 2.2e-16
(<0.05) hence, it is significant.
Figure 22
##
## Breusch-Godfrey test for serial correlation of order up to 13
##
## data: Residuals
## LM test = 13.995, df = 13, p-value = 0.3742
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.3$model)
## W = 0.99008, p-value = 0.001811
The residuals in Figure 22 above presents a few observations. The
time series plot shows no non-random pattern. The ACF plot suggests that
there are no serial correlation remaining in the residuals. The
histogram of standardized residuals contain no violations in normality
assumptions. Lastly, the Shapiro-Wilk normality test has a value of
0.3742 (<0.05) so we reject the null hypothesis of
normality.
VIF.ardlm_low_aic_tempchem = vif(ardl_5.3$model)
VIF.ardlm_low_aic_tempchem
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.382499 1.447856 1.541086 1.504983 1.394532 1.385800 3.446049 3.597371
## L(y.t, 3)
## 3.450816
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
ardl_5.4 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 4)
summary(ardl_5.4)
##
## Time series regression with "ts" data:
## Start = 6, End = 508
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.7551 -3.6756 -0.3431 3.3950 25.9633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.8727443 4.9631542 1.183 0.237274
## X.t 0.1187023 0.0283409 4.188 3.33e-05 ***
## X.1 -0.1407106 0.0289942 -4.853 1.63e-06 ***
## X.2 -0.0409772 0.0300940 -1.362 0.173935
## X.3 -0.0130300 0.0298044 -0.437 0.662169
## X.4 0.1104239 0.0291236 3.792 0.000168 ***
## X.5 0.0088496 0.0286569 0.309 0.757596
## Y.1 0.4643057 0.0452373 10.264 < 2e-16 ***
## Y.2 0.3976746 0.0493715 8.055 6.06e-15 ***
## Y.3 0.0302231 0.0493179 0.613 0.540277
## Y.4 -0.0002624 0.0451051 -0.006 0.995361
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.459 on 492 degrees of freedom
## Multiple R-squared: 0.7078, Adjusted R-squared: 0.7018
## F-statistic: 119.2 on 10 and 492 DF, p-value: < 2.2e-16
The autoregressive distributed lag model ardl_5.4, has
p=5 and q=3. The x.2, x.3, x.5, y.3, and y.4
are not significant at 5% level of significance. The adjusted R-squared
model explains 70.18% (0.7018) of the variability in
mortality rates. The p value of ardl_5.4 is 2.2e-16
(<0.05) hence, it is significant.
Figure 23
##
## Breusch-Godfrey test for serial correlation of order up to 14
##
## data: Residuals
## LM test = 26.395, df = 14, p-value = 0.02305
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.4$model)
## W = 0.99008, p-value = 0.00181
The residuals in Figure 23 above presents a few observations. The
time series plot shows no non-random pattern. The ACF plot suggests that
there are no serial correlation remaining in the residuals. The
histogram of standardized residuals contain violations in normality
assumptions. Lastly, the Shapiro-Wilk normality test has a value of
0.02305 (<0.05) so we reject the null hypothesis of
normality.
VIF.ardlm_low_bic_tempchem = vif(ardl_5.4$model)
VIF.ardlm_low_bic_tempchem
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.384251 1.448399 1.560279 1.532967 1.464017 1.417321 3.448419 4.114714
## L(y.t, 3) L(y.t, 4)
## 4.100045 3.436769
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
ardl_5.5 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 5)
summary(ardl_5.5)
##
## Time series regression with "ts" data:
## Start = 6, End = 508
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.8263 -3.6339 -0.3603 3.3448 26.0775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.387355 5.080508 1.060 0.289485
## X.t 0.119648 0.028440 4.207 3.08e-05 ***
## X.1 -0.140247 0.029035 -4.830 1.83e-06 ***
## X.2 -0.041181 0.030122 -1.367 0.172205
## X.3 -0.011349 0.030057 -0.378 0.705898
## X.4 0.112534 0.029514 3.813 0.000155 ***
## X.5 0.006187 0.029271 0.211 0.832675
## Y.1 0.464476 0.045275 10.259 < 2e-16 ***
## Y.2 0.397698 0.049411 8.049 6.35e-15 ***
## Y.3 0.021894 0.052645 0.416 0.677678
## Y.4 -0.009262 0.049287 -0.188 0.851024
## Y.5 0.020354 0.044751 0.455 0.649432
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.463 on 491 degrees of freedom
## Multiple R-squared: 0.7079, Adjusted R-squared: 0.7014
## F-statistic: 108.2 on 11 and 491 DF, p-value: < 2.2e-16
The autoregressive distributed lag model ardl_5.5, has
p=5 and q=5. The x.t, x.1, y.2, y.3, y.4, and
y.5 are not significant at 5% level of significance. The adjusted
R-squared model explains 70.14% (0.7014) of the variability
in mortality rates. The p value of ardl_5.5 is 2.2e-16
(<0.05) hence, it is significant.
Figure 24
##
## Breusch-Godfrey test for serial correlation of order up to 15
##
## data: Residuals
## LM test = 38.522, df = 15, p-value = 0.000755
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.5$model)
## W = 0.98967, p-value = 0.00132
The residuals in Figure 24 above presents a few observations. The
time series plot shows no non-random pattern. The ACF plot suggests that
there are no serial correlation remaining in the residuals. The
histogram of standardized residuals contain violations in normality
assumptions. Lastly, the Shapiro-Wilk normality test has a value of
0.000755 (<0.05) so we reject the null hypothesis of
normality.
VIF.ardlm_low_mase_tempchem = vif(ardl_5.5$model)
VIF.ardlm_low_mase_tempchem
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.391691 1.450184 1.560623 1.556501 1.501095 1.476349 3.448655 4.114719
## L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 4.664404 4.097032 3.376778
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.
| Model | AIC | BIC | MASE |
|---|---|---|---|
| ARDL(5,5) | 3149.559 | 3204.427 | 0.7909231 |
| ARDL(5,3) | 3145.771 | 3192.198 | 0.7925531 |
| ARDL(5,4) | 3147.771 | 3198.418 | 0.7925661 |
| Koyck | 3241.979 | 3258.893 | 0.8708743 |
| Finite DLM | 3305.418 | 3452.789 | 0.8873134 |
| Poly DLM | 3291.667 | 3308.509 | 0.9350304 |
The table above depicts the accuracy measures (AIC, BIC, and MASE) of
the fitted models thus far. It can be seen that ARDL(5,5)
has the lowest MASE value.
Was unable to fit due to high frequency of the data.
In conclusion, based on all the previous observations made - the model that should be used to forecast the Mortality rate 4 weeks ahead should be the ARDL(5,5) model based on the results of its residual analysis and its low MASE value.
To conclude this exploration, the frequency was too high so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the ARDL(5,5) was left as the best model to use for forecasting.
Task 2 : The modelling and forecasting of First Flowering Day (FFD) with single climate predictors (univariate models).
First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, three methods will be utilized to find the best model for forecasting 4 years ahead. The methods that is aimed to be used would be Time Series Regression Models, Exponential Smoothing Methods, and State-space Models. The models from all three methods will be compared based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.
## Rows: 31 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): Year, Temperature, Rainfall, Radiation, RelHumidity, FFD
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Figure 25
Figure 25 above depicts the time series plots for the First Flowering Factors over time.
Figure 26
Figure 26 shows the changes of Temperature over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA). The graph has an intervention point right after 2005 where there was an immense drop in temperature.
##
## Augmented Dickey-Fuller Test
##
## data: ffd_temp_ts
## Dickey-Fuller = -1.6816, Lag order = 3, p-value = 0.6953
## alternative hypothesis: stationary
Figure 27
Figure 27 shows the ACF plot for the Temperature series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.6953 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the temperature time series is not a stationary time series that contains seasonality.
Figure 28
Figure 28 shows the changes of Rainfall amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1995 where there was potentially an immense drought.
##
## Augmented Dickey-Fuller Test
##
## data: ffd_rain_ts
## Dickey-Fuller = -2.3024, Lag order = 3, p-value = 0.4563
## alternative hypothesis: stationary
Figure 29
Figure 29 shows the ACF plot for the Rainfall series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.4563 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the Rainfall time series is not a stationary time series that contains seasonality.
Figure 30
Figure 30 shows the changes of Rainfall amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1990 where there was a drop in radiation amount.
##
## Augmented Dickey-Fuller Test
##
## data: ffd_rad_ts
## Dickey-Fuller = -2.6949, Lag order = 3, p-value = 0.3052
## alternative hypothesis: stationary
Figure 31
Figure 31 shows the ACF plot for the Radiation series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.3052 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the Radiation time series is not a stationary time series that contains seasonality.
Figure 32
Figure 32 shows the changes of Relative Humidity amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has no distinct intervention points.
##
## Augmented Dickey-Fuller Test
##
## data: ffd_humid_ts
## Dickey-Fuller = -2.7992, Lag order = 3, p-value = 0.2651
## alternative hypothesis: stationary
Figure 33
Figure 33 shows the ACF plot for the Relative Humidity series. The ACF plot suggests an existence of a trend and strong seasonal patterns. The ADF test resulted a p-value of 0.2651 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the solar radiation time series is not a stationary time series that contains seasonality.
Figure 34
Figure 34 shows the changes of First Flowering Day over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point in 2000 where there was a drop in First Flowering Day.
##
## Augmented Dickey-Fuller Test
##
## data: ffd_ffd_ts
## Dickey-Fuller = -2.3723, Lag order = 3, p-value = 0.4294
## alternative hypothesis: stationary
Figure 35
Figure 35 shows the ACF plot for the FFD series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.4294 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the FFD time series is not a stationary time series that contains seasonality.
## Year Temperature Rainfall Radiation RelHumidity
## Year 1.0000000 0.148410676 -0.1752091 0.11881829 0.206355767
## Temperature 0.1484107 1.000000000 0.3933255 -0.24096625 0.009646021
## Rainfall -0.1752091 0.393325545 1.0000000 -0.58131610 0.338461007
## Radiation 0.1188183 -0.240966245 -0.5813161 1.00000000 -0.055209652
## RelHumidity 0.2063558 0.009646021 0.3384610 -0.05520965 1.000000000
## FFD -0.2329975 -0.247933708 0.0506911 0.04677758 -0.128502440
## FFD
## Year -0.23299747
## Temperature -0.24793371
## Rainfall 0.05069110
## Radiation 0.04677758
## RelHumidity -0.12850244
## FFD 1.00000000
From the results of the Correlation Coefficient above, the highest
correlation that can be observed is between Temperature and Rainfall at
0.3933255. However, due to the low correlation between most
of the variables, each variable will be used as independent variables
with FFD being the dependent variable.
## q = 1 AIC = 279.6699 BIC = 285.2747 MASE = 0.6644773
## q = 2 AIC = 272.7512 BIC = 279.5877 MASE = 0.6788097
## q = 3 AIC = 264.2921 BIC = 272.2853 MASE = 0.643964
## q = 4 AIC = 258.0491 BIC = 267.12 MASE = 0.6406911
## q = 5 AIC = 247.1156 BIC = 257.1804 MASE = 0.5959619
## q = 6 AIC = 238.7187 BIC = 249.6886 MASE = 0.5794878
## q = 7 AIC = 231.1518 BIC = 242.9323 MASE = 0.5525488
## q = 8 AIC = 224.2942 BIC = 236.7846 MASE = 0.5377635
## q = 9 AIC = 218.0096 BIC = 231.1021 MASE = 0.5596056
## q = 10 AIC = 211.9345 BIC = 225.5133 MASE = 0.5755493
## q = 11 AIC = 197.6879 BIC = 211.6282 MASE = 0.4764284
## q = 12 AIC = 181.9935 BIC = 196.16 MASE = 0.4122077
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=12 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -6.304 20.608 2.689 -10.388 -21.290 9.263 2.032 6.528 -20.530 10.231
## 11 12 13 14 15 16 17 18 19
## -9.392 9.545 1.053 -17.292 7.820 22.525 16.917 -11.337 -12.679
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 216.585 753.624 0.287 0.785
## x2.t -12.329 19.031 -0.648 0.546
## x2.1 -2.080 21.874 -0.095 0.928
## x2.2 4.420 26.151 0.169 0.872
## x2.3 13.786 22.208 0.621 0.562
## x2.4 10.166 23.340 0.436 0.681
## x2.5 17.897 25.140 0.712 0.508
## x2.6 -16.736 25.926 -0.646 0.547
## x2.7 7.064 24.921 0.283 0.788
## x2.8 13.676 22.558 0.606 0.571
## x2.9 -23.511 22.244 -1.057 0.339
## x2.10 18.381 22.496 0.817 0.451
## x2.11 1.459 20.683 0.071 0.946
## x2.12 -33.028 20.068 -1.646 0.161
##
## Residual standard error: 25.75 on 5 degrees of freedom
## Multiple R-squared: 0.6551, Adjusted R-squared: -0.2415
## F-statistic: 0.7306 on 13 and 5 DF, p-value: 0.7019
##
## AIC and BIC values for the model:
## AIC BIC
## 1 181.9935 196.16
The finite distributed lag model with q=12 has no
significant term at 5% significance level. The adjusted R-squared score
explains -24.15% (-0.2415) of variability in FFD values.
The p-value of the finite distributed lag model is 0.7019
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_temp$model)
## W = 0.95288, p-value = 0.4417
The residuals in Figure 36 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.4417 (>0.05) which fails to rejects
the null hypothesis of normality.
## x2.t x2.1 x2.2 x2.3 x2.4 x2.5 x2.6 x2.7
## 1.682976 1.813251 2.088240 1.634591 1.721548 1.946444 2.381222 2.070046
## x2.8 x2.9 x2.10 x2.11 x2.12
## 2.048046 1.625311 1.667104 1.418715 1.300454
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = 281.9769 BIC = 287.5817 MASE = 0.6758928
## q = 2 AIC = 273.8231 BIC = 280.6596 MASE = 0.6752737
## q = 3 AIC = 265.1863 BIC = 273.1795 MASE = 0.6326564
## q = 4 AIC = 257.086 BIC = 266.1568 MASE = 0.6309338
## q = 5 AIC = 246.1244 BIC = 256.1891 MASE = 0.5808908
## q = 6 AIC = 238.0915 BIC = 249.0614 MASE = 0.5522086
## q = 7 AIC = 231.4148 BIC = 243.1954 MASE = 0.5447815
## q = 8 AIC = 224.8048 BIC = 237.2953 MASE = 0.5553913
## q = 9 AIC = 216.4839 BIC = 229.5764 MASE = 0.5521884
## q = 10 AIC = 197.7206 BIC = 211.2994 MASE = 0.391709
## q = 11 AIC = 185.1533 BIC = 199.0936 MASE = 0.3819204
## q = 12 AIC = 178.947 BIC = 193.1136 MASE = 0.3886174
From the results above, a lag length of 11 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=11 will be selected as the lag length of the
model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.895 -11.127 4.817 9.066 17.143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 291.9218 89.8605 3.249 0.0141 *
## x3.t -12.4000 13.3370 -0.930 0.3834
## x3.1 7.3366 13.2496 0.554 0.5970
## x3.2 -12.2796 12.2579 -1.002 0.3498
## x3.3 26.6428 12.8234 2.078 0.0763 .
## x3.4 -31.5044 13.3419 -2.361 0.0502 .
## x3.5 32.9844 14.1965 2.323 0.0531 .
## x3.6 -4.9389 13.3441 -0.370 0.7222
## x3.7 22.7457 14.0886 1.614 0.1505
## x3.8 -18.4183 14.8639 -1.239 0.2552
## x3.9 -9.6326 14.5036 -0.664 0.5279
## x3.10 -36.3197 14.9434 -2.430 0.0454 *
## x3.11 -0.3935 15.6147 -0.025 0.9806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.8 on 7 degrees of freedom
## Multiple R-squared: 0.6872, Adjusted R-squared: 0.151
## F-statistic: 1.282 on 12 and 7 DF, p-value: 0.384
##
## AIC and BIC values for the model:
## AIC BIC
## 1 185.1533 199.0936
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 15.1% (0.151) of variability in FFD values.
The p-value of the finite distributed lag model is 0.384
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rain$model)
## W = 0.93106, p-value = 0.1619
The residuals in Figure 37 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are significant. The histogram of
standardised residuals contains violations in normality assumptions.
This is also reinforced by the Shapiro-Wilk normality test with a value
of 0.1619 (>0.05) which fails to rejects the null
hypothesis of normality.
## x3.t x3.1 x3.2 x3.3 x3.4 x3.5 x3.6 x3.7
## 1.327923 1.349384 1.253635 1.430914 1.454200 1.539660 1.413921 1.426862
## x3.8 x3.9 x3.10 x3.11
## 1.575726 1.333137 1.413099 1.518843
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = 281.571 BIC = 287.1758 MASE = 0.6579604
## q = 2 AIC = 273.2458 BIC = 280.0823 MASE = 0.6672639
## q = 3 AIC = 264.666 BIC = 272.6592 MASE = 0.6164754
## q = 4 AIC = 257.9491 BIC = 267.02 MASE = 0.5968786
## q = 5 AIC = 249.3748 BIC = 259.4396 MASE = 0.5870893
## q = 6 AIC = 235.4683 BIC = 246.4382 MASE = 0.5298748
## q = 7 AIC = 225.705 BIC = 237.4856 MASE = 0.479178
## q = 8 AIC = 219.9335 BIC = 232.424 MASE = 0.4839788
## q = 9 AIC = 211.6427 BIC = 224.7353 MASE = 0.4680972
## q = 10 AIC = 204.3142 BIC = 217.893 MASE = 0.4716663
## q = 11 AIC = 177.2058 BIC = 191.146 MASE = 0.2874734
## q = 12 AIC = 165.1432 BIC = 179.3097 MASE = 0.2509081
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=12 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 2.4332 14.1989 5.1577 -5.7822 -11.2758 -5.1128 -1.9693 10.9014
## 9 10 11 12 13 14 15 16
## 5.9282 -2.1160 -18.1646 -6.9480 8.3747 14.1349 4.6945 -0.5910
## 17 18 19
## -5.5146 -9.0023 0.6531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 642.28 573.74 1.119 0.3138
## x4.t -22.88 26.12 -0.876 0.4212
## x4.1 -26.23 31.50 -0.833 0.4429
## x4.2 23.74 25.82 0.920 0.3999
## x4.3 47.17 29.36 1.606 0.1691
## x4.4 -22.93 15.21 -1.508 0.1920
## x4.5 17.72 18.55 0.955 0.3835
## x4.6 -72.21 21.69 -3.329 0.0208 *
## x4.7 34.69 14.48 2.395 0.0620 .
## x4.8 14.33 15.15 0.946 0.3875
## x4.9 17.57 20.27 0.867 0.4258
## x4.10 26.11 21.55 1.212 0.2798
## x4.11 -55.47 21.49 -2.581 0.0494 *
## x4.12 -11.82 25.53 -0.463 0.6629
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.53 on 5 degrees of freedom
## Multiple R-squared: 0.8579, Adjusted R-squared: 0.4886
## F-statistic: 2.323 on 13 and 5 DF, p-value: 0.1806
##
## AIC and BIC values for the model:
## AIC BIC
## 1 165.1432 179.3097
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 48.86% (0.4886) of variability in FFD values.
The p-value of the finite distributed lag model is 0.1806
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rad$model)
## W = 0.97957, p-value = 0.9373
The residuals in Figure 38 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.9373 (>0.05) which fails to rejects
the null hypothesis of normality.
## x4.t x4.1 x4.2 x4.3 x4.4 x4.5 x4.6 x4.7
## 6.358009 11.309715 7.520321 10.003178 3.642591 5.340914 6.829794 3.039277
## x4.8 x4.9 x4.10 x4.11 x4.12
## 3.027825 4.322888 4.306572 4.305906 5.502498
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = 281.6547 BIC = 287.2595 MASE = 0.6748273
## q = 2 AIC = 273.1762 BIC = 280.0127 MASE = 0.6669837
## q = 3 AIC = 260.6544 BIC = 268.6476 MASE = 0.6124744
## q = 4 AIC = 254.6378 BIC = 263.7087 MASE = 0.6133102
## q = 5 AIC = 248.6573 BIC = 258.722 MASE = 0.624735
## q = 6 AIC = 240.4575 BIC = 251.4274 MASE = 0.590676
## q = 7 AIC = 234.5121 BIC = 246.2927 MASE = 0.5919628
## q = 8 AIC = 227.6822 BIC = 240.1727 MASE = 0.5909197
## q = 9 AIC = 219.2216 BIC = 232.3141 MASE = 0.6170883
## q = 10 AIC = 210.9411 BIC = 224.5199 MASE = 0.5542451
## q = 11 AIC = 196.8558 BIC = 210.7961 MASE = 0.497145
## q = 12 AIC = 171.2149 BIC = 185.3814 MASE = 0.266223
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=12 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 6.4704 -7.5114 0.9400 -7.5200 -3.0373 -1.3563 0.3787 23.8934
## 9 10 11 12 13 14 15 16
## -6.8482 6.1290 -7.7876 0.8417 -0.8325 13.5499 -3.7909 1.5788
## 17 18 19
## 16.7523 -14.0342 -17.8158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2226.2646 2797.5003 0.796 0.4622
## x5.t -10.0485 13.4394 -0.748 0.4883
## x5.1 -42.4313 12.6950 -3.342 0.0205 *
## x5.2 3.8226 8.2746 0.462 0.6635
## x5.3 -5.6566 8.3337 -0.679 0.5274
## x5.4 -4.1856 8.2741 -0.506 0.6345
## x5.5 -9.4392 8.3348 -1.133 0.3088
## x5.6 8.3532 8.0866 1.033 0.3490
## x5.7 1.4311 7.6097 0.188 0.8582
## x5.8 -0.6562 7.8694 -0.083 0.9368
## x5.9 1.3827 7.6967 0.180 0.8645
## x5.10 -8.8407 7.0851 -1.248 0.2674
## x5.11 20.8805 9.9947 2.089 0.0910 .
## x5.12 24.1486 10.3485 2.334 0.0669 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.39 on 5 degrees of freedom
## Multiple R-squared: 0.8044, Adjusted R-squared: 0.296
## F-statistic: 1.582 on 13 and 5 DF, p-value: 0.321
##
## AIC and BIC values for the model:
## AIC BIC
## 1 171.2149 185.3814
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 29.6% (0.296) of variability in FFD values.
The p-value of the finite distributed lag model is 0.321
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_humid$model)
## W = 0.95588, p-value = 0.4943
The residuals in Figure 39 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.4943 (>0.05) which fails to rejects
the null hypothesis of normality.
## x5.t x5.1 x5.2 x5.3 x5.4 x5.5 x5.6 x5.7
## 4.889400 4.280613 2.057443 2.088074 1.913716 1.870104 1.829416 2.026883
## x5.8 x5.9 x5.10 x5.11 x5.12
## 2.171511 2.057850 1.759417 3.480250 3.810613
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.488 -11.978 2.279 10.274 22.195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x2.t -11.9314 17.4692 -0.683 0.520
## x2.1 0.6082 18.1984 0.033 0.974
## x2.2 7.7678 21.5484 0.360 0.731
## x2.3 14.8910 20.1316 0.740 0.487
## x2.4 11.3869 21.1229 0.539 0.609
## x2.5 20.9052 21.0371 0.994 0.359
## x2.6 -13.5718 21.6023 -0.628 0.553
## x2.7 8.7907 22.2609 0.395 0.707
## x2.8 14.7124 20.4955 0.718 0.500
## x2.9 -22.6576 20.2895 -1.117 0.307
## x2.10 20.4306 19.6362 1.040 0.338
## x2.11 2.9584 18.4209 0.161 0.878
## x2.12 -32.1126 18.2361 -1.761 0.129
##
## Residual standard error: 23.7 on 6 degrees of freedom
## Multiple R-squared: 0.9958, Adjusted R-squared: 0.9867
## F-statistic: 109.5 on 13 and 6 DF, p-value: 5.013e-06
##
## AIC and BIC values for the model:
## AIC BIC
## 1 180.3048 193.5269
The finite distributed lag model with q=12 has no
significant term at 5% significance level. The adjusted R-squared score
only explains 98.67% (0.9867) of variability in temperature
values. The p-value of the finite distributed lag model is
5.013e-06 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_temp_noint$model)
## W = 0.93973, p-value = 0.2608
The residuals in Figure 40 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.2608 (>0.05) which fails to rejects
the null hypothesis of normality.
## x2.t x2.1 x2.2 x2.3 x2.4 x2.5 x2.6 x2.7
## 927.9078 994.5955 1381.9446 1208.7558 1328.6413 1316.7081 1396.2569 1489.3199
## x2.8 x2.9 x2.10 x2.11 x2.12
## 1268.4649 1249.7225 1166.0083 1030.2099 1011.4169
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.48 -10.09 -1.64 13.63 32.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x3.t 16.3612 21.0059 0.779 0.466
## x3.1 -0.5328 23.0576 -0.023 0.982
## x3.2 17.7826 20.4537 0.869 0.418
## x3.3 23.8554 19.2595 1.239 0.262
## x3.4 -3.6371 22.7883 -0.160 0.878
## x3.5 30.8255 20.7933 1.482 0.189
## x3.6 15.4292 21.9817 0.702 0.509
## x3.7 7.3863 23.5528 0.314 0.764
## x3.8 -12.1466 21.1688 -0.574 0.587
## x3.9 -25.2130 26.8412 -0.939 0.384
## x3.10 -13.1541 20.6498 -0.637 0.548
## x3.11 1.2068 23.0108 0.052 0.960
## x3.12 31.8473 22.9846 1.386 0.215
##
## Residual standard error: 29.68 on 6 degrees of freedom
## Multiple R-squared: 0.9934, Adjusted R-squared: 0.9792
## F-statistic: 69.67 on 13 and 6 DF, p-value: 1.915e-05
##
## AIC and BIC values for the model:
## AIC BIC
## 1 188.8569 202.0791
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 97.92% (0.9792) of variability in FFD values.
The p-value of the finite distributed lag model is
1.915e-05 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rain_noint$model)
## W = 0.97658, p-value = 0.8957
The residuals in Figure 41 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.8957 (>0.05) which fails to rejects
the null hypothesis of normality.
## x3.t x3.1 x3.2 x3.3 x3.4 x3.5 x3.6 x3.7
## 50.30699 62.46667 47.94243 43.15138 60.10997 49.66828 56.38420 67.53181
## x3.8 x3.9 x3.10 x3.11 x3.12
## 55.20205 90.55487 53.90969 66.49923 66.97290
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.7286 -7.2575 0.4361 6.3411 17.7339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x4.t -10.440 24.135 -0.433 0.6804
## x4.1 -20.256 31.692 -0.639 0.5463
## x4.2 26.721 26.215 1.019 0.3474
## x4.3 35.185 27.914 1.260 0.2543
## x4.4 -19.062 15.123 -1.260 0.2543
## x4.5 20.210 18.802 1.075 0.3237
## x4.6 -64.164 20.892 -3.071 0.0219 *
## x4.7 34.732 14.785 2.349 0.0571 .
## x4.8 17.398 15.207 1.144 0.2962
## x4.9 10.527 19.676 0.535 0.6119
## x4.10 31.167 21.510 1.449 0.1975
## x4.11 -55.018 21.935 -2.508 0.0460 *
## x4.12 6.888 19.707 0.350 0.7386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.87 on 6 degrees of freedom
## Multiple R-squared: 0.9979, Adjusted R-squared: 0.9933
## F-statistic: 216.6 on 13 and 6 DF, p-value: 6.582e-07
##
## AIC and BIC values for the model:
## AIC BIC
## 1 167.3926 180.6147
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 99.33% (0.9933) of variability in FFD values.
The p-value of the finite distributed lag model is
6.582e-07 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rad_noint$model)
## W = 0.98551, p-value = 0.9869
The residuals in Figure 42 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.9869 (>0.05) which fails to rejects
the null hypothesis of normality.
## x4.t x4.1 x4.2 x4.3 x4.4 x4.5 x4.6 x4.7
## 8382.247 14383.636 9847.985 11154.732 3260.759 5044.377 6207.043 3108.375
## x4.8 x4.9 x4.10 x4.11 x4.12
## 3277.348 5456.692 6493.486 6754.328 5442.008
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.318 -6.743 -1.152 7.154 25.578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x5.t -5.846 11.975 -0.488 0.6427
## x5.1 -39.198 11.654 -3.363 0.0152 *
## x5.2 5.165 7.849 0.658 0.5349
## x5.3 -3.439 7.610 -0.452 0.6672
## x5.4 -2.002 7.564 -0.265 0.8001
## x5.5 -7.599 7.759 -0.979 0.3652
## x5.6 8.795 7.817 1.125 0.3035
## x5.7 3.596 6.886 0.522 0.6203
## x5.8 1.720 7.055 0.244 0.8156
## x5.9 4.714 6.258 0.753 0.4798
## x5.10 -7.426 6.646 -1.117 0.3066
## x5.11 20.133 9.642 2.088 0.0818 .
## x5.12 23.681 10.011 2.365 0.0559 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.79 on 6 degrees of freedom
## Multiple R-squared: 0.9974, Adjusted R-squared: 0.9916
## F-statistic: 174.6 on 13 and 6 DF, p-value: 1.253e-06
##
## AIC and BIC values for the model:
## AIC BIC
## 1 171.4808 184.7029
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 99.16% (0.9916) of variability in FFD values.
The p-value of the finite distributed lag model is
1.253e-06 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_humid_noint$model)
## W = 0.97026, p-value = 0.7816
The residuals in Figure 43 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.7816 (>0.05) which fails to rejects
the null hypothesis of normality.
## x5.t x5.1 x5.2 x5.3 x5.4 x5.5 x5.6 x5.7
## 69246.00 65603.71 29741.97 27957.85 27566.33 28917.65 29322.45 22798.57
## x5.8 x5.9 x5.10 x5.11 x5.12
## 23929.63 18821.45 21218.72 44660.48 48118.90
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.
In conclusion, the finite distributed lag models with Intercepts are not good performing models due to their lack of significant terms at 5% significance level and explainability. The finite distributed lag models without Intercepts are not good performing models due to their lack of significant terms at 5% significance level.
As finite distributed lag models suffer from multicollinearity in general due to the inclusion of of the lags of the same model. To reduce this multicollinearity, a polynomial shape is imposed on the lag distribution (Judge and Griffiths, 2000).
## q = 1 AIC = 279.6699 BIC = 285.2747 MASE = 0.6644773
## q = 2 AIC = 272.7512 BIC = 279.5877 MASE = 0.6788097
## q = 3 AIC = 262.6923 BIC = 269.3533 MASE = 0.6467993
## q = 4 AIC = 254.835 BIC = 261.3141 MASE = 0.6493527
## q = 5 AIC = 243.9959 BIC = 250.2863 MASE = 0.6548472
## q = 6 AIC = 234.0051 BIC = 240.0995 MASE = 0.6326293
## q = 7 AIC = 225.7006 BIC = 231.5909 MASE = 0.6177101
## q = 8 AIC = 218.7617 BIC = 224.4392 MASE = 0.6419454
## q = 9 AIC = 208.9273 BIC = 214.3825 MASE = 0.6378389
## q = 10 AIC = 201.1493 BIC = 206.3719 MASE = 0.6461061
## q = 11 AIC = 187.5064 BIC = 192.485 MASE = 0.6140005
## q = 12 AIC = 176.3678 BIC = 181.09 MASE = 0.5829889
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE (relative) values. This means that a value of
q=12 will be selected as the lag length of the model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -8.200 7.32 -1.1200 0.3000
## beta.1 -3.690 6.43 -0.5740 0.5840
## beta.2 -0.118 6.18 -0.0191 0.9850
## beta.3 2.530 6.28 0.4020 0.6990
## beta.4 4.240 6.45 0.6570 0.5320
## beta.5 5.020 6.52 0.7700 0.4670
## beta.6 4.870 6.40 0.7610 0.4710
## beta.7 3.790 6.07 0.6240 0.5520
## beta.8 1.780 5.61 0.3170 0.7610
## beta.9 -1.160 5.22 -0.2230 0.8300
## beta.10 -5.040 5.26 -0.9570 0.3710
## beta.11 -9.840 6.12 -1.6100 0.1520
## beta.12 -15.600 7.91 -1.9700 0.0897
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.81 -15.40 -4.26 12.70 38.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 408.4707 590.1991 0.692 0.4995
## z.t0 -8.1974 7.3248 -1.119 0.2807
## z.t1 4.9707 2.5445 1.954 0.0697 .
## z.t2 -0.4655 0.2211 -2.105 0.0525 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.7 on 15 degrees of freedom
## Multiple R-squared: 0.2651, Adjusted R-squared: 0.1181
## F-statistic: 1.804 on 3 and 15 DF, p-value: 0.1897
The polynomial distributed lag model with q=12 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains 11.81% (0.1181) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.1897 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_temp_k2$model)
## W = 0.97181, p-value = 0.8119
Figure 44
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 11.463, df = 7, p-value = 0.1197
The residuals in Figure 44 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.1197 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.8119
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -3.64 7.71 -0.472 0.651
## beta.1 -3.77 7.09 -0.532 0.611
## beta.2 -3.89 6.51 -0.598 0.569
## beta.3 -4.02 6.01 -0.669 0.525
## beta.4 -4.15 5.60 -0.741 0.483
## beta.5 -4.27 5.29 -0.808 0.446
## beta.6 -4.40 5.11 -0.861 0.418
## beta.7 -4.53 5.08 -0.891 0.402
## beta.8 -4.65 5.19 -0.897 0.400
## beta.9 -4.78 5.44 -0.880 0.408
## beta.10 -4.91 5.80 -0.846 0.425
## beta.11 -5.03 6.26 -0.804 0.448
## beta.12 -5.16 6.80 -0.759 0.473
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.468 -14.619 1.136 12.988 42.156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 743.5714 626.3309 1.187 0.252
## z.t0 -3.6391 7.7115 -0.472 0.643
## z.t1 -0.1269 0.8619 -0.147 0.885
##
## Residual standard error: 23.92 on 16 degrees of freedom
## Multiple R-squared: 0.04797, Adjusted R-squared: -0.07103
## F-statistic: 0.4031 on 2 and 16 DF, p-value: 0.6748
The polynomial distributed lag model with q=12 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.71% (-0.07103) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.6748 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_temp_k1$model)
## W = 0.9799, p-value = 0.9412
Figure 45
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 5.2932, df = 6, p-value = 0.5068
The residuals in Figure 45 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.5068 (>0.05) which confirms the absence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.9412
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = 281.9769 BIC = 287.5817 MASE = 0.6758928
## q = 2 AIC = 273.8231 BIC = 280.6596 MASE = 0.6752737
## q = 3 AIC = 263.1955 BIC = 269.8565 MASE = 0.633195
## q = 4 AIC = 254.6528 BIC = 261.132 MASE = 0.6352745
## q = 5 AIC = 247.2862 BIC = 253.5766 MASE = 0.6312437
## q = 6 AIC = 238.2623 BIC = 244.3567 MASE = 0.6392046
## q = 7 AIC = 230.1033 BIC = 235.9936 MASE = 0.6407467
## q = 8 AIC = 221.8263 BIC = 227.5038 MASE = 0.653508
## q = 9 AIC = 211.4687 BIC = 216.9239 MASE = 0.6498614
## q = 10 AIC = 201.0167 BIC = 206.2393 MASE = 0.6248857
## q = 11 AIC = 188.3055 BIC = 193.2841 MASE = 0.5934777
## q = 12 AIC = 181.9729 BIC = 186.6951 MASE = 0.6685301
From the results above, a lag length of 11 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=11 will be selected as the lag length of the
model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -2.950 9.66 -0.3050 0.767
## beta.1 -0.302 6.79 -0.0445 0.965
## beta.2 1.600 5.30 0.3010 0.770
## beta.3 2.750 5.04 0.5450 0.599
## beta.4 3.150 5.26 0.6000 0.564
## beta.5 2.810 5.34 0.5260 0.611
## beta.6 1.720 5.05 0.3420 0.740
## beta.7 -0.111 4.45 -0.0249 0.981
## beta.8 -2.690 4.09 -0.6590 0.526
## beta.9 -6.020 5.03 -1.2000 0.261
## beta.10 -10.100 7.64 -1.3200 0.219
## beta.11 -14.900 11.50 -1.2900 0.228
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.660 -11.415 -1.195 18.459 37.888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 264.5629 99.3182 2.664 0.017 *
## z.t0 -2.9484 9.6630 -0.305 0.764
## z.t1 3.0195 4.2630 0.708 0.489
## z.t2 -0.3735 0.4013 -0.931 0.366
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.34 on 16 degrees of freedom
## Multiple R-squared: 0.0993, Adjusted R-squared: -0.06958
## F-statistic: 0.588 on 3 and 16 DF, p-value: 0.6317
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.69% (-0.06958) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.6317 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_rain_k2$model)
## W = 0.96682, p-value = 0.6867
Figure 46
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 8.9029, df = 7, p-value = 0.2597
The residuals in Figure 46 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.6867 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.2597
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 3.2500 6.97 0.4670 0.652
## beta.1 2.4200 6.10 0.3970 0.700
## beta.2 1.5900 5.28 0.3010 0.770
## beta.3 0.7610 4.55 0.1670 0.871
## beta.4 -0.0695 3.94 -0.0176 0.986
## beta.5 -0.9000 3.54 -0.2540 0.805
## beta.6 -1.7300 3.41 -0.5080 0.624
## beta.7 -2.5600 3.57 -0.7170 0.491
## beta.8 -3.3900 4.00 -0.8480 0.418
## beta.9 -4.2200 4.62 -0.9140 0.385
## beta.10 -5.0500 5.36 -0.9420 0.371
## beta.11 -5.8800 6.19 -0.9510 0.367
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.512 -11.406 -4.063 18.739 44.638
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 241.9881 95.9293 2.523 0.0219 *
## z.t0 3.2531 6.9696 0.467 0.6466
## z.t1 -0.8307 1.0226 -0.812 0.4278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.25 on 17 degrees of freedom
## Multiple R-squared: 0.05055, Adjusted R-squared: -0.06115
## F-statistic: 0.4526 on 2 and 17 DF, p-value: 0.6434
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.61% (-0.06115) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.6434 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_rain_k1$model)
## W = 0.97334, p-value = 0.8232
Figure 47
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 5.7242, df = 6, p-value = 0.4548
The residuals in Figure 47 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.4548 (>0.05) which confirms the absence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.8232
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = 281.571 BIC = 287.1758 MASE = 0.6579604
## q = 2 AIC = 273.2458 BIC = 280.0823 MASE = 0.6672639
## q = 3 AIC = 263.5785 BIC = 270.2395 MASE = 0.622861
## q = 4 AIC = 254.5356 BIC = 261.0148 MASE = 0.6027035
## q = 5 AIC = 246.071 BIC = 252.3614 MASE = 0.6504138
## q = 6 AIC = 239.3461 BIC = 245.4405 MASE = 0.6488056
## q = 7 AIC = 230.74 BIC = 236.6303 MASE = 0.6443999
## q = 8 AIC = 222.2164 BIC = 227.8938 MASE = 0.6629663
## q = 9 AIC = 212.7091 BIC = 218.1644 MASE = 0.6697067
## q = 10 AIC = 203.4282 BIC = 208.6509 MASE = 0.6463697
## q = 11 AIC = 189.6827 BIC = 194.6614 MASE = 0.6384276
## q = 12 AIC = 181.6977 BIC = 186.4199 MASE = 0.6703218
From the results above, a lag length of 11 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=11 will be selected as the lag length of the
model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -1.130 7.98 -0.1420 0.890
## beta.1 0.198 5.29 0.0374 0.971
## beta.2 1.240 3.60 0.3430 0.739
## beta.3 1.980 3.07 0.6470 0.534
## beta.4 2.440 3.23 0.7540 0.470
## beta.5 2.600 3.43 0.7580 0.468
## beta.6 2.470 3.39 0.7290 0.485
## beta.7 2.050 3.18 0.6440 0.536
## beta.8 1.330 3.26 0.4080 0.693
## beta.9 0.325 4.31 0.0755 0.941
## beta.10 -0.975 6.46 -0.1510 0.883
## beta.11 -2.570 9.51 -0.2700 0.793
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.465 -12.733 -5.073 15.051 46.539
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 58.2976 521.6177 0.112 0.912
## z.t0 -1.1343 7.9843 -0.142 0.889
## z.t1 1.4782 3.4488 0.429 0.674
## z.t2 -0.1462 0.3196 -0.458 0.653
##
## Residual standard error: 24.16 on 16 degrees of freedom
## Multiple R-squared: 0.03509, Adjusted R-squared: -0.1458
## F-statistic: 0.1939 on 3 and 16 DF, p-value: 0.899
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.14% (-0.1458) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.899 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_rad_k2$model)
## W = 0.98619, p-value = 0.9879
Figure 48
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 6.2658, df = 7, p-value = 0.5091
The residuals in Figure 48 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.5091 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.9879
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 1.88 4.40 0.428 0.679
## beta.1 1.82 3.84 0.472 0.648
## beta.2 1.75 3.34 0.524 0.613
## beta.3 1.69 2.93 0.576 0.579
## beta.4 1.62 2.63 0.616 0.553
## beta.5 1.56 2.50 0.622 0.549
## beta.6 1.49 2.56 0.581 0.575
## beta.7 1.43 2.81 0.508 0.624
## beta.8 1.36 3.19 0.427 0.680
## beta.9 1.29 3.66 0.354 0.732
## beta.10 1.23 4.20 0.293 0.776
## beta.11 1.16 4.78 0.244 0.813
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.881 -13.395 -3.902 16.646 43.284
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -62.69652 439.02324 -0.143 0.888
## z.t0 1.88137 4.39971 0.428 0.674
## z.t1 -0.06515 0.69997 -0.093 0.927
##
## Residual standard error: 23.59 on 17 degrees of freedom
## Multiple R-squared: 0.02246, Adjusted R-squared: -0.09254
## F-statistic: 0.1953 on 2 and 17 DF, p-value: 0.8244
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.09% (-0.09254) of
variability in FFD values. The p-value of the polynomial distributed lag
model is 0.8244 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_rad_k1$model)
## W = 0.9905, p-value = 0.9987
Figure 49
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 5.7074, df = 6, p-value = 0.4568
The residuals in Figure 49 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.4568 (>0.05) which confirms the absence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.9987
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = 281.6547 BIC = 287.2595 MASE = 0.6748273
## q = 2 AIC = 273.1762 BIC = 280.0127 MASE = 0.6669837
## q = 3 AIC = 258.9412 BIC = 265.6022 MASE = 0.6054505
## q = 4 AIC = 252.2204 BIC = 258.6996 MASE = 0.6198356
## q = 5 AIC = 244.2742 BIC = 250.5647 MASE = 0.6376046
## q = 6 AIC = 235.0038 BIC = 241.0982 MASE = 0.6305211
## q = 7 AIC = 227.0295 BIC = 232.9198 MASE = 0.6183031
## q = 8 AIC = 218.1006 BIC = 223.7781 MASE = 0.600106
## q = 9 AIC = 206.7468 BIC = 212.202 MASE = 0.6166688
## q = 10 AIC = 200.4011 BIC = 205.6237 MASE = 0.652098
## q = 11 AIC = 187.3064 BIC = 192.2851 MASE = 0.6058448
## q = 12 AIC = 175.329 BIC = 180.0512 MASE = 0.5564647
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE (relative) values. This means that a value of
q=12 will be selected as the lag length of the model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -9.6000 4.57 -2.1000 0.0736
## beta.1 -7.9100 3.75 -2.1100 0.0729
## beta.2 -6.2400 3.25 -1.9200 0.0958
## beta.3 -4.6200 3.01 -1.5300 0.1690
## beta.4 -3.0300 2.96 -1.0200 0.3410
## beta.5 -1.4700 2.98 -0.4940 0.6360
## beta.6 0.0495 2.99 0.0165 0.9870
## beta.7 1.5300 2.98 0.5150 0.6230
## beta.8 2.9800 2.96 1.0100 0.3470
## beta.9 4.4000 2.99 1.4700 0.1850
## beta.10 5.7700 3.18 1.8200 0.1120
## beta.11 7.1200 3.61 1.9700 0.0896
## beta.12 8.4200 4.36 1.9300 0.0949
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.429 -12.764 -4.294 13.443 37.525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 456.43947 2749.97230 0.166 0.8704
## z.t0 -9.60404 4.56730 -2.103 0.0528 .
## z.t1 1.71555 1.41059 1.216 0.2427
## z.t2 -0.01777 0.10502 -0.169 0.8679
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.12 on 15 degrees of freedom
## Multiple R-squared: 0.3042, Adjusted R-squared: 0.1651
## F-statistic: 2.186 on 3 and 15 DF, p-value: 0.1321
The polynomial distributed lag model with q=12 has none
of its scores at 5% level of significance. The adjusted R-squared score
only explains 16.51% (0.1651) of variability in FFD values.
The p-value of the polynomial distributed lag model is
0.1321 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_humid_k2$model)
## W = 0.96926, p-value = 0.7616
Figure 50
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 12.031, df = 7, p-value = 0.09954
The residuals in Figure 50 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.7616 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.09954
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -9.290 4.05 -2.290 0.0555
## beta.1 -7.790 3.58 -2.180 0.0657
## beta.2 -6.300 3.13 -2.010 0.0843
## beta.3 -4.800 2.73 -1.750 0.1230
## beta.4 -3.300 2.41 -1.370 0.2130
## beta.5 -1.800 2.18 -0.826 0.4360
## beta.6 -0.302 2.09 -0.145 0.8890
## beta.7 1.200 2.15 0.557 0.5950
## beta.8 2.700 2.35 1.150 0.2890
## beta.9 4.190 2.66 1.580 0.1580
## beta.10 5.690 3.04 1.870 0.1030
## beta.11 7.190 3.48 2.070 0.0774
## beta.12 8.690 3.95 2.200 0.0636
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.80 -12.41 -3.61 12.99 37.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 582.2831 2565.8952 0.227 0.8234
## z.t0 -9.2926 4.0512 -2.294 0.0357 *
## z.t1 1.4985 0.5685 2.636 0.0180 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.46 on 16 degrees of freedom
## Multiple R-squared: 0.3029, Adjusted R-squared: 0.2158
## F-statistic: 3.476 on 2 and 16 DF, p-value: 0.05577
The polynomial distributed lag model with q=12 has all
of its scores at 5% level of significance. The adjusted R-squared score
only explains 21.58% (0.2158) of variability in FFD values.
The p-value of the polynomial distributed lag model is
0.05577 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_ffd_humid_k1$model)
## W = 0.97238, p-value = 0.8228
Figure 51
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 9.1529, df = 6, p-value = 0.1652
The residuals in Figure 51 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.8228 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.1652
(>0.05) which fails to reject the null hypothesis of normality.
In conclusion, the polynomial distributed lag models were not successful in capturing the autocorrelation and seasonality in the series, nor are they excellent performing models due to its lack of explainability.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.755 -12.972 -4.079 17.329 58.541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.87216 608.22407 -0.033 0.974
## Y.1 0.03846 0.25384 0.152 0.881
## X.t 23.22042 61.08917 0.380 0.707
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 1.429 0.242
## Wu-Hausman 1 26 0.582 0.452
## Sargan 0 NA NA NA
##
## Residual standard error: 28.38 on 27 degrees of freedom
## Multiple R-Squared: -0.3272, Adjusted R-squared: -0.4255
## Wald test: 0.07269 on 2 and 27 DF, p-value: 0.9301
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains -0.42%
(-0.4255) of variability in FFD values. The p-value of the Koyck model
is 0.9301 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.242 which means that the model at the first stage of
least squares fitting is not significant at 5% level of significance.
The “Wu-Hausman” test also resulted in a p-value of 0.452
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_ffd_temp_dlm$model)
## W = 0.98351, p-value = 0.9093
Figure 52
The residuals in Figure 52 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.9093 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.280988 1.280988
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.755 -12.972 -4.079 17.329 58.541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.87216 608.22407 -0.033 0.974
## Y.1 0.03846 0.25384 0.152 0.881
## X.t 23.22042 61.08917 0.380 0.707
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 1.429 0.242
## Wu-Hausman 1 26 0.582 0.452
## Sargan 0 NA NA NA
##
## Residual standard error: 28.38 on 27 degrees of freedom
## Multiple R-Squared: -0.3272, Adjusted R-squared: -0.4255
## Wald test: 0.07269 on 2 and 27 DF, p-value: 0.9301
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains -0.42%
(-0.4255) of variability in FFD values. The p-value of the Koyck model
is 0.9301 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.242 which means that the model at the first stage of
least squares fitting is not significant at 5% level of significance.
The “Wu-Hausman” test also resulted in a p-value of 0.452
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_ffd_rain_dlm$model)
## W = 0.98582, p-value = 0.9504
Figure 53
The residuals in Figure 53 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.9504 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.017508 1.017508
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.229 -19.662 3.956 16.232 54.756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 418.31843 384.12678 1.089 0.286
## Y.1 -0.03254 0.20729 -0.157 0.876
## X.t -13.87153 25.49529 -0.544 0.591
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 7.199 0.0123 *
## Wu-Hausman 1 26 0.538 0.4698
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.54 on 27 degrees of freedom
## Multiple R-Squared: -0.07436, Adjusted R-squared: -0.1539
## Wald test: 0.1486 on 2 and 27 DF, p-value: 0.8626
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains -0.15%
(-0.1539) of variability in FFD values. The p-value of the Koyck model
is 0.8626 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.0123 which means that the model at the first stage of
least squares fitting is significant at 5% level of significance. The
“Wu-Hausman” test also resulted in a p-value of 0.4698
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_ffd_rad_dlm$model)
## W = 0.98748, p-value = 0.9716
Figure 54
The residuals in Figure 54 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.9716 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.055257 1.055257
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.787 -14.896 -3.024 15.673 55.019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1898.18932 4211.29973 0.451 0.656
## Y.1 -0.05904 0.25016 -0.236 0.815
## X.t -17.72952 44.24103 -0.401 0.692
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 0.570 0.457
## Wu-Hausman 1 26 0.121 0.730
## Sargan 0 NA NA NA
##
## Residual standard error: 27 on 27 degrees of freedom
## Multiple R-Squared: -0.2016, Adjusted R-squared: -0.2906
## Wald test: 0.0808 on 2 and 27 DF, p-value: 0.9226
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains -0.29%
(-0.2906) of variability in FFD values. The p-value of the Koyck model
is 0.9226 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.457 which means that the model at the first stage of
least squares fitting is not significant at 5% level of significance.
The “Wu-Hausman” test also resulted in a p-value of 0.730
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_ffd_humid_dlm$model)
## W = 0.98176, p-value = 0.8702
Figure 55
The residuals in Figure 55 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
not significant. The histogram of standardised residuals contains slight
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.8702
(>0.05) which fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.374145 1.374145
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.
## p = 1 q = 1 AIC = 285.0202 BIC = 294.8286 MASE = 0.6413659
## p = 1 q = 2 AIC = 278.1612 BIC = 289.0995 MASE = 0.6569318
## p = 1 q = 3 AIC = 271.2682 BIC = 283.2581 MASE = 0.6415733
## p = 1 q = 4 AIC = 264.4753 BIC = 277.4336 MASE = 0.6252734
## p = 1 q = 5 AIC = 257.6021 BIC = 271.4411 MASE = 0.6025462
## p = 2 q = 1 AIC = 279.4094 BIC = 291.7151 MASE = 0.6670564
## p = 2 q = 2 AIC = 281.2235 BIC = 294.8965 MASE = 0.6629674
## p = 2 q = 3 AIC = 274.0829 BIC = 288.7372 MASE = 0.6480681
## p = 2 q = 4 AIC = 267.1828 BIC = 282.7328 MASE = 0.6160513
## p = 2 q = 5 AIC = 260.6971 BIC = 277.0523 MASE = 0.613472
## p = 3 q = 1 AIC = 271.5688 BIC = 286.2231 MASE = 0.6153833
## p = 3 q = 2 AIC = 273.4313 BIC = 289.4178 MASE = 0.6159004
## p = 3 q = 3 AIC = 275.3477 BIC = 292.6664 MASE = 0.6150322
## p = 3 q = 4 AIC = 268.0854 BIC = 286.2271 MASE = 0.5856246
## p = 3 q = 5 AIC = 261.9042 BIC = 280.7757 MASE = 0.6007602
## p = 4 q = 1 AIC = 265.3386 BIC = 282.1844 MASE = 0.5859466
## p = 4 q = 2 AIC = 266.1178 BIC = 284.2595 MASE = 0.5711558
## p = 4 q = 3 AIC = 268.0671 BIC = 287.5046 MASE = 0.5705796
## p = 4 q = 4 AIC = 269.0287 BIC = 289.762 MASE = 0.567119
## p = 4 q = 5 AIC = 262.5279 BIC = 283.9155 MASE = 0.5766056
## p = 5 q = 1 AIC = 247.989 BIC = 266.8604 MASE = 0.4786405
## p = 5 q = 2 AIC = 249.9562 BIC = 270.0858 MASE = 0.4789554
## p = 5 q = 3 AIC = 249.5842 BIC = 270.9719 MASE = 0.4447671
## p = 5 q = 4 AIC = 244.8735 BIC = 267.5192 MASE = 0.3751994
## p = 5 q = 5 AIC = 245.1251 BIC = 269.029 MASE = 0.3647243
From the autoregressive distributed lag model above, it can be
observed that the model with the lowest AIC value is
p=5, q=4, the lowest BIC value is p=5, q=1,
and the lowest MASE is p=5, q=5. However, as we are after
the lowest MASE values, the models that will be selected are
ARDL(5,3), ARDL(5,4), and
ARDL(5,5)
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.476 -17.398 -1.093 17.228 41.667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2283.90942 2412.18598 0.947 0.358
## X.t -3.16791 6.85976 -0.462 0.650
## X.1 -3.59211 6.53554 -0.550 0.590
## X.2 -4.37282 7.09358 -0.616 0.546
## X.3 -5.85138 7.06229 -0.829 0.420
## X.4 -5.63846 6.76624 -0.833 0.417
## X.5 1.36698 7.73994 0.177 0.862
## Y.1 -0.06381 0.25594 -0.249 0.806
## Y.2 0.04109 0.26854 0.153 0.880
## Y.3 -0.04817 0.27124 -0.178 0.861
##
## Residual standard error: 28.68 on 16 degrees of freedom
## Multiple R-squared: 0.1273, Adjusted R-squared: -0.3635
## F-statistic: 0.2594 on 9 and 16 DF, p-value: 0.9772
The autoregressive distributed lag model ardl_5.3_task2,
has p=5 and q=3. All of its coefficients are
not significant at 5% level of significance. The adjusted R-squared
model explains -36.35% (-0.3635) of the variability in
mortality rates. The p value of ardl_5.3_task2 is 0.9772
(>0.05) hence, it is not significant.
Figure 56
##
## Breusch-Godfrey test for serial correlation of order up to 13
##
## data: Residuals
## LM test = 23.706, df = 13, p-value = 0.03394
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.3_task2$model)
## W = 0.96419, p-value = 0.4806
The residuals in Figure 56 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.03394 (<0.05) which confirms the
presence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.4806 (>0.05) which fails to reject the null hypothesis
of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.593682 1.442035 1.679153 1.657011 1.377161 1.526191 1.171215 1.185184
## L(y.t, 3)
## 1.209149
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.579 -16.818 -1.707 15.937 41.771
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2132.74389 2520.76308 0.846 0.411
## X.t -2.43787 7.37352 -0.331 0.746
## X.1 -3.84154 6.76325 -0.568 0.578
## X.2 -4.71055 7.36447 -0.640 0.532
## X.3 -5.41486 7.37709 -0.734 0.474
## X.4 -5.41607 6.99145 -0.775 0.451
## X.5 1.92651 8.12940 0.237 0.816
## Y.1 -0.05323 0.26512 -0.201 0.844
## Y.2 0.02584 0.27985 0.092 0.928
## Y.3 -0.04807 0.27905 -0.172 0.866
## Y.4 0.09635 0.28192 0.342 0.737
##
## Residual standard error: 29.51 on 15 degrees of freedom
## Multiple R-squared: 0.1341, Adjusted R-squared: -0.4432
## F-statistic: 0.2323 on 10 and 15 DF, p-value: 0.9877
The autoregressive distributed lag model ardl_5.4_task2,
has p=5 and q=3. All of its coefficients are
not significant at 5% level of significance. The adjusted R-squared
model explains -44.32% (-0.4432) of the variability in
mortality rates. The p value of ardl_5.4_task2 is 0.9877
(>0.05) hence, it is not significant.
Figure 57
##
## Breusch-Godfrey test for serial correlation of order up to 14
##
## data: Residuals
## LM test = 22.656, df = 14, p-value = 0.0661
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.4_task2$model)
## W = 0.9653, p-value = 0.5063
The residuals in Figure 57 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.0661 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.5063 (>0.05) which fails to reject the null hypothesis
of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.739695 1.459027 1.709942 1.708221 1.389196 1.590710 1.187388 1.216093
## L(y.t, 3) L(y.t, 4)
## 1.209150 1.286556
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.910 -11.825 0.646 19.287 36.583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2317.41502 2503.84586 0.926 0.370
## X.t -4.24646 7.48261 -0.568 0.579
## X.1 -1.91941 6.91722 -0.277 0.785
## X.2 -6.46056 7.46285 -0.866 0.401
## X.3 -7.07101 7.45821 -0.948 0.359
## X.4 -4.91623 6.94382 -0.708 0.491
## X.5 2.26760 8.06321 0.281 0.783
## Y.1 -0.09393 0.26525 -0.354 0.729
## Y.2 0.05404 0.27851 0.194 0.849
## Y.3 -0.08503 0.27852 -0.305 0.765
## Y.4 0.07639 0.27999 0.273 0.789
## Y.5 0.32209 0.28594 1.126 0.279
##
## Residual standard error: 29.25 on 14 degrees of freedom
## Multiple R-squared: 0.206, Adjusted R-squared: -0.4178
## F-statistic: 0.3303 on 11 and 14 DF, p-value: 0.9641
The autoregressive distributed lag model ardl_5.5_task2,
has p=5 and q=5. All of its coefficients are
not are not significant at 5% level of significance. The adjusted
R-squared model explains -41.78% (-0.4178) of the
variability in mortality rates. The p value of
ardl_5.5_task2 is 0.9641 (>0.05) hence, it is not
significant.
The residuals in Figure 57 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.0661 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.5063 (>0.05) which fails to reject the null hypothesis
of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 1.823662 1.553569 1.787404 1.777290 1.394892 1.592956 1.209830 1.225999
## L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 1.226165 1.291730 1.314591
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.
| Model | AIC | BIC | MASE |
|---|---|---|---|
| Finite DLM /w Intercept (Radiation) | 165.1432 | 179.3097 | 0.2509081 |
| Finite DLM /w Intercept (Humidity) | 171.2149 | 185.3814 | 0.2662230 |
| Finite DLM /wo Intercept (Radiation) | 167.3926 | 180.6147 | 0.2763708 |
| Finite DLM /wo Intercept (Humidity) | 171.4808 | 184.7029 | 0.2941178 |
| Finite DLM /w Intercept (Rainfall) | 185.1533 | 199.0936 | 0.3819204 |
| Finite DLM /w Intercept (Temperature) | 181.9935 | 196.1600 | 0.4122077 |
| Finite DLM /wo Intercept (Temperature) | 180.3048 | 193.5269 | 0.4278512 |
| Finite DLM /wo Intercept (Rainfall) | 188.8569 | 202.0791 | 0.4767021 |
| Poly DLM k=1 (Humidity) | 173.3652 | 177.1430 | 0.5550638 |
| Poly DLM k=2 (Humidity) | 175.3290 | 180.0512 | 0.5564647 |
| ARDL(5,5) | 259.2288 | 275.5840 | 0.5723147 |
| Poly DLM k=2 (Temperature) | 176.3678 | 181.0900 | 0.5829889 |
| Poly DLM k=2 (Rainfall) | 188.3055 | 193.2841 | 0.5934777 |
| ARDL(5,4) | 259.4845 | 274.5816 | 0.6191304 |
| ARDL(5,3) | 257.6861 | 271.5252 | 0.6269468 |
| Poly DLM k=1 (Rainfall) | 187.3596 | 191.3425 | 0.6290417 |
| Poly DLM k=2 (Radiation) | 189.6827 | 194.6614 | 0.6384276 |
| Poly DLM k=1 (Radiation) | 187.9427 | 191.9256 | 0.6397641 |
| Poly DLM k=1 (Temperature) | 179.2866 | 183.0644 | 0.6617769 |
| Koyck (Radiation) | 284.3792 | 289.9840 | 0.7136301 |
| Koyck (Rainfall) | 288.9325 | 294.5373 | 0.7267872 |
| Koyck (Humidity) | 287.7361 | 293.3409 | 0.7335956 |
| Koyck (Temperature) | 290.7199 | 296.3247 | 0.7927338 |
The table above depicts the accuracy measures (AIC, BIC, and MASE) of
the fitted models thus far. It can be seen that
Finite DLM /w Intercept (Radiation) has the lowest MASE
value.
Was unable to fit due to low frequency of the data.
In conclusion, based on all the previous observations made - the model that should be used to forecast the FFD 4 years ahead should be the Finite DLM /w Intercept (Radiation) model based on the results of its residual analysis and its low MASE value.
To conclude this exploration, the frequency was too low so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the Finite DLM /w Intercept (Radiation) was left as the best model to use for forecasting.
Task 3a : Forecast 3 years ahead using univariate regressors, forecasting for each of the best models within the dataset.
Task 3b: Analysis to obtain the 3 year ahead forecast based on the drought period for Australia that occured from 1996 to 2009.
First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, the model will be fitted with a DLM, ARDL, Koyck, and DYNLM to find the best 3 years ahead forecast based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.
## Rows: 31 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): Year, RBO, Temperature, Rainfall, Radiation, RelHumidity
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Figure 58
Figure 58 above depicts the time series plots for the Rank-based Order Similarity. This dataset shares the same factors of Temperature, Rainfall, Radiation, and Relative Humidity as previously explored in Task 2 - hence they will not be repeated for redundancy purposes. The one time series that was not explored is the RBO time series.
Figure 59
Figure 59 shows the changes of Rank-based Order similarity over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1995 where there was a large drop that changed the variance.
##
## Augmented Dickey-Fuller Test
##
## data: rbo_rbo_ts
## Dickey-Fuller = -2.0545, Lag order = 3, p-value = 0.5518
## alternative hypothesis: stationary
Figure 60
Figure 60 shows the ACF plot for the RBO series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.5518 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the FFD time series is not a stationary time series that contains seasonality.
## Year RBO Temperature Rainfall Radiation
## Year 1.0000000 -0.5635171 0.148410676 -0.1752091 0.11881829
## RBO -0.5635171 1.0000000 0.261000659 0.3932282 -0.31736018
## Temperature 0.1484107 0.2610007 1.000000000 0.3933255 -0.24096625
## Rainfall -0.1752091 0.3932282 0.393325545 1.0000000 -0.58131610
## Radiation 0.1188183 -0.3173602 -0.240966245 -0.5813161 1.00000000
## RelHumidity 0.2063558 -0.1776349 0.009646021 0.3384610 -0.05520965
## RelHumidity
## Year 0.206355767
## RBO -0.177634864
## Temperature 0.009646021
## Rainfall 0.338461007
## Radiation -0.055209652
## RelHumidity 1.000000000
From the results of the Correlation Coefficient above, the highest
correlation that can be observed is between RBO and Rainfall at
0.3932282. However, due to the low correlation between most
of the variables, each variable will be used as independent variables
with RBO being the dependent variable.
## q = 1 AIC = -101.8617 BIC = -96.2569 MASE = 0.9239038
## q = 2 AIC = -95.49894 BIC = -88.66246 MASE = 1.032564
## q = 3 AIC = -96.76727 BIC = -88.77404 MASE = 1.033663
## q = 4 AIC = -92.75653 BIC = -83.68567 MASE = 1.005373
## q = 5 AIC = -91.46337 BIC = -81.3986 MASE = 0.8594175
## q = 6 AIC = -85.74127 BIC = -74.77139 MASE = 0.8103361
## q = 7 AIC = -82.04015 BIC = -70.25962 MASE = 0.7518958
## q = 8 AIC = -83.28717 BIC = -70.79674 MASE = 0.6497633
## q = 9 AIC = -88.10651 BIC = -75.014 MASE = 0.5120906
## q = 10 AIC = -87.92965 BIC = -74.35085 MASE = 0.458588
## q = 11 AIC = -80.42149 BIC = -66.48124 MASE = 0.4388714
## q = 12 AIC = -87.10828 BIC = -72.9417 MASE = 0.3522387
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=12 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -0.0184315 0.0196039 0.0067008 -0.0005057 -0.0121655 0.0037651 -0.0114122
## 8 9 10 11 12 13 14
## 0.0150263 -0.0088538 -0.0034608 -0.0043610 0.0061125 0.0043009 -0.0160637
## 15 16 17 18 19
## -0.0011766 0.0047237 0.0189244 0.0097069 -0.0124340
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.341e+00 6.334e-01 2.117 0.0879 .
## x2.t 1.785e-02 1.599e-02 1.116 0.3152
## x2.1 4.221e-02 1.838e-02 2.296 0.0701 .
## x2.2 -4.280e-02 2.198e-02 -1.948 0.1090
## x2.3 -3.798e-02 1.866e-02 -2.035 0.0975 .
## x2.4 1.831e-02 1.962e-02 0.933 0.3935
## x2.5 -5.120e-03 2.113e-02 -0.242 0.8181
## x2.6 -6.533e-03 2.179e-02 -0.300 0.7764
## x2.7 -8.192e-03 2.094e-02 -0.391 0.7118
## x2.8 5.276e-05 1.896e-02 0.003 0.9979
## x2.9 -3.263e-02 1.869e-02 -1.745 0.1414
## x2.10 -8.331e-03 1.891e-02 -0.441 0.6779
## x2.11 1.898e-03 1.738e-02 0.109 0.9173
## x2.12 -5.800e-03 1.687e-02 -0.344 0.7449
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02164 on 5 degrees of freedom
## Multiple R-squared: 0.7848, Adjusted R-squared: 0.2252
## F-statistic: 1.402 on 13 and 5 DF, p-value: 0.3754
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -87.10828 -72.9417
The finite distributed lag model with q=12 has no
significant term at 5% significance level. The adjusted R-squared score
explains 22.52% (0.2252) of variability in FFD values. The
p-value of the finite distributed lag model is 0.3754
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_temp_rbo$model)
## W = 0.96272, p-value = 0.6271
The residuals in Figure 36 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.1253 (>0.05) which fails to rejects
the null hypothesis of normality.
## x2.t x2.1 x2.2 x2.3 x2.4 x2.5 x2.6 x2.7
## 1.682976 1.813251 2.088240 1.634591 1.721548 1.946444 2.381222 2.070046
## x2.8 x2.9 x2.10 x2.11 x2.12
## 2.048046 1.625311 1.667104 1.418715 1.300454
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = -100.898 BIC = -95.29319 MASE = 0.9417954
## q = 2 AIC = -96.70956 BIC = -89.87308 MASE = 0.9993747
## q = 3 AIC = -97.19966 BIC = -89.20643 MASE = 0.9796852
## q = 4 AIC = -90.46187 BIC = -81.39101 MASE = 1.038827
## q = 5 AIC = -87.24242 BIC = -77.17765 MASE = 0.925677
## q = 6 AIC = -82.31788 BIC = -71.348 MASE = 0.8543964
## q = 7 AIC = -77.98405 BIC = -66.20351 MASE = 0.829337
## q = 8 AIC = -76.81922 BIC = -64.32879 MASE = 0.7233794
## q = 9 AIC = -80.79432 BIC = -67.70181 MASE = 0.6205897
## q = 10 AIC = -76.93255 BIC = -63.35376 MASE = 0.585617
## q = 11 AIC = -77.84858 BIC = -63.90833 MASE = 0.4621806
## q = 12 AIC = -79.63373 BIC = -65.46714 MASE = 0.4240236
From the results above, a lag length of 12 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=12 will be selected as the lag length of the
model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -0.028368 -0.006823 0.012033 0.018530 0.006442 0.014196 -0.018606 -0.010792
## 9 10 11 12 13 14 15 16
## -0.002000 0.010196 -0.001337 0.007104 -0.001875 -0.009690 -0.019581 0.018543
## 17 18 19
## -0.001563 -0.006339 0.019931
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8035567 0.1444702 5.562 0.00258 **
## x3.t 0.0190632 0.0234786 0.812 0.45373
## x3.1 -0.0055430 0.0206666 -0.268 0.79925
## x3.2 0.0145805 0.0230040 0.634 0.55405
## x3.3 -0.0222519 0.0170995 -1.301 0.24988
## x3.4 -0.0025011 0.0239167 -0.105 0.92078
## x3.5 -0.0100115 0.0184670 -0.542 0.61101
## x3.6 -0.0002192 0.0216238 -0.010 0.99230
## x3.7 -0.0161457 0.0216731 -0.745 0.48981
## x3.8 0.0138944 0.0192116 0.723 0.50195
## x3.9 -0.0366400 0.0243706 -1.503 0.19305
## x3.10 0.0124757 0.0212846 0.586 0.58324
## x3.11 -0.0314699 0.0204542 -1.539 0.18453
## x3.12 0.0253655 0.0231360 1.096 0.32289
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02634 on 5 degrees of freedom
## Multiple R-squared: 0.681, Adjusted R-squared: -0.1482
## F-statistic: 0.8213 on 13 and 5 DF, p-value: 0.6451
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -79.63373 -65.46714
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains -14.82% (-0.1482) of variability in FFD
values. The p-value of the finite distributed lag model is
0.6451 (>0.05) hence, it is not statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rain_rbo$model)
## W = 0.95967, p-value = 0.5659
The residuals in Figure 62 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are significant. The histogram of
standardised residuals contains violations in normality assumptions.
This is also reinforced by the Shapiro-Wilk normality test with a value
of 0.5659 (>0.05) which fails to rejects the null
hypothesis of normality.
## x3.t x3.1 x3.2 x3.3 x3.4 x3.5 x3.6 x3.7
## 2.312923 1.985240 2.481193 1.480135 2.811952 1.615976 2.186146 2.094572
## x3.8 x3.9 x3.10 x3.11 x3.12
## 1.639937 2.344740 1.782273 1.617890 2.066747
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = -97.71113 BIC = -92.10634 MASE = 1.063029
## q = 2 AIC = -91.50708 BIC = -84.67061 MASE = 1.174709
## q = 3 AIC = -92.38658 BIC = -84.39335 MASE = 1.128463
## q = 4 AIC = -86.43017 BIC = -77.35931 MASE = 1.199809
## q = 5 AIC = -83.32223 BIC = -73.25746 MASE = 1.071856
## q = 6 AIC = -81.85403 BIC = -70.88414 MASE = 0.9483202
## q = 7 AIC = -79.04997 BIC = -67.26943 MASE = 0.8584745
## q = 8 AIC = -80.65686 BIC = -68.16642 MASE = 0.707333
## q = 9 AIC = -78.4524 BIC = -65.35989 MASE = 0.6617079
## q = 10 AIC = -78.06879 BIC = -64.49 MASE = 0.603729
## q = 11 AIC = -72.88766 BIC = -58.94741 MASE = 0.5583271
## q = 12 AIC = -65.19474 BIC = -51.02816 MASE = 0.6477083
From the results above, a lag length of 11 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=11 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.03300 -0.01641 0.00607 0.01283 0.03564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.579343 0.795553 0.728 0.490
## x4.t -0.049240 0.042970 -1.146 0.289
## x4.1 0.040767 0.049104 0.830 0.434
## x4.2 0.013172 0.047571 0.277 0.790
## x4.3 -0.008875 0.029431 -0.302 0.772
## x4.4 0.005986 0.028726 0.208 0.841
## x4.5 -0.015280 0.032443 -0.471 0.652
## x4.6 0.027281 0.028464 0.958 0.370
## x4.7 -0.006362 0.028744 -0.221 0.831
## x4.8 0.009588 0.028922 0.332 0.750
## x4.9 0.018459 0.035452 0.521 0.619
## x4.10 -0.007909 0.038327 -0.206 0.842
## x4.11 -0.018495 0.037212 -0.497 0.634
##
## Residual standard error: 0.03284 on 7 degrees of freedom
## Multiple R-squared: 0.3924, Adjusted R-squared: -0.6492
## F-statistic: 0.3767 on 12 and 7 DF, p-value: 0.934
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -72.88766 -58.94741
The finite distributed lag model with q=11 has no
significant term at 5% significance level. The adjusted R-squared score
only explains -64.92% (-0.6492) of variability in FFD
values. The p-value of the finite distributed lag model is
0.934 (>0.05) hence, it is not statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_rad_rbo$model)
## W = 0.94938, p-value = 0.3578
The residuals in Figure 63 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains violations in normality assumptions.
This is also reinforced by the Shapiro-Wilk normality test with a value
of 0.3578 (>0.05) which fails to rejects the null
hypothesis of normality.
## x4.t x4.1 x4.2 x4.3 x4.4 x4.5 x4.6 x4.7
## 5.333552 6.991474 6.757501 3.724608 3.317886 4.141127 3.006734 3.051213
## x4.8 x4.9 x4.10 x4.11
## 2.805563 3.348295 3.497440 3.441187
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
## q = 1 AIC = -94.56619 BIC = -88.9614 MASE = 1.101099
## q = 2 AIC = -88.3792 BIC = -81.54272 MASE = 1.235884
## q = 3 AIC = -89.1954 BIC = -81.20218 MASE = 1.171117
## q = 4 AIC = -82.80086 BIC = -73.73 MASE = 1.262425
## q = 5 AIC = -79.4041 BIC = -69.33932 MASE = 1.149637
## q = 6 AIC = -76.22147 BIC = -65.25158 MASE = 1.026742
## q = 7 AIC = -74.80399 BIC = -63.02345 MASE = 0.9235533
## q = 8 AIC = -72.65628 BIC = -60.16585 MASE = 0.8466119
## q = 9 AIC = -83.46275 BIC = -70.37025 MASE = 0.5821168
## q = 10 AIC = -89.23348 BIC = -75.65468 MASE = 0.451849
## q = 11 AIC = -86.87319 BIC = -72.93294 MASE = 0.3401436
## q = 12 AIC = -102.9404 BIC = -88.77377 MASE = 0.2453801
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE values. This means that a value of
q=12 will be selected as the lag length of the model.
##
## Call:
## lm(formula = as.formula(model.formula), data = design)
##
## Residuals:
## 1 2 3 4 5 6 7
## -6.288e-03 -4.741e-03 1.378e-02 4.883e-03 6.387e-03 -1.051e-02 -7.777e-03
## 8 9 10 11 12 13 14
## -2.019e-05 -8.178e-03 6.657e-03 -5.228e-03 7.668e-03 -1.964e-03 -6.420e-03
## 15 16 17 18 19
## -1.078e-02 4.723e-03 3.717e-03 2.796e-03 1.130e-02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.154469 2.058280 -3.476 0.0177 *
## x5.t 0.025360 0.009888 2.565 0.0504 .
## x5.1 -0.009190 0.009340 -0.984 0.3704
## x5.2 0.017859 0.006088 2.933 0.0325 *
## x5.3 -0.002440 0.006132 -0.398 0.7070
## x5.4 0.002354 0.006088 0.387 0.7149
## x5.5 0.022523 0.006132 3.673 0.0144 *
## x5.6 -0.008408 0.005950 -1.413 0.2167
## x5.7 0.018591 0.005599 3.321 0.0210 *
## x5.8 0.007378 0.005790 1.274 0.2586
## x5.9 0.001636 0.005663 0.289 0.7843
## x5.10 0.003699 0.005213 0.710 0.5096
## x5.11 -0.011267 0.007354 -1.532 0.1860
## x5.12 0.015070 0.007614 1.979 0.1047
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01427 on 5 degrees of freedom
## Multiple R-squared: 0.9065, Adjusted R-squared: 0.6633
## F-statistic: 3.727 on 13 and 5 DF, p-value: 0.07754
##
## AIC and BIC values for the model:
## AIC BIC
## 1 -102.9404 -88.77377
The finite distributed lag model with q=12 has almost no
significant term at 5% significance level. The adjusted R-squared score
only explains 66.33% (0.6633) of variability in FFD values.
The p-value of the finite distributed lag model is 0.07754
(>0.05) hence, it is not statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(fin_dlm_humid_rbo$model)
## W = 0.94432, p-value = 0.315
The residuals in Figure 64 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. The histogram
of standardised residuals contains slight violations in normality
assumptions. This is also reinforced by the Shapiro-Wilk normality test
with a value of 0.315 (>0.05) which fails to rejects the
null hypothesis of normality.
## x5.t x5.1 x5.2 x5.3 x5.4 x5.5 x5.6 x5.7
## 4.889400 4.280613 2.057443 2.088074 1.913716 1.870104 1.829416 2.026883
## x5.8 x5.9 x5.10 x5.11 x5.12
## 2.171511 2.057850 1.759417 3.480250 3.810613
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the finite distributed lag models with Intercepts are not good performing models due to their lack of significant terms at 5% significance level and explainability.
## q = 1 AIC = -101.8617 BIC = -96.2569 MASE = 0.9239038
## q = 2 AIC = -95.49894 BIC = -88.66246 MASE = 1.032564
## q = 3 AIC = -97.93803 BIC = -91.277 MASE = 1.0725
## q = 4 AIC = -94.65533 BIC = -88.17614 MASE = 1.089145
## q = 5 AIC = -92.07915 BIC = -85.78867 MASE = 1.014553
## q = 6 AIC = -90.66426 BIC = -84.56988 MASE = 0.913053
## q = 7 AIC = -88.11262 BIC = -82.22235 MASE = 0.8550191
## q = 8 AIC = -87.76319 BIC = -82.08572 MASE = 0.75509
## q = 9 AIC = -90.75952 BIC = -85.30431 MASE = 0.7002906
## q = 10 AIC = -87.92162 BIC = -82.699 MASE = 0.6742057
## q = 11 AIC = -83.56579 BIC = -78.58713 MASE = 0.6212847
## q = 12 AIC = -82.47435 BIC = -77.75216 MASE = 0.6494928
From the results above, a lag length of 11 provides the
lowest AIC, BIC, and MASE (relative) values. This means that a value of
q=11 will be selected as the lag length of the model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.007930 0.00905 0.8760 0.404
## beta.1 0.005830 0.00742 0.7850 0.452
## beta.2 0.003870 0.00675 0.5730 0.581
## beta.3 0.002060 0.00670 0.3070 0.766
## beta.4 0.000393 0.00685 0.0573 0.956
## beta.5 -0.001130 0.00690 -0.1630 0.874
## beta.6 -0.002500 0.00673 -0.3720 0.719
## beta.7 -0.003730 0.00638 -0.5840 0.573
## beta.8 -0.004810 0.00609 -0.7900 0.450
## beta.9 -0.005750 0.00631 -0.9120 0.386
## beta.10 -0.006540 0.00751 -0.8700 0.407
## beta.11 -0.007180 0.00986 -0.7290 0.485
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.048288 -0.012580 0.001061 0.011338 0.041099
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8222739 0.6265209 1.312 0.208
## z.t0 0.0079300 0.0090547 0.876 0.394
## z.t1 -0.0021760 0.0033762 -0.644 0.528
## z.t2 0.0000729 0.0003119 0.234 0.818
##
## Residual standard error: 0.02608 on 16 degrees of freedom
## Multiple R-squared: 0.1238, Adjusted R-squared: -0.0405
## F-statistic: 0.7535 on 3 and 16 DF, p-value: 0.5363
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.40% (-0.0405) of
variability in RBO values. The p-value of the polynomial distributed lag
model is 0.5363 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_temp_k2$model)
## W = 0.97762, p-value = 0.8998
Figure 65
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 14.731, df = 7, p-value = 0.03961
The residuals in Figure 65 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.03961 (<0.05) which confirms the
presence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.8998
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.006920 0.00773 0.8950 0.394
## beta.1 0.005500 0.00708 0.7770 0.457
## beta.2 0.004080 0.00650 0.6270 0.546
## beta.3 0.002660 0.00602 0.4420 0.669
## beta.4 0.001240 0.00565 0.2190 0.832
## beta.5 -0.000182 0.00543 -0.0336 0.974
## beta.6 -0.001600 0.00536 -0.2990 0.772
## beta.7 -0.003020 0.00546 -0.5530 0.593
## beta.8 -0.004440 0.00572 -0.7770 0.457
## beta.9 -0.005860 0.00611 -0.9600 0.362
## beta.10 -0.007280 0.00661 -1.1000 0.299
## beta.11 -0.008700 0.00721 -1.2100 0.258
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.046511 -0.012185 0.002197 0.011300 0.042490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8143641 0.6079624 1.339 0.198
## z.t0 0.0069189 0.0077302 0.895 0.383
## z.t1 -0.0014202 0.0009443 -1.504 0.151
##
## Residual standard error: 0.02535 on 17 degrees of freedom
## Multiple R-squared: 0.1208, Adjusted R-squared: 0.01736
## F-statistic: 1.168 on 2 and 17 DF, p-value: 0.3348
The polynomial distributed lag model with q=12 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains 0.17% (0.01736) of
variability in RBO values. The p-value of the polynomial distributed lag
model is 0.3348 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_temp_k1$model)
## W = 0.97806, p-value = 0.9066
Figure 66
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 14.281, df = 6, p-value = 0.02665
The residuals in Figure 66 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.02665 (<0.05) which confirms the presence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.9066
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = -100.898 BIC = -95.29319 MASE = 0.9417954
## q = 2 AIC = -96.70956 BIC = -89.87308 MASE = 0.9993747
## q = 3 AIC = -98.84343 BIC = -92.18241 MASE = 0.9491588
## q = 4 AIC = -94.04271 BIC = -87.56352 MASE = 1.006192
## q = 5 AIC = -92.46231 BIC = -86.17182 MASE = 0.9264235
## q = 6 AIC = -89.17565 BIC = -83.08127 MASE = 0.8773226
## q = 7 AIC = -86.84712 BIC = -80.95686 MASE = 0.8319619
## q = 8 AIC = -87.81723 BIC = -82.13976 MASE = 0.7195348
## q = 9 AIC = -90.82864 BIC = -85.37342 MASE = 0.6646487
## q = 10 AIC = -88.26525 BIC = -83.04264 MASE = 0.6451804
## q = 11 AIC = -85.66633 BIC = -80.68767 MASE = 0.6161421
## q = 12 AIC = -84.25719 BIC = -79.53499 MASE = 0.64218
From the results above, a lag length of 11 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=11 will be selected as the lag length of the
model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.009960 0.01020 0.9720 0.3560
## beta.1 0.006380 0.00719 0.8860 0.3980
## beta.2 0.003170 0.00562 0.5640 0.5860
## beta.3 0.000339 0.00534 0.0635 0.9510
## beta.4 -0.002120 0.00558 -0.3790 0.7130
## beta.5 -0.004190 0.00567 -0.7400 0.4780
## beta.6 -0.005900 0.00535 -1.1000 0.2990
## beta.7 -0.007220 0.00472 -1.5300 0.1600
## beta.8 -0.008170 0.00433 -1.8900 0.0919
## beta.9 -0.008750 0.00533 -1.6400 0.1350
## beta.10 -0.008940 0.00810 -1.1000 0.2980
## beta.11 -0.008760 0.01220 -0.7170 0.4920
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.048815 -0.015397 0.003078 0.014180 0.038986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7945101 0.1052975 7.545 1.17e-06 ***
## z.t0 0.0099617 0.0102448 0.972 0.345
## z.t1 -0.0037719 0.0045196 -0.835 0.416
## z.t2 0.0001881 0.0004255 0.442 0.664
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02475 on 16 degrees of freedom
## Multiple R-squared: 0.2111, Adjusted R-squared: 0.06324
## F-statistic: 1.428 on 3 and 16 DF, p-value: 0.2716
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains 0.63% (0.06324) of
variability in RBO values. The p-value of the polynomial distributed lag
model is 0.2716 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_rain_k2$model)
## W = 0.98469, p-value = 0.9796
Figure 67
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 11.04, df = 7, p-value = 0.1369
The residuals in Figure 67 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.1369 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.9796 (>0.05) which fails to reject the null hypothesis
of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.006840 0.00724 0.944 0.3700
## beta.1 0.005010 0.00634 0.790 0.4500
## beta.2 0.003170 0.00549 0.578 0.5770
## beta.3 0.001340 0.00472 0.284 0.7830
## beta.4 -0.000491 0.00410 -0.120 0.9070
## beta.5 -0.002320 0.00368 -0.632 0.5430
## beta.6 -0.004160 0.00354 -1.170 0.2700
## beta.7 -0.005990 0.00371 -1.610 0.1410
## beta.8 -0.007820 0.00416 -1.880 0.0926
## beta.9 -0.009650 0.00480 -2.010 0.0752
## beta.10 -0.011500 0.00557 -2.060 0.0694
## beta.11 -0.013300 0.00643 -2.070 0.0682
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.048742 -0.015175 0.002259 0.015681 0.038830
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.805883 0.099663 8.086 3.15e-07 ***
## z.t0 0.006837 0.007241 0.944 0.358
## z.t1 -0.001832 0.001062 -1.725 0.103
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02415 on 17 degrees of freedom
## Multiple R-squared: 0.2015, Adjusted R-squared: 0.1076
## F-statistic: 2.145 on 2 and 17 DF, p-value: 0.1477
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains 10.76% (0.1076) of
variability in RBO values. The p-value of the polynomial distributed lag
model is 0.1477 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_rain_k1$model)
## W = 0.98375, p-value = 0.9729
Figure 68
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 8.917, df = 6, p-value = 0.1783
The residuals in Figure 68 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.1783 (>0.05) which confirms the absence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.9729
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = -97.71113 BIC = -92.10634 MASE = 1.063029
## q = 2 AIC = -91.50708 BIC = -84.67061 MASE = 1.174709
## q = 3 AIC = -93.41298 BIC = -86.75196 MASE = 1.140154
## q = 4 AIC = -89.71544 BIC = -83.23625 MASE = 1.203645
## q = 5 AIC = -88.46836 BIC = -82.17788 MASE = 1.085079
## q = 6 AIC = -87.48527 BIC = -81.39089 MASE = 0.9572973
## q = 7 AIC = -86.62398 BIC = -80.73372 MASE = 0.8494332
## q = 8 AIC = -89.0863 BIC = -83.40883 MASE = 0.7259512
## q = 9 AIC = -89.93849 BIC = -84.48328 MASE = 0.6974314
## q = 10 AIC = -88.15084 BIC = -82.92823 MASE = 0.6711964
## q = 11 AIC = -84.26791 BIC = -79.28925 MASE = 0.6193374
## q = 12 AIC = -82.50331 BIC = -77.78112 MASE = 0.6631798
From the results above, a lag length of 11 provides the
lowest AIC (relative), BIC (relative), and MASE values. This means that
a value of q=11 will be selected as the lag length of the
model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 -0.003680 0.00847 -0.4350 0.674
## beta.1 -0.000699 0.00561 -0.1240 0.904
## beta.2 0.001740 0.00382 0.4540 0.660
## beta.3 0.003620 0.00325 1.1100 0.294
## beta.4 0.004960 0.00343 1.4500 0.182
## beta.5 0.005760 0.00364 1.5800 0.148
## beta.6 0.006000 0.00360 1.6700 0.129
## beta.7 0.005700 0.00337 1.6900 0.125
## beta.8 0.004850 0.00346 1.4000 0.195
## beta.9 0.003460 0.00457 0.7570 0.469
## beta.10 0.001520 0.00685 0.2210 0.830
## beta.11 -0.000975 0.01010 -0.0966 0.925
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.044055 -0.011957 0.002021 0.017060 0.038610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2421290 0.5533133 0.438 0.668
## z.t0 -0.0036808 0.0084695 -0.435 0.670
## z.t1 0.0032557 0.0036584 0.890 0.387
## z.t2 -0.0002736 0.0003390 -0.807 0.431
##
## Residual standard error: 0.02563 on 16 degrees of freedom
## Multiple R-squared: 0.154, Adjusted R-squared: -0.004606
## F-statistic: 0.971 on 3 and 16 DF, p-value: 0.4307
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains -0.04% (-0.004606) of
variability in RBO values. The p-value of the polynomial distributed lag
model is 0.4307 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_rad_k2$model)
## W = 0.96191, p-value = 0.5827
Figure 69
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 10.85, df = 7, p-value = 0.1453
The residuals in Figure 69 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.1453 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.5827
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.00196 0.00473 0.415 0.688
## beta.1 0.00233 0.00413 0.564 0.587
## beta.2 0.00270 0.00359 0.750 0.472
## beta.3 0.00307 0.00315 0.975 0.355
## beta.4 0.00343 0.00283 1.210 0.256
## beta.5 0.00380 0.00269 1.410 0.191
## beta.6 0.00417 0.00276 1.510 0.165
## beta.7 0.00454 0.00302 1.500 0.167
## beta.8 0.00490 0.00343 1.430 0.186
## beta.9 0.00527 0.00394 1.340 0.213
## beta.10 0.00564 0.00452 1.250 0.243
## beta.11 0.00601 0.00514 1.170 0.273
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.043545 -0.016730 -0.000869 0.015616 0.039930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0157308 0.4720065 0.033 0.974
## z.t0 0.0019620 0.0047303 0.415 0.683
## z.t1 0.0003678 0.0007526 0.489 0.631
##
## Residual standard error: 0.02536 on 17 degrees of freedom
## Multiple R-squared: 0.1196, Adjusted R-squared: 0.016
## F-statistic: 1.154 on 2 and 17 DF, p-value: 0.3387
The polynomial distributed lag model with q=11 has none
of its scores significant at 5% level of significance. The adjusted
R-squared score only explains 0.16% (0.016) of variability
in RBO values. The p-value of the polynomial distributed lag model is
0.3387 (>0.05) hence, it is statistically not
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_rad_k1$model)
## W = 0.97034, p-value = 0.762
Figure 70
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 8.7669, df = 6, p-value = 0.1871
The residuals in Figure 70 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation in the residuals are not significant. This is also
reinforced by the Breusch-Godfrey test results of a p-value of
0.1871 (>0.05) which confirms the absence of serial
correlation. The histogram of standardised residuals contain some
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.762
(>0.05) which fails to reject the null hypothesis of normality.
## q = 1 AIC = -94.56619 BIC = -88.9614 MASE = 1.101099
## q = 2 AIC = -88.3792 BIC = -81.54272 MASE = 1.235884
## q = 3 AIC = -90.71527 BIC = -84.05425 MASE = 1.205438
## q = 4 AIC = -86.21246 BIC = -79.73328 MASE = 1.265024
## q = 5 AIC = -83.611 BIC = -77.32052 MASE = 1.218437
## q = 6 AIC = -82.49202 BIC = -76.39764 MASE = 1.124788
## q = 7 AIC = -80.66896 BIC = -74.77869 MASE = 1.039317
## q = 8 AIC = -81.50599 BIC = -75.82852 MASE = 0.8827226
## q = 9 AIC = -85.27436 BIC = -79.81915 MASE = 0.7726485
## q = 10 AIC = -85.61489 BIC = -80.39227 MASE = 0.6821919
## q = 11 AIC = -83.5761 BIC = -78.59744 MASE = 0.5912701
## q = 12 AIC = -88.16372 BIC = -83.44153 MASE = 0.5722198
From the results above, a lag length of 12 provides the
lowest AIC, BIC, and MASE (relative) values. This means that a value of
q=12 will be selected as the lag length of the model.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.00885 0.00445 1.99 0.0868
## beta.1 0.00818 0.00365 2.24 0.0602
## beta.2 0.00758 0.00316 2.40 0.0477
## beta.3 0.00706 0.00294 2.40 0.0472
## beta.4 0.00662 0.00288 2.30 0.0552
## beta.5 0.00627 0.00290 2.16 0.0674
## beta.6 0.00599 0.00291 2.06 0.0789
## beta.7 0.00580 0.00290 2.00 0.0861
## beta.8 0.00568 0.00288 1.97 0.0894
## beta.9 0.00565 0.00291 1.94 0.0938
## beta.10 0.00569 0.00309 1.84 0.1080
## beta.11 0.00582 0.00352 1.65 0.1420
## beta.12 0.00603 0.00425 1.42 0.1990
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.033016 -0.010342 0.001035 0.013832 0.031004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.349e+00 2.679e+00 -2.743 0.0151 *
## z.t0 8.855e-03 4.449e-03 1.990 0.0651 .
## z.t1 -7.197e-04 1.374e-03 -0.524 0.6081
## z.t2 4.037e-05 1.023e-04 0.395 0.6987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02057 on 15 degrees of freedom
## Multiple R-squared: 0.4167, Adjusted R-squared: 0.3
## F-statistic: 3.572 on 3 and 15 DF, p-value: 0.03956
The polynomial distributed lag model with q=12 has none
of its scores at 5% level of significance. The adjusted R-squared score
only explains 3% (0.3) of variability in RBO values. The
p-value of the polynomial distributed lag model is 0.03956
(<0.05) hence, it is statistically significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_humid_k2$model)
## W = 0.9501, p-value = 0.3967
Figure 71
##
## Breusch-Godfrey test for serial correlation of order up to 7
##
## data: Residuals
## LM test = 11.539, df = 7, p-value = 0.1168
The residuals in Figure 71 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.1168 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.3967
(>0.05) which fails to reject the null hypothesis of normality.
## Estimates and t-tests for beta coefficients:
## Estimate Std. Error t value P(>|t|)
## beta.0 0.00815 0.00396 2.06 0.0788
## beta.1 0.00792 0.00350 2.26 0.0579
## beta.2 0.00769 0.00306 2.51 0.0403
## beta.3 0.00747 0.00267 2.79 0.0268
## beta.4 0.00724 0.00235 3.08 0.0179
## beta.5 0.00701 0.00213 3.29 0.0133
## beta.6 0.00679 0.00204 3.32 0.0127
## beta.7 0.00656 0.00210 3.12 0.0167
## beta.8 0.00633 0.00230 2.76 0.0281
## beta.9 0.00611 0.00260 2.35 0.0510
## beta.10 0.00588 0.00297 1.98 0.0885
## beta.11 0.00566 0.00340 1.66 0.1400
## beta.12 0.00543 0.00386 1.41 0.2020
##
## Call:
## "Y ~ (Intercept) + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.034112 -0.011865 0.004151 0.013036 0.029452
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.6343783 2.5099292 -3.042 0.00777 **
## z.t0 0.0081475 0.0039628 2.056 0.05648 .
## z.t1 -0.0002266 0.0005561 -0.407 0.68908
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02002 on 16 degrees of freedom
## Multiple R-squared: 0.4106, Adjusted R-squared: 0.337
## F-statistic: 5.574 on 2 and 16 DF, p-value: 0.01456
The polynomial distributed lag model with q=12 has all
of its scores at 5% level of significance. The adjusted R-squared score
only explains 33.7% (0.337) of variability in RBO values.
The p-value of the polynomial distributed lag model is
0.01456 (<0.05) hence, it is statistically
significant.
##
## Shapiro-Wilk normality test
##
## data: residuals(poly_dlm_rbo_humid_k1$model)
## W = 0.94453, p-value = 0.3176
Figure 72
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: Residuals
## LM test = 10.571, df = 6, p-value = 0.1026
The residuals in Figure 72 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.1026 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contain violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.3176
(>0.05) which fails to reject the null hypothesis of normality.
In conclusion, the polynomial distributed lag models were not successful in capturing the autocorrelation and seasonality in the series, nor are they excellent performing models due to its lack of explainability.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.15981 -0.04678 -0.01440 0.04750 0.14952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2032 1.6184 -0.743 0.464
## Y.1 0.2469 0.4609 0.536 0.597
## X.t 0.1847 0.1947 0.949 0.351
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 1.011 0.3236
## Wu-Hausman 1 26 3.247 0.0832 .
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07557 on 27 degrees of freedom
## Multiple R-Squared: -1.597, Adjusted R-squared: -1.789
## Wald test: 2.119 on 2 and 27 DF, p-value: 0.1397
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains (-1.789) of
variability in RBO values. The p-value of the Koyck model is
0.1397 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.3236 which means that the model at the first stage of
least squares fitting is not significant at 5% level of significance.
The “Wu-Hausman” test also resulted in a p-value of 0.0832
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_ffd_temp_dlm$model)
## W = 0.98351, p-value = 0.9093
The residuals in Figure 73 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.9093 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 2.188316 2.188316
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.15981 -0.04678 -0.01440 0.04750 0.14952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2032 1.6184 -0.743 0.464
## Y.1 0.2469 0.4609 0.536 0.597
## X.t 0.1847 0.1947 0.949 0.351
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 1.011 0.3236
## Wu-Hausman 1 26 3.247 0.0832 .
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07557 on 27 degrees of freedom
## Multiple R-Squared: -1.597, Adjusted R-squared: -1.789
## Wald test: 2.119 on 2 and 27 DF, p-value: 0.1397
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains (-1.789) of
variability in RBO values. The p-value of the Koyck model is
0.1397 (>0.05) hence, it is not statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.3236 which means that the model at the first stage of
least squares fitting is not significant at 5% level of significance.
The “Wu-Hausman” test also resulted in a p-value of 0.0832
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_rbo_rain_dlm$model)
## W = 0.96568, p-value = 0.4287
The residuals in Figure 74 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.4287 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 5531.807 5531.807
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.082255 -0.017008 -0.001036 0.021424 0.106984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.48011 0.94819 -0.506 0.6167
## Y.1 0.69801 0.24502 2.849 0.0083 **
## X.t 0.04812 0.05661 0.850 0.4028
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 4.942 0.0348 *
## Wu-Hausman 1 26 2.765 0.1084
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0467 on 27 degrees of freedom
## Multiple R-Squared: 0.008467, Adjusted R-squared: -0.06498
## Wald test: 4.731 on 2 and 27 DF, p-value: 0.01732
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains -0.64%
(-0.06498) of variability in RBO values. The p-value of the Koyck model
is 0.01732 (<0.05) hence, it is statistically
significant. The “Weak Instruments” test resulted in a p-value of
0.0348 which means that the model at the first stage of
least squares fitting is significant at 5% level of significance. The
“Wu-Hausman” test also resulted in a p-value of 0.1084
which means that there is no significant correlation between the error
term and the lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_rbo_rad_dlm$model)
## W = 0.96083, p-value = 0.3253
Figure 75
The residuals in Figure 75 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
significant. The histogram of standardised residuals contain violations
in normality assumptions. This is also reinforced by the Shapiro-Wilk
normality test with a value of 0.3253 (>0.05) which
fails to reject the null hypothesis of normality.
## Y.1 X.t
## 1.619594 1.619594
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.080897 -0.021103 -0.004676 0.022673 0.111041
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.16679 8.04941 -0.145 0.8858
## Y.1 0.62503 0.34753 1.798 0.0833 .
## X.t 0.01525 0.08274 0.184 0.8551
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 27 0.393 0.536
## Wu-Hausman 1 26 0.055 0.816
## Sargan 0 NA NA NA
##
## Residual standard error: 0.04127 on 27 degrees of freedom
## Multiple R-Squared: 0.2256, Adjusted R-squared: 0.1682
## Wald test: 5.612 on 2 and 27 DF, p-value: 0.009161
The Koyck model has none of its scores significant at 5% level of
significance. The adjusted R-squared score explains 16.82%
(0.1682) of variability in RBO values. The p-value of the Koyck model is
0.009161 (<0.05) hence, it is statistically significant.
The “Weak Instruments” test resulted in a p-value of 0.536
which means that the model at the first stage of least squares fitting
is not significant at 5% level of significance. The “Wu-Hausman” test
also resulted in a p-value of 0.816 which means that there
is no significant correlation between the error term and the
lag-dependent variable.
##
## Shapiro-Wilk normality test
##
## data: residuals(koyck_rbo_humid_dlm$model)
## W = 0.97232, p-value = 0.6044
Figure 76
The residuals in Figure 76 above presents a few observations. The
time series plot shows a non-random pattern. The ACF plot suggests that
the serial correlation and seasonality remaining in the residuals are
not significant. The histogram of standardised residuals contains slight
violations in normality assumptions. This is also reinforced by the
Shapiro-Wilk normality test with a value of 0.6044
(>0.05) which fails to reject the null hypothesis of normality.
## Y.1 X.t
## 4.171591 4.171591
The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.
## p = 1 q = 1 AIC = -104.4462 BIC = -94.63785 MASE = 0.8287036
## p = 1 q = 2 AIC = -103.9101 BIC = -92.97176 MASE = 0.8239655
## p = 1 q = 3 AIC = -108.2781 BIC = -96.28825 MASE = 0.777656
## p = 1 q = 4 AIC = -103.1442 BIC = -90.18588 MASE = 0.8081884
## p = 1 q = 5 AIC = -97.24854 BIC = -83.40948 MASE = 0.7553928
## p = 2 q = 1 AIC = -95.81121 BIC = -83.50555 MASE = 0.898983
## p = 2 q = 2 AIC = -101.4288 BIC = -87.75582 MASE = 0.766712
## p = 2 q = 3 AIC = -108.3739 BIC = -93.71963 MASE = 0.7290062
## p = 2 q = 4 AIC = -101.6237 BIC = -86.07367 MASE = 0.7606791
## p = 2 q = 5 AIC = -95.78843 BIC = -79.43317 MASE = 0.6950639
## p = 3 q = 1 AIC = -98.57578 BIC = -83.92153 MASE = 0.8391506
## p = 3 q = 2 AIC = -106.2021 BIC = -90.21565 MASE = 0.7339625
## p = 3 q = 3 AIC = -105.0644 BIC = -87.74577 MASE = 0.7207013
## p = 3 q = 4 AIC = -99.62896 BIC = -81.48724 MASE = 0.7164256
## p = 3 q = 5 AIC = -94.31971 BIC = -75.44826 MASE = 0.6607785
## p = 4 q = 1 AIC = -95.30416 BIC = -78.45828 MASE = 0.7700148
## p = 4 q = 2 AIC = -99.65453 BIC = -81.51282 MASE = 0.6934633
## p = 4 q = 3 AIC = -97.79474 BIC = -78.35719 MASE = 0.6924341
## p = 4 q = 4 AIC = -96.13468 BIC = -75.40129 MASE = 0.7127527
## p = 4 q = 5 AIC = -92.76031 BIC = -71.37267 MASE = 0.6234692
## p = 5 q = 1 AIC = -96.93151 BIC = -78.06006 MASE = 0.6401758
## p = 5 q = 2 AIC = -100.1132 BIC = -79.98367 MASE = 0.5990576
## p = 5 q = 3 AIC = -98.62372 BIC = -77.23608 MASE = 0.603044
## p = 5 q = 4 AIC = -97.14898 BIC = -74.50324 MASE = 0.5895437
## p = 5 q = 5 AIC = -96.04542 BIC = -72.14159 MASE = 0.5702066
From the autoregressive distributed lag model above, it can be
observed that the model with the lowest AIC value is
p=4, q=5, the lowest BIC value is p=4, q=5,
and the lowest MASE is p=5, q=5. However, as we are after
the lowest MASE values, the models that will be selected are
ARDL(5,2), ARDL(5,4), and
ARDL(5,5)
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.044837 -0.010709 0.000636 0.013783 0.061326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.859327 1.920150 -1.489 0.1548
## X.t 0.008326 0.009824 0.847 0.4085
## X.1 -0.003273 0.007845 -0.417 0.6818
## X.2 0.018191 0.007914 2.299 0.0345 *
## X.3 -0.009886 0.008070 -1.225 0.2373
## X.4 0.001330 0.008752 0.152 0.8810
## X.5 0.012198 0.008215 1.485 0.1559
## Y.1 0.475549 0.207151 2.296 0.0347 *
## Y.2 0.416397 0.270505 1.539 0.1421
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02927 on 17 degrees of freedom
## Multiple R-squared: 0.663, Adjusted R-squared: 0.5044
## F-statistic: 4.181 on 8 and 17 DF, p-value: 0.006333
The autoregressive distributed lag model ardl_5.2_task3,
has p=5 and q=2. Almost all of its
coefficients are not significant at 5% level of significance. The
adjusted R-squared model explains 50.44% (0.5044) of the
variability in mortality rates. The p value of
ardl_5.2_task3 is 0.006333 (<0.05) hence, it is
significant.
Figure 77
##
## Breusch-Godfrey test for serial correlation of order up to 12
##
## data: Residuals
## LM test = 17.078, df = 12, p-value = 0.1467
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.2_task3$model)
## W = 0.97623, p-value = 0.7857
The residuals in Figure 77 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.7857 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains violations in normality assumptions. This is also reinforced by
the Shapiro-Wilk normality test with a value of 0.1467
(>0.05) which fails to reject the null hypothesis of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 2.439200 1.554434 1.520936 1.605404 1.879600 1.549683 2.358583 4.032632
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.042942 -0.010248 0.001514 0.014243 0.060600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.2002014 2.6807937 -1.567 0.138
## X.t 0.0093770 0.0104375 0.898 0.383
## X.1 0.0007341 0.0093073 0.079 0.938
## X.2 0.0199841 0.0095605 2.090 0.054 .
## X.3 -0.0058925 0.0094126 -0.626 0.541
## X.4 0.0025447 0.0095462 0.267 0.793
## X.5 0.0116316 0.0086107 1.351 0.197
## Y.1 0.3623728 0.2484277 1.459 0.165
## Y.2 0.3666810 0.2873796 1.276 0.221
## Y.3 0.1926356 0.2237680 0.861 0.403
## Y.4 0.0859316 0.2354194 0.365 0.720
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03031 on 15 degrees of freedom
## Multiple R-squared: 0.6811, Adjusted R-squared: 0.4685
## F-statistic: 3.203 on 10 and 15 DF, p-value: 0.0208
The autoregressive distributed lag model ardl_5.4_task3,
has p=5 and q=4. All of its coefficients are
not significant at 5% level of significance. The adjusted R-squared
model explains 46.85% (0.4685) of the variability in
mortality rates. The p value of ardl_5.4_task3 is 0.0208
(<0.05) hence, it is significant.
Figure 78
##
## Breusch-Godfrey test for serial correlation of order up to 14
##
## data: Residuals
## LM test = 23.417, df = 14, p-value = 0.05382
##
## Shapiro-Wilk normality test
##
## data: residuals(ardl_5.4_task3$model)
## W = 0.95566, p-value = 0.3133
The residuals in Figure 78 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.05382 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.3133 (>0.05) which fails to reject the null hypothesis
of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 2.567369 2.040296 2.069773 2.036219 2.084862 1.587393 3.162892 4.243817
## L(y.t, 3) L(y.t, 4)
## 3.180939 3.497515
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
##
## Time series regression with "ts" data:
## Start = 6, End = 31
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.043603 -0.007140 0.002577 0.010431 0.061704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.457663 3.187293 -1.712 0.109
## X.t 0.010043 0.010626 0.945 0.361
## X.1 0.002499 0.009726 0.257 0.801
## X.2 0.020996 0.009791 2.144 0.050 .
## X.3 -0.002192 0.010729 -0.204 0.841
## X.4 0.003910 0.009852 0.397 0.697
## X.5 0.013895 0.009234 1.505 0.155
## Y.1 0.362583 0.252044 1.439 0.172
## Y.2 0.317047 0.298850 1.061 0.307
## Y.3 0.166393 0.229659 0.725 0.481
## Y.4 0.092673 0.239013 0.388 0.704
## Y.5 0.178912 0.236435 0.757 0.462
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03075 on 14 degrees of freedom
## Multiple R-squared: 0.6936, Adjusted R-squared: 0.4529
## F-statistic: 2.881 on 11 and 14 DF, p-value: 0.03285
The autoregressive distributed lag model ardl_5.5_task3,
has p=5 and q=5. All of its coefficients are
not are not significant at 5% level of significance. The adjusted
R-squared model explains 45.29% (0.4529) of the variability
in mortality rates. The p value of ardl_5.5_task3 is
0.03285 (<0.05) hence, it is significant.
The residuals in Figure 79 above presents a few observations.The time
series plot shows a non-random pattern. The ACF plot suggests that the
serial correlation and seasonality remaining in the residuals are not
significant. This is also reinforced by the Breusch-Godfrey test results
of a p-value of 0.0661 (>0.05) which confirms the
absence of serial correlation. The histogram of standardised residuals
contains slight violations in normality assumptions. This is also
reinforced by the Shapiro-Wilk normality test with a value of
0.5063 (>0.05) which fails to reject the null hypothesis
of normality.
## X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2)
## 2.585128 2.164717 2.109082 2.570112 2.157184 1.773510 3.162896 4.458597
## L(y.t, 3) L(y.t, 4) L(y.t, 5)
## 3.255167 3.502381 3.361625
The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.
In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.
| Model | AIC | BIC | MASE |
|---|---|---|---|
| Finite DLM (Humidity) | -102.94035 | -88.77377 | 0.2453801 |
| Finite DLM (Temperature) | -87.10828 | -72.94170 | 0.3522387 |
| Finite DLM (Rainfall) | -79.63373 | -65.46714 | 0.4240236 |
| Finite DLM (Radiation) | -72.88766 | -58.94741 | 0.5583271 |
| Poly DLM k=2 (Humidity) | -88.16372 | -83.44153 | 0.5722198 |
| Poly DLM k=1 (Humidity) | -89.96751 | -86.18975 | 0.5828660 |
| Poly DLM k=2 (Rainfall) | -85.66633 | -80.68767 | 0.6161421 |
| Poly DLM k=1 (Rainfall) | -87.42340 | -83.44047 | 0.6185122 |
| Poly DLM k=2 (Radiation) | -84.26791 | -79.28925 | 0.6193374 |
| ARDL(5,5) | -97.36157 | -81.00632 | 0.6208847 |
| Poly DLM k=2 (Temperature) | -83.56579 | -78.58713 | 0.6212847 |
| Poly DLM k=1 (Temperature) | -85.49762 | -81.51469 | 0.6248693 |
| Poly DLM k=1 (Radiation) | -85.46993 | -81.48700 | 0.6570757 |
| ARDL(5,4) | -98.31933 | -83.22217 | 0.6599444 |
| ARDL(5,2) | -100.88457 | -88.30361 | 0.7045789 |
| Koyck (Humidity) | -101.28049 | -95.67571 | 0.9559618 |
| Koyck (Radiation) | -93.86690 | -88.26211 | 1.0314227 |
| Koyck (Temperature) | -64.98345 | -59.37866 | 1.9150155 |
| Koyck (Rainfall) | 76.22068 | 81.82547 | 19.1057647 |
The table above depicts the accuracy measures (AIC, BIC, and MASE) of
the fitted models thus far. It can be seen that
Finite DLM (Humidity) has the lowest MASE value.
In conclusion, based on all the previous observations made - the model that should be used to forecast the FFD 4 years ahead should be the Finite DLM (Humidity) model based on the results of its residual analysis and its low MASE value.
To conclude this exploration, the frequency was too low so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the Finite DLM (Humidity) was left as the best model to use for forecasting.
Demirhan H., 2020. Time Series Regression Models with Distributed Lag Models. Package ‘dLagM’. Available at: https://cran.r-project.org/web/packages/dLagM/dLagM.pdf [Accessed 13 September 2021].
Sauer, S., 2016. Different ways to set figure size in RMarkdown. [online] Sebastian Sauer Stats Blog. Available at: https://sebastiansauer.github.io/figure_sizing_knitr/ [Accessed 13 September 2021].
Common Box-Cox Transformations: https://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/
# Task 1
### Data Exploration and Visualization
# Data Import and Cleaning
mort_import <- read_csv("mort .csv")
mort_import = mort_import[,-1] # Drop first column
# Time-series Conversion
mort_mort_ts <- ts(mort_import$mortality, start = c(2010, 7), frequency = 53)
mort_temp_ts <- ts(mort_import$temp, start = c(2010, 7), frequency = 53)
mort_chem1_ts <- ts(mort_import$chem1, start = c(2010, 7), frequency = 53)
mort_chem2_ts <- ts(mort_import$chem2, start = c(2010, 7), frequency = 53)
mort_part_ts <- ts(mort_import$`particle size`, start = c(2010, 7), frequency = 53)
mort_ts <- ts(mort_import, start = c(2010, 7), frequency = 53)
# Plotting of Solar Time-Series
par(xpd=NA)
plot(mort_ts,
xlab = "Year",
ylab = "Mortality",
plot.type = "s",
col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
main = "Plot of Climate and Pollution on Disease specific morality factors")
add_legend("bottomright",
legend = c("Mortality",
"Temperature",
"Chemical 1",
"Chemical 2",
"Particle Size"),
pch = 20,
col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
horiz = TRUE,
bty = 'n',
cex = 0.9)
# Scale and center the series
mort_scaled <- scale(mort_ts)
plot(mort_scaled,
xlab = "Time",
plot.type = "s",
col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
main = "Scaled plot of Climate and Pollution on Disease specific morality factors")
add_legend("bottomright",
legend = c("Mortality",
"Temperature",
"Chemical 1",
"Chemical 2",
"Particle Size"),
pch = 20,
col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
horiz = TRUE,
bty = 'n',
cex = 0.9)
### Mortality Time Series Analysis
plot(mort_mort_ts,
xlab = "Time",
ylab = "Amount of Mortality",
main = "Mortality Time Series")
### Mortality ADF & ACF
adf.test(mort_mort_ts)
acf(mort_mort_ts,
lag.max = 48,
main = "ACF of Mortality series")
### Temperature Time Series Analysis
plot(mort_temp_ts,
xlab = "Time",
ylab = "Temperature",
main = "Temperature Time Series")
### Temperature ADF & ACF
adf.test(mort_temp_ts)
acf(mort_temp_ts,
lag.max = 48,
main = "ACF of Temperature series")
### Chemical 1 Time Series Analysis
plot(mort_chem1_ts,
xlab = "Time",
ylab = "Amount of Chemical 1",
main = "Chemical 1 Time Series")
### Chemical 1 ADF & ACF
adf.test(mort_chem1_ts)
acf(mort_chem1_ts,
lag.max = 48,
main = "ACF of Chemical 1 series")
### Chemical 2 Time Series Analysis
plot(mort_chem2_ts,
xlab = "Time",
ylab = "Amount of Chemical 2",
main = "Chemical 2 Time Series")
### Chemical 2 ADF & ACF
adf.test(mort_chem2_ts)
acf(mort_chem2_ts,
lag.max = 48,
main = "ACF of Chemical 2 series")
### Particle Size Time Series Analysis
plot(mort_part_ts,
xlab = "Time",
ylab = "Size of Particle",
main = "Particle Size Time Series Plot")
### Particle Size ADF & ACF
adf.test(mort_part_ts)
acf(mort_part_ts,
lag.max = 48,
main = "ACF of Particle Size")
### Impact of Trend and Seasonality Decomposition
mort_mort_decomposition_stl <- stl(mort_mort_ts, t.window = 15,
s.window = "periodic",
robust = TRUE)
plot(mort_mort_decomposition_stl,
main = "STL Decomposition of Mortality Time Series")
mort_temp_decomposition_stl <- stl(mort_temp_ts,
t.window = 15,
s.window = "periodic",
robust = TRUE)
plot(mort_temp_decomposition_stl,
main = "STL Decomposition of Temperature Time Series")
mort_chem1_decomposition_stl <- stl(mort_chem1_ts,
t.window = 15,
s.window = "periodic",
robust = TRUE)
plot(mort_chem1_decomposition_stl,
main = "STL Decomposition of Chemical 1 Time Series")
mort_chem2_decomposition_stl <- stl(mort_chem2_ts,
t.window = 15,
s.window = "periodic",
robust = TRUE)
plot(mort_chem2_decomposition_stl,
main = "STL Decomposition of Chemical 2 Time Series")
mort_part_decomposition_stl <- stl(mort_part_ts,
t.window = 15,
s.window = "periodic",
robust = TRUE)
plot(mort_mort_decomposition_stl,
main = "STL Decomposition of Particle Size Time Series")
## Time Series Regression Model Fitting
### Correlation Coefficient
cor(mort_ts)
### Fitting Finite Distributed Lag Model
#### Finite DLM
colnames(mort_import)<-c("y","x1","x2","x3","x4")
for (i in 1:12){
fin_model <- dlm(formula = y~x1+x2+x3,
data = data.frame(mort_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model$model),
"BIC =", BIC(fin_model$model),
"MASE =", MASE(fin_model)$MASE, "\n")
}
fin_dlm <- dlm(formula = y~x1+x3+x4,
data = data.frame(mort_import),
q = 10)
summary(fin_dlm)
shapiro.test(residuals(fin_dlm$model))
checkresiduals(fin_dlm$model)
vif_fin_dlm = vif(fin_dlm$model)
vif_fin_dlm
### Fitting Polynomial Distributed Lag Model
for ( i in 1:12){
poly_dlm_k2 = polyDlm(y = as.vector(mort_mort_ts),
x = as.vector(mort_chem1_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_k2$model),
"BIC =", BIC(poly_dlm_k2$model),
"MASE =", MASE(poly_dlm_k2)$MASE, "\n")
}
poly_dlm_k2 = polyDlm(y = as.vector(mort_mort_ts),
x = as.vector(mort_chem1_ts),
q = 10,
k = 2)
summary(poly_dlm_k2)
shapiro.test(residuals(poly_dlm_k2$model))
checkresiduals(poly_dlm_k2$model)
poly_dlm_k1 = polyDlm(y = as.vector(mort_mort_ts),
x = as.vector(mort_chem1_ts),
q = 10,
k = 1)
summary(poly_dlm_k1)
shapiro.test(residuals(poly_dlm_k1$model))
checkresiduals(poly_dlm_k1$model)
### Fitting Koyck Distributed Lag Model
koyck_dlm = koyckDlm(y = as.vector(mort_mort_ts),
x = as.vector(mort_chem1_ts))
summary(koyck_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_dlm$model))
checkresiduals(koyck_dlm$model)
vif_koyck_dlm = vif(koyck_dlm$model)
vif_koyck_dlm
### Fitting Autoregressive Distributed Lag Model
### Fitting Autoregressive Distributed Lag Models with three predictors
for (i in 1:5){
for(j in 1:5){
ardlm_tempchem = ardlDlm(formula = y~x1+x2+x3,
data = data.frame(mort_import),
p = i,
q = j)
cat("p =", i,
"q =", j,
"AIC =", AIC(ardlm_tempchem$model),
"BIC =", BIC(ardlm_tempchem$model),
"MASE =", MASE(ardlm_tempchem)$MASE, "\n")
}
}
ardl_5.3 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 3)
summary(ardl_5.3)
checkresiduals(ardl_5.3$model)
shapiro.test(residuals(ardl_5.3$model))
VIF.ardlm_low_aic_tempchem = vif(ardl_5.3$model)
VIF.ardlm_low_aic_tempchem
ardl_5.4 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 4)
summary(ardl_5.4)
checkresiduals(ardl_5.4$model)
shapiro.test(residuals(ardl_5.4$model))
VIF.ardlm_low_bic_tempchem = vif(ardl_5.4$model)
VIF.ardlm_low_bic_tempchem
ardl_5.5 = ardlDlm(x = as.vector(mort_temp_ts) +
as.vector(mort_chem1_ts) +
as.vector(mort_chem2_ts),
y = as.vector(mort_mort_ts),
p = 5,
q = 5)
summary(ardl_5.5)
checkresiduals(ardl_5.5$model)
shapiro.test(residuals(ardl_5.5$model))
VIF.ardlm_low_mase_tempchem = vif(ardl_5.5$model)
VIF.ardlm_low_mase_tempchem
### Time Series Regression Models Table Summary
attr(koyck_dlm$model,"class") = "lm"
models <- c("Finite DLM", "Poly DLM", "Koyck", "ARDL(5,3)", "ARDL(5,4)", "ARDL(5,5)")
aic_score <- AIC(fin_dlm$model, poly_dlm_k1$model,
koyck_dlm$model, ardl_5.3$model,
ardl_5.4$model, ardl_5.5$model)$AIC
bic_score <- BIC(fin_dlm$model, poly_dlm_k1$model,
koyck_dlm$model, ardl_5.3$model,
ardl_5.4$model, ardl_5.5$model)$BIC
MASE_score <- MASE(fin_dlm$model, poly_dlm_k1$model,
koyck_dlm$model, ardl_5.3,
ardl_5.4, ardl_5.5)$MASE
accuracy_table <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table <- arrange(accuracy_table, MASE)
kable(accuracy_table, caption = "Accuracy Score Table")
# Task 2
### Data Exploration and Visualization
# Data Import and Cleaning
ffd_import<- read_csv("FFD .csv")
# Time Series Conversion
ffd_ts <- ts(ffd_import, start = c(1984), frequency = 1)
ffd_temp_ts <- ts(ffd_import$Temperature, start = c(1984), frequency = 1)
ffd_rain_ts <- ts(ffd_import$Rainfall, start = c(1984), frequency = 1)
ffd_rad_ts <- ts(ffd_import$Radiation, start = c(1984), frequency = 1)
ffd_humid_ts <- ts(ffd_import$RelHumidity, start = c(1984), frequency = 1)
ffd_ffd_ts <- ts(ffd_import$FFD, start = c(1984), frequency = 1)
# Plotting of Time Series(s)
plot(ffd_ts, main="Time series plots of Species First Flowering Factors", type="o")
### Temperature Time Series Analysis
plot(ffd_temp_ts,
xlab = "Time",
ylab = "Temperature",
type = "o",
main = "Temperature Time Series")
### Temperature ADF & ACF
adf.test(ffd_temp_ts)
acf(ffd_temp_ts,
lag.max = 48,
main = "ACF of Temperature series")
### Rainfall Time Series Analysis
plot(ffd_rain_ts,
xlab = "Time",
ylab = "Amount of Rainfall",
type = "o",
main = "Rainfall Time Series")
### Rainfall ADF & ACF
adf.test(ffd_rain_ts)
acf(ffd_rain_ts,
lag.max = 48,
main = "ACF of Rainfall series")
### Radiation Time Series Analysis
plot(ffd_rad_ts,
xlab = "Time",
ylab = "Amount of Radiation",
type = "o",
main = "Radiation Time Series")
### Radiation ADF & ACF
adf.test(ffd_rad_ts)
acf(ffd_rad_ts,
lag.max = 48,
main = "ACF of Radiation series")
### Relative Humidity Time Series Analysis
plot(ffd_humid_ts,
xlab = "Time",
ylab = "Relative Humidity Levels",
type = "o",
main = "Relative Humidity Time Series")
### Relative Humidity ADF & ACF
adf.test(ffd_humid_ts)
acf(ffd_humid_ts,
lag.max = 48,
main = "ACF of Relative Humidity series")
### First Flowering Day Time Series Analysis
plot(ffd_ffd_ts,
xlab = "Time",
ylab = "First Flowering Day (FFD)",
type = "o",
main = "First Flowering Day Time Series Plot")
### First Flowering Day ADF & ACF
adf.test(ffd_ffd_ts)
acf(ffd_ffd_ts,
lag.max = 48,
main = "ACF of First Flowering Day (FFD)")
## Time Series Regression Model Fitting
### Correlation Coefficient
cor(ffd_ts)
### Fitting Finite Distributed Lag Model with Intercepts
#### Finite DLM Temperature
colnames(ffd_import)<-c("x1","x2","x3","x4","x5","y")
for (i in 1:12){
fin_model_temp <- dlm(formula = y~x2,
data = data.frame(ffd_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_temp$model),
"BIC =", BIC(fin_model_temp$model),
"MASE =", MASE(fin_model_temp)$MASE, "\n")
}
fin_dlm_temp <- dlm(formula = y~x2,
data = data.frame(ffd_import),
q = 12)
shapiro.test(residuals(fin_dlm_temp$model))
checkresiduals(fin_dlm_temp$model)
vif_fin_dlm_temp = vif(fin_dlm_temp$model)
vif_fin_dlm_temp
#### Finite DLM Rainfall
for (i in 1:12){
fin_model_rain <- dlm(formula = y~x3,
data = data.frame(ffd_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_rain$model),
"BIC =", BIC(fin_model_rain$model),
"MASE =", MASE(fin_model_rain)$MASE, "\n")
}
fin_dlm_rain <- dlm(formula = y~x3,
data = data.frame(ffd_import),
q = 11)
summary(fin_dlm_rain)
shapiro.test(residuals(fin_dlm_rain$model))
checkresiduals(fin_dlm_rain$model)
vif_fin_dlm_rain = vif(fin_dlm_rain$model)
vif_fin_dlm_rain
#### Finite DLM Radiation
for (i in 1:12){
fin_model_rad <- dlm(formula = y~x4,
data = data.frame(ffd_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_rad$model),
"BIC =", BIC(fin_model_rad$model),
"MASE =", MASE(fin_model_rad)$MASE, "\n")
}
fin_dlm_rad <- dlm(formula = y~x4,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_rad)
shapiro.test(residuals(fin_dlm_rad$model))
checkresiduals(fin_dlm_rad$model)
vif_fin_dlm_rad = vif(fin_dlm_rad$model)
vif_fin_dlm_rad
#### Finite DLM Relative Humidity
for (i in 1:12){
fin_model_humid <- dlm(formula = y~x5,
data = data.frame(ffd_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_humid$model),
"BIC =", BIC(fin_model_humid$model),
"MASE =", MASE(fin_model_humid)$MASE, "\n")
}
fin_dlm_humid <- dlm(formula = y~x5,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_humid)
shapiro.test(residuals(fin_dlm_humid$model))
checkresiduals(fin_dlm_humid$model)
vif_fin_dlm_humid = vif(fin_dlm_humid$model)
vif_fin_dlm_humid
### Fitting Finite Distributed Lag Model without Intercepts
#### Finite DLM Temperature (no Intercept)
fin_dlm_temp_noint <- dlm(formula = y~x2-1,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_temp_noint)
shapiro.test(residuals(fin_dlm_temp_noint$model))
checkresiduals(fin_dlm_temp_noint$model)
vif_fin_dlm_temp_noint = vif(fin_dlm_temp_noint$model)
vif_fin_dlm_temp_noint
#### Finite DLM Rainfall (no Intercept)
fin_dlm_rain_noint <- dlm(formula = y~x3-1,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_rain_noint)
shapiro.test(residuals(fin_dlm_rain_noint$model))
checkresiduals(fin_dlm_rain_noint$model)
vif_fin_dlm_rain_noint = vif(fin_dlm_rain_noint$model)
vif_fin_dlm_rain_noint
#### Finite DLM Radiation (no Intercept)
fin_dlm_rad_noint <- dlm(formula = y~x4-1,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_rad_noint)
shapiro.test(residuals(fin_dlm_rad_noint$model))
checkresiduals(fin_dlm_rad_noint$model)
vif_fin_dlm_rad_noint = vif(fin_dlm_rad_noint$model)
vif_fin_dlm_rad_noint
#### Finite DLM Relative Humidity (no Intercept)
fin_dlm_humid_noint <- dlm(formula = y~x5-1,
data = data.frame(ffd_import),
q = 12)
summary(fin_dlm_humid_noint)
shapiro.test(residuals(fin_dlm_humid_noint$model))
checkresiduals(fin_dlm_humid_noint$model)
vif_fin_dlm_humid_noint = vif(fin_dlm_humid_noint$model)
vif_fin_dlm_humid_noint
### Fitting Polynomial Distributed Lag Model
### FFD and Temperature
for ( i in 1:12){
poly_dlm_ffd_temp_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_temp_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_ffd_temp_k2$model),
"BIC =", BIC(poly_dlm_ffd_temp_k2$model),
"MASE =", MASE(poly_dlm_ffd_temp_k2)$MASE, "\n")
}
poly_dlm_ffd_temp_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_temp_ts),
q = 12,
k = 2)
shapiro.test(residuals(poly_dlm_ffd_temp_k2$model))
checkresiduals(poly_dlm_ffd_temp_k2$model)
poly_dlm_ffd_temp_k1 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_temp_ts),
q = 12,
k = 1)
summary(poly_dlm_ffd_temp_k1)
shapiro.test(residuals(poly_dlm_ffd_temp_k1$model))
checkresiduals(poly_dlm_ffd_temp_k1$model)
### FFD and Rainfall
for ( i in 1:12){
poly_dlm_ffd_rain_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rain_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_ffd_rain_k2$model),
"BIC =", BIC(poly_dlm_ffd_rain_k2$model),
"MASE =", MASE(poly_dlm_ffd_rain_k2)$MASE, "\n")
}
poly_dlm_ffd_rain_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rain_ts),
q = 11,
k = 2)
summary(poly_dlm_ffd_rain_k2)
shapiro.test(residuals(poly_dlm_ffd_rain_k2$model))
checkresiduals(poly_dlm_ffd_rain_k2$model)
poly_dlm_ffd_rain_k1 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rain_ts),
q = 11,
k = 1)
summary(poly_dlm_ffd_rain_k1)
shapiro.test(residuals(poly_dlm_ffd_rain_k1$model))
checkresiduals(poly_dlm_ffd_rain_k1$model)
### FFD and Radiation
for ( i in 1:12){
poly_dlm_ffd_rad_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rad_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_ffd_rad_k2$model),
"BIC =", BIC(poly_dlm_ffd_rad_k2$model),
"MASE =", MASE(poly_dlm_ffd_rad_k2)$MASE, "\n")
}
poly_dlm_ffd_rad_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rad_ts),
q = 11,
k = 2)
summary(poly_dlm_ffd_rad_k2)
shapiro.test(residuals(poly_dlm_ffd_rad_k2$model))
checkresiduals(poly_dlm_ffd_rad_k2$model)
poly_dlm_ffd_rad_k1 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rad_ts),
q = 11,
k = 1)
summary(poly_dlm_ffd_rad_k1)
shapiro.test(residuals(poly_dlm_ffd_rad_k1$model))
checkresiduals(poly_dlm_ffd_rad_k1$model)
### FFD and Humidity
for ( i in 1:12){
poly_dlm_ffd_humid_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_humid_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_ffd_humid_k2$model),
"BIC =", BIC(poly_dlm_ffd_humid_k2$model),
"MASE =", MASE(poly_dlm_ffd_humid_k2)$MASE, "\n")
}
poly_dlm_ffd_humid_k2 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_humid_ts),
q = 12,
k = 2)
summary(poly_dlm_ffd_humid_k2)
shapiro.test(residuals(poly_dlm_ffd_humid_k2$model))
checkresiduals(poly_dlm_ffd_humid_k2$model)
poly_dlm_ffd_humid_k1 = polyDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_humid_ts),
q = 12,
k = 1)
summary(poly_dlm_ffd_humid_k1)
shapiro.test(residuals(poly_dlm_ffd_humid_k1$model))
checkresiduals(poly_dlm_ffd_humid_k1$model)
### Fitting Koyck Distributed Lag Model
### FFD and Temperature
koyck_ffd_temp_dlm = koyckDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_temp_ts))
summary(koyck_ffd_temp_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_temp_dlm$model))
checkresiduals(koyck_ffd_temp_dlm$model)
vif_koyck_ffd_temp_dlm = vif(koyck_ffd_temp_dlm$model)
vif_koyck_ffd_temp_dlm
### FFD and Rainfall
koyck_ffd_rain_dlm = koyckDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rain_ts))
summary(koyck_ffd_temp_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_rain_dlm$model))
checkresiduals(koyck_ffd_rain_dlm$model)
vif_koyck_ffd_rain_dlm = vif(koyck_ffd_rain_dlm$model)
vif_koyck_ffd_rain_dlm
### FFD and Radiation
koyck_ffd_rad_dlm = koyckDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_rad_ts))
summary(koyck_ffd_rad_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_rad_dlm$model))
checkresiduals(koyck_ffd_rad_dlm$model)
vif_koyck_ffd_rad_dlm = vif(koyck_ffd_rad_dlm$model)
vif_koyck_ffd_rad_dlm
### FFD and Humidity
koyck_ffd_humid_dlm = koyckDlm(y = as.vector(ffd_ffd_ts),
x = as.vector(ffd_humid_ts))
summary(koyck_ffd_humid_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_humid_dlm$model))
checkresiduals(koyck_ffd_humid_dlm$model)
vif_koyck_ffd_humid_dlm = vif(koyck_ffd_humid_dlm$model)
vif_koyck_ffd_humid_dlm
### Fitting Autoregressive Distributed Lag Model
for (i in 1:5){
for(j in 1:5){
ardlm_rainhumid = ardlDlm(formula = y~x2+x4,
data = data.frame(ffd_import),
p = i,
q = j)
cat("p =", i,
"q =", j,
"AIC =", AIC(ardlm_rainhumid$model),
"BIC =", BIC(ardlm_rainhumid$model),
"MASE =", MASE(ardlm_rainhumid)$MASE, "\n")
}
}
ardl_5.3_task2 = ardlDlm(x = as.vector(ffd_rain_ts) +
as.vector(ffd_humid_ts),
y = as.vector(ffd_ffd_ts),
p = 5,
q = 3)
summary(ardl_5.3_task2)
checkresiduals(ardl_5.3_task2$model)
shapiro.test(residuals(ardl_5.3_task2$model))
VIF.ardlm_rain_humid = vif(ardl_5.3_task2$model)
VIF.ardlm_rain_humid
ardl_5.4_task2 = ardlDlm(x = as.vector(ffd_rain_ts) +
as.vector(ffd_humid_ts),
y = as.vector(ffd_ffd_ts),
p = 5,
q = 4)
summary(ardl_5.4_task2)
checkresiduals(ardl_5.4_task2$model)
shapiro.test(residuals(ardl_5.4_task2$model))
VIF.ardlm_rain_humid = vif(ardl_5.4_task2$model)
VIF.ardlm_rain_humid
ardl_5.5_task2 = ardlDlm(x = as.vector(ffd_rain_ts) +
as.vector(ffd_humid_ts),
y = as.vector(ffd_ffd_ts),
p = 5,
q = 5)
summary(ardl_5.5_task2)
checkresiduals(ardl_5.5_task2$model)
shapiro.test(residuals(ardl_5.5_task2$model))
VIF.ardlm_rain_humid = vif(ardl_5.5_task2$model)
VIF.ardlm_rain_humid
### Time Series Regression Models Table Summary
attr(koyck_ffd_temp_dlm$model,"class") = "lm"
attr(koyck_ffd_rain_dlm$model,"class") = "lm"
attr(koyck_ffd_rad_dlm$model,"class") = "lm"
attr(koyck_ffd_humid_dlm$model,"class") = "lm"
models <- c("Finite DLM /w Intercept (Temperature)",
"Finite DLM /w Intercept (Rainfall)",
"Finite DLM /w Intercept (Radiation)",
"Finite DLM /w Intercept (Humidity)",
"Finite DLM /wo Intercept (Temperature)",
"Finite DLM /wo Intercept (Rainfall)",
"Finite DLM /wo Intercept (Radiation)",
"Finite DLM /wo Intercept (Humidity)",
"Poly DLM k=2 (Temperature)",
"Poly DLM k=2 (Rainfall)",
"Poly DLM k=2 (Radiation)",
"Poly DLM k=2 (Humidity)",
"Poly DLM k=1 (Temperature)",
"Poly DLM k=1 (Rainfall)",
"Poly DLM k=1 (Radiation)",
"Poly DLM k=1 (Humidity)",
"Koyck (Temperature)",
"Koyck (Rainfall)",
"Koyck (Radiation)",
"Koyck (Humidity)",
"ARDL(5,3)",
"ARDL(5,4)",
"ARDL(5,5)")
aic_score <- AIC(fin_dlm_temp$model,
fin_dlm_rain$model,
fin_dlm_rad$model,
fin_dlm_humid$model,
fin_dlm_temp_noint$model,
fin_dlm_rain_noint$model,
fin_dlm_rad_noint$model,
fin_dlm_humid_noint$model,
poly_dlm_ffd_temp_k2$model,
poly_dlm_ffd_rain_k2$model,
poly_dlm_ffd_rad_k2$model,
poly_dlm_ffd_humid_k2$model,
poly_dlm_ffd_temp_k1$model,
poly_dlm_ffd_rain_k1$model,
poly_dlm_ffd_rad_k1$model,
poly_dlm_ffd_humid_k1$model,
koyck_ffd_temp_dlm$model,
koyck_ffd_rain_dlm$model,
koyck_ffd_rad_dlm$model,
koyck_ffd_humid_dlm$model,
ardl_5.3_task2$model,
ardl_5.4_task2$model,
ardl_5.5_task2$model)$AIC
bic_score <- BIC(fin_dlm_temp$model,
fin_dlm_rain$model,
fin_dlm_rad$model,
fin_dlm_humid$model,
fin_dlm_temp_noint$model,
fin_dlm_rain_noint$model,
fin_dlm_rad_noint$model,
fin_dlm_humid_noint$model,
poly_dlm_ffd_temp_k2$model,
poly_dlm_ffd_rain_k2$model,
poly_dlm_ffd_rad_k2$model,
poly_dlm_ffd_humid_k2$model,
poly_dlm_ffd_temp_k1$model,
poly_dlm_ffd_rain_k1$model,
poly_dlm_ffd_rad_k1$model,
poly_dlm_ffd_humid_k1$model,
koyck_ffd_temp_dlm$model,
koyck_ffd_rain_dlm$model,
koyck_ffd_rad_dlm$model,
koyck_ffd_humid_dlm$model,
ardl_5.3_task2$model,
ardl_5.4_task2$model,
ardl_5.5_task2$model)$BIC
MASE_score <- MASE(fin_dlm_temp$model,
fin_dlm_rain$model,
fin_dlm_rad$model,
fin_dlm_humid$model,
fin_dlm_temp_noint$model,
fin_dlm_rain_noint$model,
fin_dlm_rad_noint$model,
fin_dlm_humid_noint$model,
poly_dlm_ffd_temp_k2$model,
poly_dlm_ffd_rain_k2$model,
poly_dlm_ffd_rad_k2$model,
poly_dlm_ffd_humid_k2$model,
poly_dlm_ffd_temp_k1$model,
poly_dlm_ffd_rain_k1$model,
poly_dlm_ffd_rad_k1$model,
poly_dlm_ffd_humid_k1$model,
koyck_ffd_temp_dlm$model,
koyck_ffd_rain_dlm$model,
koyck_ffd_rad_dlm$model,
koyck_ffd_humid_dlm$model,
ardl_5.3_task2,
ardl_5.4_task2,
ardl_5.5_task2)$MASE
accuracy_table_task2 <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table_task2) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table_task2 <- arrange(accuracy_table_task2, MASE)
kable(accuracy_table_task2, caption = "Accuracy Score Table")
# Task 3a
### Data Exploration and Visualization
# Data Import and Cleaning
rbo_import<- read_csv("RBO .csv")
# Time Series Conversion
rbo_ts <- ts(rbo_import, start = c(1984), frequency = 1)
rbo_temp_ts <- ts(rbo_import$Temperature, start = c(1984), frequency = 1)
rbo_rain_ts <- ts(rbo_import$Rainfall, start = c(1984), frequency = 1)
rbo_rad_ts <- ts(rbo_import$Radiation, start = c(1984), frequency = 1)
rbo_humid_ts <- ts(rbo_import$RelHumidity, start = c(1984), frequency = 1)
rbo_rbo_ts <- ts(rbo_import$RBO, start = c(1984), frequency = 1)
# Plotting of Time Series(s)
plot(rbo_ts, main="Time series plots of Rank-based Order similarity metric and factors", type="o")
### RBO Time Series Analysis
plot(rbo_rbo_ts,
xlab = "Time",
ylab = "Rank-based Order similarity metric (RBO)",
type = "o",
main = "Rank-based Order similarity metric (RBO) Time Series Plot")
### RBO ADF & ACF
adf.test(rbo_rbo_ts)
acf(rbo_rbo_ts,
lag.max = 48,
main = "ACF of Rank-based Order similarity metric (RBO)")
## Time Series Regression Model Fitting
### Correlation Coefficient
cor(rbo_ts)
### Fitting Finite Distributed Lag Model with Intercepts
#### Finite DLM Temperature
colnames(rbo_import)<-c("x1","y","x2","x3","x4","x5")
for (i in 1:12){
fin_model_temp_rbo <- dlm(formula = y~x2,
data = data.frame(rbo_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_temp_rbo$model),
"BIC =", BIC(fin_model_temp_rbo$model),
"MASE =", MASE(fin_model_temp_rbo)$MASE, "\n")
}
fin_dlm_temp_rbo <- dlm(formula = y~x2,
data = data.frame(rbo_import),
q = 12)
summary(fin_dlm_temp_rbo)
shapiro.test(residuals(fin_dlm_temp_rbo$model))
#checkresiduals(fin_dlm_temp_rbo$model)
vif_fin_dlm_temp_rbo = vif(fin_dlm_temp_rbo$model)
vif_fin_dlm_temp_rbo
#### Finite DLM Rainfall
for (i in 1:12){
fin_model_rain_rbo <- dlm(formula = y~x3,
data = data.frame(rbo_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_rain_rbo$model),
"BIC =", BIC(fin_model_rain_rbo$model),
"MASE =", MASE(fin_model_rain_rbo)$MASE, "\n")
}
fin_dlm_rain_rbo <- dlm(formula = y~x3,
data = data.frame(rbo_import),
q = 12)
summary(fin_dlm_rain_rbo)
shapiro.test(residuals(fin_dlm_rain_rbo$model))
#checkresiduals(fin_dlm_rain_rbo$model)
vif_fin_dlm_rain_rbo = vif(fin_dlm_rain_rbo$model)
vif_fin_dlm_rain_rbo
#### Finite DLM Radiation
for (i in 1:12){
fin_model_rad_rbo <- dlm(formula = y~x4,
data = data.frame(rbo_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_rad_rbo$model),
"BIC =", BIC(fin_model_rad_rbo$model),
"MASE =", MASE(fin_model_rad_rbo)$MASE, "\n")
}
fin_dlm_rad_rbo <- dlm(formula = y~x4,
data = data.frame(rbo_import),
q = 11)
summary(fin_dlm_rad_rbo)
shapiro.test(residuals(fin_dlm_rad_rbo$model))
#checkresiduals(fin_dlm_rad_rbo$model)
vif_fin_dlm_rad_rbo = vif(fin_dlm_rad_rbo$model)
vif_fin_dlm_rad_rbo
#### Finite DLM Relative Humidity
for (i in 1:12){
fin_model_humid_rbo <- dlm(formula = y~x5,
data = data.frame(rbo_import),
q = i)
cat("q =", i,
"AIC =", AIC(fin_model_humid_rbo$model),
"BIC =", BIC(fin_model_humid_rbo$model),
"MASE =", MASE(fin_model_humid_rbo)$MASE, "\n")
}
fin_dlm_humid_rbo <- dlm(formula = y~x5,
data = data.frame(rbo_import),
q = 12)
summary(fin_dlm_humid_rbo)
shapiro.test(residuals(fin_dlm_humid_rbo$model))
#checkresiduals(fin_dlm_humid_rbo$model)
vif_fin_dlm_humid_rbo = vif(fin_dlm_humid_rbo$model)
vif_fin_dlm_humid_rbo
### Fitting Polynomial Distributed Lag Model
### RBO and Temperature
for ( i in 1:12){
poly_dlm_rbo_temp_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_temp_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_rbo_temp_k2$model),
"BIC =", BIC(poly_dlm_rbo_temp_k2$model),
"MASE =", MASE(poly_dlm_rbo_temp_k2)$MASE, "\n")
}
poly_dlm_rbo_temp_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_temp_ts),
q = 11,
k = 2)
summary(poly_dlm_rbo_temp_k2)
shapiro.test(residuals(poly_dlm_rbo_temp_k2$model))
checkresiduals(poly_dlm_rbo_temp_k2$model)
poly_dlm_rbo_temp_k1 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_temp_ts),
q = 11,
k = 1)
summary(poly_dlm_rbo_temp_k1)
shapiro.test(residuals(poly_dlm_rbo_temp_k1$model))
checkresiduals(poly_dlm_rbo_temp_k1$model)
### RBO and Rainfall
for ( i in 1:12){
poly_dlm_rbo_rain_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rain_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_rbo_rain_k2$model),
"BIC =", BIC(poly_dlm_rbo_rain_k2$model),
"MASE =", MASE(poly_dlm_rbo_rain_k2)$MASE, "\n")
}
poly_dlm_rbo_rain_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rain_ts),
q = 11,
k = 2)
summary(poly_dlm_rbo_rain_k2)
shapiro.test(residuals(poly_dlm_rbo_rain_k2$model))
checkresiduals(poly_dlm_rbo_rain_k2$model)
poly_dlm_rbo_rain_k1 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rain_ts),
q = 11,
k = 1)
summary(poly_dlm_rbo_rain_k1)
shapiro.test(residuals(poly_dlm_rbo_rain_k1$model))
checkresiduals(poly_dlm_rbo_rain_k1$model)
### RBO and Radiation
for ( i in 1:12){
poly_dlm_rbo_rad_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rad_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_rbo_rad_k2$model),
"BIC =", BIC(poly_dlm_rbo_rad_k2$model),
"MASE =", MASE(poly_dlm_rbo_rad_k2)$MASE, "\n")
}
poly_dlm_rbo_rad_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rad_ts),
q = 11,
k = 2)
summary(poly_dlm_rbo_rad_k2)
shapiro.test(residuals(poly_dlm_rbo_rad_k2$model))
checkresiduals(poly_dlm_rbo_rad_k2$model)
poly_dlm_rbo_rad_k1 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rad_ts),
q = 11,
k = 1)
summary(poly_dlm_rbo_rad_k1)
shapiro.test(residuals(poly_dlm_rbo_rad_k1$model))
checkresiduals(poly_dlm_rbo_rad_k1$model)
### RBO and Humidity
for ( i in 1:12){
poly_dlm_rbo_humid_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_humid_ts),
q = i,
k = 2,
show.beta = FALSE)
cat("q =", i,
"AIC =", AIC(poly_dlm_rbo_humid_k2$model),
"BIC =", BIC(poly_dlm_rbo_humid_k2$model),
"MASE =", MASE(poly_dlm_rbo_humid_k2)$MASE, "\n")
}
poly_dlm_rbo_humid_k2 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_humid_ts),
q = 12,
k = 2)
summary(poly_dlm_rbo_humid_k2)
shapiro.test(residuals(poly_dlm_rbo_humid_k2$model))
checkresiduals(poly_dlm_rbo_humid_k2$model)
poly_dlm_rbo_humid_k1 = polyDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_humid_ts),
q = 12,
k = 1)
summary(poly_dlm_rbo_humid_k1)
shapiro.test(residuals(poly_dlm_rbo_humid_k1$model))
checkresiduals(poly_dlm_rbo_humid_k1$model)
### Fitting Koyck Distributed Lag Model
### RBO and Temperature
koyck_rbo_temp_dlm = koyckDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_temp_ts))
summary(koyck_rbo_temp_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_temp_dlm$model))
#checkresiduals(koyck_ffd_temp_dlm$model)
vif_koyck_rbo_temp_dlm = vif(koyck_rbo_temp_dlm$model)
vif_koyck_rbo_temp_dlm
### RBO and Rainfall
koyck_rbo_rain_dlm = koyckDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rain_ts))
summary(koyck_rbo_temp_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_rbo_rain_dlm$model))
#checkresiduals(koyck_rbo_rain_dlm$model)
vif_koyck_rbo_rain_dlm = vif(koyck_rbo_rain_dlm$model)
vif_koyck_rbo_rain_dlm
### RBO and Radiation
koyck_rbo_rad_dlm = koyckDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_rad_ts))
summary(koyck_rbo_rad_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_rbo_rad_dlm$model))
checkresiduals(koyck_rbo_rad_dlm$model)
vif_koyck_rbo_rad_dlm = vif(koyck_rbo_rad_dlm$model)
vif_koyck_rbo_rad_dlm
### RBO and Humidity
koyck_rbo_humid_dlm = koyckDlm(y = as.vector(rbo_rbo_ts),
x = as.vector(rbo_humid_ts))
summary(koyck_rbo_humid_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_rbo_humid_dlm$model))
checkresiduals(koyck_rbo_humid_dlm$model)
vif_koyck_rbo_humid_dlm = vif(koyck_rbo_humid_dlm$model)
vif_koyck_rbo_humid_dlm
### Fitting Autoregressive Distributed Lag Model
for (i in 1:5){
for(j in 1:5){
ardlm_radhumid = ardlDlm(formula = y~x4+x5,
data = data.frame(rbo_import),
p = i,
q = j)
cat("p =", i,
"q =", j,
"AIC =", AIC(ardlm_radhumid$model),
"BIC =", BIC(ardlm_radhumid$model),
"MASE =", MASE(ardlm_radhumid)$MASE, "\n")
}
}
ardl_5.2_task3 = ardlDlm(x = as.vector(rbo_rad_ts) +
as.vector(rbo_humid_ts),
y = as.vector(rbo_rbo_ts),
p = 5,
q = 2)
summary(ardl_5.2_task3)
checkresiduals(ardl_5.2_task3$model)
shapiro.test(residuals(ardl_5.2_task3$model))
VIF.ardlm_rad_humid = vif(ardl_5.2_task3$model)
VIF.ardlm_rad_humid
ardl_5.4_task3 = ardlDlm(x = as.vector(rbo_rad_ts) +
as.vector(rbo_humid_ts),
y = as.vector(rbo_rbo_ts),
p = 5,
q = 4)
summary(ardl_5.4_task3)
checkresiduals(ardl_5.4_task3$model)
shapiro.test(residuals(ardl_5.4_task3$model))
VIF.ardlm_rad_humid = vif(ardl_5.4_task3$model)
VIF.ardlm_rad_humid
ardl_5.5_task3 = ardlDlm(x = as.vector(rbo_rad_ts) +
as.vector(rbo_humid_ts),
y = as.vector(rbo_rbo_ts),
p = 5,
q = 5)
summary(ardl_5.5_task3)
#checkresiduals(ardl_5.5_task3$model)
#shapiro.test(residuals(ardl_5.5_task3$model))
VIF.ardlm_rain_humid = vif(ardl_5.5_task3$model)
VIF.ardlm_rain_humid
### Time Series Regression Models Table Summary
attr(koyck_rbo_temp_dlm$model,"class") = "lm"
attr(koyck_rbo_rain_dlm$model,"class") = "lm"
attr(koyck_rbo_rad_dlm$model,"class") = "lm"
attr(koyck_rbo_humid_dlm$model,"class") = "lm"
models <- c("Finite DLM (Temperature)",
"Finite DLM (Rainfall)",
"Finite DLM (Radiation)",
"Finite DLM (Humidity)",
"Poly DLM k=2 (Temperature)",
"Poly DLM k=2 (Rainfall)",
"Poly DLM k=2 (Radiation)",
"Poly DLM k=2 (Humidity)",
"Poly DLM k=1 (Temperature)",
"Poly DLM k=1 (Rainfall)",
"Poly DLM k=1 (Radiation)",
"Poly DLM k=1 (Humidity)",
"Koyck (Temperature)",
"Koyck (Rainfall)",
"Koyck (Radiation)",
"Koyck (Humidity)",
"ARDL(5,2)",
"ARDL(5,4)",
"ARDL(5,5)")
aic_score <- AIC(fin_dlm_temp_rbo$model,
fin_dlm_rain_rbo$model,
fin_dlm_rad_rbo$model,
fin_dlm_humid_rbo$model,
poly_dlm_rbo_temp_k2$model,
poly_dlm_rbo_rain_k2$model,
poly_dlm_rbo_rad_k2$model,
poly_dlm_rbo_humid_k2$model,
poly_dlm_rbo_temp_k1$model,
poly_dlm_rbo_rain_k1$model,
poly_dlm_rbo_rad_k1$model,
poly_dlm_rbo_humid_k1$model,
koyck_rbo_temp_dlm$model,
koyck_rbo_rain_dlm$model,
koyck_rbo_rad_dlm$model,
koyck_rbo_humid_dlm$model,
ardl_5.2_task3$model,
ardl_5.4_task3$model,
ardl_5.5_task3$model)$AIC
bic_score <- BIC(fin_dlm_temp_rbo$model,
fin_dlm_rain_rbo$model,
fin_dlm_rad_rbo$model,
fin_dlm_humid_rbo$model,
poly_dlm_rbo_temp_k2$model,
poly_dlm_rbo_rain_k2$model,
poly_dlm_rbo_rad_k2$model,
poly_dlm_rbo_humid_k2$model,
poly_dlm_rbo_temp_k1$model,
poly_dlm_rbo_rain_k1$model,
poly_dlm_rbo_rad_k1$model,
poly_dlm_rbo_humid_k1$model,
koyck_rbo_temp_dlm$model,
koyck_rbo_rain_dlm$model,
koyck_rbo_rad_dlm$model,
koyck_rbo_humid_dlm$model,
ardl_5.2_task3$model,
ardl_5.4_task3$model,
ardl_5.5_task3$model)$BIC
MASE_score <- MASE(fin_dlm_temp_rbo$model,
fin_dlm_rain_rbo$model,
fin_dlm_rad_rbo$model,
fin_dlm_humid_rbo$model,
poly_dlm_rbo_temp_k2$model,
poly_dlm_rbo_rain_k2$model,
poly_dlm_rbo_rad_k2$model,
poly_dlm_rbo_humid_k2$model,
poly_dlm_rbo_temp_k1$model,
poly_dlm_rbo_rain_k1$model,
poly_dlm_rbo_rad_k1$model,
poly_dlm_rbo_humid_k1$model,
koyck_rbo_temp_dlm$model,
koyck_rbo_rain_dlm$model,
koyck_rbo_rad_dlm$model,
koyck_rbo_humid_dlm$model,
ardl_5.2_task3,
ardl_5.4_task3,
ardl_5.5_task3)$MASE
accuracy_table_task3 <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table_task3) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table_task3 <- arrange(accuracy_table_task3, MASE)
kable(accuracy_table_task3, caption = "Accuracy Score Table")