Final Project - Forecasting

Assignment Overview

This Assignment is segregated into three tasks. The first task is to present the best 4 weeks ahead forecast in terms of R squared, AIC, BIC, MASE, etc for the mortality series. The second task is to model and forecast the best 4 years ahead the First Flowering Day (FFD) series.

The third task was split into two, an analysis based on univariate climate regressors, and analysis to obtain the 3 year ahead forecasts.

Dataset Description

mort.csv : The dataset contains the mortality rates (mort), temperature (temp), pollutant particle size (part), chemical 1 (chem1), and Chemical 2 (chem2) between 2010 to 2020. FFD.csv : The dataset contains the First Flowering Day (FFD), temperature (temp), rainfall (rain), radiation (rad), and relative humidity (humid) from 1984 to 2014. RBO.csv:

Task 1

Objectives

Task 1 : The analysis and forecast of the affects of different factors on Mortality rates.

Methodology

First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, three methods will be utilized to find the best model for forecasting 4 weeks ahead. The methods that is aimed to be used would be Time Series Regression Models, Exponential Smoothing Methods, and State-space Models. The models from all three methods will be compared based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.

Data Exploration and Visualization

## New names:
## * `` -> ...1

## Rows: 508 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): ...1, mortality, temp, chem1, chem2, particle size
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Figure 1

Figure 2

Figure 1 shows the combined graph of the average weekly mortality in Paris against disease specific mortality factors between 2010 to 2020, while Figure 2 shows the scaled and centered version of the combined graph.

Mortality Time Series Analysis

Figure 3

Figure 3 shows the changes of the averaged weekly mortality in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph has an intervention point near the end of 2012 where the average weekly mortality reached its peak.

Mortality ADF & ACF

## Warning in adf.test(mort_mort_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mort_mort_ts
## Dickey-Fuller = -5.4125, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Figure 4

Figure 4 shows the ACF plot for the average weekly mortality rate. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the mortality time series is a stationary time series that contains seasonality.

Temperature Time Series Analysis

Figure 5

Figure 5 shows the changes of the averaged weekly temperature in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.

Temperature ADF & ACF

## Warning in adf.test(mort_temp_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mort_temp_ts
## Dickey-Fuller = -4.4572, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Figure 6

Figure 6 shows the ACF plot for averaged weekly temperature. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the temperature time series is a stationary time series that contains seasonality.

Chemical 1 Time Series Analysis

Figure 7

Figure 7 shows the changes of the averaged weekly chemical 1 readings in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.

Chemical 1 ADF & ACF

## Warning in adf.test(mort_chem1_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mort_chem1_ts
## Dickey-Fuller = -4.4926, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Figure 8

Figure 8 shows the ACF plot for averaged weekly chemical 1 readings. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the chemical 1 time series is a stationary time series that contains seasonality.

Chemical 2 Time Series Analysis

Figure 9

Figure 9 shows the changes of the averaged weekly Chemical 2 readings in Paris over the years. The graph shows a very slight decreasing trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph shows no distinct intervention points.

Chemical 2 ADF & ACF

## Warning in adf.test(mort_chem2_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mort_chem2_ts
## Dickey-Fuller = -5.2791, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Figure 10

Figure 10 shows the ACF plot for averaged weekly chemical 2 readings. The ACF plot suggests the existence of a trend. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the chemical 2 time series is a stationary time series that contains a trend.

Particle Size Time Series Analysis

Figure 11

Figure 11 shows the changes of the averaged weekly particle size readings in Paris over the years. The graph shows no apparent trend. There is some seasonality in the graph as depicted by the cyclic pattern observed. The graph mostly reflects a Moving Average (MA). The graph has an intervention point near 2019 where the averaged particle size reached its peak.

Particle Size ADF & ACF

## Warning in adf.test(mort_part_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mort_part_ts
## Dickey-Fuller = -4.493, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Figure 12

Figure 12 shows the ACF plot for averaged weekly particle size readings. The ACF plot suggests the existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.01 (<0.05) hence, we reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the particle size time series is a stationary time series that contains seasonality.

Impact of Trend and Seasonality Decomposition

Figure 13

From the STL decomposition in Figure 13, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series follows the same overall pattern of the series - with spikes around 2013 and a drop at around 2017. Lastly, the remainder in the STL suggests marginal interventions in the series.

Figure 14

From the STL decomposition in Figure 14, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series follows the same overall pattern of the series - with drops around 2011, 2015, late 2018 and 2019. Lastly, the remainder in the STL suggests marginal interventions in the series.

Figure 15

From the STL decomposition in Figure 15, the seasonality components reflect the overall pattern of the series, suggesting seasonality. The trend graph slightly follows the overtall pattern of the series, but largely deviates from 2014 to 2016. Lastly, the remainder in the STL suggests marginal interventions in the series.

Figure 16

From the STL decomposition in Figure 16, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series generally follows the same overall pattern of the series. Lastly, the remainder in the STL suggests marginal interventions in the series.

Figure 17

From the STL decomposition in Figure 17, the seasonality components are not meaningful in this scenario as previous observation suggests the presence of seasonality. Additionally, the trend in the series roughly follows the same overall pattern of the series. Lastly, the remainder in the STL suggests marginal interventions in the series.

Time Series Regression Model Fitting

Correlation Coefficient

##                mortality        temp       chem1     chem2 particle size
## mortality      1.0000000 -0.43863962  0.55744759 0.2569989    0.44387133
## temp          -0.4386396  1.00000000 -0.09785582 0.4043740   -0.01723095
## chem1          0.5574476 -0.09785582  1.00000000 0.5130047    0.86611747
## chem2          0.2569989  0.40437401  0.51300467 1.0000000    0.46793404
## particle size  0.4438713 -0.01723095  0.86611747 0.4679340    1.00000000

From the results of the Correlation Coefficient above, a high correlation between Chemical 1 and Particle Size can be observed at 0.86611747 (implies multicollienarity). Hence these variables should not be treated as independent variables simultaneously. Another observation that can be observed is a moderate level correlation between Chemical 1 and Mortality at 0.55744759. For modelling purposes, Temperature, Chemical 1, and Chemical 2 will be used as independent variables with Mortality being the dependent variable.

Fitting Finite Distributed Lag Model

Finite DLM

## q = 1 AIC = 3357.164 BIC = 3390.992 MASE = 0.9355909 
## q = 2 AIC = 3311.412 BIC = 3357.904 MASE = 0.8968938 
## q = 3 AIC = 3302.736 BIC = 3361.88 MASE = 0.8915604 
## q = 4 AIC = 3290.634 BIC = 3362.418 MASE = 0.8831327 
## q = 5 AIC = 3285.337 BIC = 3369.749 MASE = 0.8738662 
## q = 6 AIC = 3279.125 BIC = 3376.153 MASE = 0.872103 
## q = 7 AIC = 3273.536 BIC = 3383.168 MASE = 0.8611472 
## q = 8 AIC = 3272.002 BIC = 3394.226 MASE = 0.8558678 
## q = 9 AIC = 3272.207 BIC = 3407.01 MASE = 0.8555663 
## q = 10 AIC = 3271.771 BIC = 3419.142 MASE = 0.8553405 
## q = 11 AIC = 3272.315 BIC = 3432.241 MASE = 0.8559131 
## q = 12 AIC = 3272.547 BIC = 3445.017 MASE = 0.858027

From the results above, a lag length of 10 provides the lowest AIC, BIC (relative), and MASE values. This means that a value of q=10 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.747  -3.991  -0.497   3.864  35.304 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 112.241711   5.568128  20.158  < 2e-16 ***
## x1.t         -0.019743   0.068466  -0.288  0.77320    
## x1.1         -0.194046   0.070246  -2.762  0.00597 ** 
## x1.2         -0.154467   0.072274  -2.137  0.03310 *  
## x1.3         -0.058983   0.072991  -0.808  0.41945    
## x1.4          0.021698   0.074285   0.292  0.77034    
## x1.5         -0.125043   0.074317  -1.683  0.09313 .  
## x1.6         -0.094553   0.073752  -1.282  0.20047    
## x1.7         -0.038808   0.072428  -0.536  0.59234    
## x1.8         -0.021044   0.072333  -0.291  0.77123    
## x1.9         -0.021308   0.070011  -0.304  0.76100    
## x1.10         0.016000   0.069759   0.229  0.81869    
## x3.t          0.822583   0.549961   1.496  0.13541    
## x3.1          0.443087   0.551247   0.804  0.42193    
## x3.2          0.817593   0.567822   1.440  0.15058    
## x3.3         -0.044295   0.568023  -0.078  0.93788    
## x3.4          0.486762   0.570700   0.853  0.39414    
## x3.5          0.320478   0.572441   0.560  0.57586    
## x3.6          0.950698   0.572213   1.661  0.09730 .  
## x3.7          0.393360   0.564801   0.696  0.48649    
## x3.8          0.298915   0.557305   0.536  0.59197    
## x3.9          0.041245   0.546535   0.075  0.93988    
## x3.10         0.211691   0.541224   0.391  0.69588    
## x4.t          0.114905   0.045779   2.510  0.01241 *  
## x4.1          0.031179   0.046125   0.676  0.49939    
## x4.2          0.020768   0.047269   0.439  0.66061    
## x4.3         -0.008694   0.047953  -0.181  0.85622    
## x4.4          0.015852   0.049322   0.321  0.74805    
## x4.5          0.039415   0.049634   0.794  0.42754    
## x4.6         -0.006331   0.049639  -0.128  0.89857    
## x4.7          0.041482   0.048317   0.859  0.39104    
## x4.8          0.018642   0.047517   0.392  0.69500    
## x4.9          0.041326   0.046109   0.896  0.37057    
## x4.10        -0.010018   0.044552  -0.225  0.82219    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.455 on 464 degrees of freedom
## Multiple R-squared:  0.6141, Adjusted R-squared:  0.5867 
## F-statistic: 22.38 on 33 and 464 DF,  p-value: < 2.2e-16
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 3305.418 3452.789

The finite distributed lag model with q=10 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 58.6% (0.5867) of variability in mortality rates. The p-value of the finite distributed lag model is 2.2e-16 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm$model)
## W = 0.97277, p-value = 5.358e-08

Figure 18

## 
##  Breusch-Godfrey test for serial correlation of order up to 37
## 
## data:  Residuals
## LM test = 208.71, df = 37, p-value < 2.2e-16

The residuals in Figure 18 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of <2.2e-16 (<0.05). The histogram of standardised residuals contain slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 5.358e-08 (<0.05) which rejects the null hypothesis of normality.

##     x1.t     x1.1     x1.2     x1.3     x1.4     x1.5     x1.6     x1.7 
## 4.591187 4.837413 5.133621 5.235821 5.422560 5.432629 5.341759 5.147467 
##     x1.8     x1.9    x1.10     x3.t     x3.1     x3.2     x3.3     x3.4 
## 5.118225 4.757270 4.690602 4.038018 4.045829 4.295368 4.293476 4.321524 
##     x3.5     x3.6     x3.7     x3.8     x3.9    x3.10     x4.t     x4.1 
## 4.338476 4.334681 4.210682 4.096153 3.939884 3.864821 5.758777 5.836553 
##     x4.2     x4.3     x4.4     x4.5     x4.6     x4.7     x4.8     x4.9 
## 6.150734 6.300165 6.664955 6.748520 6.742381 6.391789 6.182337 5.816225 
##    x4.10 
## 5.448359

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

In conclusion, the finite distributed lag model with lag length of 10 is not successful in capturing the autocorrelation and seasonality in the series, nor is it a good performing model due to its lack of significant terms at 5% significance level and explainability.

Fitting Polynomial Distributed Lag Model

As finite distributed lag models suffer from multicollinearity in general due to the inclusion of of the lags of the same model. To reduce this multicollinearity, a polynomial shape is imposed on the lag distribution (Judge and Griffiths, 2000). From the Correlation Coefficient table above, Mortality has the highest correlation with Chemical 1. Hence, Chemical 1 will be used to fit the polynomial distributed lag model.

## q = 1 AIC = 3554.683 BIC = 3571.597 MASE = 1.171233 
## q = 2 AIC = 3505.901 BIC = 3527.034 MASE = 1.122964 
## q = 3 AIC = 3482.544 BIC = 3503.667 MASE = 1.108563 
## q = 4 AIC = 3419.705 BIC = 3440.817 MASE = 1.046494 
## q = 5 AIC = 3401.12 BIC = 3422.223 MASE = 1.030782 
## q = 6 AIC = 3363.07 BIC = 3384.163 MASE = 0.9961896 
## q = 7 AIC = 3327.74 BIC = 3348.823 MASE = 0.9528036 
## q = 8 AIC = 3310.924 BIC = 3331.997 MASE = 0.9442986 
## q = 9 AIC = 3300.181 BIC = 3321.244 MASE = 0.9355695 
## q = 10 AIC = 3293.549 BIC = 3314.602 MASE = 0.9339622 
## q = 11 AIC = 3287.592 BIC = 3308.634 MASE = 0.9376734 
## q = 12 AIC = 3280.596 BIC = 3301.629 MASE = 0.9396848

From the results above, a lag length of 10 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=10 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0     0.200     0.0841    2.38 1.78e-02
## beta.1     0.196     0.0432    4.54 7.17e-06
## beta.2     0.196     0.0229    8.54 1.76e-16
## beta.3     0.199     0.0318    6.24 9.25e-10
## beta.4     0.205     0.0434    4.73 3.00e-06
## beta.5     0.215     0.0477    4.51 8.19e-06
## beta.6     0.228     0.0434    5.25 2.27e-07
## beta.7     0.244     0.0318    7.67 9.10e-14
## beta.8     0.264     0.0229   11.50 2.57e-27
## beta.9     0.287     0.0432    6.64 8.20e-11
## beta.10    0.314     0.0841    3.73 2.14e-04

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.519  -4.292  -0.555   3.686  32.589 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 68.444286   0.855229  80.030   <2e-16 ***
## z.t0         0.199889   0.084077   2.377   0.0178 *  
## z.t1        -0.005372   0.049603  -0.108   0.9138    
## z.t2         0.001675   0.004911   0.341   0.7332    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.566 on 494 degrees of freedom
## Multiple R-squared:  0.575,  Adjusted R-squared:  0.5724 
## F-statistic: 222.8 on 3 and 494 DF,  p-value: < 2.2e-16

The finite distributed lag model with q=10 and k=2 has almost no significant term at 5% level of significance. The adjusted R-squared score only explains 57.2% (0.5724) of variability in mortality rates. The p-value of the finite distributed lag model is 2.2e-16 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_k2$model)
## W = 0.97445, p-value = 1.225e-07

Figure 19

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 168.02, df = 10, p-value < 2.2e-16

The residuals in Figure 19 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of <2.2e-16 (<0.05) which confirms the presence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 9.668e-08 (<0.05) which rejects the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value  P(>|t|)
## beta.0     0.174    0.03600    4.83 1.83e-06
## beta.1     0.185    0.02930    6.32 5.81e-10
## beta.2     0.197    0.02280    8.64 8.06e-17
## beta.3     0.208    0.01660   12.60 1.63e-31
## beta.4     0.219    0.01130   19.30 2.62e-62
## beta.5     0.231    0.00894   25.80 2.58e-93
## beta.6     0.242    0.01130   21.40 5.63e-72
## beta.7     0.254    0.01660   15.30 1.81e-43
## beta.8     0.265    0.02280   11.60 8.15e-28
## beta.9     0.276    0.02930    9.43 1.66e-19
## beta.10    0.288    0.03600    7.99 9.84e-15

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.514  -4.329  -0.515   3.708  32.530 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 68.512699   0.830623   82.48  < 2e-16 ***
## z.t0         0.173989   0.036026    4.83 1.83e-06 ***
## z.t1         0.011375   0.006979    1.63    0.104    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.56 on 495 degrees of freedom
## Multiple R-squared:  0.5749, Adjusted R-squared:  0.5732 
## F-statistic: 334.7 on 2 and 495 DF,  p-value: < 2.2e-16

The finite distributed lag model with q=10 and k=1 has one significant term at 5% level of significance. The adjusted R-squared score only explains 57.3% (0.5732) of variability in mortality rates. The p-value of the finite distributed lag model is 2.2e-16 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_k1$model)
## W = 0.97448, p-value = 1.247e-07

Figure 20

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 167.95, df = 10, p-value < 2.2e-16

The residuals in Figure 20 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of <2.2e-16 (<0.05) which confirms the presence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 9.666e-08 (<0.05) which rejects the null hypothesis of normality.

In conclusion, the polynomial distributed lag model with lag length of 12 is not successful in capturing the autocorrelation and seasonality in the series, nor is it an excellent performing model due to its lack of explainability.

Fitting Koyck Distributed Lag Model

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -20.19302  -4.18513  -0.06397   3.61514  23.08021 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.50952    2.55334   9.599  < 2e-16 ***
## Y.1          0.66674    0.03637  18.331  < 2e-16 ***
## X.t          0.63624    0.15311   4.155 3.81e-05 ***
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   1 504   173.382  <2e-16 ***
## Wu-Hausman         1 503     0.372   0.542    
## Sargan             0  NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.891 on 504 degrees of freedom
## Multiple R-Squared: 0.6544,  Adjusted R-squared: 0.653 
## Wald test: 443.5 on 2 and 504 DF,  p-value: < 2.2e-16

The Koyck model with q=12 has all of its scores significant at 5% level of significance. The adjusted R-squared score explains 65.3% (0.653) of variability in mortality rates. The p-value of the finite distributed lag model is 2.2e-16 (<0.05) hence, it is statistically significant. The “Weak Instruments” test resulted in a p-value of 2e-16 which means that the model at the first stage of least squares fitting is significant at 5% level of significance. The “Wu-Hausman” test resulted in a p-value of 0.542 which means that there is significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_dlm$model)
## W = 0.99486, p-value = 0.08876

Figure 21

The residuals in Figure 13 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.08876 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.932188 1.932188

The Variance Inflation Factor (VIF) values suggest that the estimates from the Koyck model does not suffer from multicollinearity as all VIF values are less than 10.

In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.

Fitting Autoregressive Distributed Lag Model

Fitting Autoregressive Distributed Lag Models with three predictors

for (i in 1:5){
  for(j in 1:5){
    ardlm_tempchem = ardlDlm(formula = y~x1+x2+x3,
                             data = data.frame(mort_import),
                             p = i, 
                             q = j)
    cat("p =", i,
        "q =", j,
        "AIC =", AIC(ardlm_tempchem$model), 
        "BIC =", BIC(ardlm_tempchem$model), 
        "MASE =", MASE(ardlm_tempchem)$MASE, "\n")
  }
}

## p = 1 q = 1 AIC = 3172.828 BIC = 3210.884 MASE = 0.7992248 
## p = 1 q = 2 AIC = 3096.521 BIC = 3138.787 MASE = 0.7421265 
## p = 1 q = 3 AIC = 3092.607 BIC = 3139.078 MASE = 0.7415002 
## p = 1 q = 4 AIC = 3089.171 BIC = 3139.842 MASE = 0.7418239 
## p = 1 q = 5 AIC = 3085.657 BIC = 3140.525 MASE = 0.7404108 
## p = 2 q = 1 AIC = 3152.829 BIC = 3203.548 MASE = 0.7825795 
## p = 2 q = 2 AIC = 3094.987 BIC = 3149.931 MASE = 0.7365205 
## p = 2 q = 3 AIC = 3091.072 BIC = 3150.215 MASE = 0.7361552 
## p = 2 q = 4 AIC = 3087.887 BIC = 3151.226 MASE = 0.736728 
## p = 2 q = 5 AIC = 3084.559 BIC = 3152.088 MASE = 0.7357571 
## p = 3 q = 1 AIC = 3151.333 BIC = 3214.701 MASE = 0.7820418 
## p = 3 q = 2 AIC = 3091.897 BIC = 3159.49 MASE = 0.7345853 
## p = 3 q = 3 AIC = 3093.835 BIC = 3165.653 MASE = 0.7343553 
## p = 3 q = 4 AIC = 3090.528 BIC = 3166.534 MASE = 0.7349866 
## p = 3 q = 5 AIC = 3086.97 BIC = 3167.161 MASE = 0.7336521 
## p = 4 q = 1 AIC = 3141.082 BIC = 3217.089 MASE = 0.7737713 
## p = 4 q = 2 AIC = 3085.842 BIC = 3166.071 MASE = 0.7266196 
## p = 4 q = 3 AIC = 3087.804 BIC = 3172.256 MASE = 0.726738 
## p = 4 q = 4 AIC = 3089.783 BIC = 3178.457 MASE = 0.7265216 
## p = 4 q = 5 AIC = 3085.988 BIC = 3178.841 MASE = 0.7246658 
## p = 5 q = 1 AIC = 3137.901 BIC = 3226.534 MASE = 0.7693759 
## p = 5 q = 2 AIC = 3083.264 BIC = 3176.117 MASE = 0.7240193 
## p = 5 q = 3 AIC = 3085.252 BIC = 3182.326 MASE = 0.7239573 
## p = 5 q = 4 AIC = 3087.202 BIC = 3188.496 MASE = 0.7237336 
## p = 5 q = 5 AIC = 3088.542 BIC = 3194.057 MASE = 0.7222653

From the autoregressive distributed lag model above, it can be observed that the model with the lowest AIC value is p=2, q=5, the lowest BIC value is p=1, q=5, and the lowest MASE is p=5, q=5. However, as we are after the lowest MASE values, the models that will be selected are ARDL(5,3), ARDL(5,4), and ARDL(5,5)

ardl_5.3 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts), 
                   y = as.vector(mort_mort_ts),
                   p = 5, 
                   q = 3)
summary(ardl_5.3)

## 
## Time series regression with "ts" data:
## Start = 6, End = 508
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7559  -3.6759  -0.3462   3.3930  25.9639 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.866385   4.836345   1.213 0.225719    
## X.t          0.118708   0.028294   4.195 3.23e-05 ***
## X.1         -0.140714   0.028959  -4.859 1.59e-06 ***
## X.2         -0.040958   0.029878  -1.371 0.171050    
## X.3         -0.013007   0.029501  -0.441 0.659490    
## X.4          0.110387   0.028395   3.888 0.000115 ***
## X.5          0.008874   0.028308   0.313 0.754034    
## Y.1          0.464299   0.045176  10.278  < 2e-16 ***
## Y.2          0.397573   0.046117   8.621  < 2e-16 ***
## Y.3          0.030109   0.045199   0.666 0.505633    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.453 on 493 degrees of freedom
## Multiple R-squared:  0.7078, Adjusted R-squared:  0.7024 
## F-statistic: 132.7 on 9 and 493 DF,  p-value: < 2.2e-16

The autoregressive distributed lag model ardl_5.3, has p=5 and q=3. The x.2, x.3, x.5, and y.3 are not significant at 5% level of significance. The adjusted R-squared model explains 70.24% (0.7024) of the variability in mortality rates. The p value of ardl_5.3 is 2.2e-16 (<0.05) hence, it is significant.

Figure 22

## 
##  Breusch-Godfrey test for serial correlation of order up to 13
## 
## data:  Residuals
## LM test = 13.995, df = 13, p-value = 0.3742

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.3$model)
## W = 0.99008, p-value = 0.001811

The residuals in Figure 22 above presents a few observations. The time series plot shows no non-random pattern. The ACF plot suggests that there are no serial correlation remaining in the residuals. The histogram of standardized residuals contain no violations in normality assumptions. Lastly, the Shapiro-Wilk normality test has a value of 0.3742 (<0.05) so we reject the null hypothesis of normality.

VIF.ardlm_low_aic_tempchem = vif(ardl_5.3$model)
VIF.ardlm_low_aic_tempchem

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.382499  1.447856  1.541086  1.504983  1.394532  1.385800  3.446049  3.597371 
## L(y.t, 3) 
##  3.450816

The Variance Inflation Factor (VIF) values suggest that the estimates from the autoregressive lag model with three predictors does not suffer from multicollinearity as all VIF values are less than 10.

ardl_5.4 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts),
                   y = as.vector(mort_mort_ts), 
                   p = 5, 
                   q = 4)
summary(ardl_5.4)

## 
## Time series regression with "ts" data:
## Start = 6, End = 508
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7551  -3.6756  -0.3431   3.3950  25.9633 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.8727443  4.9631542   1.183 0.237274    
## X.t          0.1187023  0.0283409   4.188 3.33e-05 ***
## X.1         -0.1407106  0.0289942  -4.853 1.63e-06 ***
## X.2         -0.0409772  0.0300940  -1.362 0.173935    
## X.3         -0.0130300  0.0298044  -0.437 0.662169    
## X.4          0.1104239  0.0291236   3.792 0.000168 ***
## X.5          0.0088496  0.0286569   0.309 0.757596    
## Y.1          0.4643057  0.0452373  10.264  < 2e-16 ***
## Y.2          0.3976746  0.0493715   8.055 6.06e-15 ***
## Y.3          0.0302231  0.0493179   0.613 0.540277    
## Y.4         -0.0002624  0.0451051  -0.006 0.995361    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.459 on 492 degrees of freedom
## Multiple R-squared:  0.7078, Adjusted R-squared:  0.7018 
## F-statistic: 119.2 on 10 and 492 DF,  p-value: < 2.2e-16

The autoregressive distributed lag model ardl_5.4, has p=5 and q=3. The x.2, x.3, x.5, y.3, and y.4 are not significant at 5% level of significance. The adjusted R-squared model explains 70.18% (0.7018) of the variability in mortality rates. The p value of ardl_5.4 is 2.2e-16 (<0.05) hence, it is significant.

Figure 23

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 26.395, df = 14, p-value = 0.02305

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.4$model)
## W = 0.99008, p-value = 0.00181

The residuals in Figure 23 above presents a few observations. The time series plot shows no non-random pattern. The ACF plot suggests that there are no serial correlation remaining in the residuals. The histogram of standardized residuals contain violations in normality assumptions. Lastly, the Shapiro-Wilk normality test has a value of 0.02305 (<0.05) so we reject the null hypothesis of normality.

VIF.ardlm_low_bic_tempchem = vif(ardl_5.4$model)
VIF.ardlm_low_bic_tempchem

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.384251  1.448399  1.560279  1.532967  1.464017  1.417321  3.448419  4.114714 
## L(y.t, 3) L(y.t, 4) 
##  4.100045  3.436769

ardl_5.5 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts),
                   y = as.vector(mort_mort_ts), 
                   p = 5, 
                   q = 5)
summary(ardl_5.5)

## 
## Time series regression with "ts" data:
## Start = 6, End = 508
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.8263  -3.6339  -0.3603   3.3448  26.0775 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.387355   5.080508   1.060 0.289485    
## X.t          0.119648   0.028440   4.207 3.08e-05 ***
## X.1         -0.140247   0.029035  -4.830 1.83e-06 ***
## X.2         -0.041181   0.030122  -1.367 0.172205    
## X.3         -0.011349   0.030057  -0.378 0.705898    
## X.4          0.112534   0.029514   3.813 0.000155 ***
## X.5          0.006187   0.029271   0.211 0.832675    
## Y.1          0.464476   0.045275  10.259  < 2e-16 ***
## Y.2          0.397698   0.049411   8.049 6.35e-15 ***
## Y.3          0.021894   0.052645   0.416 0.677678    
## Y.4         -0.009262   0.049287  -0.188 0.851024    
## Y.5          0.020354   0.044751   0.455 0.649432    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.463 on 491 degrees of freedom
## Multiple R-squared:  0.7079, Adjusted R-squared:  0.7014 
## F-statistic: 108.2 on 11 and 491 DF,  p-value: < 2.2e-16

The autoregressive distributed lag model ardl_5.5, has p=5 and q=5. The x.t, x.1, y.2, y.3, y.4, and y.5 are not significant at 5% level of significance. The adjusted R-squared model explains 70.14% (0.7014) of the variability in mortality rates. The p value of ardl_5.5 is 2.2e-16 (<0.05) hence, it is significant.

Figure 24

## 
##  Breusch-Godfrey test for serial correlation of order up to 15
## 
## data:  Residuals
## LM test = 38.522, df = 15, p-value = 0.000755

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.5$model)
## W = 0.98967, p-value = 0.00132

The residuals in Figure 24 above presents a few observations. The time series plot shows no non-random pattern. The ACF plot suggests that there are no serial correlation remaining in the residuals. The histogram of standardized residuals contain violations in normality assumptions. Lastly, the Shapiro-Wilk normality test has a value of 0.000755 (<0.05) so we reject the null hypothesis of normality.

VIF.ardlm_low_mase_tempchem = vif(ardl_5.5$model)
VIF.ardlm_low_mase_tempchem

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.391691  1.450184  1.560623  1.556501  1.501095  1.476349  3.448655  4.114719 
## L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  4.664404  4.097032  3.376778

In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.

Time Series Regression Models Table Summary

Accuracy Score Table
Model	AIC	BIC	MASE
ARDL(5,5)	3149.559	3204.427	0.7909231
ARDL(5,3)	3145.771	3192.198	0.7925531
ARDL(5,4)	3147.771	3198.418	0.7925661
Koyck	3241.979	3258.893	0.8708743
Finite DLM	3305.418	3452.789	0.8873134
Poly DLM	3291.667	3308.509	0.9350304

The table above depicts the accuracy measures (AIC, BIC, and MASE) of the fitted models thus far. It can be seen that ARDL(5,5) has the lowest MASE value.

Fitting Exponential Smoothing Models and State-Space Models

Was unable to fit due to high frequency of the data.

Forecasting

In conclusion, based on all the previous observations made - the model that should be used to forecast the Mortality rate 4 weeks ahead should be the ARDL(5,5) model based on the results of its residual analysis and its low MASE value.

Task 1 Conclusion

To conclude this exploration, the frequency was too high so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the ARDL(5,5) was left as the best model to use for forecasting.

Task 2

Objectives

Task 2 : The modelling and forecasting of First Flowering Day (FFD) with single climate predictors (univariate models).

Methodology

First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, three methods will be utilized to find the best model for forecasting 4 years ahead. The methods that is aimed to be used would be Time Series Regression Models, Exponential Smoothing Methods, and State-space Models. The models from all three methods will be compared based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.

Data Exploration and Visualization

## Rows: 31 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): Year, Temperature, Rainfall, Radiation, RelHumidity, FFD
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Figure 25

Figure 25 above depicts the time series plots for the First Flowering Factors over time.

Temperature Time Series Analysis

Figure 26

Figure 26 shows the changes of Temperature over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA). The graph has an intervention point right after 2005 where there was an immense drop in temperature.

Temperature ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ffd_temp_ts
## Dickey-Fuller = -1.6816, Lag order = 3, p-value = 0.6953
## alternative hypothesis: stationary

Figure 27

Figure 27 shows the ACF plot for the Temperature series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.6953 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the temperature time series is not a stationary time series that contains seasonality.

Rainfall Time Series Analysis

Figure 28

Figure 28 shows the changes of Rainfall amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1995 where there was potentially an immense drought.

Rainfall ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ffd_rain_ts
## Dickey-Fuller = -2.3024, Lag order = 3, p-value = 0.4563
## alternative hypothesis: stationary

Figure 29

Figure 29 shows the ACF plot for the Rainfall series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.4563 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the Rainfall time series is not a stationary time series that contains seasonality.

Radiation Time Series Analysis

Figure 30

Figure 30 shows the changes of Rainfall amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1990 where there was a drop in radiation amount.

Radiation ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ffd_rad_ts
## Dickey-Fuller = -2.6949, Lag order = 3, p-value = 0.3052
## alternative hypothesis: stationary

Figure 31

Figure 31 shows the ACF plot for the Radiation series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.3052 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the Radiation time series is not a stationary time series that contains seasonality.

Relative Humidity Time Series Analysis

Figure 32

Figure 32 shows the changes of Relative Humidity amount over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has no distinct intervention points.

Relative Humidity ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ffd_humid_ts
## Dickey-Fuller = -2.7992, Lag order = 3, p-value = 0.2651
## alternative hypothesis: stationary

Figure 33

Figure 33 shows the ACF plot for the Relative Humidity series. The ACF plot suggests an existence of a trend and strong seasonal patterns. The ADF test resulted a p-value of 0.2651 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the solar radiation time series is not a stationary time series that contains seasonality.

First Flowering Day Time Series Analysis

Figure 34

Figure 34 shows the changes of First Flowering Day over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point in 2000 where there was a drop in First Flowering Day.

First Flowering Day ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ffd_ffd_ts
## Dickey-Fuller = -2.3723, Lag order = 3, p-value = 0.4294
## alternative hypothesis: stationary

Figure 35

Figure 35 shows the ACF plot for the FFD series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.4294 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the FFD time series is not a stationary time series that contains seasonality.

Time Series Regression Model Fitting

Correlation Coefficient

##                   Year  Temperature   Rainfall   Radiation  RelHumidity
## Year         1.0000000  0.148410676 -0.1752091  0.11881829  0.206355767
## Temperature  0.1484107  1.000000000  0.3933255 -0.24096625  0.009646021
## Rainfall    -0.1752091  0.393325545  1.0000000 -0.58131610  0.338461007
## Radiation    0.1188183 -0.240966245 -0.5813161  1.00000000 -0.055209652
## RelHumidity  0.2063558  0.009646021  0.3384610 -0.05520965  1.000000000
## FFD         -0.2329975 -0.247933708  0.0506911  0.04677758 -0.128502440
##                     FFD
## Year        -0.23299747
## Temperature -0.24793371
## Rainfall     0.05069110
## Radiation    0.04677758
## RelHumidity -0.12850244
## FFD          1.00000000

From the results of the Correlation Coefficient above, the highest correlation that can be observed is between Temperature and Rainfall at 0.3933255. However, due to the low correlation between most of the variables, each variable will be used as independent variables with FFD being the dependent variable.

Fitting Finite Distributed Lag Model with Intercepts

Finite DLM Temperature

## q = 1 AIC = 279.6699 BIC = 285.2747 MASE = 0.6644773 
## q = 2 AIC = 272.7512 BIC = 279.5877 MASE = 0.6788097 
## q = 3 AIC = 264.2921 BIC = 272.2853 MASE = 0.643964 
## q = 4 AIC = 258.0491 BIC = 267.12 MASE = 0.6406911 
## q = 5 AIC = 247.1156 BIC = 257.1804 MASE = 0.5959619 
## q = 6 AIC = 238.7187 BIC = 249.6886 MASE = 0.5794878 
## q = 7 AIC = 231.1518 BIC = 242.9323 MASE = 0.5525488 
## q = 8 AIC = 224.2942 BIC = 236.7846 MASE = 0.5377635 
## q = 9 AIC = 218.0096 BIC = 231.1021 MASE = 0.5596056 
## q = 10 AIC = 211.9345 BIC = 225.5133 MASE = 0.5755493 
## q = 11 AIC = 197.6879 BIC = 211.6282 MASE = 0.4764284 
## q = 12 AIC = 181.9935 BIC = 196.16 MASE = 0.4122077

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##       1       2       3       4       5       6       7       8       9      10 
##  -6.304  20.608   2.689 -10.388 -21.290   9.263   2.032   6.528 -20.530  10.231 
##      11      12      13      14      15      16      17      18      19 
##  -9.392   9.545   1.053 -17.292   7.820  22.525  16.917 -11.337 -12.679 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  216.585    753.624   0.287    0.785
## x2.t         -12.329     19.031  -0.648    0.546
## x2.1          -2.080     21.874  -0.095    0.928
## x2.2           4.420     26.151   0.169    0.872
## x2.3          13.786     22.208   0.621    0.562
## x2.4          10.166     23.340   0.436    0.681
## x2.5          17.897     25.140   0.712    0.508
## x2.6         -16.736     25.926  -0.646    0.547
## x2.7           7.064     24.921   0.283    0.788
## x2.8          13.676     22.558   0.606    0.571
## x2.9         -23.511     22.244  -1.057    0.339
## x2.10         18.381     22.496   0.817    0.451
## x2.11          1.459     20.683   0.071    0.946
## x2.12        -33.028     20.068  -1.646    0.161
## 
## Residual standard error: 25.75 on 5 degrees of freedom
## Multiple R-squared:  0.6551, Adjusted R-squared:  -0.2415 
## F-statistic: 0.7306 on 13 and 5 DF,  p-value: 0.7019
## 
## AIC and BIC values for the model:
##        AIC    BIC
## 1 181.9935 196.16

The finite distributed lag model with q=12 has no significant term at 5% significance level. The adjusted R-squared score explains -24.15% (-0.2415) of variability in FFD values. The p-value of the finite distributed lag model is 0.7019 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_temp$model)
## W = 0.95288, p-value = 0.4417

The residuals in Figure 36 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.4417 (>0.05) which fails to rejects the null hypothesis of normality.

##     x2.t     x2.1     x2.2     x2.3     x2.4     x2.5     x2.6     x2.7 
## 1.682976 1.813251 2.088240 1.634591 1.721548 1.946444 2.381222 2.070046 
##     x2.8     x2.9    x2.10    x2.11    x2.12 
## 2.048046 1.625311 1.667104 1.418715 1.300454

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Rainfall

## q = 1 AIC = 281.9769 BIC = 287.5817 MASE = 0.6758928 
## q = 2 AIC = 273.8231 BIC = 280.6596 MASE = 0.6752737 
## q = 3 AIC = 265.1863 BIC = 273.1795 MASE = 0.6326564 
## q = 4 AIC = 257.086 BIC = 266.1568 MASE = 0.6309338 
## q = 5 AIC = 246.1244 BIC = 256.1891 MASE = 0.5808908 
## q = 6 AIC = 238.0915 BIC = 249.0614 MASE = 0.5522086 
## q = 7 AIC = 231.4148 BIC = 243.1954 MASE = 0.5447815 
## q = 8 AIC = 224.8048 BIC = 237.2953 MASE = 0.5553913 
## q = 9 AIC = 216.4839 BIC = 229.5764 MASE = 0.5521884 
## q = 10 AIC = 197.7206 BIC = 211.2994 MASE = 0.391709 
## q = 11 AIC = 185.1533 BIC = 199.0936 MASE = 0.3819204 
## q = 12 AIC = 178.947 BIC = 193.1136 MASE = 0.3886174

From the results above, a lag length of 11 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.895 -11.127   4.817   9.066  17.143 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 291.9218    89.8605   3.249   0.0141 *
## x3.t        -12.4000    13.3370  -0.930   0.3834  
## x3.1          7.3366    13.2496   0.554   0.5970  
## x3.2        -12.2796    12.2579  -1.002   0.3498  
## x3.3         26.6428    12.8234   2.078   0.0763 .
## x3.4        -31.5044    13.3419  -2.361   0.0502 .
## x3.5         32.9844    14.1965   2.323   0.0531 .
## x3.6         -4.9389    13.3441  -0.370   0.7222  
## x3.7         22.7457    14.0886   1.614   0.1505  
## x3.8        -18.4183    14.8639  -1.239   0.2552  
## x3.9         -9.6326    14.5036  -0.664   0.5279  
## x3.10       -36.3197    14.9434  -2.430   0.0454 *
## x3.11        -0.3935    15.6147  -0.025   0.9806  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.8 on 7 degrees of freedom
## Multiple R-squared:  0.6872, Adjusted R-squared:  0.151 
## F-statistic: 1.282 on 12 and 7 DF,  p-value: 0.384
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 185.1533 199.0936

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 15.1% (0.151) of variability in FFD values. The p-value of the finite distributed lag model is 0.384 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rain$model)
## W = 0.93106, p-value = 0.1619

The residuals in Figure 37 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are significant. The histogram of standardised residuals contains violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.1619 (>0.05) which fails to rejects the null hypothesis of normality.

##     x3.t     x3.1     x3.2     x3.3     x3.4     x3.5     x3.6     x3.7 
## 1.327923 1.349384 1.253635 1.430914 1.454200 1.539660 1.413921 1.426862 
##     x3.8     x3.9    x3.10    x3.11 
## 1.575726 1.333137 1.413099 1.518843

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Radiation

## q = 1 AIC = 281.571 BIC = 287.1758 MASE = 0.6579604 
## q = 2 AIC = 273.2458 BIC = 280.0823 MASE = 0.6672639 
## q = 3 AIC = 264.666 BIC = 272.6592 MASE = 0.6164754 
## q = 4 AIC = 257.9491 BIC = 267.02 MASE = 0.5968786 
## q = 5 AIC = 249.3748 BIC = 259.4396 MASE = 0.5870893 
## q = 6 AIC = 235.4683 BIC = 246.4382 MASE = 0.5298748 
## q = 7 AIC = 225.705 BIC = 237.4856 MASE = 0.479178 
## q = 8 AIC = 219.9335 BIC = 232.424 MASE = 0.4839788 
## q = 9 AIC = 211.6427 BIC = 224.7353 MASE = 0.4680972 
## q = 10 AIC = 204.3142 BIC = 217.893 MASE = 0.4716663 
## q = 11 AIC = 177.2058 BIC = 191.146 MASE = 0.2874734 
## q = 12 AIC = 165.1432 BIC = 179.3097 MASE = 0.2509081

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##        1        2        3        4        5        6        7        8 
##   2.4332  14.1989   5.1577  -5.7822 -11.2758  -5.1128  -1.9693  10.9014 
##        9       10       11       12       13       14       15       16 
##   5.9282  -2.1160 -18.1646  -6.9480   8.3747  14.1349   4.6945  -0.5910 
##       17       18       19 
##  -5.5146  -9.0023   0.6531 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   642.28     573.74   1.119   0.3138  
## x4.t          -22.88      26.12  -0.876   0.4212  
## x4.1          -26.23      31.50  -0.833   0.4429  
## x4.2           23.74      25.82   0.920   0.3999  
## x4.3           47.17      29.36   1.606   0.1691  
## x4.4          -22.93      15.21  -1.508   0.1920  
## x4.5           17.72      18.55   0.955   0.3835  
## x4.6          -72.21      21.69  -3.329   0.0208 *
## x4.7           34.69      14.48   2.395   0.0620 .
## x4.8           14.33      15.15   0.946   0.3875  
## x4.9           17.57      20.27   0.867   0.4258  
## x4.10          26.11      21.55   1.212   0.2798  
## x4.11         -55.47      21.49  -2.581   0.0494 *
## x4.12         -11.82      25.53  -0.463   0.6629  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.53 on 5 degrees of freedom
## Multiple R-squared:  0.8579, Adjusted R-squared:  0.4886 
## F-statistic: 2.323 on 13 and 5 DF,  p-value: 0.1806
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 165.1432 179.3097

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 48.86% (0.4886) of variability in FFD values. The p-value of the finite distributed lag model is 0.1806 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rad$model)
## W = 0.97957, p-value = 0.9373

The residuals in Figure 38 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9373 (>0.05) which fails to rejects the null hypothesis of normality.

##      x4.t      x4.1      x4.2      x4.3      x4.4      x4.5      x4.6      x4.7 
##  6.358009 11.309715  7.520321 10.003178  3.642591  5.340914  6.829794  3.039277 
##      x4.8      x4.9     x4.10     x4.11     x4.12 
##  3.027825  4.322888  4.306572  4.305906  5.502498

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Relative Humidity

## q = 1 AIC = 281.6547 BIC = 287.2595 MASE = 0.6748273 
## q = 2 AIC = 273.1762 BIC = 280.0127 MASE = 0.6669837 
## q = 3 AIC = 260.6544 BIC = 268.6476 MASE = 0.6124744 
## q = 4 AIC = 254.6378 BIC = 263.7087 MASE = 0.6133102 
## q = 5 AIC = 248.6573 BIC = 258.722 MASE = 0.624735 
## q = 6 AIC = 240.4575 BIC = 251.4274 MASE = 0.590676 
## q = 7 AIC = 234.5121 BIC = 246.2927 MASE = 0.5919628 
## q = 8 AIC = 227.6822 BIC = 240.1727 MASE = 0.5909197 
## q = 9 AIC = 219.2216 BIC = 232.3141 MASE = 0.6170883 
## q = 10 AIC = 210.9411 BIC = 224.5199 MASE = 0.5542451 
## q = 11 AIC = 196.8558 BIC = 210.7961 MASE = 0.497145 
## q = 12 AIC = 171.2149 BIC = 185.3814 MASE = 0.266223

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##        1        2        3        4        5        6        7        8 
##   6.4704  -7.5114   0.9400  -7.5200  -3.0373  -1.3563   0.3787  23.8934 
##        9       10       11       12       13       14       15       16 
##  -6.8482   6.1290  -7.7876   0.8417  -0.8325  13.5499  -3.7909   1.5788 
##       17       18       19 
##  16.7523 -14.0342 -17.8158 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 2226.2646  2797.5003   0.796   0.4622  
## x5.t         -10.0485    13.4394  -0.748   0.4883  
## x5.1         -42.4313    12.6950  -3.342   0.0205 *
## x5.2           3.8226     8.2746   0.462   0.6635  
## x5.3          -5.6566     8.3337  -0.679   0.5274  
## x5.4          -4.1856     8.2741  -0.506   0.6345  
## x5.5          -9.4392     8.3348  -1.133   0.3088  
## x5.6           8.3532     8.0866   1.033   0.3490  
## x5.7           1.4311     7.6097   0.188   0.8582  
## x5.8          -0.6562     7.8694  -0.083   0.9368  
## x5.9           1.3827     7.6967   0.180   0.8645  
## x5.10         -8.8407     7.0851  -1.248   0.2674  
## x5.11         20.8805     9.9947   2.089   0.0910 .
## x5.12         24.1486    10.3485   2.334   0.0669 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.39 on 5 degrees of freedom
## Multiple R-squared:  0.8044, Adjusted R-squared:  0.296 
## F-statistic: 1.582 on 13 and 5 DF,  p-value: 0.321
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 171.2149 185.3814

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 29.6% (0.296) of variability in FFD values. The p-value of the finite distributed lag model is 0.321 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_humid$model)
## W = 0.95588, p-value = 0.4943

The residuals in Figure 39 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.4943 (>0.05) which fails to rejects the null hypothesis of normality.

##     x5.t     x5.1     x5.2     x5.3     x5.4     x5.5     x5.6     x5.7 
## 4.889400 4.280613 2.057443 2.088074 1.913716 1.870104 1.829416 2.026883 
##     x5.8     x5.9    x5.10    x5.11    x5.12 
## 2.171511 2.057850 1.759417 3.480250 3.810613

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Fitting Finite Distributed Lag Model without Intercepts

Finite DLM Temperature (no Intercept)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.488 -11.978   2.279  10.274  22.195 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)
## x2.t  -11.9314    17.4692  -0.683    0.520
## x2.1    0.6082    18.1984   0.033    0.974
## x2.2    7.7678    21.5484   0.360    0.731
## x2.3   14.8910    20.1316   0.740    0.487
## x2.4   11.3869    21.1229   0.539    0.609
## x2.5   20.9052    21.0371   0.994    0.359
## x2.6  -13.5718    21.6023  -0.628    0.553
## x2.7    8.7907    22.2609   0.395    0.707
## x2.8   14.7124    20.4955   0.718    0.500
## x2.9  -22.6576    20.2895  -1.117    0.307
## x2.10  20.4306    19.6362   1.040    0.338
## x2.11   2.9584    18.4209   0.161    0.878
## x2.12 -32.1126    18.2361  -1.761    0.129
## 
## Residual standard error: 23.7 on 6 degrees of freedom
## Multiple R-squared:  0.9958, Adjusted R-squared:  0.9867 
## F-statistic: 109.5 on 13 and 6 DF,  p-value: 5.013e-06
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 180.3048 193.5269

The finite distributed lag model with q=12 has no significant term at 5% significance level. The adjusted R-squared score only explains 98.67% (0.9867) of variability in temperature values. The p-value of the finite distributed lag model is 5.013e-06 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_temp_noint$model)
## W = 0.93973, p-value = 0.2608

The residuals in Figure 40 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.2608 (>0.05) which fails to rejects the null hypothesis of normality.

##      x2.t      x2.1      x2.2      x2.3      x2.4      x2.5      x2.6      x2.7 
##  927.9078  994.5955 1381.9446 1208.7558 1328.6413 1316.7081 1396.2569 1489.3199 
##      x2.8      x2.9     x2.10     x2.11     x2.12 
## 1268.4649 1249.7225 1166.0083 1030.2099 1011.4169

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.

Finite DLM Rainfall (no Intercept)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -38.48 -10.09  -1.64  13.63  32.67 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)
## x3.t   16.3612    21.0059   0.779    0.466
## x3.1   -0.5328    23.0576  -0.023    0.982
## x3.2   17.7826    20.4537   0.869    0.418
## x3.3   23.8554    19.2595   1.239    0.262
## x3.4   -3.6371    22.7883  -0.160    0.878
## x3.5   30.8255    20.7933   1.482    0.189
## x3.6   15.4292    21.9817   0.702    0.509
## x3.7    7.3863    23.5528   0.314    0.764
## x3.8  -12.1466    21.1688  -0.574    0.587
## x3.9  -25.2130    26.8412  -0.939    0.384
## x3.10 -13.1541    20.6498  -0.637    0.548
## x3.11   1.2068    23.0108   0.052    0.960
## x3.12  31.8473    22.9846   1.386    0.215
## 
## Residual standard error: 29.68 on 6 degrees of freedom
## Multiple R-squared:  0.9934, Adjusted R-squared:  0.9792 
## F-statistic: 69.67 on 13 and 6 DF,  p-value: 1.915e-05
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 188.8569 202.0791

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 97.92% (0.9792) of variability in FFD values. The p-value of the finite distributed lag model is 1.915e-05 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rain_noint$model)
## W = 0.97658, p-value = 0.8957

The residuals in Figure 41 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.8957 (>0.05) which fails to rejects the null hypothesis of normality.

##     x3.t     x3.1     x3.2     x3.3     x3.4     x3.5     x3.6     x3.7 
## 50.30699 62.46667 47.94243 43.15138 60.10997 49.66828 56.38420 67.53181 
##     x3.8     x3.9    x3.10    x3.11    x3.12 
## 55.20205 90.55487 53.90969 66.49923 66.97290

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.

Finite DLM Radiation (no Intercept)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.7286  -7.2575   0.4361   6.3411  17.7339 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)  
## x4.t   -10.440     24.135  -0.433   0.6804  
## x4.1   -20.256     31.692  -0.639   0.5463  
## x4.2    26.721     26.215   1.019   0.3474  
## x4.3    35.185     27.914   1.260   0.2543  
## x4.4   -19.062     15.123  -1.260   0.2543  
## x4.5    20.210     18.802   1.075   0.3237  
## x4.6   -64.164     20.892  -3.071   0.0219 *
## x4.7    34.732     14.785   2.349   0.0571 .
## x4.8    17.398     15.207   1.144   0.2962  
## x4.9    10.527     19.676   0.535   0.6119  
## x4.10   31.167     21.510   1.449   0.1975  
## x4.11  -55.018     21.935  -2.508   0.0460 *
## x4.12    6.888     19.707   0.350   0.7386  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.87 on 6 degrees of freedom
## Multiple R-squared:  0.9979, Adjusted R-squared:  0.9933 
## F-statistic: 216.6 on 13 and 6 DF,  p-value: 6.582e-07
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 167.3926 180.6147

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 99.33% (0.9933) of variability in FFD values. The p-value of the finite distributed lag model is 6.582e-07 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rad_noint$model)
## W = 0.98551, p-value = 0.9869

The residuals in Figure 42 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9869 (>0.05) which fails to rejects the null hypothesis of normality.

##      x4.t      x4.1      x4.2      x4.3      x4.4      x4.5      x4.6      x4.7 
##  8382.247 14383.636  9847.985 11154.732  3260.759  5044.377  6207.043  3108.375 
##      x4.8      x4.9     x4.10     x4.11     x4.12 
##  3277.348  5456.692  6493.486  6754.328  5442.008

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.

Finite DLM Relative Humidity (no Intercept)

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.318  -6.743  -1.152   7.154  25.578 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)  
## x5.t    -5.846     11.975  -0.488   0.6427  
## x5.1   -39.198     11.654  -3.363   0.0152 *
## x5.2     5.165      7.849   0.658   0.5349  
## x5.3    -3.439      7.610  -0.452   0.6672  
## x5.4    -2.002      7.564  -0.265   0.8001  
## x5.5    -7.599      7.759  -0.979   0.3652  
## x5.6     8.795      7.817   1.125   0.3035  
## x5.7     3.596      6.886   0.522   0.6203  
## x5.8     1.720      7.055   0.244   0.8156  
## x5.9     4.714      6.258   0.753   0.4798  
## x5.10   -7.426      6.646  -1.117   0.3066  
## x5.11   20.133      9.642   2.088   0.0818 .
## x5.12   23.681     10.011   2.365   0.0559 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.79 on 6 degrees of freedom
## Multiple R-squared:  0.9974, Adjusted R-squared:  0.9916 
## F-statistic: 174.6 on 13 and 6 DF,  p-value: 1.253e-06
## 
## AIC and BIC values for the model:
##        AIC      BIC
## 1 171.4808 184.7029

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 99.16% (0.9916) of variability in FFD values. The p-value of the finite distributed lag model is 1.253e-06 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_humid_noint$model)
## W = 0.97026, p-value = 0.7816

The residuals in Figure 43 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.7816 (>0.05) which fails to rejects the null hypothesis of normality.

##     x5.t     x5.1     x5.2     x5.3     x5.4     x5.5     x5.6     x5.7 
## 69246.00 65603.71 29741.97 27957.85 27566.33 28917.65 29322.45 22798.57 
##     x5.8     x5.9    x5.10    x5.11    x5.12 
## 23929.63 18821.45 21218.72 44660.48 48118.90

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.

In conclusion, the finite distributed lag models with Intercepts are not good performing models due to their lack of significant terms at 5% significance level and explainability. The finite distributed lag models without Intercepts are not good performing models due to their lack of significant terms at 5% significance level.

Fitting Polynomial Distributed Lag Model

FFD and Temperature

## q = 1 AIC = 279.6699 BIC = 285.2747 MASE = 0.6644773 
## q = 2 AIC = 272.7512 BIC = 279.5877 MASE = 0.6788097 
## q = 3 AIC = 262.6923 BIC = 269.3533 MASE = 0.6467993 
## q = 4 AIC = 254.835 BIC = 261.3141 MASE = 0.6493527 
## q = 5 AIC = 243.9959 BIC = 250.2863 MASE = 0.6548472 
## q = 6 AIC = 234.0051 BIC = 240.0995 MASE = 0.6326293 
## q = 7 AIC = 225.7006 BIC = 231.5909 MASE = 0.6177101 
## q = 8 AIC = 218.7617 BIC = 224.4392 MASE = 0.6419454 
## q = 9 AIC = 208.9273 BIC = 214.3825 MASE = 0.6378389 
## q = 10 AIC = 201.1493 BIC = 206.3719 MASE = 0.6461061 
## q = 11 AIC = 187.5064 BIC = 192.485 MASE = 0.6140005 
## q = 12 AIC = 176.3678 BIC = 181.09 MASE = 0.5829889

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE (relative) values. This means that a value of q=12 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0    -8.200       7.32 -1.1200  0.3000
## beta.1    -3.690       6.43 -0.5740  0.5840
## beta.2    -0.118       6.18 -0.0191  0.9850
## beta.3     2.530       6.28  0.4020  0.6990
## beta.4     4.240       6.45  0.6570  0.5320
## beta.5     5.020       6.52  0.7700  0.4670
## beta.6     4.870       6.40  0.7610  0.4710
## beta.7     3.790       6.07  0.6240  0.5520
## beta.8     1.780       5.61  0.3170  0.7610
## beta.9    -1.160       5.22 -0.2230  0.8300
## beta.10   -5.040       5.26 -0.9570  0.3710
## beta.11   -9.840       6.12 -1.6100  0.1520
## beta.12  -15.600       7.91 -1.9700  0.0897

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -32.81 -15.40  -4.26  12.70  38.05 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 408.4707   590.1991   0.692   0.4995  
## z.t0         -8.1974     7.3248  -1.119   0.2807  
## z.t1          4.9707     2.5445   1.954   0.0697 .
## z.t2         -0.4655     0.2211  -2.105   0.0525 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.7 on 15 degrees of freedom
## Multiple R-squared:  0.2651, Adjusted R-squared:  0.1181 
## F-statistic: 1.804 on 3 and 15 DF,  p-value: 0.1897

The polynomial distributed lag model with q=12 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains 11.81% (0.1181) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.1897 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_temp_k2$model)
## W = 0.97181, p-value = 0.8119

Figure 44

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 11.463, df = 7, p-value = 0.1197

The residuals in Figure 44 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1197 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.8119 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0     -3.64       7.71  -0.472   0.651
## beta.1     -3.77       7.09  -0.532   0.611
## beta.2     -3.89       6.51  -0.598   0.569
## beta.3     -4.02       6.01  -0.669   0.525
## beta.4     -4.15       5.60  -0.741   0.483
## beta.5     -4.27       5.29  -0.808   0.446
## beta.6     -4.40       5.11  -0.861   0.418
## beta.7     -4.53       5.08  -0.891   0.402
## beta.8     -4.65       5.19  -0.897   0.400
## beta.9     -4.78       5.44  -0.880   0.408
## beta.10    -4.91       5.80  -0.846   0.425
## beta.11    -5.03       6.26  -0.804   0.448
## beta.12    -5.16       6.80  -0.759   0.473

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.468 -14.619   1.136  12.988  42.156 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 743.5714   626.3309   1.187    0.252
## z.t0         -3.6391     7.7115  -0.472    0.643
## z.t1         -0.1269     0.8619  -0.147    0.885
## 
## Residual standard error: 23.92 on 16 degrees of freedom
## Multiple R-squared:  0.04797,    Adjusted R-squared:  -0.07103 
## F-statistic: 0.4031 on 2 and 16 DF,  p-value: 0.6748

The polynomial distributed lag model with q=12 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.71% (-0.07103) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.6748 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_temp_k1$model)
## W = 0.9799, p-value = 0.9412

Figure 45

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 5.2932, df = 6, p-value = 0.5068

The residuals in Figure 45 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.5068 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9412 (>0.05) which fails to reject the null hypothesis of normality.

FFD and Rainfall

## q = 1 AIC = 281.9769 BIC = 287.5817 MASE = 0.6758928 
## q = 2 AIC = 273.8231 BIC = 280.6596 MASE = 0.6752737 
## q = 3 AIC = 263.1955 BIC = 269.8565 MASE = 0.633195 
## q = 4 AIC = 254.6528 BIC = 261.132 MASE = 0.6352745 
## q = 5 AIC = 247.2862 BIC = 253.5766 MASE = 0.6312437 
## q = 6 AIC = 238.2623 BIC = 244.3567 MASE = 0.6392046 
## q = 7 AIC = 230.1033 BIC = 235.9936 MASE = 0.6407467 
## q = 8 AIC = 221.8263 BIC = 227.5038 MASE = 0.653508 
## q = 9 AIC = 211.4687 BIC = 216.9239 MASE = 0.6498614 
## q = 10 AIC = 201.0167 BIC = 206.2393 MASE = 0.6248857 
## q = 11 AIC = 188.3055 BIC = 193.2841 MASE = 0.5934777 
## q = 12 AIC = 181.9729 BIC = 186.6951 MASE = 0.6685301

From the results above, a lag length of 11 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0    -2.950       9.66 -0.3050   0.767
## beta.1    -0.302       6.79 -0.0445   0.965
## beta.2     1.600       5.30  0.3010   0.770
## beta.3     2.750       5.04  0.5450   0.599
## beta.4     3.150       5.26  0.6000   0.564
## beta.5     2.810       5.34  0.5260   0.611
## beta.6     1.720       5.05  0.3420   0.740
## beta.7    -0.111       4.45 -0.0249   0.981
## beta.8    -2.690       4.09 -0.6590   0.526
## beta.9    -6.020       5.03 -1.2000   0.261
## beta.10  -10.100       7.64 -1.3200   0.219
## beta.11  -14.900      11.50 -1.2900   0.228

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.660 -11.415  -1.195  18.459  37.888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 264.5629    99.3182   2.664    0.017 *
## z.t0         -2.9484     9.6630  -0.305    0.764  
## z.t1          3.0195     4.2630   0.708    0.489  
## z.t2         -0.3735     0.4013  -0.931    0.366  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.34 on 16 degrees of freedom
## Multiple R-squared:  0.0993, Adjusted R-squared:  -0.06958 
## F-statistic: 0.588 on 3 and 16 DF,  p-value: 0.6317

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.69% (-0.06958) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.6317 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_rain_k2$model)
## W = 0.96682, p-value = 0.6867

Figure 46

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 8.9029, df = 7, p-value = 0.2597

The residuals in Figure 46 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.6867 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.2597 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0    3.2500       6.97  0.4670   0.652
## beta.1    2.4200       6.10  0.3970   0.700
## beta.2    1.5900       5.28  0.3010   0.770
## beta.3    0.7610       4.55  0.1670   0.871
## beta.4   -0.0695       3.94 -0.0176   0.986
## beta.5   -0.9000       3.54 -0.2540   0.805
## beta.6   -1.7300       3.41 -0.5080   0.624
## beta.7   -2.5600       3.57 -0.7170   0.491
## beta.8   -3.3900       4.00 -0.8480   0.418
## beta.9   -4.2200       4.62 -0.9140   0.385
## beta.10  -5.0500       5.36 -0.9420   0.371
## beta.11  -5.8800       6.19 -0.9510   0.367

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.512 -11.406  -4.063  18.739  44.638 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 241.9881    95.9293   2.523   0.0219 *
## z.t0          3.2531     6.9696   0.467   0.6466  
## z.t1         -0.8307     1.0226  -0.812   0.4278  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.25 on 17 degrees of freedom
## Multiple R-squared:  0.05055,    Adjusted R-squared:  -0.06115 
## F-statistic: 0.4526 on 2 and 17 DF,  p-value: 0.6434

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.61% (-0.06115) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.6434 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_rain_k1$model)
## W = 0.97334, p-value = 0.8232

Figure 47

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 5.7242, df = 6, p-value = 0.4548

The residuals in Figure 47 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.4548 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.8232 (>0.05) which fails to reject the null hypothesis of normality.

FFD and Radiation

## q = 1 AIC = 281.571 BIC = 287.1758 MASE = 0.6579604 
## q = 2 AIC = 273.2458 BIC = 280.0823 MASE = 0.6672639 
## q = 3 AIC = 263.5785 BIC = 270.2395 MASE = 0.622861 
## q = 4 AIC = 254.5356 BIC = 261.0148 MASE = 0.6027035 
## q = 5 AIC = 246.071 BIC = 252.3614 MASE = 0.6504138 
## q = 6 AIC = 239.3461 BIC = 245.4405 MASE = 0.6488056 
## q = 7 AIC = 230.74 BIC = 236.6303 MASE = 0.6443999 
## q = 8 AIC = 222.2164 BIC = 227.8938 MASE = 0.6629663 
## q = 9 AIC = 212.7091 BIC = 218.1644 MASE = 0.6697067 
## q = 10 AIC = 203.4282 BIC = 208.6509 MASE = 0.6463697 
## q = 11 AIC = 189.6827 BIC = 194.6614 MASE = 0.6384276 
## q = 12 AIC = 181.6977 BIC = 186.4199 MASE = 0.6703218

From the results above, a lag length of 11 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0    -1.130       7.98 -0.1420   0.890
## beta.1     0.198       5.29  0.0374   0.971
## beta.2     1.240       3.60  0.3430   0.739
## beta.3     1.980       3.07  0.6470   0.534
## beta.4     2.440       3.23  0.7540   0.470
## beta.5     2.600       3.43  0.7580   0.468
## beta.6     2.470       3.39  0.7290   0.485
## beta.7     2.050       3.18  0.6440   0.536
## beta.8     1.330       3.26  0.4080   0.693
## beta.9     0.325       4.31  0.0755   0.941
## beta.10   -0.975       6.46 -0.1510   0.883
## beta.11   -2.570       9.51 -0.2700   0.793

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.465 -12.733  -5.073  15.051  46.539 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  58.2976   521.6177   0.112    0.912
## z.t0         -1.1343     7.9843  -0.142    0.889
## z.t1          1.4782     3.4488   0.429    0.674
## z.t2         -0.1462     0.3196  -0.458    0.653
## 
## Residual standard error: 24.16 on 16 degrees of freedom
## Multiple R-squared:  0.03509,    Adjusted R-squared:  -0.1458 
## F-statistic: 0.1939 on 3 and 16 DF,  p-value: 0.899

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.14% (-0.1458) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.899 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_rad_k2$model)
## W = 0.98619, p-value = 0.9879

Figure 48

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 6.2658, df = 7, p-value = 0.5091

The residuals in Figure 48 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.5091 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9879 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0      1.88       4.40   0.428   0.679
## beta.1      1.82       3.84   0.472   0.648
## beta.2      1.75       3.34   0.524   0.613
## beta.3      1.69       2.93   0.576   0.579
## beta.4      1.62       2.63   0.616   0.553
## beta.5      1.56       2.50   0.622   0.549
## beta.6      1.49       2.56   0.581   0.575
## beta.7      1.43       2.81   0.508   0.624
## beta.8      1.36       3.19   0.427   0.680
## beta.9      1.29       3.66   0.354   0.732
## beta.10     1.23       4.20   0.293   0.776
## beta.11     1.16       4.78   0.244   0.813

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.881 -13.395  -3.902  16.646  43.284 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -62.69652  439.02324  -0.143    0.888
## z.t0          1.88137    4.39971   0.428    0.674
## z.t1         -0.06515    0.69997  -0.093    0.927
## 
## Residual standard error: 23.59 on 17 degrees of freedom
## Multiple R-squared:  0.02246,    Adjusted R-squared:  -0.09254 
## F-statistic: 0.1953 on 2 and 17 DF,  p-value: 0.8244

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.09% (-0.09254) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.8244 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_rad_k1$model)
## W = 0.9905, p-value = 0.9987

Figure 49

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 5.7074, df = 6, p-value = 0.4568

The residuals in Figure 49 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.4568 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9987 (>0.05) which fails to reject the null hypothesis of normality.

FFD and Humidity

## q = 1 AIC = 281.6547 BIC = 287.2595 MASE = 0.6748273 
## q = 2 AIC = 273.1762 BIC = 280.0127 MASE = 0.6669837 
## q = 3 AIC = 258.9412 BIC = 265.6022 MASE = 0.6054505 
## q = 4 AIC = 252.2204 BIC = 258.6996 MASE = 0.6198356 
## q = 5 AIC = 244.2742 BIC = 250.5647 MASE = 0.6376046 
## q = 6 AIC = 235.0038 BIC = 241.0982 MASE = 0.6305211 
## q = 7 AIC = 227.0295 BIC = 232.9198 MASE = 0.6183031 
## q = 8 AIC = 218.1006 BIC = 223.7781 MASE = 0.600106 
## q = 9 AIC = 206.7468 BIC = 212.202 MASE = 0.6166688 
## q = 10 AIC = 200.4011 BIC = 205.6237 MASE = 0.652098 
## q = 11 AIC = 187.3064 BIC = 192.2851 MASE = 0.6058448 
## q = 12 AIC = 175.329 BIC = 180.0512 MASE = 0.5564647

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE (relative) values. This means that a value of q=12 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0   -9.6000       4.57 -2.1000  0.0736
## beta.1   -7.9100       3.75 -2.1100  0.0729
## beta.2   -6.2400       3.25 -1.9200  0.0958
## beta.3   -4.6200       3.01 -1.5300  0.1690
## beta.4   -3.0300       2.96 -1.0200  0.3410
## beta.5   -1.4700       2.98 -0.4940  0.6360
## beta.6    0.0495       2.99  0.0165  0.9870
## beta.7    1.5300       2.98  0.5150  0.6230
## beta.8    2.9800       2.96  1.0100  0.3470
## beta.9    4.4000       2.99  1.4700  0.1850
## beta.10   5.7700       3.18  1.8200  0.1120
## beta.11   7.1200       3.61  1.9700  0.0896
## beta.12   8.4200       4.36  1.9300  0.0949

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.429 -12.764  -4.294  13.443  37.525 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  456.43947 2749.97230   0.166   0.8704  
## z.t0          -9.60404    4.56730  -2.103   0.0528 .
## z.t1           1.71555    1.41059   1.216   0.2427  
## z.t2          -0.01777    0.10502  -0.169   0.8679  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.12 on 15 degrees of freedom
## Multiple R-squared:  0.3042, Adjusted R-squared:  0.1651 
## F-statistic: 2.186 on 3 and 15 DF,  p-value: 0.1321

The polynomial distributed lag model with q=12 has none of its scores at 5% level of significance. The adjusted R-squared score only explains 16.51% (0.1651) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.1321 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_humid_k2$model)
## W = 0.96926, p-value = 0.7616

Figure 50

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 12.031, df = 7, p-value = 0.09954

The residuals in Figure 50 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.7616 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.09954 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0    -9.290       4.05  -2.290  0.0555
## beta.1    -7.790       3.58  -2.180  0.0657
## beta.2    -6.300       3.13  -2.010  0.0843
## beta.3    -4.800       2.73  -1.750  0.1230
## beta.4    -3.300       2.41  -1.370  0.2130
## beta.5    -1.800       2.18  -0.826  0.4360
## beta.6    -0.302       2.09  -0.145  0.8890
## beta.7     1.200       2.15   0.557  0.5950
## beta.8     2.700       2.35   1.150  0.2890
## beta.9     4.190       2.66   1.580  0.1580
## beta.10    5.690       3.04   1.870  0.1030
## beta.11    7.190       3.48   2.070  0.0774
## beta.12    8.690       3.95   2.200  0.0636

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.80 -12.41  -3.61  12.99  37.05 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  582.2831  2565.8952   0.227   0.8234  
## z.t0          -9.2926     4.0512  -2.294   0.0357 *
## z.t1           1.4985     0.5685   2.636   0.0180 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.46 on 16 degrees of freedom
## Multiple R-squared:  0.3029, Adjusted R-squared:  0.2158 
## F-statistic: 3.476 on 2 and 16 DF,  p-value: 0.05577

The polynomial distributed lag model with q=12 has all of its scores at 5% level of significance. The adjusted R-squared score only explains 21.58% (0.2158) of variability in FFD values. The p-value of the polynomial distributed lag model is 0.05577 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_ffd_humid_k1$model)
## W = 0.97238, p-value = 0.8228

Figure 51

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 9.1529, df = 6, p-value = 0.1652

The residuals in Figure 51 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.8228 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.1652 (>0.05) which fails to reject the null hypothesis of normality.

In conclusion, the polynomial distributed lag models were not successful in capturing the autocorrelation and seasonality in the series, nor are they excellent performing models due to its lack of explainability.

Fitting Koyck Distributed Lag Model

FFD and Temperature

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.755 -12.972  -4.079  17.329  58.541 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.87216  608.22407  -0.033    0.974
## Y.1           0.03846    0.25384   0.152    0.881
## X.t          23.22042   61.08917   0.380    0.707
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value
## Weak instruments   1  27     1.429   0.242
## Wu-Hausman         1  26     0.582   0.452
## Sargan             0  NA        NA      NA
## 
## Residual standard error: 28.38 on 27 degrees of freedom
## Multiple R-Squared: -0.3272, Adjusted R-squared: -0.4255 
## Wald test: 0.07269 on 2 and 27 DF,  p-value: 0.9301

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains -0.42% (-0.4255) of variability in FFD values. The p-value of the Koyck model is 0.9301 (>0.05) hence, it is not statistically significant. The “Weak Instruments” test resulted in a p-value of 0.242 which means that the model at the first stage of least squares fitting is not significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.452 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_ffd_temp_dlm$model)
## W = 0.98351, p-value = 0.9093

Figure 52

The residuals in Figure 52 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9093 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.280988 1.280988

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

FFD and Rainfall

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.755 -12.972  -4.079  17.329  58.541 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.87216  608.22407  -0.033    0.974
## Y.1           0.03846    0.25384   0.152    0.881
## X.t          23.22042   61.08917   0.380    0.707
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value
## Weak instruments   1  27     1.429   0.242
## Wu-Hausman         1  26     0.582   0.452
## Sargan             0  NA        NA      NA
## 
## Residual standard error: 28.38 on 27 degrees of freedom
## Multiple R-Squared: -0.3272, Adjusted R-squared: -0.4255 
## Wald test: 0.07269 on 2 and 27 DF,  p-value: 0.9301

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_ffd_rain_dlm$model)
## W = 0.98582, p-value = 0.9504

Figure 53

The residuals in Figure 53 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9504 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.017508 1.017508

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

FFD and Radiation

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.229 -19.662   3.956  16.232  54.756 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 418.31843  384.12678   1.089    0.286
## Y.1          -0.03254    0.20729  -0.157    0.876
## X.t         -13.87153   25.49529  -0.544    0.591
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value  
## Weak instruments   1  27     7.199  0.0123 *
## Wu-Hausman         1  26     0.538  0.4698  
## Sargan             0  NA        NA      NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.54 on 27 degrees of freedom
## Multiple R-Squared: -0.07436,    Adjusted R-squared: -0.1539 
## Wald test: 0.1486 on 2 and 27 DF,  p-value: 0.8626

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains -0.15% (-0.1539) of variability in FFD values. The p-value of the Koyck model is 0.8626 (>0.05) hence, it is not statistically significant. The “Weak Instruments” test resulted in a p-value of 0.0123 which means that the model at the first stage of least squares fitting is significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.4698 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_ffd_rad_dlm$model)
## W = 0.98748, p-value = 0.9716

Figure 54

The residuals in Figure 54 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9716 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.055257 1.055257

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

FFD and Humidity

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.787 -14.896  -3.024  15.673  55.019 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1898.18932 4211.29973   0.451    0.656
## Y.1           -0.05904    0.25016  -0.236    0.815
## X.t          -17.72952   44.24103  -0.401    0.692
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value
## Weak instruments   1  27     0.570   0.457
## Wu-Hausman         1  26     0.121   0.730
## Sargan             0  NA        NA      NA
## 
## Residual standard error: 27 on 27 degrees of freedom
## Multiple R-Squared: -0.2016, Adjusted R-squared: -0.2906 
## Wald test: 0.0808 on 2 and 27 DF,  p-value: 0.9226

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains -0.29% (-0.2906) of variability in FFD values. The p-value of the Koyck model is 0.9226 (>0.05) hence, it is not statistically significant. The “Weak Instruments” test resulted in a p-value of 0.457 which means that the model at the first stage of least squares fitting is not significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.730 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_ffd_humid_dlm$model)
## W = 0.98176, p-value = 0.8702

Figure 55

The residuals in Figure 55 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.8702 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.374145 1.374145

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.

Fitting Autoregressive Distributed Lag Model

## p = 1 q = 1 AIC = 285.0202 BIC = 294.8286 MASE = 0.6413659 
## p = 1 q = 2 AIC = 278.1612 BIC = 289.0995 MASE = 0.6569318 
## p = 1 q = 3 AIC = 271.2682 BIC = 283.2581 MASE = 0.6415733 
## p = 1 q = 4 AIC = 264.4753 BIC = 277.4336 MASE = 0.6252734 
## p = 1 q = 5 AIC = 257.6021 BIC = 271.4411 MASE = 0.6025462 
## p = 2 q = 1 AIC = 279.4094 BIC = 291.7151 MASE = 0.6670564 
## p = 2 q = 2 AIC = 281.2235 BIC = 294.8965 MASE = 0.6629674 
## p = 2 q = 3 AIC = 274.0829 BIC = 288.7372 MASE = 0.6480681 
## p = 2 q = 4 AIC = 267.1828 BIC = 282.7328 MASE = 0.6160513 
## p = 2 q = 5 AIC = 260.6971 BIC = 277.0523 MASE = 0.613472 
## p = 3 q = 1 AIC = 271.5688 BIC = 286.2231 MASE = 0.6153833 
## p = 3 q = 2 AIC = 273.4313 BIC = 289.4178 MASE = 0.6159004 
## p = 3 q = 3 AIC = 275.3477 BIC = 292.6664 MASE = 0.6150322 
## p = 3 q = 4 AIC = 268.0854 BIC = 286.2271 MASE = 0.5856246 
## p = 3 q = 5 AIC = 261.9042 BIC = 280.7757 MASE = 0.6007602 
## p = 4 q = 1 AIC = 265.3386 BIC = 282.1844 MASE = 0.5859466 
## p = 4 q = 2 AIC = 266.1178 BIC = 284.2595 MASE = 0.5711558 
## p = 4 q = 3 AIC = 268.0671 BIC = 287.5046 MASE = 0.5705796 
## p = 4 q = 4 AIC = 269.0287 BIC = 289.762 MASE = 0.567119 
## p = 4 q = 5 AIC = 262.5279 BIC = 283.9155 MASE = 0.5766056 
## p = 5 q = 1 AIC = 247.989 BIC = 266.8604 MASE = 0.4786405 
## p = 5 q = 2 AIC = 249.9562 BIC = 270.0858 MASE = 0.4789554 
## p = 5 q = 3 AIC = 249.5842 BIC = 270.9719 MASE = 0.4447671 
## p = 5 q = 4 AIC = 244.8735 BIC = 267.5192 MASE = 0.3751994 
## p = 5 q = 5 AIC = 245.1251 BIC = 269.029 MASE = 0.3647243

From the autoregressive distributed lag model above, it can be observed that the model with the lowest AIC value is p=5, q=4, the lowest BIC value is p=5, q=1, and the lowest MASE is p=5, q=5. However, as we are after the lowest MASE values, the models that will be selected are ARDL(5,3), ARDL(5,4), and ARDL(5,5)

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.476 -17.398  -1.093  17.228  41.667 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2283.90942 2412.18598   0.947    0.358
## X.t           -3.16791    6.85976  -0.462    0.650
## X.1           -3.59211    6.53554  -0.550    0.590
## X.2           -4.37282    7.09358  -0.616    0.546
## X.3           -5.85138    7.06229  -0.829    0.420
## X.4           -5.63846    6.76624  -0.833    0.417
## X.5            1.36698    7.73994   0.177    0.862
## Y.1           -0.06381    0.25594  -0.249    0.806
## Y.2            0.04109    0.26854   0.153    0.880
## Y.3           -0.04817    0.27124  -0.178    0.861
## 
## Residual standard error: 28.68 on 16 degrees of freedom
## Multiple R-squared:  0.1273, Adjusted R-squared:  -0.3635 
## F-statistic: 0.2594 on 9 and 16 DF,  p-value: 0.9772

The autoregressive distributed lag model ardl_5.3_task2, has p=5 and q=3. All of its coefficients are not significant at 5% level of significance. The adjusted R-squared model explains -36.35% (-0.3635) of the variability in mortality rates. The p value of ardl_5.3_task2 is 0.9772 (>0.05) hence, it is not significant.

Figure 56

## 
##  Breusch-Godfrey test for serial correlation of order up to 13
## 
## data:  Residuals
## LM test = 23.706, df = 13, p-value = 0.03394

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.3_task2$model)
## W = 0.96419, p-value = 0.4806

The residuals in Figure 56 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.03394 (<0.05) which confirms the presence of serial correlation. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.4806 (>0.05) which fails to reject the null hypothesis of normality.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.593682  1.442035  1.679153  1.657011  1.377161  1.526191  1.171215  1.185184 
## L(y.t, 3) 
##  1.209149

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.579 -16.818  -1.707  15.937  41.771 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2132.74389 2520.76308   0.846    0.411
## X.t           -2.43787    7.37352  -0.331    0.746
## X.1           -3.84154    6.76325  -0.568    0.578
## X.2           -4.71055    7.36447  -0.640    0.532
## X.3           -5.41486    7.37709  -0.734    0.474
## X.4           -5.41607    6.99145  -0.775    0.451
## X.5            1.92651    8.12940   0.237    0.816
## Y.1           -0.05323    0.26512  -0.201    0.844
## Y.2            0.02584    0.27985   0.092    0.928
## Y.3           -0.04807    0.27905  -0.172    0.866
## Y.4            0.09635    0.28192   0.342    0.737
## 
## Residual standard error: 29.51 on 15 degrees of freedom
## Multiple R-squared:  0.1341, Adjusted R-squared:  -0.4432 
## F-statistic: 0.2323 on 10 and 15 DF,  p-value: 0.9877

The autoregressive distributed lag model ardl_5.4_task2, has p=5 and q=3. All of its coefficients are not significant at 5% level of significance. The adjusted R-squared model explains -44.32% (-0.4432) of the variability in mortality rates. The p value of ardl_5.4_task2 is 0.9877 (>0.05) hence, it is not significant.

Figure 57

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 22.656, df = 14, p-value = 0.0661

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.4_task2$model)
## W = 0.9653, p-value = 0.5063

The residuals in Figure 57 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.0661 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.5063 (>0.05) which fails to reject the null hypothesis of normality.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.739695  1.459027  1.709942  1.708221  1.389196  1.590710  1.187388  1.216093 
## L(y.t, 3) L(y.t, 4) 
##  1.209150  1.286556

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.910 -11.825   0.646  19.287  36.583 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2317.41502 2503.84586   0.926    0.370
## X.t           -4.24646    7.48261  -0.568    0.579
## X.1           -1.91941    6.91722  -0.277    0.785
## X.2           -6.46056    7.46285  -0.866    0.401
## X.3           -7.07101    7.45821  -0.948    0.359
## X.4           -4.91623    6.94382  -0.708    0.491
## X.5            2.26760    8.06321   0.281    0.783
## Y.1           -0.09393    0.26525  -0.354    0.729
## Y.2            0.05404    0.27851   0.194    0.849
## Y.3           -0.08503    0.27852  -0.305    0.765
## Y.4            0.07639    0.27999   0.273    0.789
## Y.5            0.32209    0.28594   1.126    0.279
## 
## Residual standard error: 29.25 on 14 degrees of freedom
## Multiple R-squared:  0.206,  Adjusted R-squared:  -0.4178 
## F-statistic: 0.3303 on 11 and 14 DF,  p-value: 0.9641

The autoregressive distributed lag model ardl_5.5_task2, has p=5 and q=5. All of its coefficients are not are not significant at 5% level of significance. The adjusted R-squared model explains -41.78% (-0.4178) of the variability in mortality rates. The p value of ardl_5.5_task2 is 0.9641 (>0.05) hence, it is not significant.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  1.823662  1.553569  1.787404  1.777290  1.394892  1.592956  1.209830  1.225999 
## L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  1.226165  1.291730  1.314591

In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.

Time Series Regression Models Table Summary

Accuracy Score Table
Model	AIC	BIC	MASE
Finite DLM /w Intercept (Radiation)	165.1432	179.3097	0.2509081
Finite DLM /w Intercept (Humidity)	171.2149	185.3814	0.2662230
Finite DLM /wo Intercept (Radiation)	167.3926	180.6147	0.2763708
Finite DLM /wo Intercept (Humidity)	171.4808	184.7029	0.2941178
Finite DLM /w Intercept (Rainfall)	185.1533	199.0936	0.3819204
Finite DLM /w Intercept (Temperature)	181.9935	196.1600	0.4122077
Finite DLM /wo Intercept (Temperature)	180.3048	193.5269	0.4278512
Finite DLM /wo Intercept (Rainfall)	188.8569	202.0791	0.4767021
Poly DLM k=1 (Humidity)	173.3652	177.1430	0.5550638
Poly DLM k=2 (Humidity)	175.3290	180.0512	0.5564647
ARDL(5,5)	259.2288	275.5840	0.5723147
Poly DLM k=2 (Temperature)	176.3678	181.0900	0.5829889
Poly DLM k=2 (Rainfall)	188.3055	193.2841	0.5934777
ARDL(5,4)	259.4845	274.5816	0.6191304
ARDL(5,3)	257.6861	271.5252	0.6269468
Poly DLM k=1 (Rainfall)	187.3596	191.3425	0.6290417
Poly DLM k=2 (Radiation)	189.6827	194.6614	0.6384276
Poly DLM k=1 (Radiation)	187.9427	191.9256	0.6397641
Poly DLM k=1 (Temperature)	179.2866	183.0644	0.6617769
Koyck (Radiation)	284.3792	289.9840	0.7136301
Koyck (Rainfall)	288.9325	294.5373	0.7267872
Koyck (Humidity)	287.7361	293.3409	0.7335956
Koyck (Temperature)	290.7199	296.3247	0.7927338

The table above depicts the accuracy measures (AIC, BIC, and MASE) of the fitted models thus far. It can be seen that Finite DLM /w Intercept (Radiation) has the lowest MASE value.

Fitting Exponential Smoothing Models and State-Space Models

Was unable to fit due to low frequency of the data.

Forecasting

In conclusion, based on all the previous observations made - the model that should be used to forecast the FFD 4 years ahead should be the Finite DLM /w Intercept (Radiation) model based on the results of its residual analysis and its low MASE value.

Task 2 Conclusion

To conclude this exploration, the frequency was too low so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the Finite DLM /w Intercept (Radiation) was left as the best model to use for forecasting.

Task 3

Objectives

Task 3a : Forecast 3 years ahead using univariate regressors, forecasting for each of the best models within the dataset.

Task 3b: Analysis to obtain the 3 year ahead forecast based on the drought period for Australia that occured from 1996 to 2009.

Methodology

First, a time series will be created for all series within the dataset. They will then be examined for stationarity and subsequently decomposed. Once the exploration is complete, the model will be fitted with a DLM, ARDL, Koyck, and DYNLM to find the best 3 years ahead forecast based on their residual analysis and MASE value. The best model will be selected based on the model’s ability to capture autocorrelation and seasonality components, and also low MASE values.

Task 3a

Data Exploration and Visualization

## Rows: 31 Columns: 6-- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): Year, RBO, Temperature, Rainfall, Radiation, RelHumidity
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Figure 58

Figure 58 above depicts the time series plots for the Rank-based Order Similarity. This dataset shares the same factors of Temperature, Rainfall, Radiation, and Relative Humidity as previously explored in Task 2 - hence they will not be repeated for redundancy purposes. The one time series that was not explored is the RBO time series.

RBO Time Series Analysis

Figure 59

Figure 59 shows the changes of Rank-based Order similarity over the years. The graph shows no apparent trend. There seems to be no seasonality in the graph. The graph appears to have some changing variance. The graph mostly reflects a Moving Average (MA) with certain instances of Auto-regressive points. The graph has an intervention point after 1995 where there was a large drop that changed the variance.

RBO ADF & ACF

## 
##  Augmented Dickey-Fuller Test
## 
## data:  rbo_rbo_ts
## Dickey-Fuller = -2.0545, Lag order = 3, p-value = 0.5518
## alternative hypothesis: stationary

Figure 60

Figure 60 shows the ACF plot for the RBO series. The ACF plot suggests an existence of a trend and seasonal patterns. The ADF test resulted a p-value of 0.5518 (>0.05) hence, we fail to reject the null-hypothesis of non-stationarity. Based on these results, it can be concluded that the FFD time series is not a stationary time series that contains seasonality.

Time Series Regression Model Fitting

Correlation Coefficient

##                   Year        RBO  Temperature   Rainfall   Radiation
## Year         1.0000000 -0.5635171  0.148410676 -0.1752091  0.11881829
## RBO         -0.5635171  1.0000000  0.261000659  0.3932282 -0.31736018
## Temperature  0.1484107  0.2610007  1.000000000  0.3933255 -0.24096625
## Rainfall    -0.1752091  0.3932282  0.393325545  1.0000000 -0.58131610
## Radiation    0.1188183 -0.3173602 -0.240966245 -0.5813161  1.00000000
## RelHumidity  0.2063558 -0.1776349  0.009646021  0.3384610 -0.05520965
##              RelHumidity
## Year         0.206355767
## RBO         -0.177634864
## Temperature  0.009646021
## Rainfall     0.338461007
## Radiation   -0.055209652
## RelHumidity  1.000000000

From the results of the Correlation Coefficient above, the highest correlation that can be observed is between RBO and Rainfall at 0.3932282. However, due to the low correlation between most of the variables, each variable will be used as independent variables with RBO being the dependent variable.

Fitting Finite Distributed Lag Model with Intercepts

Finite DLM Temperature

## q = 1 AIC = -101.8617 BIC = -96.2569 MASE = 0.9239038 
## q = 2 AIC = -95.49894 BIC = -88.66246 MASE = 1.032564 
## q = 3 AIC = -96.76727 BIC = -88.77404 MASE = 1.033663 
## q = 4 AIC = -92.75653 BIC = -83.68567 MASE = 1.005373 
## q = 5 AIC = -91.46337 BIC = -81.3986 MASE = 0.8594175 
## q = 6 AIC = -85.74127 BIC = -74.77139 MASE = 0.8103361 
## q = 7 AIC = -82.04015 BIC = -70.25962 MASE = 0.7518958 
## q = 8 AIC = -83.28717 BIC = -70.79674 MASE = 0.6497633 
## q = 9 AIC = -88.10651 BIC = -75.014 MASE = 0.5120906 
## q = 10 AIC = -87.92965 BIC = -74.35085 MASE = 0.458588 
## q = 11 AIC = -80.42149 BIC = -66.48124 MASE = 0.4388714 
## q = 12 AIC = -87.10828 BIC = -72.9417 MASE = 0.3522387

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##          1          2          3          4          5          6          7 
## -0.0184315  0.0196039  0.0067008 -0.0005057 -0.0121655  0.0037651 -0.0114122 
##          8          9         10         11         12         13         14 
##  0.0150263 -0.0088538 -0.0034608 -0.0043610  0.0061125  0.0043009 -0.0160637 
##         15         16         17         18         19 
## -0.0011766  0.0047237  0.0189244  0.0097069 -0.0124340 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.341e+00  6.334e-01   2.117   0.0879 .
## x2.t         1.785e-02  1.599e-02   1.116   0.3152  
## x2.1         4.221e-02  1.838e-02   2.296   0.0701 .
## x2.2        -4.280e-02  2.198e-02  -1.948   0.1090  
## x2.3        -3.798e-02  1.866e-02  -2.035   0.0975 .
## x2.4         1.831e-02  1.962e-02   0.933   0.3935  
## x2.5        -5.120e-03  2.113e-02  -0.242   0.8181  
## x2.6        -6.533e-03  2.179e-02  -0.300   0.7764  
## x2.7        -8.192e-03  2.094e-02  -0.391   0.7118  
## x2.8         5.276e-05  1.896e-02   0.003   0.9979  
## x2.9        -3.263e-02  1.869e-02  -1.745   0.1414  
## x2.10       -8.331e-03  1.891e-02  -0.441   0.6779  
## x2.11        1.898e-03  1.738e-02   0.109   0.9173  
## x2.12       -5.800e-03  1.687e-02  -0.344   0.7449  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02164 on 5 degrees of freedom
## Multiple R-squared:  0.7848, Adjusted R-squared:  0.2252 
## F-statistic: 1.402 on 13 and 5 DF,  p-value: 0.3754
## 
## AIC and BIC values for the model:
##         AIC      BIC
## 1 -87.10828 -72.9417

The finite distributed lag model with q=12 has no significant term at 5% significance level. The adjusted R-squared score explains 22.52% (0.2252) of variability in FFD values. The p-value of the finite distributed lag model is 0.3754 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_temp_rbo$model)
## W = 0.96272, p-value = 0.6271

The residuals in Figure 36 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.1253 (>0.05) which fails to rejects the null hypothesis of normality.

##     x2.t     x2.1     x2.2     x2.3     x2.4     x2.5     x2.6     x2.7 
## 1.682976 1.813251 2.088240 1.634591 1.721548 1.946444 2.381222 2.070046 
##     x2.8     x2.9    x2.10    x2.11    x2.12 
## 2.048046 1.625311 1.667104 1.418715 1.300454

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Rainfall

## q = 1 AIC = -100.898 BIC = -95.29319 MASE = 0.9417954 
## q = 2 AIC = -96.70956 BIC = -89.87308 MASE = 0.9993747 
## q = 3 AIC = -97.19966 BIC = -89.20643 MASE = 0.9796852 
## q = 4 AIC = -90.46187 BIC = -81.39101 MASE = 1.038827 
## q = 5 AIC = -87.24242 BIC = -77.17765 MASE = 0.925677 
## q = 6 AIC = -82.31788 BIC = -71.348 MASE = 0.8543964 
## q = 7 AIC = -77.98405 BIC = -66.20351 MASE = 0.829337 
## q = 8 AIC = -76.81922 BIC = -64.32879 MASE = 0.7233794 
## q = 9 AIC = -80.79432 BIC = -67.70181 MASE = 0.6205897 
## q = 10 AIC = -76.93255 BIC = -63.35376 MASE = 0.585617 
## q = 11 AIC = -77.84858 BIC = -63.90833 MASE = 0.4621806 
## q = 12 AIC = -79.63373 BIC = -65.46714 MASE = 0.4240236

From the results above, a lag length of 12 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##         1         2         3         4         5         6         7         8 
## -0.028368 -0.006823  0.012033  0.018530  0.006442  0.014196 -0.018606 -0.010792 
##         9        10        11        12        13        14        15        16 
## -0.002000  0.010196 -0.001337  0.007104 -0.001875 -0.009690 -0.019581  0.018543 
##        17        18        19 
## -0.001563 -0.006339  0.019931 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.8035567  0.1444702   5.562  0.00258 **
## x3.t         0.0190632  0.0234786   0.812  0.45373   
## x3.1        -0.0055430  0.0206666  -0.268  0.79925   
## x3.2         0.0145805  0.0230040   0.634  0.55405   
## x3.3        -0.0222519  0.0170995  -1.301  0.24988   
## x3.4        -0.0025011  0.0239167  -0.105  0.92078   
## x3.5        -0.0100115  0.0184670  -0.542  0.61101   
## x3.6        -0.0002192  0.0216238  -0.010  0.99230   
## x3.7        -0.0161457  0.0216731  -0.745  0.48981   
## x3.8         0.0138944  0.0192116   0.723  0.50195   
## x3.9        -0.0366400  0.0243706  -1.503  0.19305   
## x3.10        0.0124757  0.0212846   0.586  0.58324   
## x3.11       -0.0314699  0.0204542  -1.539  0.18453   
## x3.12        0.0253655  0.0231360   1.096  0.32289   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02634 on 5 degrees of freedom
## Multiple R-squared:  0.681,  Adjusted R-squared:  -0.1482 
## F-statistic: 0.8213 on 13 and 5 DF,  p-value: 0.6451
## 
## AIC and BIC values for the model:
##         AIC       BIC
## 1 -79.63373 -65.46714

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains -14.82% (-0.1482) of variability in FFD values. The p-value of the finite distributed lag model is 0.6451 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rain_rbo$model)
## W = 0.95967, p-value = 0.5659

The residuals in Figure 62 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are significant. The histogram of standardised residuals contains violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.5659 (>0.05) which fails to rejects the null hypothesis of normality.

##     x3.t     x3.1     x3.2     x3.3     x3.4     x3.5     x3.6     x3.7 
## 2.312923 1.985240 2.481193 1.480135 2.811952 1.615976 2.186146 2.094572 
##     x3.8     x3.9    x3.10    x3.11    x3.12 
## 1.639937 2.344740 1.782273 1.617890 2.066747

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Radiation

## q = 1 AIC = -97.71113 BIC = -92.10634 MASE = 1.063029 
## q = 2 AIC = -91.50708 BIC = -84.67061 MASE = 1.174709 
## q = 3 AIC = -92.38658 BIC = -84.39335 MASE = 1.128463 
## q = 4 AIC = -86.43017 BIC = -77.35931 MASE = 1.199809 
## q = 5 AIC = -83.32223 BIC = -73.25746 MASE = 1.071856 
## q = 6 AIC = -81.85403 BIC = -70.88414 MASE = 0.9483202 
## q = 7 AIC = -79.04997 BIC = -67.26943 MASE = 0.8584745 
## q = 8 AIC = -80.65686 BIC = -68.16642 MASE = 0.707333 
## q = 9 AIC = -78.4524 BIC = -65.35989 MASE = 0.6617079 
## q = 10 AIC = -78.06879 BIC = -64.49 MASE = 0.603729 
## q = 11 AIC = -72.88766 BIC = -58.94741 MASE = 0.5583271 
## q = 12 AIC = -65.19474 BIC = -51.02816 MASE = 0.6477083

From the results above, a lag length of 11 provides the lowest AIC, BIC, and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.03300 -0.01641  0.00607  0.01283  0.03564 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.579343   0.795553   0.728    0.490
## x4.t        -0.049240   0.042970  -1.146    0.289
## x4.1         0.040767   0.049104   0.830    0.434
## x4.2         0.013172   0.047571   0.277    0.790
## x4.3        -0.008875   0.029431  -0.302    0.772
## x4.4         0.005986   0.028726   0.208    0.841
## x4.5        -0.015280   0.032443  -0.471    0.652
## x4.6         0.027281   0.028464   0.958    0.370
## x4.7        -0.006362   0.028744  -0.221    0.831
## x4.8         0.009588   0.028922   0.332    0.750
## x4.9         0.018459   0.035452   0.521    0.619
## x4.10       -0.007909   0.038327  -0.206    0.842
## x4.11       -0.018495   0.037212  -0.497    0.634
## 
## Residual standard error: 0.03284 on 7 degrees of freedom
## Multiple R-squared:  0.3924, Adjusted R-squared:  -0.6492 
## F-statistic: 0.3767 on 12 and 7 DF,  p-value: 0.934
## 
## AIC and BIC values for the model:
##         AIC       BIC
## 1 -72.88766 -58.94741

The finite distributed lag model with q=11 has no significant term at 5% significance level. The adjusted R-squared score only explains -64.92% (-0.6492) of variability in FFD values. The p-value of the finite distributed lag model is 0.934 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_rad_rbo$model)
## W = 0.94938, p-value = 0.3578

The residuals in Figure 63 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.3578 (>0.05) which fails to rejects the null hypothesis of normality.

##     x4.t     x4.1     x4.2     x4.3     x4.4     x4.5     x4.6     x4.7 
## 5.333552 6.991474 6.757501 3.724608 3.317886 4.141127 3.006734 3.051213 
##     x4.8     x4.9    x4.10    x4.11 
## 2.805563 3.348295 3.497440 3.441187

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

Finite DLM Relative Humidity

## q = 1 AIC = -94.56619 BIC = -88.9614 MASE = 1.101099 
## q = 2 AIC = -88.3792 BIC = -81.54272 MASE = 1.235884 
## q = 3 AIC = -89.1954 BIC = -81.20218 MASE = 1.171117 
## q = 4 AIC = -82.80086 BIC = -73.73 MASE = 1.262425 
## q = 5 AIC = -79.4041 BIC = -69.33932 MASE = 1.149637 
## q = 6 AIC = -76.22147 BIC = -65.25158 MASE = 1.026742 
## q = 7 AIC = -74.80399 BIC = -63.02345 MASE = 0.9235533 
## q = 8 AIC = -72.65628 BIC = -60.16585 MASE = 0.8466119 
## q = 9 AIC = -83.46275 BIC = -70.37025 MASE = 0.5821168 
## q = 10 AIC = -89.23348 BIC = -75.65468 MASE = 0.451849 
## q = 11 AIC = -86.87319 BIC = -72.93294 MASE = 0.3401436 
## q = 12 AIC = -102.9404 BIC = -88.77377 MASE = 0.2453801

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE values. This means that a value of q=12 will be selected as the lag length of the model.

## 
## Call:
## lm(formula = as.formula(model.formula), data = design)
## 
## Residuals:
##          1          2          3          4          5          6          7 
## -6.288e-03 -4.741e-03  1.378e-02  4.883e-03  6.387e-03 -1.051e-02 -7.777e-03 
##          8          9         10         11         12         13         14 
## -2.019e-05 -8.178e-03  6.657e-03 -5.228e-03  7.668e-03 -1.964e-03 -6.420e-03 
##         15         16         17         18         19 
## -1.078e-02  4.723e-03  3.717e-03  2.796e-03  1.130e-02 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -7.154469   2.058280  -3.476   0.0177 *
## x5.t         0.025360   0.009888   2.565   0.0504 .
## x5.1        -0.009190   0.009340  -0.984   0.3704  
## x5.2         0.017859   0.006088   2.933   0.0325 *
## x5.3        -0.002440   0.006132  -0.398   0.7070  
## x5.4         0.002354   0.006088   0.387   0.7149  
## x5.5         0.022523   0.006132   3.673   0.0144 *
## x5.6        -0.008408   0.005950  -1.413   0.2167  
## x5.7         0.018591   0.005599   3.321   0.0210 *
## x5.8         0.007378   0.005790   1.274   0.2586  
## x5.9         0.001636   0.005663   0.289   0.7843  
## x5.10        0.003699   0.005213   0.710   0.5096  
## x5.11       -0.011267   0.007354  -1.532   0.1860  
## x5.12        0.015070   0.007614   1.979   0.1047  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01427 on 5 degrees of freedom
## Multiple R-squared:  0.9065, Adjusted R-squared:  0.6633 
## F-statistic: 3.727 on 13 and 5 DF,  p-value: 0.07754
## 
## AIC and BIC values for the model:
##         AIC       BIC
## 1 -102.9404 -88.77377

The finite distributed lag model with q=12 has almost no significant term at 5% significance level. The adjusted R-squared score only explains 66.33% (0.6633) of variability in FFD values. The p-value of the finite distributed lag model is 0.07754 (>0.05) hence, it is not statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fin_dlm_humid_rbo$model)
## W = 0.94432, p-value = 0.315

The residuals in Figure 64 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.315 (>0.05) which fails to rejects the null hypothesis of normality.

##     x5.t     x5.1     x5.2     x5.3     x5.4     x5.5     x5.6     x5.7 
## 4.889400 4.280613 2.057443 2.088074 1.913716 1.870104 1.829416 2.026883 
##     x5.8     x5.9    x5.10    x5.11    x5.12 
## 2.171511 2.057850 1.759417 3.480250 3.810613

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

In conclusion, the finite distributed lag models with Intercepts are not good performing models due to their lack of significant terms at 5% significance level and explainability.

Fitting Polynomial Distributed Lag Model

RBO and Temperature

## q = 1 AIC = -101.8617 BIC = -96.2569 MASE = 0.9239038 
## q = 2 AIC = -95.49894 BIC = -88.66246 MASE = 1.032564 
## q = 3 AIC = -97.93803 BIC = -91.277 MASE = 1.0725 
## q = 4 AIC = -94.65533 BIC = -88.17614 MASE = 1.089145 
## q = 5 AIC = -92.07915 BIC = -85.78867 MASE = 1.014553 
## q = 6 AIC = -90.66426 BIC = -84.56988 MASE = 0.913053 
## q = 7 AIC = -88.11262 BIC = -82.22235 MASE = 0.8550191 
## q = 8 AIC = -87.76319 BIC = -82.08572 MASE = 0.75509 
## q = 9 AIC = -90.75952 BIC = -85.30431 MASE = 0.7002906 
## q = 10 AIC = -87.92162 BIC = -82.699 MASE = 0.6742057 
## q = 11 AIC = -83.56579 BIC = -78.58713 MASE = 0.6212847 
## q = 12 AIC = -82.47435 BIC = -77.75216 MASE = 0.6494928

From the results above, a lag length of 11 provides the lowest AIC, BIC, and MASE (relative) values. This means that a value of q=11 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##          Estimate Std. Error t value P(>|t|)
## beta.0   0.007930    0.00905  0.8760   0.404
## beta.1   0.005830    0.00742  0.7850   0.452
## beta.2   0.003870    0.00675  0.5730   0.581
## beta.3   0.002060    0.00670  0.3070   0.766
## beta.4   0.000393    0.00685  0.0573   0.956
## beta.5  -0.001130    0.00690 -0.1630   0.874
## beta.6  -0.002500    0.00673 -0.3720   0.719
## beta.7  -0.003730    0.00638 -0.5840   0.573
## beta.8  -0.004810    0.00609 -0.7900   0.450
## beta.9  -0.005750    0.00631 -0.9120   0.386
## beta.10 -0.006540    0.00751 -0.8700   0.407
## beta.11 -0.007180    0.00986 -0.7290   0.485

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.048288 -0.012580  0.001061  0.011338  0.041099 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.8222739  0.6265209   1.312    0.208
## z.t0         0.0079300  0.0090547   0.876    0.394
## z.t1        -0.0021760  0.0033762  -0.644    0.528
## z.t2         0.0000729  0.0003119   0.234    0.818
## 
## Residual standard error: 0.02608 on 16 degrees of freedom
## Multiple R-squared:  0.1238, Adjusted R-squared:  -0.0405 
## F-statistic: 0.7535 on 3 and 16 DF,  p-value: 0.5363

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.40% (-0.0405) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.5363 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_temp_k2$model)
## W = 0.97762, p-value = 0.8998

Figure 65

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 14.731, df = 7, p-value = 0.03961

The residuals in Figure 65 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.03961 (<0.05) which confirms the presence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.8998 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##          Estimate Std. Error t value P(>|t|)
## beta.0   0.006920    0.00773  0.8950   0.394
## beta.1   0.005500    0.00708  0.7770   0.457
## beta.2   0.004080    0.00650  0.6270   0.546
## beta.3   0.002660    0.00602  0.4420   0.669
## beta.4   0.001240    0.00565  0.2190   0.832
## beta.5  -0.000182    0.00543 -0.0336   0.974
## beta.6  -0.001600    0.00536 -0.2990   0.772
## beta.7  -0.003020    0.00546 -0.5530   0.593
## beta.8  -0.004440    0.00572 -0.7770   0.457
## beta.9  -0.005860    0.00611 -0.9600   0.362
## beta.10 -0.007280    0.00661 -1.1000   0.299
## beta.11 -0.008700    0.00721 -1.2100   0.258

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.046511 -0.012185  0.002197  0.011300  0.042490 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.8143641  0.6079624   1.339    0.198
## z.t0         0.0069189  0.0077302   0.895    0.383
## z.t1        -0.0014202  0.0009443  -1.504    0.151
## 
## Residual standard error: 0.02535 on 17 degrees of freedom
## Multiple R-squared:  0.1208, Adjusted R-squared:  0.01736 
## F-statistic: 1.168 on 2 and 17 DF,  p-value: 0.3348

The polynomial distributed lag model with q=12 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains 0.17% (0.01736) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.3348 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_temp_k1$model)
## W = 0.97806, p-value = 0.9066

Figure 66

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 14.281, df = 6, p-value = 0.02665

The residuals in Figure 66 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.02665 (<0.05) which confirms the presence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9066 (>0.05) which fails to reject the null hypothesis of normality.

RBO and Rainfall

## q = 1 AIC = -100.898 BIC = -95.29319 MASE = 0.9417954 
## q = 2 AIC = -96.70956 BIC = -89.87308 MASE = 0.9993747 
## q = 3 AIC = -98.84343 BIC = -92.18241 MASE = 0.9491588 
## q = 4 AIC = -94.04271 BIC = -87.56352 MASE = 1.006192 
## q = 5 AIC = -92.46231 BIC = -86.17182 MASE = 0.9264235 
## q = 6 AIC = -89.17565 BIC = -83.08127 MASE = 0.8773226 
## q = 7 AIC = -86.84712 BIC = -80.95686 MASE = 0.8319619 
## q = 8 AIC = -87.81723 BIC = -82.13976 MASE = 0.7195348 
## q = 9 AIC = -90.82864 BIC = -85.37342 MASE = 0.6646487 
## q = 10 AIC = -88.26525 BIC = -83.04264 MASE = 0.6451804 
## q = 11 AIC = -85.66633 BIC = -80.68767 MASE = 0.6161421 
## q = 12 AIC = -84.25719 BIC = -79.53499 MASE = 0.64218

From the results above, a lag length of 11 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##          Estimate Std. Error t value P(>|t|)
## beta.0   0.009960    0.01020  0.9720  0.3560
## beta.1   0.006380    0.00719  0.8860  0.3980
## beta.2   0.003170    0.00562  0.5640  0.5860
## beta.3   0.000339    0.00534  0.0635  0.9510
## beta.4  -0.002120    0.00558 -0.3790  0.7130
## beta.5  -0.004190    0.00567 -0.7400  0.4780
## beta.6  -0.005900    0.00535 -1.1000  0.2990
## beta.7  -0.007220    0.00472 -1.5300  0.1600
## beta.8  -0.008170    0.00433 -1.8900  0.0919
## beta.9  -0.008750    0.00533 -1.6400  0.1350
## beta.10 -0.008940    0.00810 -1.1000  0.2980
## beta.11 -0.008760    0.01220 -0.7170  0.4920

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.048815 -0.015397  0.003078  0.014180  0.038986 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.7945101  0.1052975   7.545 1.17e-06 ***
## z.t0         0.0099617  0.0102448   0.972    0.345    
## z.t1        -0.0037719  0.0045196  -0.835    0.416    
## z.t2         0.0001881  0.0004255   0.442    0.664    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02475 on 16 degrees of freedom
## Multiple R-squared:  0.2111, Adjusted R-squared:  0.06324 
## F-statistic: 1.428 on 3 and 16 DF,  p-value: 0.2716

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains 0.63% (0.06324) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.2716 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_rain_k2$model)
## W = 0.98469, p-value = 0.9796

Figure 67

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 11.04, df = 7, p-value = 0.1369

The residuals in Figure 67 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1369 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9796 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##          Estimate Std. Error t value P(>|t|)
## beta.0   0.006840    0.00724   0.944  0.3700
## beta.1   0.005010    0.00634   0.790  0.4500
## beta.2   0.003170    0.00549   0.578  0.5770
## beta.3   0.001340    0.00472   0.284  0.7830
## beta.4  -0.000491    0.00410  -0.120  0.9070
## beta.5  -0.002320    0.00368  -0.632  0.5430
## beta.6  -0.004160    0.00354  -1.170  0.2700
## beta.7  -0.005990    0.00371  -1.610  0.1410
## beta.8  -0.007820    0.00416  -1.880  0.0926
## beta.9  -0.009650    0.00480  -2.010  0.0752
## beta.10 -0.011500    0.00557  -2.060  0.0694
## beta.11 -0.013300    0.00643  -2.070  0.0682

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.048742 -0.015175  0.002259  0.015681  0.038830 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.805883   0.099663   8.086 3.15e-07 ***
## z.t0         0.006837   0.007241   0.944    0.358    
## z.t1        -0.001832   0.001062  -1.725    0.103    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02415 on 17 degrees of freedom
## Multiple R-squared:  0.2015, Adjusted R-squared:  0.1076 
## F-statistic: 2.145 on 2 and 17 DF,  p-value: 0.1477

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains 10.76% (0.1076) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.1477 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_rain_k1$model)
## W = 0.98375, p-value = 0.9729

Figure 68

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 8.917, df = 6, p-value = 0.1783

The residuals in Figure 68 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1783 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9729 (>0.05) which fails to reject the null hypothesis of normality.

RBO and Radiation

## q = 1 AIC = -97.71113 BIC = -92.10634 MASE = 1.063029 
## q = 2 AIC = -91.50708 BIC = -84.67061 MASE = 1.174709 
## q = 3 AIC = -93.41298 BIC = -86.75196 MASE = 1.140154 
## q = 4 AIC = -89.71544 BIC = -83.23625 MASE = 1.203645 
## q = 5 AIC = -88.46836 BIC = -82.17788 MASE = 1.085079 
## q = 6 AIC = -87.48527 BIC = -81.39089 MASE = 0.9572973 
## q = 7 AIC = -86.62398 BIC = -80.73372 MASE = 0.8494332 
## q = 8 AIC = -89.0863 BIC = -83.40883 MASE = 0.7259512 
## q = 9 AIC = -89.93849 BIC = -84.48328 MASE = 0.6974314 
## q = 10 AIC = -88.15084 BIC = -82.92823 MASE = 0.6711964 
## q = 11 AIC = -84.26791 BIC = -79.28925 MASE = 0.6193374 
## q = 12 AIC = -82.50331 BIC = -77.78112 MASE = 0.6631798

From the results above, a lag length of 11 provides the lowest AIC (relative), BIC (relative), and MASE values. This means that a value of q=11 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##          Estimate Std. Error t value P(>|t|)
## beta.0  -0.003680    0.00847 -0.4350   0.674
## beta.1  -0.000699    0.00561 -0.1240   0.904
## beta.2   0.001740    0.00382  0.4540   0.660
## beta.3   0.003620    0.00325  1.1100   0.294
## beta.4   0.004960    0.00343  1.4500   0.182
## beta.5   0.005760    0.00364  1.5800   0.148
## beta.6   0.006000    0.00360  1.6700   0.129
## beta.7   0.005700    0.00337  1.6900   0.125
## beta.8   0.004850    0.00346  1.4000   0.195
## beta.9   0.003460    0.00457  0.7570   0.469
## beta.10  0.001520    0.00685  0.2210   0.830
## beta.11 -0.000975    0.01010 -0.0966   0.925

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044055 -0.011957  0.002021  0.017060  0.038610 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.2421290  0.5533133   0.438    0.668
## z.t0        -0.0036808  0.0084695  -0.435    0.670
## z.t1         0.0032557  0.0036584   0.890    0.387
## z.t2        -0.0002736  0.0003390  -0.807    0.431
## 
## Residual standard error: 0.02563 on 16 degrees of freedom
## Multiple R-squared:  0.154,  Adjusted R-squared:  -0.004606 
## F-statistic: 0.971 on 3 and 16 DF,  p-value: 0.4307

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains -0.04% (-0.004606) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.4307 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_rad_k2$model)
## W = 0.96191, p-value = 0.5827

Figure 69

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 10.85, df = 7, p-value = 0.1453

The residuals in Figure 69 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1453 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.5827 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0   0.00196    0.00473   0.415   0.688
## beta.1   0.00233    0.00413   0.564   0.587
## beta.2   0.00270    0.00359   0.750   0.472
## beta.3   0.00307    0.00315   0.975   0.355
## beta.4   0.00343    0.00283   1.210   0.256
## beta.5   0.00380    0.00269   1.410   0.191
## beta.6   0.00417    0.00276   1.510   0.165
## beta.7   0.00454    0.00302   1.500   0.167
## beta.8   0.00490    0.00343   1.430   0.186
## beta.9   0.00527    0.00394   1.340   0.213
## beta.10  0.00564    0.00452   1.250   0.243
## beta.11  0.00601    0.00514   1.170   0.273

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043545 -0.016730 -0.000869  0.015616  0.039930 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0157308  0.4720065   0.033    0.974
## z.t0        0.0019620  0.0047303   0.415    0.683
## z.t1        0.0003678  0.0007526   0.489    0.631
## 
## Residual standard error: 0.02536 on 17 degrees of freedom
## Multiple R-squared:  0.1196, Adjusted R-squared:  0.016 
## F-statistic: 1.154 on 2 and 17 DF,  p-value: 0.3387

The polynomial distributed lag model with q=11 has none of its scores significant at 5% level of significance. The adjusted R-squared score only explains 0.16% (0.016) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.3387 (>0.05) hence, it is statistically not significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_rad_k1$model)
## W = 0.97034, p-value = 0.762

Figure 70

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 8.7669, df = 6, p-value = 0.1871

The residuals in Figure 70 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1871 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain some violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.762 (>0.05) which fails to reject the null hypothesis of normality.

RBO and Humidity

## q = 1 AIC = -94.56619 BIC = -88.9614 MASE = 1.101099 
## q = 2 AIC = -88.3792 BIC = -81.54272 MASE = 1.235884 
## q = 3 AIC = -90.71527 BIC = -84.05425 MASE = 1.205438 
## q = 4 AIC = -86.21246 BIC = -79.73328 MASE = 1.265024 
## q = 5 AIC = -83.611 BIC = -77.32052 MASE = 1.218437 
## q = 6 AIC = -82.49202 BIC = -76.39764 MASE = 1.124788 
## q = 7 AIC = -80.66896 BIC = -74.77869 MASE = 1.039317 
## q = 8 AIC = -81.50599 BIC = -75.82852 MASE = 0.8827226 
## q = 9 AIC = -85.27436 BIC = -79.81915 MASE = 0.7726485 
## q = 10 AIC = -85.61489 BIC = -80.39227 MASE = 0.6821919 
## q = 11 AIC = -83.5761 BIC = -78.59744 MASE = 0.5912701 
## q = 12 AIC = -88.16372 BIC = -83.44153 MASE = 0.5722198

From the results above, a lag length of 12 provides the lowest AIC, BIC, and MASE (relative) values. This means that a value of q=12 will be selected as the lag length of the model.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0   0.00885    0.00445    1.99  0.0868
## beta.1   0.00818    0.00365    2.24  0.0602
## beta.2   0.00758    0.00316    2.40  0.0477
## beta.3   0.00706    0.00294    2.40  0.0472
## beta.4   0.00662    0.00288    2.30  0.0552
## beta.5   0.00627    0.00290    2.16  0.0674
## beta.6   0.00599    0.00291    2.06  0.0789
## beta.7   0.00580    0.00290    2.00  0.0861
## beta.8   0.00568    0.00288    1.97  0.0894
## beta.9   0.00565    0.00291    1.94  0.0938
## beta.10  0.00569    0.00309    1.84  0.1080
## beta.11  0.00582    0.00352    1.65  0.1420
## beta.12  0.00603    0.00425    1.42  0.1990

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.033016 -0.010342  0.001035  0.013832  0.031004 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -7.349e+00  2.679e+00  -2.743   0.0151 *
## z.t0         8.855e-03  4.449e-03   1.990   0.0651 .
## z.t1        -7.197e-04  1.374e-03  -0.524   0.6081  
## z.t2         4.037e-05  1.023e-04   0.395   0.6987  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02057 on 15 degrees of freedom
## Multiple R-squared:  0.4167, Adjusted R-squared:    0.3 
## F-statistic: 3.572 on 3 and 15 DF,  p-value: 0.03956

The polynomial distributed lag model with q=12 has none of its scores at 5% level of significance. The adjusted R-squared score only explains 3% (0.3) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.03956 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_humid_k2$model)
## W = 0.9501, p-value = 0.3967

Figure 71

## 
##  Breusch-Godfrey test for serial correlation of order up to 7
## 
## data:  Residuals
## LM test = 11.539, df = 7, p-value = 0.1168

The residuals in Figure 71 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1168 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.3967 (>0.05) which fails to reject the null hypothesis of normality.

## Estimates and t-tests for beta coefficients:
##         Estimate Std. Error t value P(>|t|)
## beta.0   0.00815    0.00396    2.06  0.0788
## beta.1   0.00792    0.00350    2.26  0.0579
## beta.2   0.00769    0.00306    2.51  0.0403
## beta.3   0.00747    0.00267    2.79  0.0268
## beta.4   0.00724    0.00235    3.08  0.0179
## beta.5   0.00701    0.00213    3.29  0.0133
## beta.6   0.00679    0.00204    3.32  0.0127
## beta.7   0.00656    0.00210    3.12  0.0167
## beta.8   0.00633    0.00230    2.76  0.0281
## beta.9   0.00611    0.00260    2.35  0.0510
## beta.10  0.00588    0.00297    1.98  0.0885
## beta.11  0.00566    0.00340    1.66  0.1400
## beta.12  0.00543    0.00386    1.41  0.2020

## 
## Call:
## "Y ~ (Intercept) + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.034112 -0.011865  0.004151  0.013036  0.029452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -7.6343783  2.5099292  -3.042  0.00777 **
## z.t0         0.0081475  0.0039628   2.056  0.05648 . 
## z.t1        -0.0002266  0.0005561  -0.407  0.68908   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02002 on 16 degrees of freedom
## Multiple R-squared:  0.4106, Adjusted R-squared:  0.337 
## F-statistic: 5.574 on 2 and 16 DF,  p-value: 0.01456

The polynomial distributed lag model with q=12 has all of its scores at 5% level of significance. The adjusted R-squared score only explains 33.7% (0.337) of variability in RBO values. The p-value of the polynomial distributed lag model is 0.01456 (<0.05) hence, it is statistically significant.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(poly_dlm_rbo_humid_k1$model)
## W = 0.94453, p-value = 0.3176

Figure 72

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals
## LM test = 10.571, df = 6, p-value = 0.1026

The residuals in Figure 72 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.1026 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.3176 (>0.05) which fails to reject the null hypothesis of normality.

Fitting Koyck Distributed Lag Model

RBO and Temperature

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15981 -0.04678 -0.01440  0.04750  0.14952 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -1.2032     1.6184  -0.743    0.464
## Y.1           0.2469     0.4609   0.536    0.597
## X.t           0.1847     0.1947   0.949    0.351
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value  
## Weak instruments   1  27     1.011  0.3236  
## Wu-Hausman         1  26     3.247  0.0832 .
## Sargan             0  NA        NA      NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07557 on 27 degrees of freedom
## Multiple R-Squared: -1.597,  Adjusted R-squared: -1.789 
## Wald test: 2.119 on 2 and 27 DF,  p-value: 0.1397

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains (-1.789) of variability in RBO values. The p-value of the Koyck model is 0.1397 (>0.05) hence, it is not statistically significant. The “Weak Instruments” test resulted in a p-value of 0.3236 which means that the model at the first stage of least squares fitting is not significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.0832 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_ffd_temp_dlm$model)
## W = 0.98351, p-value = 0.9093

The residuals in Figure 73 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.9093 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 2.188316 2.188316

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

RBO and Rainfall

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15981 -0.04678 -0.01440  0.04750  0.14952 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -1.2032     1.6184  -0.743    0.464
## Y.1           0.2469     0.4609   0.536    0.597
## X.t           0.1847     0.1947   0.949    0.351
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value  
## Weak instruments   1  27     1.011  0.3236  
## Wu-Hausman         1  26     3.247  0.0832 .
## Sargan             0  NA        NA      NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07557 on 27 degrees of freedom
## Multiple R-Squared: -1.597,  Adjusted R-squared: -1.789 
## Wald test: 2.119 on 2 and 27 DF,  p-value: 0.1397

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_rbo_rain_dlm$model)
## W = 0.96568, p-value = 0.4287

The residuals in Figure 74 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.4287 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 5531.807 5531.807

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model suffers from multicollinearity as all VIF values are greater than 10.

RBO and Radiation

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.082255 -0.017008 -0.001036  0.021424  0.106984 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.48011    0.94819  -0.506   0.6167   
## Y.1          0.69801    0.24502   2.849   0.0083 **
## X.t          0.04812    0.05661   0.850   0.4028   
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value  
## Weak instruments   1  27     4.942  0.0348 *
## Wu-Hausman         1  26     2.765  0.1084  
## Sargan             0  NA        NA      NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0467 on 27 degrees of freedom
## Multiple R-Squared: 0.008467,    Adjusted R-squared: -0.06498 
## Wald test: 4.731 on 2 and 27 DF,  p-value: 0.01732

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains -0.64% (-0.06498) of variability in RBO values. The p-value of the Koyck model is 0.01732 (<0.05) hence, it is statistically significant. The “Weak Instruments” test resulted in a p-value of 0.0348 which means that the model at the first stage of least squares fitting is significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.1084 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_rbo_rad_dlm$model)
## W = 0.96083, p-value = 0.3253

Figure 75

The residuals in Figure 75 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are significant. The histogram of standardised residuals contain violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.3253 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 1.619594 1.619594

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

RBO and Humidity

## 
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.080897 -0.021103 -0.004676  0.022673  0.111041 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -1.16679    8.04941  -0.145   0.8858  
## Y.1          0.62503    0.34753   1.798   0.0833 .
## X.t          0.01525    0.08274   0.184   0.8551  
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value
## Weak instruments   1  27     0.393   0.536
## Wu-Hausman         1  26     0.055   0.816
## Sargan             0  NA        NA      NA
## 
## Residual standard error: 0.04127 on 27 degrees of freedom
## Multiple R-Squared: 0.2256,  Adjusted R-squared: 0.1682 
## Wald test: 5.612 on 2 and 27 DF,  p-value: 0.009161

The Koyck model has none of its scores significant at 5% level of significance. The adjusted R-squared score explains 16.82% (0.1682) of variability in RBO values. The p-value of the Koyck model is 0.009161 (<0.05) hence, it is statistically significant. The “Weak Instruments” test resulted in a p-value of 0.536 which means that the model at the first stage of least squares fitting is not significant at 5% level of significance. The “Wu-Hausman” test also resulted in a p-value of 0.816 which means that there is no significant correlation between the error term and the lag-dependent variable.

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(koyck_rbo_humid_dlm$model)
## W = 0.97232, p-value = 0.6044

Figure 76

The residuals in Figure 76 above presents a few observations. The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.6044 (>0.05) which fails to reject the null hypothesis of normality.

##      Y.1      X.t 
## 4.171591 4.171591

The Variance Inflation Factor (VIF) values suggest that the estimates from the finite distributed lag model does not suffer from multicollinearity as all VIF values are less than 10.

In conclusion, the Koyck model was unsuccessful in capturing the autocorrelation and seasonality in the series.

Fitting Autoregressive Distributed Lag Model

## p = 1 q = 1 AIC = -104.4462 BIC = -94.63785 MASE = 0.8287036 
## p = 1 q = 2 AIC = -103.9101 BIC = -92.97176 MASE = 0.8239655 
## p = 1 q = 3 AIC = -108.2781 BIC = -96.28825 MASE = 0.777656 
## p = 1 q = 4 AIC = -103.1442 BIC = -90.18588 MASE = 0.8081884 
## p = 1 q = 5 AIC = -97.24854 BIC = -83.40948 MASE = 0.7553928 
## p = 2 q = 1 AIC = -95.81121 BIC = -83.50555 MASE = 0.898983 
## p = 2 q = 2 AIC = -101.4288 BIC = -87.75582 MASE = 0.766712 
## p = 2 q = 3 AIC = -108.3739 BIC = -93.71963 MASE = 0.7290062 
## p = 2 q = 4 AIC = -101.6237 BIC = -86.07367 MASE = 0.7606791 
## p = 2 q = 5 AIC = -95.78843 BIC = -79.43317 MASE = 0.6950639 
## p = 3 q = 1 AIC = -98.57578 BIC = -83.92153 MASE = 0.8391506 
## p = 3 q = 2 AIC = -106.2021 BIC = -90.21565 MASE = 0.7339625 
## p = 3 q = 3 AIC = -105.0644 BIC = -87.74577 MASE = 0.7207013 
## p = 3 q = 4 AIC = -99.62896 BIC = -81.48724 MASE = 0.7164256 
## p = 3 q = 5 AIC = -94.31971 BIC = -75.44826 MASE = 0.6607785 
## p = 4 q = 1 AIC = -95.30416 BIC = -78.45828 MASE = 0.7700148 
## p = 4 q = 2 AIC = -99.65453 BIC = -81.51282 MASE = 0.6934633 
## p = 4 q = 3 AIC = -97.79474 BIC = -78.35719 MASE = 0.6924341 
## p = 4 q = 4 AIC = -96.13468 BIC = -75.40129 MASE = 0.7127527 
## p = 4 q = 5 AIC = -92.76031 BIC = -71.37267 MASE = 0.6234692 
## p = 5 q = 1 AIC = -96.93151 BIC = -78.06006 MASE = 0.6401758 
## p = 5 q = 2 AIC = -100.1132 BIC = -79.98367 MASE = 0.5990576 
## p = 5 q = 3 AIC = -98.62372 BIC = -77.23608 MASE = 0.603044 
## p = 5 q = 4 AIC = -97.14898 BIC = -74.50324 MASE = 0.5895437 
## p = 5 q = 5 AIC = -96.04542 BIC = -72.14159 MASE = 0.5702066

From the autoregressive distributed lag model above, it can be observed that the model with the lowest AIC value is p=4, q=5, the lowest BIC value is p=4, q=5, and the lowest MASE is p=5, q=5. However, as we are after the lowest MASE values, the models that will be selected are ARDL(5,2), ARDL(5,4), and ARDL(5,5)

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044837 -0.010709  0.000636  0.013783  0.061326 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -2.859327   1.920150  -1.489   0.1548  
## X.t          0.008326   0.009824   0.847   0.4085  
## X.1         -0.003273   0.007845  -0.417   0.6818  
## X.2          0.018191   0.007914   2.299   0.0345 *
## X.3         -0.009886   0.008070  -1.225   0.2373  
## X.4          0.001330   0.008752   0.152   0.8810  
## X.5          0.012198   0.008215   1.485   0.1559  
## Y.1          0.475549   0.207151   2.296   0.0347 *
## Y.2          0.416397   0.270505   1.539   0.1421  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02927 on 17 degrees of freedom
## Multiple R-squared:  0.663,  Adjusted R-squared:  0.5044 
## F-statistic: 4.181 on 8 and 17 DF,  p-value: 0.006333

The autoregressive distributed lag model ardl_5.2_task3, has p=5 and q=2. Almost all of its coefficients are not significant at 5% level of significance. The adjusted R-squared model explains 50.44% (0.5044) of the variability in mortality rates. The p value of ardl_5.2_task3 is 0.006333 (<0.05) hence, it is significant.

Figure 77

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 17.078, df = 12, p-value = 0.1467

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.2_task3$model)
## W = 0.97623, p-value = 0.7857

The residuals in Figure 77 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.7857 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contains violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.1467 (>0.05) which fails to reject the null hypothesis of normality.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  2.439200  1.554434  1.520936  1.605404  1.879600  1.549683  2.358583  4.032632

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.042942 -0.010248  0.001514  0.014243  0.060600 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -4.2002014  2.6807937  -1.567    0.138  
## X.t          0.0093770  0.0104375   0.898    0.383  
## X.1          0.0007341  0.0093073   0.079    0.938  
## X.2          0.0199841  0.0095605   2.090    0.054 .
## X.3         -0.0058925  0.0094126  -0.626    0.541  
## X.4          0.0025447  0.0095462   0.267    0.793  
## X.5          0.0116316  0.0086107   1.351    0.197  
## Y.1          0.3623728  0.2484277   1.459    0.165  
## Y.2          0.3666810  0.2873796   1.276    0.221  
## Y.3          0.1926356  0.2237680   0.861    0.403  
## Y.4          0.0859316  0.2354194   0.365    0.720  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03031 on 15 degrees of freedom
## Multiple R-squared:  0.6811, Adjusted R-squared:  0.4685 
## F-statistic: 3.203 on 10 and 15 DF,  p-value: 0.0208

The autoregressive distributed lag model ardl_5.4_task3, has p=5 and q=4. All of its coefficients are not significant at 5% level of significance. The adjusted R-squared model explains 46.85% (0.4685) of the variability in mortality rates. The p value of ardl_5.4_task3 is 0.0208 (<0.05) hence, it is significant.

Figure 78

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals
## LM test = 23.417, df = 14, p-value = 0.05382

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ardl_5.4_task3$model)
## W = 0.95566, p-value = 0.3133

The residuals in Figure 78 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.05382 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.3133 (>0.05) which fails to reject the null hypothesis of normality.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  2.567369  2.040296  2.069773  2.036219  2.084862  1.587393  3.162892  4.243817 
## L(y.t, 3) L(y.t, 4) 
##  3.180939  3.497515

## 
## Time series regression with "ts" data:
## Start = 6, End = 31
## 
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043603 -0.007140  0.002577  0.010431  0.061704 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -5.457663   3.187293  -1.712    0.109  
## X.t          0.010043   0.010626   0.945    0.361  
## X.1          0.002499   0.009726   0.257    0.801  
## X.2          0.020996   0.009791   2.144    0.050 .
## X.3         -0.002192   0.010729  -0.204    0.841  
## X.4          0.003910   0.009852   0.397    0.697  
## X.5          0.013895   0.009234   1.505    0.155  
## Y.1          0.362583   0.252044   1.439    0.172  
## Y.2          0.317047   0.298850   1.061    0.307  
## Y.3          0.166393   0.229659   0.725    0.481  
## Y.4          0.092673   0.239013   0.388    0.704  
## Y.5          0.178912   0.236435   0.757    0.462  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03075 on 14 degrees of freedom
## Multiple R-squared:  0.6936, Adjusted R-squared:  0.4529 
## F-statistic: 2.881 on 11 and 14 DF,  p-value: 0.03285

The autoregressive distributed lag model ardl_5.5_task3, has p=5 and q=5. All of its coefficients are not are not significant at 5% level of significance. The adjusted R-squared model explains 45.29% (0.4529) of the variability in mortality rates. The p value of ardl_5.5_task3 is 0.03285 (<0.05) hence, it is significant.

The residuals in Figure 79 above presents a few observations.The time series plot shows a non-random pattern. The ACF plot suggests that the serial correlation and seasonality remaining in the residuals are not significant. This is also reinforced by the Breusch-Godfrey test results of a p-value of 0.0661 (>0.05) which confirms the absence of serial correlation. The histogram of standardised residuals contains slight violations in normality assumptions. This is also reinforced by the Shapiro-Wilk normality test with a value of 0.5063 (>0.05) which fails to reject the null hypothesis of normality.

##       X.t L(X.t, 1) L(X.t, 2) L(X.t, 3) L(X.t, 4) L(X.t, 5) L(y.t, 1) L(y.t, 2) 
##  2.585128  2.164717  2.109082  2.570112  2.157184  1.773510  3.162896  4.458597 
## L(y.t, 3) L(y.t, 4) L(y.t, 5) 
##  3.255167  3.502381  3.361625

In conclusion, the autoregressive distributed lag models were not successful at capturing the autocorrelation and seasonal patterns in the series.

Time Series Regression Models Table Summary

Accuracy Score Table
Model	AIC	BIC	MASE
Finite DLM (Humidity)	-102.94035	-88.77377	0.2453801
Finite DLM (Temperature)	-87.10828	-72.94170	0.3522387
Finite DLM (Rainfall)	-79.63373	-65.46714	0.4240236
Finite DLM (Radiation)	-72.88766	-58.94741	0.5583271
Poly DLM k=2 (Humidity)	-88.16372	-83.44153	0.5722198
Poly DLM k=1 (Humidity)	-89.96751	-86.18975	0.5828660
Poly DLM k=2 (Rainfall)	-85.66633	-80.68767	0.6161421
Poly DLM k=1 (Rainfall)	-87.42340	-83.44047	0.6185122
Poly DLM k=2 (Radiation)	-84.26791	-79.28925	0.6193374
ARDL(5,5)	-97.36157	-81.00632	0.6208847
Poly DLM k=2 (Temperature)	-83.56579	-78.58713	0.6212847
Poly DLM k=1 (Temperature)	-85.49762	-81.51469	0.6248693
Poly DLM k=1 (Radiation)	-85.46993	-81.48700	0.6570757
ARDL(5,4)	-98.31933	-83.22217	0.6599444
ARDL(5,2)	-100.88457	-88.30361	0.7045789
Koyck (Humidity)	-101.28049	-95.67571	0.9559618
Koyck (Radiation)	-93.86690	-88.26211	1.0314227
Koyck (Temperature)	-64.98345	-59.37866	1.9150155
Koyck (Rainfall)	76.22068	81.82547	19.1057647

The table above depicts the accuracy measures (AIC, BIC, and MASE) of the fitted models thus far. It can be seen that Finite DLM (Humidity) has the lowest MASE value.

Forecasting

In conclusion, based on all the previous observations made - the model that should be used to forecast the FFD 4 years ahead should be the Finite DLM (Humidity) model based on the results of its residual analysis and its low MASE value.

Task 3a Conclusion

To conclude this exploration, the frequency was too low so Exponential Smoothing and State-Space Models were unable to fitted at this time. Due to time constraints, a forecasting plot was also unable to be achieved. Theoretically, based on the work done - thus far the Finite DLM (Humidity) was left as the best model to use for forecasting.

References

Demirhan H., 2020. Time Series Regression Models with Distributed Lag Models. Package ‘dLagM’. Available at: https://cran.r-project.org/web/packages/dLagM/dLagM.pdf [Accessed 13 September 2021].
Sauer, S., 2016. Different ways to set figure size in RMarkdown. [online] Sebastian Sauer Stats Blog. Available at: https://sebastiansauer.github.io/figure_sizing_knitr/ [Accessed 13 September 2021].

Common Box-Cox Transformations: https://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/

Appendix

# Task 1
### Data Exploration and Visualization
# Data Import and Cleaning
mort_import <- read_csv("mort  .csv")
mort_import = mort_import[,-1] # Drop first column

# Time-series Conversion
mort_mort_ts <- ts(mort_import$mortality, start = c(2010, 7), frequency = 53)
mort_temp_ts <- ts(mort_import$temp, start = c(2010, 7), frequency = 53)
mort_chem1_ts <- ts(mort_import$chem1, start = c(2010, 7), frequency = 53)
mort_chem2_ts <- ts(mort_import$chem2, start = c(2010, 7), frequency = 53)
mort_part_ts <- ts(mort_import$`particle size`, start = c(2010, 7), frequency = 53)
mort_ts <- ts(mort_import, start = c(2010, 7), frequency = 53)

# Plotting of Solar Time-Series
par(xpd=NA)
plot(mort_ts, 
     xlab = "Year",
     ylab = "Mortality",
     plot.type = "s", 
     col = c("blue", "brown2", "green4", "chartreuse", "yellow3"), 
     main = "Plot of Climate and Pollution on Disease specific morality factors")
add_legend("bottomright", 
           legend = c("Mortality", 
                      "Temperature", 
                      "Chemical 1", 
                      "Chemical 2", 
                      "Particle Size"), 
           pch = 20, 
           col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
           horiz = TRUE, 
           bty = 'n', 
           cex = 0.9)

# Scale and center the series
mort_scaled <- scale(mort_ts)
plot(mort_scaled, 
     xlab = "Time", 
     plot.type = "s", 
     col = c("blue", "brown2", "green4", "chartreuse", "yellow3"), 
     main = "Scaled plot of Climate and Pollution on Disease specific morality factors")
add_legend("bottomright", 
           legend = c("Mortality", 
                      "Temperature", 
                      "Chemical 1", 
                      "Chemical 2", 
                      "Particle Size"), 
           pch = 20, 
           col = c("blue", "brown2", "green4", "chartreuse", "yellow3"),
           horiz = TRUE, 
           bty = 'n', 
           cex = 0.9)


### Mortality Time Series Analysis
plot(mort_mort_ts, 
     xlab = "Time", 
     ylab = "Amount of Mortality", 
     main = "Mortality Time Series")


### Mortality ADF & ACF
adf.test(mort_mort_ts)
acf(mort_mort_ts, 
    lag.max = 48, 
    main = "ACF of Mortality series")

### Temperature Time Series Analysis
plot(mort_temp_ts, 
     xlab = "Time", 
     ylab = "Temperature", 
     main = "Temperature Time Series")

### Temperature ADF & ACF
adf.test(mort_temp_ts)
acf(mort_temp_ts, 
    lag.max = 48, 
    main = "ACF of Temperature series")

### Chemical 1 Time Series Analysis
plot(mort_chem1_ts, 
     xlab = "Time", 
     ylab = "Amount of Chemical 1", 
     main = "Chemical 1 Time Series")

### Chemical 1 ADF & ACF
adf.test(mort_chem1_ts)
acf(mort_chem1_ts, 
    lag.max = 48, 
    main = "ACF of Chemical 1 series")


### Chemical 2 Time Series Analysis
plot(mort_chem2_ts, 
     xlab = "Time", 
     ylab = "Amount of Chemical 2", 
     main = "Chemical 2 Time Series")

### Chemical 2 ADF & ACF
adf.test(mort_chem2_ts)
acf(mort_chem2_ts, 
    lag.max = 48, 
    main = "ACF of Chemical 2 series")

### Particle Size Time Series Analysis
plot(mort_part_ts, 
     xlab = "Time", 
     ylab = "Size of Particle", 
     main = "Particle Size Time Series Plot")


### Particle Size ADF & ACF
adf.test(mort_part_ts)
acf(mort_part_ts, 
    lag.max = 48, 
    main = "ACF of Particle Size")

### Impact of Trend and Seasonality Decomposition
mort_mort_decomposition_stl <- stl(mort_mort_ts, t.window = 15, 
                                   s.window = "periodic", 
                                   robust = TRUE)
plot(mort_mort_decomposition_stl, 
     main = "STL Decomposition of Mortality Time Series")


mort_temp_decomposition_stl <- stl(mort_temp_ts, 
                                   t.window = 15, 
                                   s.window = "periodic", 
                                   robust = TRUE)
plot(mort_temp_decomposition_stl, 
     main = "STL Decomposition of Temperature Time Series")


mort_chem1_decomposition_stl <- stl(mort_chem1_ts, 
                                    t.window = 15, 
                                    s.window = "periodic", 
                                    robust = TRUE)
plot(mort_chem1_decomposition_stl, 
     main = "STL Decomposition of Chemical 1 Time Series")

mort_chem2_decomposition_stl <- stl(mort_chem2_ts, 
                                    t.window = 15, 
                                    s.window = "periodic", 
                                    robust = TRUE)
plot(mort_chem2_decomposition_stl, 
     main = "STL Decomposition of Chemical 2 Time Series")

mort_part_decomposition_stl <- stl(mort_part_ts, 
                                   t.window = 15, 
                                   s.window = "periodic", 
                                   robust = TRUE)
plot(mort_mort_decomposition_stl, 
     main = "STL Decomposition of Particle Size Time Series")

## Time Series Regression Model Fitting
### Correlation Coefficient
cor(mort_ts)

### Fitting Finite Distributed Lag Model
#### Finite DLM
colnames(mort_import)<-c("y","x1","x2","x3","x4")
for (i in 1:12){
  fin_model <- dlm(formula = y~x1+x2+x3,
                   data = data.frame(mort_import),
                   q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model$model), 
      "BIC =", BIC(fin_model$model), 
      "MASE =", MASE(fin_model)$MASE, "\n")
}

fin_dlm <- dlm(formula = y~x1+x3+x4,
               data = data.frame(mort_import),
               q = 10)
summary(fin_dlm)
shapiro.test(residuals(fin_dlm$model))
checkresiduals(fin_dlm$model)
vif_fin_dlm = vif(fin_dlm$model)
vif_fin_dlm

### Fitting Polynomial Distributed Lag Model
for ( i in 1:12){
    poly_dlm_k2 = polyDlm(y = as.vector(mort_mort_ts), 
                          x = as.vector(mort_chem1_ts), 
                          q = i, 
                          k = 2, 
                          show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_k2$model), 
        "BIC =", BIC(poly_dlm_k2$model), 
        "MASE =", MASE(poly_dlm_k2)$MASE, "\n")
}

poly_dlm_k2 = polyDlm(y = as.vector(mort_mort_ts), 
                      x = as.vector(mort_chem1_ts), 
                      q = 10, 
                      k = 2)
summary(poly_dlm_k2)
shapiro.test(residuals(poly_dlm_k2$model))
checkresiduals(poly_dlm_k2$model)
poly_dlm_k1 = polyDlm(y = as.vector(mort_mort_ts), 
                      x = as.vector(mort_chem1_ts), 
                      q = 10, 
                      k = 1)
summary(poly_dlm_k1)
shapiro.test(residuals(poly_dlm_k1$model))
checkresiduals(poly_dlm_k1$model)

### Fitting Koyck Distributed Lag Model
koyck_dlm = koyckDlm(y = as.vector(mort_mort_ts), 
                     x = as.vector(mort_chem1_ts))
summary(koyck_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_dlm$model))
checkresiduals(koyck_dlm$model)

vif_koyck_dlm = vif(koyck_dlm$model)
vif_koyck_dlm

### Fitting Autoregressive Distributed Lag Model
### Fitting Autoregressive Distributed Lag Models with three predictors
for (i in 1:5){
  for(j in 1:5){
    ardlm_tempchem = ardlDlm(formula = y~x1+x2+x3,
                             data = data.frame(mort_import),
                             p = i, 
                             q = j)
    cat("p =", i,
        "q =", j,
        "AIC =", AIC(ardlm_tempchem$model), 
        "BIC =", BIC(ardlm_tempchem$model), 
        "MASE =", MASE(ardlm_tempchem)$MASE, "\n")
  }
}

ardl_5.3 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts), 
                   y = as.vector(mort_mort_ts),
                   p = 5, 
                   q = 3)
summary(ardl_5.3)
checkresiduals(ardl_5.3$model)
shapiro.test(residuals(ardl_5.3$model))
VIF.ardlm_low_aic_tempchem = vif(ardl_5.3$model)
VIF.ardlm_low_aic_tempchem
ardl_5.4 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts),
                   y = as.vector(mort_mort_ts), 
                   p = 5, 
                   q = 4)
summary(ardl_5.4)
checkresiduals(ardl_5.4$model)
shapiro.test(residuals(ardl_5.4$model))
VIF.ardlm_low_bic_tempchem = vif(ardl_5.4$model)
VIF.ardlm_low_bic_tempchem
ardl_5.5 = ardlDlm(x = as.vector(mort_temp_ts) + 
                       as.vector(mort_chem1_ts) + 
                       as.vector(mort_chem2_ts),
                   y = as.vector(mort_mort_ts), 
                   p = 5, 
                   q = 5)
summary(ardl_5.5)
checkresiduals(ardl_5.5$model)
shapiro.test(residuals(ardl_5.5$model))
VIF.ardlm_low_mase_tempchem = vif(ardl_5.5$model)
VIF.ardlm_low_mase_tempchem

### Time Series Regression Models Table Summary
attr(koyck_dlm$model,"class") = "lm"
models <- c("Finite DLM", "Poly DLM", "Koyck", "ARDL(5,3)", "ARDL(5,4)", "ARDL(5,5)")
aic_score <- AIC(fin_dlm$model, poly_dlm_k1$model, 
           koyck_dlm$model, ardl_5.3$model, 
           ardl_5.4$model, ardl_5.5$model)$AIC
bic_score <- BIC(fin_dlm$model, poly_dlm_k1$model, 
           koyck_dlm$model, ardl_5.3$model, 
           ardl_5.4$model, ardl_5.5$model)$BIC
MASE_score <- MASE(fin_dlm$model, poly_dlm_k1$model, 
             koyck_dlm$model, ardl_5.3, 
             ardl_5.4, ardl_5.5)$MASE

accuracy_table <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table <- arrange(accuracy_table, MASE)
kable(accuracy_table, caption = "Accuracy Score Table")


# Task 2
### Data Exploration and Visualization
# Data Import and Cleaning
ffd_import<- read_csv("FFD  .csv")

# Time Series Conversion
ffd_ts <- ts(ffd_import, start = c(1984), frequency = 1)
ffd_temp_ts <- ts(ffd_import$Temperature, start = c(1984), frequency = 1)
ffd_rain_ts <- ts(ffd_import$Rainfall, start = c(1984), frequency = 1)
ffd_rad_ts <- ts(ffd_import$Radiation, start = c(1984), frequency = 1)
ffd_humid_ts <- ts(ffd_import$RelHumidity, start = c(1984), frequency = 1)
ffd_ffd_ts <- ts(ffd_import$FFD, start = c(1984), frequency = 1)

# Plotting of Time Series(s)
plot(ffd_ts, main="Time series plots of Species First Flowering Factors", type="o")

### Temperature Time Series Analysis
plot(ffd_temp_ts, 
     xlab = "Time", 
     ylab = "Temperature", 
     type = "o",
     main = "Temperature Time Series")

### Temperature ADF & ACF
adf.test(ffd_temp_ts)
acf(ffd_temp_ts, 
    lag.max = 48, 
    main = "ACF of Temperature series")

### Rainfall Time Series Analysis
plot(ffd_rain_ts, 
     xlab = "Time", 
     ylab = "Amount of Rainfall", 
     type = "o",
     main = "Rainfall Time Series")


### Rainfall ADF & ACF
adf.test(ffd_rain_ts)
acf(ffd_rain_ts, 
    lag.max = 48, 
    main = "ACF of Rainfall series")

### Radiation Time Series Analysis
plot(ffd_rad_ts, 
     xlab = "Time", 
     ylab = "Amount of Radiation",
     type = "o",
     main = "Radiation Time Series")

### Radiation ADF & ACF
adf.test(ffd_rad_ts)
acf(ffd_rad_ts, 
    lag.max = 48, 
    main = "ACF of Radiation series")

### Relative Humidity Time Series Analysis
plot(ffd_humid_ts, 
     xlab = "Time", 
     ylab = "Relative Humidity Levels",
     type = "o",
     main = "Relative Humidity Time Series")

### Relative Humidity ADF & ACF
adf.test(ffd_humid_ts)
acf(ffd_humid_ts, 
    lag.max = 48, 
    main = "ACF of Relative Humidity series")

### First Flowering Day Time Series Analysis
plot(ffd_ffd_ts, 
     xlab = "Time", 
     ylab = "First Flowering Day (FFD)", 
     type = "o",
     main = "First Flowering Day Time Series Plot")

### First Flowering Day ADF & ACF
adf.test(ffd_ffd_ts)
acf(ffd_ffd_ts, 
    lag.max = 48, 
    main = "ACF of First Flowering Day (FFD)")

## Time Series Regression Model Fitting
### Correlation Coefficient
cor(ffd_ts)

### Fitting Finite Distributed Lag Model with Intercepts
#### Finite DLM Temperature
colnames(ffd_import)<-c("x1","x2","x3","x4","x5","y")
for (i in 1:12){
  fin_model_temp <- dlm(formula = y~x2,
                        data = data.frame(ffd_import),
                        q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_temp$model), 
      "BIC =", BIC(fin_model_temp$model), 
      "MASE =", MASE(fin_model_temp)$MASE, "\n")
}

fin_dlm_temp <- dlm(formula = y~x2,
                    data = data.frame(ffd_import),
                    q = 12)

shapiro.test(residuals(fin_dlm_temp$model))
checkresiduals(fin_dlm_temp$model)
vif_fin_dlm_temp = vif(fin_dlm_temp$model)
vif_fin_dlm_temp

#### Finite DLM Rainfall
for (i in 1:12){
  fin_model_rain <- dlm(formula = y~x3,
                        data = data.frame(ffd_import),
                        q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_rain$model), 
      "BIC =", BIC(fin_model_rain$model), 
      "MASE =", MASE(fin_model_rain)$MASE, "\n")
}

fin_dlm_rain <- dlm(formula = y~x3,
                    data = data.frame(ffd_import),
                    q = 11)
summary(fin_dlm_rain)
shapiro.test(residuals(fin_dlm_rain$model))
checkresiduals(fin_dlm_rain$model)
vif_fin_dlm_rain = vif(fin_dlm_rain$model)
vif_fin_dlm_rain

#### Finite DLM Radiation
for (i in 1:12){
  fin_model_rad <- dlm(formula = y~x4,
                       data = data.frame(ffd_import),
                       q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_rad$model), 
      "BIC =", BIC(fin_model_rad$model), 
      "MASE =", MASE(fin_model_rad)$MASE, "\n")
}
fin_dlm_rad <- dlm(formula = y~x4,
                   data = data.frame(ffd_import),
                   q = 12)
summary(fin_dlm_rad)
shapiro.test(residuals(fin_dlm_rad$model))
checkresiduals(fin_dlm_rad$model)

vif_fin_dlm_rad = vif(fin_dlm_rad$model)
vif_fin_dlm_rad

#### Finite DLM Relative Humidity
for (i in 1:12){
  fin_model_humid <- dlm(formula = y~x5,
                         data = data.frame(ffd_import),
                         q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_humid$model), 
      "BIC =", BIC(fin_model_humid$model), 
      "MASE =", MASE(fin_model_humid)$MASE, "\n")
}

fin_dlm_humid <- dlm(formula = y~x5,
                     data = data.frame(ffd_import),
                     q = 12)
summary(fin_dlm_humid)
shapiro.test(residuals(fin_dlm_humid$model))
checkresiduals(fin_dlm_humid$model)
vif_fin_dlm_humid = vif(fin_dlm_humid$model)
vif_fin_dlm_humid

### Fitting Finite Distributed Lag Model without Intercepts
#### Finite DLM Temperature (no Intercept)
fin_dlm_temp_noint <- dlm(formula = y~x2-1,
                          data = data.frame(ffd_import),
                          q = 12)
summary(fin_dlm_temp_noint)
shapiro.test(residuals(fin_dlm_temp_noint$model))
checkresiduals(fin_dlm_temp_noint$model)
vif_fin_dlm_temp_noint = vif(fin_dlm_temp_noint$model)
vif_fin_dlm_temp_noint

#### Finite DLM Rainfall (no Intercept)
fin_dlm_rain_noint <- dlm(formula = y~x3-1,
                          data = data.frame(ffd_import),
                          q = 12)
summary(fin_dlm_rain_noint)
shapiro.test(residuals(fin_dlm_rain_noint$model))
checkresiduals(fin_dlm_rain_noint$model)
vif_fin_dlm_rain_noint = vif(fin_dlm_rain_noint$model)
vif_fin_dlm_rain_noint

#### Finite DLM Radiation (no Intercept)
fin_dlm_rad_noint <- dlm(formula = y~x4-1,
                         data = data.frame(ffd_import),
                         q = 12)
summary(fin_dlm_rad_noint)
shapiro.test(residuals(fin_dlm_rad_noint$model))
checkresiduals(fin_dlm_rad_noint$model)
vif_fin_dlm_rad_noint = vif(fin_dlm_rad_noint$model)
vif_fin_dlm_rad_noint

#### Finite DLM Relative Humidity (no Intercept)
fin_dlm_humid_noint <- dlm(formula = y~x5-1,
                           data = data.frame(ffd_import),
                           q = 12)
summary(fin_dlm_humid_noint)
shapiro.test(residuals(fin_dlm_humid_noint$model))
checkresiduals(fin_dlm_humid_noint$model)
vif_fin_dlm_humid_noint = vif(fin_dlm_humid_noint$model)
vif_fin_dlm_humid_noint

### Fitting Polynomial Distributed Lag Model
### FFD and Temperature
for ( i in 1:12){
    poly_dlm_ffd_temp_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                                   x = as.vector(ffd_temp_ts), 
                                   q = i, 
                                   k = 2, 
                                   show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_ffd_temp_k2$model), 
        "BIC =", BIC(poly_dlm_ffd_temp_k2$model), 
        "MASE =", MASE(poly_dlm_ffd_temp_k2)$MASE, "\n")
}

poly_dlm_ffd_temp_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_temp_ts), 
                               q = 12, 
                               k = 2)
shapiro.test(residuals(poly_dlm_ffd_temp_k2$model))
checkresiduals(poly_dlm_ffd_temp_k2$model)
poly_dlm_ffd_temp_k1 = polyDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_temp_ts), 
                               q = 12, 
                               k = 1)
summary(poly_dlm_ffd_temp_k1)
shapiro.test(residuals(poly_dlm_ffd_temp_k1$model))
checkresiduals(poly_dlm_ffd_temp_k1$model)

### FFD and Rainfall
for ( i in 1:12){
    poly_dlm_ffd_rain_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                                   x = as.vector(ffd_rain_ts), 
                                   q = i, 
                                   k = 2, 
                                   show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_ffd_rain_k2$model), 
        "BIC =", BIC(poly_dlm_ffd_rain_k2$model), 
        "MASE =", MASE(poly_dlm_ffd_rain_k2)$MASE, "\n")
}
poly_dlm_ffd_rain_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_rain_ts), 
                               q = 11, 
                               k = 2)
summary(poly_dlm_ffd_rain_k2)
shapiro.test(residuals(poly_dlm_ffd_rain_k2$model))
checkresiduals(poly_dlm_ffd_rain_k2$model)
poly_dlm_ffd_rain_k1 = polyDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_rain_ts), 
                               q = 11, 
                               k = 1)
summary(poly_dlm_ffd_rain_k1)
shapiro.test(residuals(poly_dlm_ffd_rain_k1$model))
checkresiduals(poly_dlm_ffd_rain_k1$model)

### FFD and Radiation
for ( i in 1:12){
    poly_dlm_ffd_rad_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                                  x = as.vector(ffd_rad_ts), 
                                  q = i, 
                                  k = 2, 
                                  show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_ffd_rad_k2$model), 
        "BIC =", BIC(poly_dlm_ffd_rad_k2$model), 
        "MASE =", MASE(poly_dlm_ffd_rad_k2)$MASE, "\n")
}
poly_dlm_ffd_rad_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                              x = as.vector(ffd_rad_ts), 
                              q = 11, 
                              k = 2)
summary(poly_dlm_ffd_rad_k2)
shapiro.test(residuals(poly_dlm_ffd_rad_k2$model))
checkresiduals(poly_dlm_ffd_rad_k2$model)
poly_dlm_ffd_rad_k1 = polyDlm(y = as.vector(ffd_ffd_ts), 
                              x = as.vector(ffd_rad_ts), 
                              q = 11, 
                              k = 1)
summary(poly_dlm_ffd_rad_k1)
shapiro.test(residuals(poly_dlm_ffd_rad_k1$model))
checkresiduals(poly_dlm_ffd_rad_k1$model)

### FFD and Humidity
for ( i in 1:12){
    poly_dlm_ffd_humid_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                                    x = as.vector(ffd_humid_ts), 
                                    q = i, 
                                    k = 2, 
                                    show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_ffd_humid_k2$model), 
        "BIC =", BIC(poly_dlm_ffd_humid_k2$model), 
        "MASE =", MASE(poly_dlm_ffd_humid_k2)$MASE, "\n")
}

poly_dlm_ffd_humid_k2 = polyDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_humid_ts), 
                               q = 12, 
                               k = 2)
summary(poly_dlm_ffd_humid_k2)
shapiro.test(residuals(poly_dlm_ffd_humid_k2$model))
checkresiduals(poly_dlm_ffd_humid_k2$model)

poly_dlm_ffd_humid_k1 = polyDlm(y = as.vector(ffd_ffd_ts), 
                                x = as.vector(ffd_humid_ts), 
                                q = 12, 
                                k = 1)
summary(poly_dlm_ffd_humid_k1)
shapiro.test(residuals(poly_dlm_ffd_humid_k1$model))
checkresiduals(poly_dlm_ffd_humid_k1$model)

### Fitting Koyck Distributed Lag Model
### FFD and Temperature
koyck_ffd_temp_dlm = koyckDlm(y = as.vector(ffd_ffd_ts), 
                              x = as.vector(ffd_temp_ts))
summary(koyck_ffd_temp_dlm$model, diagnostics = TRUE)
shapiro.test(residuals(koyck_ffd_temp_dlm$model))
checkresiduals(koyck_ffd_temp_dlm$model)
vif_koyck_ffd_temp_dlm = vif(koyck_ffd_temp_dlm$model)
vif_koyck_ffd_temp_dlm

### FFD and Rainfall
koyck_ffd_rain_dlm = koyckDlm(y = as.vector(ffd_ffd_ts), 
                              x = as.vector(ffd_rain_ts))
summary(koyck_ffd_temp_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_ffd_rain_dlm$model))
checkresiduals(koyck_ffd_rain_dlm$model)

vif_koyck_ffd_rain_dlm = vif(koyck_ffd_rain_dlm$model)
vif_koyck_ffd_rain_dlm

### FFD and Radiation
koyck_ffd_rad_dlm = koyckDlm(y = as.vector(ffd_ffd_ts), 
                             x = as.vector(ffd_rad_ts))
summary(koyck_ffd_rad_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_ffd_rad_dlm$model))
checkresiduals(koyck_ffd_rad_dlm$model)

vif_koyck_ffd_rad_dlm = vif(koyck_ffd_rad_dlm$model)
vif_koyck_ffd_rad_dlm

### FFD and Humidity
koyck_ffd_humid_dlm = koyckDlm(y = as.vector(ffd_ffd_ts), 
                               x = as.vector(ffd_humid_ts))
summary(koyck_ffd_humid_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_ffd_humid_dlm$model))
checkresiduals(koyck_ffd_humid_dlm$model)

vif_koyck_ffd_humid_dlm = vif(koyck_ffd_humid_dlm$model)
vif_koyck_ffd_humid_dlm

### Fitting Autoregressive Distributed Lag Model
for (i in 1:5){
  for(j in 1:5){
    ardlm_rainhumid = ardlDlm(formula = y~x2+x4,
                              data = data.frame(ffd_import),
                              p = i, 
                              q = j)
    cat("p =", i,
        "q =", j,
        "AIC =", AIC(ardlm_rainhumid$model), 
        "BIC =", BIC(ardlm_rainhumid$model), 
        "MASE =", MASE(ardlm_rainhumid)$MASE, "\n")
  }
}

ardl_5.3_task2 = ardlDlm(x = as.vector(ffd_rain_ts) + 
                             as.vector(ffd_humid_ts), 
                         y = as.vector(ffd_ffd_ts),
                         p = 5, 
                         q = 3)
summary(ardl_5.3_task2)

checkresiduals(ardl_5.3_task2$model)
shapiro.test(residuals(ardl_5.3_task2$model))

VIF.ardlm_rain_humid = vif(ardl_5.3_task2$model)
VIF.ardlm_rain_humid

ardl_5.4_task2 = ardlDlm(x = as.vector(ffd_rain_ts) + 
                             as.vector(ffd_humid_ts), 
                         y = as.vector(ffd_ffd_ts),
                         p = 5, 
                         q = 4)
summary(ardl_5.4_task2)

checkresiduals(ardl_5.4_task2$model)
shapiro.test(residuals(ardl_5.4_task2$model))

VIF.ardlm_rain_humid = vif(ardl_5.4_task2$model)
VIF.ardlm_rain_humid

ardl_5.5_task2 = ardlDlm(x = as.vector(ffd_rain_ts) + 
                             as.vector(ffd_humid_ts), 
                         y = as.vector(ffd_ffd_ts),
                         p = 5, 
                         q = 5)
summary(ardl_5.5_task2)

checkresiduals(ardl_5.5_task2$model)
shapiro.test(residuals(ardl_5.5_task2$model))

VIF.ardlm_rain_humid = vif(ardl_5.5_task2$model)
VIF.ardlm_rain_humid

### Time Series Regression Models Table Summary
attr(koyck_ffd_temp_dlm$model,"class") = "lm"
attr(koyck_ffd_rain_dlm$model,"class") = "lm"
attr(koyck_ffd_rad_dlm$model,"class") = "lm"
attr(koyck_ffd_humid_dlm$model,"class") = "lm"

models <- c("Finite DLM /w Intercept (Temperature)",
            "Finite DLM /w Intercept (Rainfall)",
            "Finite DLM /w Intercept (Radiation)",
            "Finite DLM /w Intercept (Humidity)",
            "Finite DLM /wo Intercept (Temperature)",
            "Finite DLM /wo Intercept (Rainfall)",
            "Finite DLM /wo Intercept (Radiation)",
            "Finite DLM /wo Intercept (Humidity)",
            "Poly DLM k=2 (Temperature)", 
            "Poly DLM k=2 (Rainfall)", 
            "Poly DLM k=2 (Radiation)", 
            "Poly DLM k=2 (Humidity)", 
            "Poly DLM k=1 (Temperature)", 
            "Poly DLM k=1 (Rainfall)", 
            "Poly DLM k=1 (Radiation)", 
            "Poly DLM k=1 (Humidity)", 
            "Koyck (Temperature)", 
            "Koyck (Rainfall)",
            "Koyck (Radiation)",
            "Koyck (Humidity)",
            "ARDL(5,3)", 
            "ARDL(5,4)", 
            "ARDL(5,5)")
aic_score <- AIC(fin_dlm_temp$model,
                 fin_dlm_rain$model,
                 fin_dlm_rad$model,
                 fin_dlm_humid$model,
                 fin_dlm_temp_noint$model,
                 fin_dlm_rain_noint$model,
                 fin_dlm_rad_noint$model,
                 fin_dlm_humid_noint$model,
                 poly_dlm_ffd_temp_k2$model,
                 poly_dlm_ffd_rain_k2$model,
                 poly_dlm_ffd_rad_k2$model,
                 poly_dlm_ffd_humid_k2$model,
                 poly_dlm_ffd_temp_k1$model,
                 poly_dlm_ffd_rain_k1$model,
                 poly_dlm_ffd_rad_k1$model,
                 poly_dlm_ffd_humid_k1$model,
                 koyck_ffd_temp_dlm$model, 
                 koyck_ffd_rain_dlm$model,
                 koyck_ffd_rad_dlm$model,
                 koyck_ffd_humid_dlm$model,
                 ardl_5.3_task2$model, 
                 ardl_5.4_task2$model, 
                 ardl_5.5_task2$model)$AIC
bic_score <- BIC(fin_dlm_temp$model,
                 fin_dlm_rain$model,
                 fin_dlm_rad$model,
                 fin_dlm_humid$model,
                 fin_dlm_temp_noint$model,
                 fin_dlm_rain_noint$model,
                 fin_dlm_rad_noint$model,
                 fin_dlm_humid_noint$model,
                 poly_dlm_ffd_temp_k2$model,
                 poly_dlm_ffd_rain_k2$model,
                 poly_dlm_ffd_rad_k2$model,
                 poly_dlm_ffd_humid_k2$model,
                 poly_dlm_ffd_temp_k1$model,
                 poly_dlm_ffd_rain_k1$model,
                 poly_dlm_ffd_rad_k1$model,
                 poly_dlm_ffd_humid_k1$model,
                 koyck_ffd_temp_dlm$model, 
                 koyck_ffd_rain_dlm$model,
                 koyck_ffd_rad_dlm$model,
                 koyck_ffd_humid_dlm$model,
                 ardl_5.3_task2$model, 
                 ardl_5.4_task2$model, 
                 ardl_5.5_task2$model)$BIC
MASE_score <- MASE(fin_dlm_temp$model,
                 fin_dlm_rain$model,
                 fin_dlm_rad$model,
                 fin_dlm_humid$model,
                 fin_dlm_temp_noint$model,
                 fin_dlm_rain_noint$model,
                 fin_dlm_rad_noint$model,
                 fin_dlm_humid_noint$model,
                 poly_dlm_ffd_temp_k2$model,
                 poly_dlm_ffd_rain_k2$model,
                 poly_dlm_ffd_rad_k2$model,
                 poly_dlm_ffd_humid_k2$model,
                 poly_dlm_ffd_temp_k1$model,
                 poly_dlm_ffd_rain_k1$model,
                 poly_dlm_ffd_rad_k1$model,
                 poly_dlm_ffd_humid_k1$model,
                 koyck_ffd_temp_dlm$model, 
                 koyck_ffd_rain_dlm$model,
                 koyck_ffd_rad_dlm$model,
                 koyck_ffd_humid_dlm$model,
                 ardl_5.3_task2, 
                 ardl_5.4_task2, 
                 ardl_5.5_task2)$MASE

accuracy_table_task2 <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table_task2) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table_task2 <- arrange(accuracy_table_task2, MASE)
kable(accuracy_table_task2, caption = "Accuracy Score Table")


# Task 3a
### Data Exploration and Visualization
# Data Import and Cleaning
rbo_import<- read_csv("RBO  .csv")

# Time Series Conversion
rbo_ts <- ts(rbo_import, start = c(1984), frequency = 1)
rbo_temp_ts <- ts(rbo_import$Temperature, start = c(1984), frequency = 1)
rbo_rain_ts <- ts(rbo_import$Rainfall, start = c(1984), frequency = 1)
rbo_rad_ts <- ts(rbo_import$Radiation, start = c(1984), frequency = 1)
rbo_humid_ts <- ts(rbo_import$RelHumidity, start = c(1984), frequency = 1)
rbo_rbo_ts <- ts(rbo_import$RBO, start = c(1984), frequency = 1)

# Plotting of Time Series(s)
plot(rbo_ts, main="Time series plots of Rank-based Order similarity metric and factors", type="o")

### RBO Time Series Analysis
plot(rbo_rbo_ts, 
     xlab = "Time", 
     ylab = "Rank-based Order similarity metric (RBO)", 
     type = "o",
     main = "Rank-based Order similarity metric (RBO) Time Series Plot")

### RBO ADF & ACF
adf.test(rbo_rbo_ts)
acf(rbo_rbo_ts, 
    lag.max = 48, 
    main = "ACF of Rank-based Order similarity metric (RBO)")

## Time Series Regression Model Fitting
### Correlation Coefficient
cor(rbo_ts)

### Fitting Finite Distributed Lag Model with Intercepts
#### Finite DLM Temperature
colnames(rbo_import)<-c("x1","y","x2","x3","x4","x5")
for (i in 1:12){
  fin_model_temp_rbo <- dlm(formula = y~x2,
                            data = data.frame(rbo_import),
                            q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_temp_rbo$model), 
      "BIC =", BIC(fin_model_temp_rbo$model), 
      "MASE =", MASE(fin_model_temp_rbo)$MASE, "\n")
}

fin_dlm_temp_rbo <- dlm(formula = y~x2,
                        data = data.frame(rbo_import),
                        q = 12)
summary(fin_dlm_temp_rbo)

shapiro.test(residuals(fin_dlm_temp_rbo$model))
#checkresiduals(fin_dlm_temp_rbo$model)

vif_fin_dlm_temp_rbo = vif(fin_dlm_temp_rbo$model)
vif_fin_dlm_temp_rbo

#### Finite DLM Rainfall
for (i in 1:12){
  fin_model_rain_rbo <- dlm(formula = y~x3,
                            data = data.frame(rbo_import),
                            q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_rain_rbo$model), 
      "BIC =", BIC(fin_model_rain_rbo$model), 
      "MASE =", MASE(fin_model_rain_rbo)$MASE, "\n")
}

fin_dlm_rain_rbo <- dlm(formula = y~x3,
                    data = data.frame(rbo_import),
                    q = 12)
summary(fin_dlm_rain_rbo)

shapiro.test(residuals(fin_dlm_rain_rbo$model))
#checkresiduals(fin_dlm_rain_rbo$model)

vif_fin_dlm_rain_rbo = vif(fin_dlm_rain_rbo$model)
vif_fin_dlm_rain_rbo

#### Finite DLM Radiation
for (i in 1:12){
  fin_model_rad_rbo <- dlm(formula = y~x4,
                           data = data.frame(rbo_import),
                           q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_rad_rbo$model), 
      "BIC =", BIC(fin_model_rad_rbo$model), 
      "MASE =", MASE(fin_model_rad_rbo)$MASE, "\n")
}

fin_dlm_rad_rbo <- dlm(formula = y~x4,
                       data = data.frame(rbo_import),
                       q = 11)
summary(fin_dlm_rad_rbo)

shapiro.test(residuals(fin_dlm_rad_rbo$model))
#checkresiduals(fin_dlm_rad_rbo$model)

vif_fin_dlm_rad_rbo = vif(fin_dlm_rad_rbo$model)
vif_fin_dlm_rad_rbo

#### Finite DLM Relative Humidity
for (i in 1:12){
  fin_model_humid_rbo <- dlm(formula = y~x5,
                             data = data.frame(rbo_import),
                             q = i)
  cat("q =", i, 
      "AIC =", AIC(fin_model_humid_rbo$model), 
      "BIC =", BIC(fin_model_humid_rbo$model), 
      "MASE =", MASE(fin_model_humid_rbo)$MASE, "\n")
}

fin_dlm_humid_rbo <- dlm(formula = y~x5,
                         data = data.frame(rbo_import),
                         q = 12)
summary(fin_dlm_humid_rbo)

shapiro.test(residuals(fin_dlm_humid_rbo$model))
#checkresiduals(fin_dlm_humid_rbo$model)

vif_fin_dlm_humid_rbo = vif(fin_dlm_humid_rbo$model)
vif_fin_dlm_humid_rbo

### Fitting Polynomial Distributed Lag Model
### RBO and Temperature
for ( i in 1:12){
    poly_dlm_rbo_temp_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                   x = as.vector(rbo_temp_ts), 
                                   q = i, 
                                   k = 2, 
                                   show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_rbo_temp_k2$model), 
        "BIC =", BIC(poly_dlm_rbo_temp_k2$model), 
        "MASE =", MASE(poly_dlm_rbo_temp_k2)$MASE, "\n")
}

poly_dlm_rbo_temp_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                               x = as.vector(rbo_temp_ts), 
                               q = 11, 
                               k = 2)
summary(poly_dlm_rbo_temp_k2)

shapiro.test(residuals(poly_dlm_rbo_temp_k2$model))
checkresiduals(poly_dlm_rbo_temp_k2$model)

poly_dlm_rbo_temp_k1 = polyDlm(y = as.vector(rbo_rbo_ts), 
                               x = as.vector(rbo_temp_ts), 
                               q = 11, 
                               k = 1)
summary(poly_dlm_rbo_temp_k1)

shapiro.test(residuals(poly_dlm_rbo_temp_k1$model))
checkresiduals(poly_dlm_rbo_temp_k1$model)

### RBO and Rainfall
for ( i in 1:12){
    poly_dlm_rbo_rain_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                   x = as.vector(rbo_rain_ts), 
                                   q = i, 
                                   k = 2, 
                                   show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_rbo_rain_k2$model), 
        "BIC =", BIC(poly_dlm_rbo_rain_k2$model), 
        "MASE =", MASE(poly_dlm_rbo_rain_k2)$MASE, "\n")
}

poly_dlm_rbo_rain_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                               x = as.vector(rbo_rain_ts), 
                               q = 11, 
                               k = 2)
summary(poly_dlm_rbo_rain_k2)

shapiro.test(residuals(poly_dlm_rbo_rain_k2$model))
checkresiduals(poly_dlm_rbo_rain_k2$model)

poly_dlm_rbo_rain_k1 = polyDlm(y = as.vector(rbo_rbo_ts), 
                               x = as.vector(rbo_rain_ts), 
                               q = 11, 
                               k = 1)
summary(poly_dlm_rbo_rain_k1)

shapiro.test(residuals(poly_dlm_rbo_rain_k1$model))
checkresiduals(poly_dlm_rbo_rain_k1$model)

### RBO and Radiation
for ( i in 1:12){
    poly_dlm_rbo_rad_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                  x = as.vector(rbo_rad_ts), 
                                  q = i, 
                                  k = 2, 
                                  show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_rbo_rad_k2$model), 
        "BIC =", BIC(poly_dlm_rbo_rad_k2$model), 
        "MASE =", MASE(poly_dlm_rbo_rad_k2)$MASE, "\n")
}

poly_dlm_rbo_rad_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                              x = as.vector(rbo_rad_ts), 
                              q = 11, 
                              k = 2)
summary(poly_dlm_rbo_rad_k2)
shapiro.test(residuals(poly_dlm_rbo_rad_k2$model))
checkresiduals(poly_dlm_rbo_rad_k2$model)

poly_dlm_rbo_rad_k1 = polyDlm(y = as.vector(rbo_rbo_ts), 
                              x = as.vector(rbo_rad_ts), 
                              q = 11, 
                              k = 1)
summary(poly_dlm_rbo_rad_k1)

shapiro.test(residuals(poly_dlm_rbo_rad_k1$model))
checkresiduals(poly_dlm_rbo_rad_k1$model)

### RBO and Humidity
for ( i in 1:12){
    poly_dlm_rbo_humid_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                    x = as.vector(rbo_humid_ts), 
                                    q = i, 
                                    k = 2, 
                                    show.beta = FALSE)
    cat("q =", i, 
        "AIC =", AIC(poly_dlm_rbo_humid_k2$model), 
        "BIC =", BIC(poly_dlm_rbo_humid_k2$model), 
        "MASE =", MASE(poly_dlm_rbo_humid_k2)$MASE, "\n")
}

poly_dlm_rbo_humid_k2 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                x = as.vector(rbo_humid_ts), 
                                q = 12, 
                                k = 2)
summary(poly_dlm_rbo_humid_k2)

shapiro.test(residuals(poly_dlm_rbo_humid_k2$model))
checkresiduals(poly_dlm_rbo_humid_k2$model)

poly_dlm_rbo_humid_k1 = polyDlm(y = as.vector(rbo_rbo_ts), 
                                x = as.vector(rbo_humid_ts), 
                                q = 12, 
                                k = 1)
summary(poly_dlm_rbo_humid_k1)

shapiro.test(residuals(poly_dlm_rbo_humid_k1$model))
checkresiduals(poly_dlm_rbo_humid_k1$model)

### Fitting Koyck Distributed Lag Model
### RBO and Temperature
koyck_rbo_temp_dlm = koyckDlm(y = as.vector(rbo_rbo_ts), 
                              x = as.vector(rbo_temp_ts))
summary(koyck_rbo_temp_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_ffd_temp_dlm$model))
#checkresiduals(koyck_ffd_temp_dlm$model)

vif_koyck_rbo_temp_dlm = vif(koyck_rbo_temp_dlm$model)
vif_koyck_rbo_temp_dlm


### RBO and Rainfall
koyck_rbo_rain_dlm = koyckDlm(y = as.vector(rbo_rbo_ts), 
                              x = as.vector(rbo_rain_ts))
summary(koyck_rbo_temp_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_rbo_rain_dlm$model))
#checkresiduals(koyck_rbo_rain_dlm$model)

vif_koyck_rbo_rain_dlm = vif(koyck_rbo_rain_dlm$model)
vif_koyck_rbo_rain_dlm

### RBO and Radiation
koyck_rbo_rad_dlm = koyckDlm(y = as.vector(rbo_rbo_ts), 
                             x = as.vector(rbo_rad_ts))
summary(koyck_rbo_rad_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_rbo_rad_dlm$model))
checkresiduals(koyck_rbo_rad_dlm$model)

vif_koyck_rbo_rad_dlm = vif(koyck_rbo_rad_dlm$model)
vif_koyck_rbo_rad_dlm

### RBO and Humidity
koyck_rbo_humid_dlm = koyckDlm(y = as.vector(rbo_rbo_ts), 
                               x = as.vector(rbo_humid_ts))
summary(koyck_rbo_humid_dlm$model, diagnostics = TRUE)

shapiro.test(residuals(koyck_rbo_humid_dlm$model))
checkresiduals(koyck_rbo_humid_dlm$model)

vif_koyck_rbo_humid_dlm = vif(koyck_rbo_humid_dlm$model)
vif_koyck_rbo_humid_dlm

### Fitting Autoregressive Distributed Lag Model
for (i in 1:5){
  for(j in 1:5){
    ardlm_radhumid = ardlDlm(formula = y~x4+x5,
                             data = data.frame(rbo_import),
                             p = i, 
                             q = j)
    cat("p =", i,
        "q =", j,
        "AIC =", AIC(ardlm_radhumid$model), 
        "BIC =", BIC(ardlm_radhumid$model), 
        "MASE =", MASE(ardlm_radhumid)$MASE, "\n")
  }
}

ardl_5.2_task3 = ardlDlm(x = as.vector(rbo_rad_ts) + 
                             as.vector(rbo_humid_ts), 
                         y = as.vector(rbo_rbo_ts),
                         p = 5, 
                         q = 2)
summary(ardl_5.2_task3)

checkresiduals(ardl_5.2_task3$model)
shapiro.test(residuals(ardl_5.2_task3$model))

VIF.ardlm_rad_humid = vif(ardl_5.2_task3$model)
VIF.ardlm_rad_humid

ardl_5.4_task3 = ardlDlm(x = as.vector(rbo_rad_ts) + 
                             as.vector(rbo_humid_ts), 
                         y = as.vector(rbo_rbo_ts),
                         p = 5, 
                         q = 4)
summary(ardl_5.4_task3)

checkresiduals(ardl_5.4_task3$model)
shapiro.test(residuals(ardl_5.4_task3$model))

VIF.ardlm_rad_humid = vif(ardl_5.4_task3$model)
VIF.ardlm_rad_humid

ardl_5.5_task3 = ardlDlm(x = as.vector(rbo_rad_ts) + 
                             as.vector(rbo_humid_ts), 
                         y = as.vector(rbo_rbo_ts),
                         p = 5, 
                         q = 5)
summary(ardl_5.5_task3)

#checkresiduals(ardl_5.5_task3$model)
#shapiro.test(residuals(ardl_5.5_task3$model))

VIF.ardlm_rain_humid = vif(ardl_5.5_task3$model)
VIF.ardlm_rain_humid

### Time Series Regression Models Table Summary
attr(koyck_rbo_temp_dlm$model,"class") = "lm"
attr(koyck_rbo_rain_dlm$model,"class") = "lm"
attr(koyck_rbo_rad_dlm$model,"class") = "lm"
attr(koyck_rbo_humid_dlm$model,"class") = "lm"

models <- c("Finite DLM (Temperature)",
            "Finite DLM (Rainfall)",
            "Finite DLM (Radiation)",
            "Finite DLM (Humidity)",
            "Poly DLM k=2 (Temperature)", 
            "Poly DLM k=2 (Rainfall)", 
            "Poly DLM k=2 (Radiation)", 
            "Poly DLM k=2 (Humidity)", 
            "Poly DLM k=1 (Temperature)", 
            "Poly DLM k=1 (Rainfall)", 
            "Poly DLM k=1 (Radiation)", 
            "Poly DLM k=1 (Humidity)", 
            "Koyck (Temperature)", 
            "Koyck (Rainfall)",
            "Koyck (Radiation)",
            "Koyck (Humidity)",
            "ARDL(5,2)", 
            "ARDL(5,4)", 
            "ARDL(5,5)")
aic_score <- AIC(fin_dlm_temp_rbo$model,
                 fin_dlm_rain_rbo$model,
                 fin_dlm_rad_rbo$model,
                 fin_dlm_humid_rbo$model,
                 poly_dlm_rbo_temp_k2$model,
                 poly_dlm_rbo_rain_k2$model,
                 poly_dlm_rbo_rad_k2$model,
                 poly_dlm_rbo_humid_k2$model,
                 poly_dlm_rbo_temp_k1$model,
                 poly_dlm_rbo_rain_k1$model,
                 poly_dlm_rbo_rad_k1$model,
                 poly_dlm_rbo_humid_k1$model,
                 koyck_rbo_temp_dlm$model, 
                 koyck_rbo_rain_dlm$model,
                 koyck_rbo_rad_dlm$model,
                 koyck_rbo_humid_dlm$model,
                 ardl_5.2_task3$model, 
                 ardl_5.4_task3$model, 
                 ardl_5.5_task3$model)$AIC
bic_score <- BIC(fin_dlm_temp_rbo$model,
                 fin_dlm_rain_rbo$model,
                 fin_dlm_rad_rbo$model,
                 fin_dlm_humid_rbo$model,
                 poly_dlm_rbo_temp_k2$model,
                 poly_dlm_rbo_rain_k2$model,
                 poly_dlm_rbo_rad_k2$model,
                 poly_dlm_rbo_humid_k2$model,
                 poly_dlm_rbo_temp_k1$model,
                 poly_dlm_rbo_rain_k1$model,
                 poly_dlm_rbo_rad_k1$model,
                 poly_dlm_rbo_humid_k1$model,
                 koyck_rbo_temp_dlm$model, 
                 koyck_rbo_rain_dlm$model,
                 koyck_rbo_rad_dlm$model,
                 koyck_rbo_humid_dlm$model,
                 ardl_5.2_task3$model, 
                 ardl_5.4_task3$model, 
                 ardl_5.5_task3$model)$BIC
MASE_score <- MASE(fin_dlm_temp_rbo$model,
                 fin_dlm_rain_rbo$model,
                 fin_dlm_rad_rbo$model,
                 fin_dlm_humid_rbo$model,
                 poly_dlm_rbo_temp_k2$model,
                 poly_dlm_rbo_rain_k2$model,
                 poly_dlm_rbo_rad_k2$model,
                 poly_dlm_rbo_humid_k2$model,
                 poly_dlm_rbo_temp_k1$model,
                 poly_dlm_rbo_rain_k1$model,
                 poly_dlm_rbo_rad_k1$model,
                 poly_dlm_rbo_humid_k1$model,
                 koyck_rbo_temp_dlm$model, 
                 koyck_rbo_rain_dlm$model,
                 koyck_rbo_rad_dlm$model,
                 koyck_rbo_humid_dlm$model,
                 ardl_5.2_task3, 
                 ardl_5.4_task3,
                 ardl_5.5_task3)$MASE

accuracy_table_task3 <- data.frame(models, aic_score, bic_score, MASE_score)
colnames(accuracy_table_task3) <- c("Model", "AIC", "BIC", "MASE")
accuracy_table_task3 <- arrange(accuracy_table_task3, MASE)
kable(accuracy_table_task3, caption = "Accuracy Score Table")

Final Project - Forecasting

Hon Meng Lee

10/4/2021

Assignment Overview

Dataset Description

Task 1

Objectives

Methodology

Data Exploration and Visualization

Mortality Time Series Analysis

Mortality ADF & ACF

Temperature Time Series Analysis

Temperature ADF & ACF

Chemical 1 Time Series Analysis

Chemical 1 ADF & ACF

Chemical 2 Time Series Analysis

Chemical 2 ADF & ACF

Particle Size Time Series Analysis

Particle Size ADF & ACF

Impact of Trend and Seasonality Decomposition

Time Series Regression Model Fitting

Correlation Coefficient

Fitting Finite Distributed Lag Model

Finite DLM

Fitting Polynomial Distributed Lag Model

Fitting Koyck Distributed Lag Model

Fitting Autoregressive Distributed Lag Model

Fitting Autoregressive Distributed Lag Models with three predictors

Time Series Regression Models Table Summary

Fitting Exponential Smoothing Models and State-Space Models

Forecasting

Task 1 Conclusion

Task 2

Objectives

Methodology

Data Exploration and Visualization

Temperature Time Series Analysis

Temperature ADF & ACF

Rainfall Time Series Analysis

Rainfall ADF & ACF

Radiation Time Series Analysis

Radiation ADF & ACF

Relative Humidity Time Series Analysis

Relative Humidity ADF & ACF

First Flowering Day Time Series Analysis

First Flowering Day ADF & ACF

Time Series Regression Model Fitting

Correlation Coefficient

Fitting Finite Distributed Lag Model with Intercepts

Finite DLM Temperature

Finite DLM Rainfall

Finite DLM Radiation

Finite DLM Relative Humidity

Fitting Finite Distributed Lag Model without Intercepts

Finite DLM Temperature (no Intercept)

Finite DLM Rainfall (no Intercept)

Finite DLM Radiation (no Intercept)

Finite DLM Relative Humidity (no Intercept)

Fitting Polynomial Distributed Lag Model

FFD and Temperature

FFD and Rainfall

FFD and Radiation

FFD and Humidity

Fitting Koyck Distributed Lag Model

FFD and Temperature

FFD and Rainfall

FFD and Radiation

FFD and Humidity

Fitting Autoregressive Distributed Lag Model

Time Series Regression Models Table Summary

Fitting Exponential Smoothing Models and State-Space Models

Forecasting

Task 2 Conclusion

Task 3

Objectives

Methodology

Task 3a

Data Exploration and Visualization

RBO Time Series Analysis

RBO ADF & ACF