The time series data that I have chosen to analyze is the Monthly Precipitation in Pennsylvania over a span of 150 years. This data is provided by the National Centers for Environmental Information. The purpose for this data being collected is to track and monitor climate and precipitation trends. This helps support research that is related to weather forecasting and public safety. Since precipitation is a critical variable in studying the ever changing climate, knowing the long term trends in precipitation totals will impact how agriculture, infrastructure and water sources function. This will help services plan around different events such as droughts,floods, or normal precipitation.
The purpose of analyzing this time series is to observe the trends and behavior and to determine if there are any obvious patterns that can help us forecast future precipitation values. Through modeling this time series, some scientific questions that I want to answer are the following:
Total Monthly Precipitation in Pennsylvania (1895-2025)
Dicky-Fuller Test:
\(H_0\) - The time series is not stationary.
\(H_a\) - The time series is stationary.
##
## Augmented Dickey-Fuller Test
##
## data: Precipitation$Value
## Dickey-Fuller = -10.218, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary
KPSS Test:
\(H_0\) - The time series is stationary.
\(H_a\) - The time series is not stationary.
##
## KPSS Test for Level Stationarity
##
## data: Precipitation$Value
## KPSS Level = 0.92124, Truncation lag parameter = 7, p-value = 0.01
Detrended Series
Dicky-Fuller Test:
\(H_0\) - The time series is not stationary.
\(H_a\) - The time series is stationary.
##
## Augmented Dickey-Fuller Test
##
## data: detrended
## Dickey-Fuller = -10.218, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary
Test Statistic = -10.218
p-value = 0.01
Since the p-value is less than 0.05, we reject the null hypothesis meaning that this series is stationary.
KPSS Test:
\(H_0\) - The time series is stationary.
\(H_a\) - The time series is not stationary.
##
## KPSS Test for Level Stationarity
##
## data: detrended
## KPSS Level = 0.14733, Truncation lag parameter = 7, p-value = 0.1
Test Statistics = 0.14733
p-value: 0.1
Like I mentioned above,the hypotheses of the KPSS test are the opposite of the Dicky-Fuller test. So this means since the p-value is greater than 0.05, we fail to reject the null hypothesis meaning this time series is stationary.
ACF of Precipitation in PA
PACF of Precipitation in PA
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 1.7666 0.0254 69.5912 0.0000
## ar2 -1.0594 0.0440 -24.0883 0.0000
## ar3 0.0345 0.0254 1.3585 0.1745
## ma1 -1.7355 NaN NaN NaN
## ma2 0.9993 NaN NaN NaN
## xmean -0.0059 0.0336 -0.1758 0.8605
##
## sigma^2 estimated as 1.691657 on 1556 degrees of freedom
##
## AIC = 3.377028 AICc = 3.377063 BIC = 3.401021
##
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.0351 0.0254 1.3809 0.1675
## ar2 0.0241 0.0254 0.9473 0.3436
## sar1 -0.0031 0.0263 -0.1167 0.9071
## sma1 -0.9857 0.0110 -89.9844 0.0000
## constant 0.0000 0.0001 0.0390 0.9689
##
## sigma^2 estimated as 1.695389 on 1545 degrees of freedom
##
## AIC = 3.400984 AICc = 3.401009 BIC = 3.421678
##
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.0406 0.0254 1.5968 0.1105
## ar2 0.0267 0.0253 1.0536 0.2922
## sar1 0.9991 0.0012 854.3352 0.0000
## sma1 -0.9856 0.0096 -102.4054 0.0000
## xmean 0.0092 0.1413 0.0652 0.9481
##
## sigma^2 estimated as 1.694474 on 1557 degrees of freedom
##
## AIC = 3.384692 AICc = 3.384717 BIC = 3.405257
##
When comparing these models the first thing I am going to do is compare the AIC and BIC values (Hidden) for the three of them.
Model 1 AIC: 3.37696 | BIC: 3.400953
Model 2 AIC: 3.400984 | BIC: 3.421678
Model 3 AIC: 3.384692 | BIC: 3.405257
When looking at AIC and BIC values we are primarily looking for the model that has the lowest of these values. According to the values I presented above the model with the lowest AIC and BIC values is Model 1 ARMA(3,2) which would indicate that this is the best model. However this does not mean that this actually the best model. Now we have to take a look at the residual plots and analyze them.
Model 1:
Model 1 Residuals
When looking at the plot of the standardized residuals there is no obvious pattern or any trend. Looking at the ACF of the Residuals at all of the lags the autocorrelation is between the blue lines which is good and means they are not correlated. The Normal Q-Q plot of the Std Residuals appear normal until you reach the tails where they appear to deviate from normality(which is fine) and finally the p values do all appear to be high except right in the beginning meaning not all of the residuals are not correlated and are consistent with white noise. This model would have seemed like a good fit since the AIC and BIC mentioned above for this model is the lowest but this model fails to be the best because of the p-value. For Ljung-Box statistic the null hypothesis is The data is not correlated and the alternate hypothesis is the data exhibits serial correlation. Since this model has p-value at 0 we reject the null hypothesis which makes this model no good even the AIC and BIC is the lowest.
Model 2:
Model 2 Residuals
Analyzing the residuals of model 2: Looking at the plot of the standardized residuals there is no obvious pattern or any trend that I can see. Looking at the ACF of the Residuals at all of the lags the autocorrelation is between the blue lines which is good and means they are not correlated. The Normal Q-Q plot of the Std Residuals appear normal until you reach the tails where they appear to deviate from normality (which is fine) and finally the p values do all appear to be high and above 0 meaning residuals are not correlated and are consistent with white noise. This seasonal model does seem to be a pretty good fit. Everything checks out, but I need to check one more model with a slight change.
Model 3:
Model 3 Residuals
\[\left(1-\phi_1 B - \phi_2 B^2 - \phi_3 B^3\right)X_t=\left(1 + \theta_1 B + \theta_2 B^2\right)w_t \]
| Terms | Estimate | SE | p.value |
|---|---|---|---|
| ar1 | 1.7669 | 0.0254 | 0.0000 |
| ar2 | -1.0599 | 0.0440 | 0.0000 |
| ar3 | 0.0348 | 0.0254 | 0.1707 |
| ma1 | -1.7356 | NaN | NaN |
| ma2 | 0.9995 | NaN | NaN |
| xmean | 0.0066 | 0.0336 | 0.8436 |
\[ \left(1 - \phi_1 B - \phi_2 B^2\right)\left(1 - \Phi_1 B^{12}\right)\left(1 - B^{12}\right)X_t = \left(1 + \Theta_1 B^{12}\right)w_t \]
| Terms | Estimate | SE | p.value |
|---|---|---|---|
| ar1 | 0.0351 | 0.0254 | 0.1675 |
| ar2 | 0.0241 | 0.0254 | 0.3436 |
| sar1 | -0.0031 | 0.0263 | 0.9071 |
| sma1 | -0.9857 | 0.0110 | 0.0000 |
| constant | 0.0000 | 0.0001 | 0.9689 |
\[ \left(1 - \phi_1 B - \phi_2 B^2\right)\left(1 - \Phi_1 B^{12}\right)X_t = \left(1 + \Theta_1 B^{12}\right)w_t \]
| Terms | Estimate | SE | p.value |
|---|---|---|---|
| ar1 | 0.0406 | 0.0254 | 0.1105 |
| ar2 | 0.0267 | 0.0253 | 0.2922 |
| sar1 | 0.9919 | 0.0012 | 0.0000 |
| sma1 | -0.9856 | 0.0096 | 0.0000 |
| constant | 0.0092 | 0.1413 | 0.9481 |
Is there a seasonal pattern in precipitation, meaning is there more precipitation in the spring compared to the winter or summer to fall? Now we can say yes there is a seasonal pattern in precipitation. This means that we can tell if there is more precipitation in the spring compared to the winter or summer to fall, etc. We know this because looking at the ACF and PACF of the detrended series. They showed showed strong auto correlations that suggested a yearly seasonal pattern. Also through the SARIMA model fitted it helped us fully realize that there is a seasonal pattern.
Based on the patterns found in the historical data, can we forecast future precipitation values and compare them to the actual values to see how accurate it will be? Since the SARIMA(2,0,0)(1,0,1)[12] model passed all of the residual tests it is suitable for generating forecasts. So you we can forecast values and compare them to the actual values one year from now.
Based on our analysis we can forecast this model to predict future values of precipitation (One year into the Future).
Forecasted Precipitation
In the future, some work that can be done on this data could be adding more variables that directly impact precipitation. This could include if we are in a El Niño or La Niña weather pattern, this could change the total precipitation positively or negatively. Temperature could also play a big roll and can be used as a variable. Some more future work could also include examining the seasonal trends more closely to pinpoint the exact pattern of precipitation there is. Mentioned in the limitations section the data can be examined more closely to see if a more complex model can be fitted now that we could potentially add more inputs like weather patterns and temperature. With this in mind in the future we could more accurately forecast precipitation values.
This has been a Time Series Analysis of Monthly Precipitation in Pennsylvania from 1895 to present day.