Trend and survival analysis of extreme precipitation

Numerous studies have predicted climate change-induced shifts in precipitation patterns across the Northeastern United States (US EPA 2016; US GCRP 2014). Both the volume and intensity of precipitation are projected to increase. To investigate this claim, climate data from U.S. Historical Climatology Network (USHCN) station 431081 at Burlington, VT were obtained and analyzed. The period of record (POR) is 1941 through 2016.

To assess trends in precipitation volume, monthly and annual precipitation totals were analyzed over the POR. Trends in precipitation intensity were investigated using the annual count of 24hr events with >= 1 inch precipitation. NB: the term ‘precipitation’ is used to denote liquid equivalent precipitation totals.

Data Exploration

First, we look at the data.

Figure 1. Density histogram of measurable 24hr precipitation totals. Data from USHCN Station 431081 at Burlington, VT.

As with most precipitation data, the observed distribution exhibits a strong positive skew; most 24hr precipitation events are relatively small, a few are large (Figure 1; Table 1). For instance, the 75th percentile measureable precipitation value is almost 3 times the median value.

Table 1. Summary statistics for measurable (>trace) 24hr precipitation events. Units are inches.

Min	25%	Median	Mean	75%	Max
0.010	0.030	0.110	0.228	0.300	3.990

An empirical cumulative density function (ECDF) plot of all 24hr observations, including days with zero precipitation, further illuminates this trend (Figure 2). An ECDF calculates the probability that an observation will take a value less than or equal to a given threshold. For example, the probability that a 24hr precipitation value will be equal to or less than 1 inch is 98.2%. Fifty percent of all 24hr observations are less than or equal to 0.01 inch.

Figure 2. Empirical cumulative density function for 24hr precipitation, all days. Events >= 1 inch are in the 98th percentile of all observations (noted with circle). Data from USHCN Station 431081 at Burlington, VT.

The 1 inch precipitation threshold will serve as our definition of ‘extreme’ precipitation; the probability of a 24hr event of this magnitude occuring is less than 2%.

Precipitation Volume Trend Analysis

To assess trends in precipitation volume, we look for evidence that the distribution of 24hr precipitation values has shifted over time; specifically, whether precipitation totals have increased over the station’s POR. We will examine two temporal resolutions: monthly and annual precipitation totals.

A panel plot of monthly precipitation totals suggests that most months have experienced a general increasing trend (Figure 3). A seasonal Kendall trend test for this series is statistically significant, indicating a trend in each season. However, the seasonal estimate for November indicates a slight negative trend (slope = -4.55e-03).

Figure 3. Total precipitation by month. Linear regression with 95% confidence interval shown by month. Seasonal Kendall trend test p-value = 0.0196; test uses extension proposed by Hirsch and Slack (1984) that allows for serial dependence in the observations. Even though several months exhibit an apparent flat to negative trend, a Van Belle-Hughes heterogeneity test for uniform trend direction does not provide evidence for the rejection of the null hypothesis that seasonal trends are monotonic (p-value = 0.813). Data from USHCN Station 431081 at Burlington, VT.

Because the seasonal trend direction may not be uniform, annual precipitation totals should also be investigated (Figure 4). Annual values also generally reduce or eliminate any serial autocorrelation present in climate data.

Figure 4. Annual total precipitation with trendline from Kendall trend test. The test is statistically significant with a p-value of 0.00138. The trendline slope over ther POR is 6.89 inches. Series autocorrelation is negligible at -0.0266. Data from USHCN Station 431081 at Burlington, VT.

The annual precipitation series demonstrates a clear increasing trend, with the trendline slope estimating an average increase of almost 7 additional inches of precipitation per year since 1941!

Precipitation Intensity Trend Analysis

To investigate trends in precipitation intensity, we examine the number of 24hr precipitation events with totals greater than or equal to 1 inch. As shown above, this threshold is in the 98th percentile of all 24hr periods (e.g., 98% of the time, 24hr precipitation is less than 1 inch).

Figure 5. Annual count of 24hr periods with precipitation totals >= 1 inch. Poisson regression line displayed; p-value is 0.000165. The trend over the POR is 3.33 days. The dispersion parameter of the fitted model is 1.31. The p-value of a quasipoisson model is 0.00132. Autocorrelation is -0.0425.

First, we calculate the annual count of 24hr extreme precipitation events. A Poisson regression fit to this data series is statistically significant (Figure 5). The regression line indicates an average increase of over 3 days of >= 1 inch 24hr precipitation events per year over the period of record. Using observed data, the 2007-2016 median count of extreme events is 60% higher than the 1941-1950 median value.

Survival Analysis

The risk of extreme precipitation can also be characterized using survival analysis. Survival analysis is used to calculate a survivor function that estimates the probability, or risk, of an outcome, typically over time. If meaningful groups (called strata) can be identified in the dataset, the risk of a given outcome can be compared across group membership.

The outcome in this example is the annual risk of a >=1 inch precipitation event. Two time periods are identified: pre and post-1990. These time periods make up the strata used in the analysis. The pre-1990 strata includes 49 years of data; the post-1990 strata includes 27 years of data.

Because there are no unknown or censored observations in the dataset, the survivor function for a given outcome can be calculated as the proportion of events greater than that outcome. This simplifies to a comparison of two discrete value ECDFs, one for each strata.

Figure 6. Survival analysis of extreme 24hr precipitation for two time periods: pre-1990 and 1990-present. The figure displays the survival function probability (or risk) of experiencing a year with a given number of 24hr precipitation events >= 1 inch. Binomial proportion 95% confidence intervals were estimated using the Wilson score interval approach.

A plot comparing survivor functions for each strata indicates a significant difference in outcome probabilities (Figure 6). For example, the risk of a year with more than 9 >=1 inch precipitation events was 2% before 1990, and 22.2% post-1990 (a greater than 1100% increase!) (Figure 6; Table 2).

Table 2. Probability of exceeding selected annual extreme precipitation thresholds by strata.

Extreme events per year	Strata 1 Risk, pre-1990	Strata 2 Risk, post-1990
>7	14.2%	44%
>9	2.0%	22.2%
>11	0%	7.4%

The statistical significance of observed differences between strata can be assessed by analyzing the proportion of extreme events in a 2 x 2 contingency table. These proportions are first visualized in a mosaic plot (Figure 7). Although the overall proportion of extreme events is small, the post-1990 strata has a visibily higher relative frequency of >=1 inch precipitation events despite having a significantly smaller sample size (27 versus 49 observation years).

Figure 7. Mosaic plot of extreme precipitation event proportions by time period strata. The p-value of a 2-sample test for equality of proportions with continuity correction is 0.0001183, using the alternative hypothesis that the post-1990 strata is greater than the pre-1990 strata.

A 2-sample test for equality of proportions is used to test the alternative hypothesis that the proportion of extreme events in the post-1990 time period is greater than the pre-1990 period. The p-value of this test is 0.0001183, which leads us to accept the alternative hypothesis that the proportion of extreme events is greater in the post-1990 strata (the Bonferroni corrected p-value for multiple comparisons is 0.0089908).

Based on these analyses, there is strong evidence that both the volume and intensity of precipitation have increased at this climate station.

Note: as an alternative to the custom plot in Figure 6, the {survminer} package provides automated functions to display both survival curves and associated risk tables (Figure 8).

Figure 8. Survival plot and risk table produced with {survminer} package. Log-rank test p-value is 0.0019. Risk table displays absolute count and percent exceedance values by strata. Note: confidence interval and p-value estimation methodologies differ slightly from those used to produce the results in Figure 6.

References

Hirsch, R.M. and J.R. Slack. 1984. A Nonparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6), 727-732.

U.S. Environmental Protection Agency. 2016. Climate change indicators in the United States, 2016. Fourth edition. EPA 430-R-16-004. https://www.epa.gov/climate-indicators.

U.S. Global Change Research Program. 2014. Climate change impacts in the United States: U.S. national climate assessment. http://nca2014.globalchange.gov.