Trend Analysis Summary Plots

Jeffrey D. Walker 2013-01-14

This tutorial explains how to perform parametric and non-parametric trend tests on multiple time series, and plot a summary of the resulting slope estimates.

First, we load four libraries needed to perform the analysis.

library(lubridate)
library(scales)
library(plyr)
library(ggplot2)
library(wq)
theme_set(theme_bw())

Next, we create a dummy set of data for multiple sites and parameters. For each site/parameter pair, we 1) generate random slopes and error variances, and 2) generate a synthetic monthly time series.

start.date <- as.Date("2000-01-01")  # initial date of all timeseries
n.years <- 10  # number of years
initial.value <- 100  # initial value of each time series

dates <- seq.Date(start.date, by = "month", length.out = 12 * n.years)  # array of dates
sites <- factor(c("A", "B", "C"))  # site factors
params <- factor(c("X", "Y", "Z"))  # parameter factors

set.seed(123)  # set seed for reproducibility

# generate random slopes and error variances for each site/parameter pair
ts.slopes <- expand.grid(SITE = sites, PARAM = params)
ts.slopes$SLOPE <- runif(n = nrow(ts.slopes), min = -0.1, max = 0.1)
ts.slopes$VAR <- runif(n = nrow(ts.slopes), min = 0, max = 3)
ts.slopes
##   SITE PARAM     SLOPE    VAR
## 1    A     X -0.042484 1.3698
## 2    B     X  0.057661 2.8705
## 3    C     X -0.018205 1.3600
## 4    A     Y  0.076603 2.0327
## 5    B     Y  0.088093 1.7179
## 6    C     Y -0.090889 0.3088
## 7    A     Z  0.005621 2.6995
## 8    B     Z  0.078484 0.7383
## 9    C     Z  0.010287 0.1262

# generate time series for each site/parameter
df <- expand.grid(SITE = sites, PARAM = params, DATE = dates, VALUE = NA)
for (s in sites) {
    for (p in params) {
        SLOPE <- subset(ts.slopes, SITE == s & PARAM == p)$SLOPE
        VAR <- subset(ts.slopes, SITE == s & PARAM == p)$VAR
        df[which(df$SITE == s & df$PARAM == p), "VALUE"] <- initial.value + 
            seq(0, length(dates) - 1) * SLOPE/12 + rnorm(length(dates), 0, sqrt(VAR))
    }
}

Generate plot of each time series, with linear regression overlay.

ggplot(df, aes(DATE, VALUE)) + geom_line() + geom_smooth(method = "lm") + facet_grid(SITE ~ 
    PARAM, scales = "free_y")

plot of chunk plot.ts

It is generally a good idea to compare different statistical methods to test for trends. We will use two types of tests: 1) Seasonall Kendall Test (non-parametric), and 2) Linear Regression (parametric). For each test, we define a function that takes a single time series (as a subset of the original dataframe, df), and returns a 1-row data frame containing a summary of the slope, p-value, etc.

To explain how each function works, we first extract one time series from the data frame corresponding to Site 'A' and Parameter 'X', which is saved as variable x.

x <- subset(df, SITE == "A" & PARAM == "X")
head(x)
##    SITE PARAM       DATE  VALUE
## 1     A     X 2000-01-01  99.48
## 10    A     X 2000-02-01 101.43
## 19    A     X 2000-03-01 100.41
## 28    A     X 2000-04-01 100.46
## 37    A     X 2000-05-01 100.12
## 46    A     X 2000-06-01  99.33

For the Seasonal Kendall test, we will use the seaKen() function from the wq package. Since this function requires the input timeseries to be of type ts, we must convert x from a dataframe to a ts object. Note that the arguments start and end both take a vector of length 2, with the first element being the initial/final year, and the second being the initial/final month. Since this is a month time series, the frequency is 12.

t <- ts(x$VALUE, start = c(year(start.date), 1), end = c(year(start.date) + 
    n.years - 1, 12), frequency = 12)
t
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2000  99.48 101.43 100.41 100.46 100.12  99.33 102.07 100.56  97.67 100.79
## 2001  99.70  98.75  99.10  99.22  97.97 100.92 100.12  98.60 101.40 100.42
## 2002 100.94 100.87 100.71 100.55  99.83  99.54  99.45  99.08  99.64  98.40
## 2003  98.56  99.40  99.32 100.77  99.76 100.15  99.82  99.80 101.45  99.58
## 2004 100.51  99.97 100.08 100.26  99.23  99.42  98.62  98.55 100.16 100.32
## 2005 102.19  99.21  97.08 100.95  98.94  98.96 100.97  99.43  98.33  99.97
## 2006 100.20  99.31 100.49  99.48 100.12 101.01 100.23  99.34 101.06 100.88
## 2007  98.97 101.29  98.99 102.25 101.48  99.41  98.48  98.85  99.97  99.38
## 2008  99.61  98.74  97.70  99.20 100.72  98.97 100.35  97.74  99.57 100.24
## 2009  98.87  98.62  98.41  99.74  98.49  99.03  99.30 101.75  98.83  99.86
##         Nov    Dec
## 2000  99.41  98.71
## 2001  99.58 100.97
## 2002 102.42 101.29
## 2003 101.61  98.02
## 2004  99.86 100.87
## 2005  99.59  99.76
## 2006 100.35  99.99
## 2007  99.26  98.55
## 2008  99.98  99.74
## 2009  99.67  98.45

We can then apply the function seaKen() to the object t.

sk <- seaKen(t)
sk
## $sen.slope
## [1] -0.09148
## 
## $sen.slope.pct
## [1] -0.09168
## 
## $p.value
## [1] 0.01226
## 
## $miss
##  1  2  3  4  5  6  7  8  9 10 11 12 
##  0  0  0  0  0  0  0  0  0  0  0  0
str(sk)
## List of 4
##  $ sen.slope    : num -0.0915
##  $ sen.slope.pct: num -0.0917
##  $ p.value      : num 0.0123
##  $ miss         : Named num [1:12] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "names")= chr [1:12] "1" "2" "3" "4" ...

The slope, percent slope and associated p-value can thus be extracted from the output of the seaKen() function, and the results saved as a 1-row data frame.

slope <- sk$sen.slope
slope.pct <- sk$sen.slope.pct/100
direction <- ordered(ifelse(slope > 0, "Increasing", "Decreasing"), levels = c("Increasing", 
    "Decreasing"))
pval <- sk$p.value
signif <- cut(pval, breaks = c(0, 0.05, 0.1, 1), labels = c("p<0.05", "0.05<p<0.10", 
    "p>0.10"))
data.frame(METHOD = "SeasonalKendall", SLOPE = slope, SLOPE.PCT = slope.pct, 
    PVAL = pval, SIGNIF = signif, DIRECTION = direction)
##            METHOD    SLOPE  SLOPE.PCT    PVAL SIGNIF  DIRECTION
## 1 SeasonalKendall -0.09148 -0.0009168 0.01226 p<0.05 Decreasing

This process is then saved to a function called trend.sk().

trend.sk <- function(x) {
    t <- ts(x$VALUE, start = c(year(start.date), 1), end = c(year(start.date) + 
        n.years - 1, 12), frequency = 12)
    sk <- seaKen(t)
    slope <- sk$sen.slope
    slope.pct <- sk$sen.slope.pct/100
    direction <- ordered(ifelse(slope > 0, "Increasing", "Decreasing"), levels = c("Increasing", 
        "Decreasing"))
    pval <- sk$p.value
    signif <- cut(pval, breaks = c(0, 0.05, 0.1, 1), labels = c("p<0.05", "0.05<p<0.10", 
        "p>0.10"))
    data.frame(METHOD = "SeasonalKendall", SLOPE = slope, SLOPE.PCT = slope.pct, 
        PVAL = pval, SIGNIF = signif, DIRECTION = direction)
}
trend.sk(x)
##            METHOD    SLOPE  SLOPE.PCT    PVAL SIGNIF  DIRECTION
## 1 SeasonalKendall -0.09148 -0.0009168 0.01226 p<0.05 Decreasing

A similar function can be written that summarizes the trend test using linear reag

trend.lm <- function(x) {
    x$Year <- year(x$DATE) + (month(x$DATE) - 1)/12
    l <- summary(lm(VALUE ~ Year, data = x))
    slope <- l$coefficients[2, "Estimate"]
    slope.pct <- slope/mean(x$VALUE)
    direction <- ordered(ifelse(slope > 0, "Increasing", "Decreasing"), levels = c("Increasing", 
        "Decreasing"))
    pval <- l$coefficients[2, "Pr(>|t|)"]
    signif <- cut(pval, breaks = c(0, 0.05, 0.1, 1), labels = c("p<0.05", "0.05<p<0.10", 
        "p>0.10"))
    data.frame(METHOD = "LinearRegression", SLOPE = slope, SLOPE.PCT = slope.pct, 
        PVAL = pval, SIGNIF = signif, DIRECTION = direction)
}
trend.lm(x)
##             METHOD    SLOPE  SLOPE.PCT    PVAL SIGNIF  DIRECTION
## 1 LinearRegression -0.06524 -0.0006538 0.04922 p<0.05 Decreasing

Batch Test

With the two trend tests defined by trend.sk() and trend.lm(), which both return a 1-row data frame with the same column names, we can apply both tests to all the time series in the original data frame using the ddply() function from the plyr package.

trend.sk.batch <- ddply(df, c("SITE", "PARAM"), trend.sk)
trend.lm.batch <- ddply(df, c("SITE", "PARAM"), trend.lm)
trends <- rbind(trend.sk.batch, trend.lm.batch)
trends
##    SITE PARAM           METHOD    SLOPE  SLOPE.PCT      PVAL      SIGNIF
## 1     A     X  SeasonalKendall -0.09148 -0.0009168 1.226e-02      p<0.05
## 2     A     Y  SeasonalKendall  0.02117  0.0002110 4.540e-01      p>0.10
## 3     A     Z  SeasonalKendall -0.01813 -0.0001809 7.371e-01      p>0.10
## 4     B     X  SeasonalKendall  0.17183  0.0017121 4.887e-03      p<0.05
## 5     B     Y  SeasonalKendall  0.06103  0.0006080 5.281e-02 0.05<p<0.10
## 6     B     Z  SeasonalKendall  0.11096  0.0011065 1.494e-03      p<0.05
## 7     C     X  SeasonalKendall -0.11504 -0.0011500 2.122e-03      p<0.05
## 8     C     Y  SeasonalKendall -0.11936 -0.0011983 4.782e-07      p<0.05
## 9     C     Z  SeasonalKendall  0.01169  0.0001168 3.394e-01      p>0.10
## 10    A     X LinearRegression -0.06524 -0.0006538 4.922e-02      p<0.05
## 11    A     Y LinearRegression  0.06420  0.0006398 1.595e-01      p>0.10
## 12    A     Z LinearRegression -0.02638 -0.0002632 6.158e-01      p>0.10
## 13    B     X LinearRegression  0.16765  0.0016704 2.142e-03      p<0.05
## 14    B     Y LinearRegression  0.06343  0.0006320 1.015e-01      p>0.10
## 15    B     Z LinearRegression  0.08732  0.0008708 1.754e-03      p<0.05
## 16    C     X LinearRegression -0.08264 -0.0008262 3.424e-02      p<0.05
## 17    C     Y LinearRegression -0.11392 -0.0011437 7.480e-09      p<0.05
## 18    C     Z LinearRegression  0.01761  0.0001759 1.286e-01      p>0.10
##     DIRECTION
## 1  Decreasing
## 2  Increasing
## 3  Decreasing
## 4  Increasing
## 5  Increasing
## 6  Increasing
## 7  Decreasing
## 8  Decreasing
## 9  Increasing
## 10 Decreasing
## 11 Increasing
## 12 Decreasing
## 13 Increasing
## 14 Increasing
## 15 Increasing
## 16 Decreasing
## 17 Decreasing
## 18 Increasing

We can then compare the results of each test using a bar chart.

ggplot(trends, aes(SITE, SLOPE, fill = METHOD)) + geom_bar(stat = "identity", 
    position = "dodge") + facet_wrap(~PARAM) + labs(x = "Site", y = "Trend Slope (units/yr)")

plot of chunk plot.bar

Finally, we can also summarize the results using a “lolli-pop” chart, which shows the magnitude, direction and significance of each time series.

p.slope <- ggplot() + aes(SLOPE, SITE) +
  geom_vline(xint=0) +
  geom_segment(aes(x=0, xend=SLOPE, yend=SITE)) +
  geom_point(size=4, color='white') +
  geom_point(size=4, aes(alpha=ordered(SIGNIF, levels=c("p>0.10","0.05<p<0.10","p<0.05")), color=DIRECTION)) +
  geom_point(size=4, shape=1) +
  scale_alpha_manual('Significance', values=c(0.01, 0.4, 1), drop=FALSE) +
  scale_color_manual('Slope Sign', values=c('orangered','steelblue'), labels=c('Increasing','Decreasing'), drop=FALSE) +
  labs(x='Trend Slope (units/yr)', y='Site')
p.slope %+% subset(trends, METHOD=="SeasonalKendall") + facet_wrap(~PARAM) + ggtitle("Seasonal Kendall Trend Tests")

plot of chunk plot.lolli

p.slope %+% subset(trends, METHOD=="LinearRegression") + facet_wrap(~PARAM) + ggtitle("Linear Regression Trend Tests")

plot of chunk plot.lolli