Time trends
This package facilitates the investigation of two basic problems in the analysis of data collected over time:
1. Is there a time trend at all (beyond chance)?
2. If there is a time trend, how big is it?
The Mann-Kendall correlation
The first question is attacked using the Mann-Kendall correlation, which is the Kendall correlation where the (x)-variable is time. This is a non-parametric correlation that does not assume linearity and is not damaged by outliers. See this Wikipedia page for details. It will thus detect non-linear but monotonic trends.
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Use dataset http://www.utsc.utoronto.ca/~butler/d29/temperature.csv"
temp <- read.csv("d:/MANN-KENDALL/temp.csv")
temp <- temp[,-1]
head(temp)
## X1 Year temperature year
## 1 1 1880-12-31 13.72 1880
## 2 2 1881-12-31 13.79 1881
## 3 3 1882-12-31 13.74 1882
## 4 4 1883-12-31 13.73 1883
## 5 5 1884-12-31 13.68 1884
## 6 6 1885-12-31 13.68 1885
ggplot(temp, aes(x=year, y=temperature)) + geom_point() + geom_smooth()#ok
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
This appears to show an upward trend, but with a lot of variability. Is that statistically significant?
kendall_Z_adjusted(temp$temperature)
## $z
## [1] 11.77267
##
## $z_star
## [1] 4.475666
##
## $ratio
## [1] 6.918858
##
## $P_value
## [1] 0
##
## $P_value_adj
## [1] 7.617357e-06
theil_sen_slope(temp$temperature)
## [1] 0.005675676
The trend appears to be approximately linear up to about 1970,
and approximately linear after that, but with a steeper trend.
We might calculate and compare two separate Theil-Sen slopes, thus:
temp %>% mutate(time_period=ifelse(year<=1970, "pre-1970", "post-1970")) %>%
nest(-time_period) %>%
mutate(theil_sen=map_dbl(data, ~theil_sen_slope(.$temperature)))
## Warning: All elements of `...` must be named.
## Did you want `data = c(X1, Year, temperature, year)`?
## # A tibble: 2 x 3
## time_period data theil_sen
## <chr> <list> <dbl>
## 1 pre-1970 <tibble [91 x 4]> 0.00429
## 2 post-1970 <tibble [40 x 4]> 0.0168
head(temp,20)
## X1 Year temperature year
## 1 1 1880-12-31 13.72 1880
## 2 2 1881-12-31 13.79 1881
## 3 3 1882-12-31 13.74 1882
## 4 4 1883-12-31 13.73 1883
## 5 5 1884-12-31 13.68 1884
## 6 6 1885-12-31 13.68 1885
## 7 7 1886-12-31 13.71 1886
## 8 8 1887-12-31 13.65 1887
## 9 9 1888-12-31 13.73 1888
## 10 10 1889-12-31 13.83 1889
## 11 11 1890-12-31 13.61 1890
## 12 12 1891-12-31 13.73 1891
## 13 13 1892-12-31 13.68 1892
## 14 14 1893-12-31 13.67 1893
## 15 15 1894-12-31 13.67 1894
## 16 16 1895-12-31 13.75 1895
## 17 17 1896-12-31 13.86 1896
## 18 18 1897-12-31 13.89 1897
## 19 19 1898-12-31 13.74 1898
## 20 20 1899-12-31 13.84 1899
Theil-Sen slope is very nearly four times as big since 1970 vs. before, and even then, appears to be increasing with time.