The objective of this analysis is to determine trends in total phosphorus (TP) and orthophosphorus (OP) in Spring Creek. Data from two probabilistic sites (Pr) the Rotating Basin Monitoring Program (RB), Blue Thumb Program (BT), and Oklahoma Water Resources Board (OWRB) were included. RB data included both TP and OP, BT data had only OP, and OWRB data had only TP.
## [1] 910 11
## WBID Latittude Longitude DataSource
## 1 NEOGRD-093 36.08760 -95.06800 Pr
## 2 NEOGRD-122 36.13630 -94.92660 Pr
## 3 OK121600-01-0290D 36.13095 -95.18850 RB
## 4 OK121600-01-0290T 36.14448 -94.90732 RB
## 5 OK121600-01-0290C 36.12778 -95.20756 BT
## 6 OK121600-01-0290B 36.12474 -95.21441 BT
## 7 OK121600-01-0290R 36.12547 -94.94700 BT
## 8 OK121600-01-0290H 36.12305 -95.10900 BT
## 9 OK121600-01-0290U 36.15590 -94.87267 BT
## 10 OK121600-01-0290N 36.08444 -95.03444 BT
## 11 OK121600-01-0290K 36.09140 -95.08380 BT
## 12 OK121600-01-0290S 36.13431 -94.93258 BT
## 13 OK121600-01-0290T 36.14448 -94.90732 BT
## 14 121600010290-001AT 36.13104 -95.19016 OWRB
## SiteName
## 1 Spring Creek - 093
## 2 Spring Creek - 122
## 3 Spring Creek
## 4 Spring Creek: Rocky Ford
## 5 Spring Creek: Cavalier Road
## 6 Spring Creek: Coopers
## 7 Spring Creek: Fram Site
## 8 Spring Creek: Timberlake, Downstream
## 9 Spring Creek: Three Spring Farm
## 10 Spring Creek: Rooney
## 11 Spring Creek: Cherokee Cattle Company
## 12 Spring Creek: Evans
## 13 Spring Creek: Rocky Ford
## 14 Spring Creek, off US 412, Murphy
Spring Creek at Rocky Ford was sampled by both RB and BT
##
## 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## BT 0 0 0 4 18 19 15 12 12 14 26 29 23 19
## OWRB 1 12 9 9 9 10 10 10 9 10 8 10 6 4
## Pr 0 0 0 0 0 0 0 0 0 0 0 0 0 2
## RB 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 2012 2013 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
## BT 12 7 7 18 22 29 23 21 29 35 32 22
## OWRB 6 8 6 3 7 5 4 4 7 5 0 0
## Pr 0 0 0 0 0 0 0 0 0 0 0 0
## RB 0 0 0 0 0 0 0 0 108 132 48 0
Good temporal coverage by BT and OWRB, RB data are more recent
Check data source
OWRB data distribution is way lower than RB distribution of TP
Maybe it is a site effect
There appears to be difference in sites. Need to check for significant differences, but first check for log-normal distribution.
Also, Pr sites are each represented by only a single data point and should be dropped. The BT site “Timberlake Downstream” has only 4 points with 0 variability, it will be dropped
## [1] 902 11
Let’s make better site labels based on the last letter(s) of WBID, AT refers to OWRB site.
Both parameters appear to be log-normal and will be transformed to meet the assumptions of the statistical tests. Notice the extreme low outliers in the transformed OP plot, this is likely due to 0’s in the data which would be more accurately interpreted as below the detectable limit.The lowest non-zero value in the data is 0.005. I will change 0’s to 0.004 (affects 35 entries)
## Analysis of Variance Table
##
## Response: log(TP)
## Df Sum Sq Mean Sq F value Pr(>F)
## SID 2 84.339 42.170 139.49 < 2.2e-16 ***
## Residuals 383 115.782 0.302
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = log(TP) ~ SID, data = TPdat)
##
## $SID
## diff lwr upr p adj
## D-AT -0.01246197 -0.1718633 0.1469394 0.9815202
## T-AT 1.00421482 0.8448135 1.1636162 0.0000000
## T-D 1.01667680 0.8496668 1.1836868 0.0000000
The OWRB site and the Downstream RB site are statistically similar, but the upstream RB site is different.
Check OP
## Analysis of Variance Table
##
## Response: log(OP)
## Df Sum Sq Mean Sq F value Pr(>F)
## SID 8 182.21 22.7762 64.321 < 2.2e-16 ***
## Residuals 668 236.54 0.3541
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = log(OP) ~ SID, data = OPdat)
##
## $SID
## diff lwr upr p adj
## C-B -0.29569276 -1.00506367 0.4136781 0.9321165
## D-B -0.15114028 -0.82733188 0.5250513 0.9988394
## K-B -0.13022023 -1.08863173 0.8281913 0.9999723
## N-B -0.31840062 -1.05526333 0.4184621 0.9173984
## R-B 0.71455487 0.04718241 1.3819273 0.0253937
## S-B 0.91000214 0.17046963 1.6495346 0.0044292
## T-B 1.06483775 0.39480090 1.7348746 0.0000337
## U-B 0.90554687 0.21049965 1.6005941 0.0018339
## D-C 0.14455248 -0.17658049 0.4656855 0.8974969
## K-C 0.16547253 -0.58582230 0.9167674 0.9989568
## N-C -0.02270785 -0.45728577 0.4118701 1.0000000
## R-C 1.01024763 0.70812628 1.3123690 0.0000000
## S-C 1.20569490 0.76660533 1.6447845 0.0000000
## T-C 1.36053051 1.05256835 1.6684927 0.0000000
## U-C 1.20123963 0.84209907 1.5603802 0.0000000
## K-D 0.02092005 -0.69912989 0.7409700 1.0000000
## N-D -0.16726034 -0.54526293 0.2107423 0.9062759
## R-D 0.86569515 0.65284472 1.0785456 0.0000000
## S-D 1.06114242 0.67796146 1.4443234 0.0000000
## T-D 1.21597803 0.99491539 1.4370407 0.0000000
## U-D 1.05668715 0.76857269 1.3448016 0.0000000
## N-K -0.18818038 -0.96548566 0.5891249 0.9979436
## R-K 0.84477510 0.13300067 1.5565495 0.0073410
## S-K 1.04022237 0.26038573 1.8200590 0.0012357
## T-K 1.19505798 0.48078477 1.9093312 0.0000091
## U-K 1.03576710 0.29798158 1.7735526 0.0004898
## R-N 1.03295549 0.67096532 1.3949457 0.0000000
## S-N 1.22840275 0.74615881 1.7106467 0.0000000
## T-N 1.38323837 1.01635928 1.7501175 0.0000000
## U-N 1.22394748 0.81316589 1.6347291 0.0000000
## S-R 0.19544727 -0.17194702 0.5628416 0.7733081
## T-R 0.35028288 0.15787807 0.5426877 0.0000008
## U-R 0.19099200 -0.07576793 0.4577519 0.3885456
## T-S 0.15483561 -0.21737662 0.5270478 0.9328803
## U-S -0.00445527 -0.42000695 0.4110964 1.0000000
## U-T -0.15929088 -0.43264824 0.1140665 0.6733082
Sites ‘B’, ‘C’, ‘D’, ‘K’, and ‘N’ are similar, ‘R’, ‘S’, and ‘U’ are similar, and ‘T’ matches with ‘S’ and ‘U’. However, ‘D’ and ‘T’ are recent RB sites and each appear to have less variance due to the amount of data associated with RB sites. This appears to be driving the significance of ‘T’ from ‘R’.
It looks like there is an upstream and downstream group. From here on, sites AT, B, C, D, K, and N will be grouped as downstream and all others will be considered upstream.
First check for temporal patterns in the each data group. Hopefully condensing the data to years will break any temporal correlations
Points appear to be scattered randomly across time. The Rotating Basin sites are limited by time, try testing TP trend without Rotating Basin sites to avoid skew.
## [1] 146 13
##
## Call:
## lm(formula = log(TP) ~ Year, data = TPdat1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1434 -0.3981 -0.1096 0.2135 3.0081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.414420 15.209368 1.868 0.0638 .
## Year -0.016180 0.007574 -2.136 0.0343 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6557 on 144 degrees of freedom
## Multiple R-squared: 0.03072, Adjusted R-squared: 0.02399
## F-statistic: 4.564 on 1 and 144 DF, p-value: 0.03434
The model tests for a linear trend across years. There was a significant declining trend in TP overtime at an alpha or 0.05 (95% confidence).
The Adjusted R-squared value of the model was 0.02, indicating that the model poorly explained the spread of data around the estimated trend, but does not undermine the detected trend.
We appear to be under predicting data at the extreme high end according the qqnorm plot. The high outliers (>2) are apparent in the heteroskedasticity plot, the residuals are otherwise evenly spread around 0 except at the very high end (> -4.0).
At the Downstream site, the model estimated a decline in TP of 0.007 (mg/L) from 1998 - 2022. Although the model poorly fit the data, most of the high outliers occurred earlier in the study period (i.e., before 2011), indicating that the declining line may be due to a reduced frequency in high spikes.
First check for temporal patterns in the each data group. Hopefully condensing the data to years will break any temporal correlations
No apparent patterns in the data over time.
##
## Call:
## lm(formula = log(OP + 0.001) ~ Year + Group, data = OPdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5894 -0.0974 0.1005 0.3425 2.0868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -48.600501 9.124545 -5.326 1.37e-07 ***
## Year 0.021837 0.004525 4.826 1.73e-06 ***
## GroupUpstream 1.106802 0.066793 16.571 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8043 on 674 degrees of freedom
## Multiple R-squared: 0.3114, Adjusted R-squared: 0.3094
## F-statistic: 152.4 on 2 and 674 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(OP + 0.001) ~ Year * Group, data = OPdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5915 -0.1025 0.0994 0.3440 2.0771
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -47.239398 15.550231 -3.038 0.00247 **
## Year 0.021162 0.007712 2.744 0.00623 **
## GroupUpstream -0.970815 19.213042 -0.051 0.95972
## Year:GroupUpstream 0.001030 0.009527 0.108 0.91392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8049 on 673 degrees of freedom
## Multiple R-squared: 0.3114, Adjusted R-squared: 0.3084
## F-statistic: 101.5 on 3 and 673 DF, p-value: < 2.2e-16
## df AIC
## OP1 4 1631.361
## OP2 5 1633.349
The first model test for different intercepts but similar trends in the Upstream and Downstream sites. A significant increasing trend was detected with this model.
The second model tested for differing intercepts and trends found only a shared positive trend between the Upstream and Downstream sites.
According to AIC, the first model best explained the data. The associated R-squared value of this model was 0.31, which is a the low end of a moderate fit.
Check model assumptions
The model is not predicting extreme low values. This could be a problem if low values are skewed to earlier or later time periods.
Low residual values (<= -2) are evenly distributed across time. Although the model may not predict the low values, the trend line of interest appears to be accurate.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01300 0.03000 0.02984 0.04000 0.15300
Can we tighten our model estimates with seasonal/ cyclic month variable, or stream stage?
No Stream Stage in OWRB data, just try month
##
## Call:
## lm(formula = log(TP) ~ Month.x + Month.y + Year, data = TPdat1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2581 -0.3801 -0.0977 0.2034 3.2029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.216852 15.030512 1.944 0.0539 .
## Month.x -0.131302 0.075697 -1.735 0.0850 .
## Month.y -0.134088 0.076688 -1.748 0.0825 .
## Year -0.016588 0.007485 -2.216 0.0283 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6471 on 142 degrees of freedom
## Multiple R-squared: 0.06906, Adjusted R-squared: 0.04939
## F-statistic: 3.511 on 3 and 142 DF, p-value: 0.01697
Month improved the model fit slightly (adj-R2 from 0.02 to 0.05). There
are still unexplained high outliers in the data. The declining trend in
TP remains.
##
## Call:
## lm(formula = log(OP) ~ Group + Month.x + Month.y + StreamStage +
## Year, data = OPdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.24953 -0.22512 0.04449 0.24687 1.95298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -37.441070 6.660130 -5.622 2.78e-08 ***
## GroupUpstream 1.116889 0.049384 22.617 < 2e-16 ***
## Month.x -0.103842 0.034534 -3.007 0.002737 **
## Month.y -0.170167 0.032330 -5.263 1.91e-07 ***
## StreamStage 0.061641 0.016885 3.651 0.000282 ***
## Year 0.016139 0.003309 4.877 1.35e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5785 on 669 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.4645, Adjusted R-squared: 0.4605
## F-statistic: 116.1 on 5 and 669 DF, p-value: < 2.2e-16
## df AIC
## OP1 4 1631.361
## OP4 7 1184.653
Month and stream stage substantially improved the fit of the model (adj-R2 from 0.3 to 0.46). Each feature was statistically significant and the increasing trend in OP remained.
In this analysis, data from multiple sites and sources were combined to determine trends in Total Phosphorus (TP) and Orthophosphorus (OP) in Spring Creek. I first determined that data from different sources were compatible with one another, but there were differences in means and variances among sites. Sites were therefore grouped into Upstream and Downstream categories, as Upstream sites tended to have higher loads of both TP and OP.
To assess TP, only the OWRB data spanned the full range of years. Rotating Basin data included TP as well but was only collected in the past 3 years. Rotating Basin data were excluded from the TP analysis to avoid excessive influence in more recent years. A significant declining trend was detected in the TP model, however the model fit was very weak (adjusted R-squared = 0.02). Adding a cyclical Month variable improved the model fit slightly (Adjusted R-squared = 0.05), but the month variable were not statistically significant. High spikes in TP concentration appear to be the reason for the poor fit of the model.
In the above figure, TP was estimated to be higher at the start of the time frame with an estimated decrease of 0.007 mg/L over the past 25 years.
To assess OP, several Blue Thumb sites and two Rotating Basin sites were included. Unlike the TP analysis, coverage was uniform across the time frame of interest. At both, Upstream and Downstream sites, a significant increasing trend was detected. Additionally, there was a significant seasonal effect (cyclical Month variable), and a significant positive relationship with Stream Stage and OP.
The analysis of OP at Spring Creek estimates a slight but significant increasing trend in OP. The increase was more pronounced at the Upstream sites (blue, increase of 0.015 mg/L) than the downstream sites (black, increase of 0.005 mg/L). The OP model fit the data better than the TP model especially when seasonal and Stream Stage features were included (adjusted R-square = 0.46).