The objective of this analysis is to determine trends in total phosphorus (TP) and orthophosphorus (OP) in Spring Creek. Data from two probabilistic sites (Pr) the Rotating Basin Monitoring Program (RB), Blue Thumb Program (BT), and Oklahoma Water Resources Board (OWRB) were included. RB data included both TP and OP, BT data had only OP, and OWRB data had only TP.

Overview of Sites

## [1] 910  11
##                  WBID Latittude Longitude DataSource
## 1          NEOGRD-093  36.08760 -95.06800         Pr
## 2          NEOGRD-122  36.13630 -94.92660         Pr
## 3   OK121600-01-0290D  36.13095 -95.18850         RB
## 4   OK121600-01-0290T  36.14448 -94.90732         RB
## 5   OK121600-01-0290C  36.12778 -95.20756         BT
## 6   OK121600-01-0290B  36.12474 -95.21441         BT
## 7   OK121600-01-0290R  36.12547 -94.94700         BT
## 8   OK121600-01-0290H  36.12305 -95.10900         BT
## 9   OK121600-01-0290U  36.15590 -94.87267         BT
## 10  OK121600-01-0290N  36.08444 -95.03444         BT
## 11  OK121600-01-0290K  36.09140 -95.08380         BT
## 12  OK121600-01-0290S  36.13431 -94.93258         BT
## 13  OK121600-01-0290T  36.14448 -94.90732         BT
## 14 121600010290-001AT  36.13104 -95.19016       OWRB
##                                  SiteName
## 1                      Spring Creek - 093
## 2                      Spring Creek - 122
## 3                            Spring Creek
## 4               Spring Creek:  Rocky Ford
## 5             Spring Creek: Cavalier Road
## 6                  Spring Creek:  Coopers
## 7                Spring Creek:  Fram Site
## 8   Spring Creek:  Timberlake, Downstream
## 9         Spring Creek: Three Spring Farm
## 10                  Spring Creek:  Rooney
## 11 Spring Creek:  Cherokee Cattle Company
## 12                    Spring Creek: Evans
## 13              Spring Creek:  Rocky Ford
## 14       Spring Creek, off US 412, Murphy

Spring Creek at Rocky Ford was sampled by both RB and BT

What years are covered?

##       
##        1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
##   BT      0    0    0    4   18   19   15   12   12   14   26   29   23   19
##   OWRB    1   12    9    9    9   10   10   10    9   10    8   10    6    4
##   Pr      0    0    0    0    0    0    0    0    0    0    0    0    0    2
##   RB      0    0    0    0    0    0    0    0    0    0    0    0    0    0
##       
##        2012 2013 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
##   BT     12    7    7   18   22   29   23   21   29   35   32   22
##   OWRB    6    8    6    3    7    5    4    4    7    5    0    0
##   Pr      0    0    0    0    0    0    0    0    0    0    0    0
##   RB      0    0    0    0    0    0    0    0  108  132   48    0

Good temporal coverage by BT and OWRB, RB data are more recent

Check for differences in data sources, sites and for a relationship with stream flow

Check data source

OWRB data distribution is way lower than RB distribution of TP

Maybe it is a site effect

There appears to be difference in sites. Need to check for significant differences, but first check for log-normal distribution.

Also, Pr sites are each represented by only a single data point and should be dropped. The BT site “Timberlake Downstream” has only 4 points with 0 variability, it will be dropped

## [1] 902  11

Let’s make better site labels based on the last letter(s) of WBID, AT refers to OWRB site.

Both parameters appear to be log-normal and will be transformed to meet the assumptions of the statistical tests. Notice the extreme low outliers in the transformed OP plot, this is likely due to 0’s in the data which would be more accurately interpreted as below the detectable limit.The lowest non-zero value in the data is 0.005. I will change 0’s to 0.004 (affects 35 entries)

Test for differences among sites using ANOVA

## Analysis of Variance Table
## 
## Response: log(TP)
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## SID         2  84.339  42.170  139.49 < 2.2e-16 ***
## Residuals 383 115.782   0.302                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log(TP) ~ SID, data = TPdat)
## 
## $SID
##             diff        lwr       upr     p adj
## D-AT -0.01246197 -0.1718633 0.1469394 0.9815202
## T-AT  1.00421482  0.8448135 1.1636162 0.0000000
## T-D   1.01667680  0.8496668 1.1836868 0.0000000

The OWRB site and the Downstream RB site are statistically similar, but the upstream RB site is different.

Check OP

## Analysis of Variance Table
## 
## Response: log(OP)
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## SID         8 182.21 22.7762  64.321 < 2.2e-16 ***
## Residuals 668 236.54  0.3541                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log(OP) ~ SID, data = OPdat)
## 
## $SID
##            diff         lwr       upr     p adj
## C-B -0.29569276 -1.00506367 0.4136781 0.9321165
## D-B -0.15114028 -0.82733188 0.5250513 0.9988394
## K-B -0.13022023 -1.08863173 0.8281913 0.9999723
## N-B -0.31840062 -1.05526333 0.4184621 0.9173984
## R-B  0.71455487  0.04718241 1.3819273 0.0253937
## S-B  0.91000214  0.17046963 1.6495346 0.0044292
## T-B  1.06483775  0.39480090 1.7348746 0.0000337
## U-B  0.90554687  0.21049965 1.6005941 0.0018339
## D-C  0.14455248 -0.17658049 0.4656855 0.8974969
## K-C  0.16547253 -0.58582230 0.9167674 0.9989568
## N-C -0.02270785 -0.45728577 0.4118701 1.0000000
## R-C  1.01024763  0.70812628 1.3123690 0.0000000
## S-C  1.20569490  0.76660533 1.6447845 0.0000000
## T-C  1.36053051  1.05256835 1.6684927 0.0000000
## U-C  1.20123963  0.84209907 1.5603802 0.0000000
## K-D  0.02092005 -0.69912989 0.7409700 1.0000000
## N-D -0.16726034 -0.54526293 0.2107423 0.9062759
## R-D  0.86569515  0.65284472 1.0785456 0.0000000
## S-D  1.06114242  0.67796146 1.4443234 0.0000000
## T-D  1.21597803  0.99491539 1.4370407 0.0000000
## U-D  1.05668715  0.76857269 1.3448016 0.0000000
## N-K -0.18818038 -0.96548566 0.5891249 0.9979436
## R-K  0.84477510  0.13300067 1.5565495 0.0073410
## S-K  1.04022237  0.26038573 1.8200590 0.0012357
## T-K  1.19505798  0.48078477 1.9093312 0.0000091
## U-K  1.03576710  0.29798158 1.7735526 0.0004898
## R-N  1.03295549  0.67096532 1.3949457 0.0000000
## S-N  1.22840275  0.74615881 1.7106467 0.0000000
## T-N  1.38323837  1.01635928 1.7501175 0.0000000
## U-N  1.22394748  0.81316589 1.6347291 0.0000000
## S-R  0.19544727 -0.17194702 0.5628416 0.7733081
## T-R  0.35028288  0.15787807 0.5426877 0.0000008
## U-R  0.19099200 -0.07576793 0.4577519 0.3885456
## T-S  0.15483561 -0.21737662 0.5270478 0.9328803
## U-S -0.00445527 -0.42000695 0.4110964 1.0000000
## U-T -0.15929088 -0.43264824 0.1140665 0.6733082

Sites ‘B’, ‘C’, ‘D’, ‘K’, and ‘N’ are similar, ‘R’, ‘S’, and ‘U’ are similar, and ‘T’ matches with ‘S’ and ‘U’. However, ‘D’ and ‘T’ are recent RB sites and each appear to have less variance due to the amount of data associated with RB sites. This appears to be driving the significance of ‘T’ from ‘R’.

Let’s see how these sites look on the map.

It looks like there is an upstream and downstream group. From here on, sites AT, B, C, D, K, and N will be grouped as downstream and all others will be considered upstream.

Results

Total Phosphorus

First check for temporal patterns in the each data group. Hopefully condensing the data to years will break any temporal correlations

Points appear to be scattered randomly across time. The Rotating Basin sites are limited by time, try testing TP trend without Rotating Basin sites to avoid skew.

## [1] 146  13
## 
## Call:
## lm(formula = log(TP) ~ Year, data = TPdat1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1434 -0.3981 -0.1096  0.2135  3.0081 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 28.414420  15.209368   1.868   0.0638 .
## Year        -0.016180   0.007574  -2.136   0.0343 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6557 on 144 degrees of freedom
## Multiple R-squared:  0.03072,    Adjusted R-squared:  0.02399 
## F-statistic: 4.564 on 1 and 144 DF,  p-value: 0.03434

The model tests for a linear trend across years. There was a significant declining trend in TP overtime at an alpha or 0.05 (95% confidence).

The Adjusted R-squared value of the model was 0.02, indicating that the model poorly explained the spread of data around the estimated trend, but does not undermine the detected trend.

Check other assumptions of the polynomial TP model

We appear to be under predicting data at the extreme high end according the qqnorm plot. The high outliers (>2) are apparent in the heteroskedasticity plot, the residuals are otherwise evenly spread around 0 except at the very high end (> -4.0).

Look at the relationship of mean modeled points with year

At the Downstream site, the model estimated a decline in TP of 0.007 (mg/L) from 1998 - 2022. Although the model poorly fit the data, most of the high outliers occurred earlier in the study period (i.e., before 2011), indicating that the declining line may be due to a reduced frequency in high spikes.

Check OP

First check for temporal patterns in the each data group. Hopefully condensing the data to years will break any temporal correlations

No apparent patterns in the data over time.

## 
## Call:
## lm(formula = log(OP + 0.001) ~ Year + Group, data = OPdat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5894 -0.0974  0.1005  0.3425  2.0868 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -48.600501   9.124545  -5.326 1.37e-07 ***
## Year            0.021837   0.004525   4.826 1.73e-06 ***
## GroupUpstream   1.106802   0.066793  16.571  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8043 on 674 degrees of freedom
## Multiple R-squared:  0.3114, Adjusted R-squared:  0.3094 
## F-statistic: 152.4 on 2 and 674 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = log(OP + 0.001) ~ Year * Group, data = OPdat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5915 -0.1025  0.0994  0.3440  2.0771 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        -47.239398  15.550231  -3.038  0.00247 **
## Year                 0.021162   0.007712   2.744  0.00623 **
## GroupUpstream       -0.970815  19.213042  -0.051  0.95972   
## Year:GroupUpstream   0.001030   0.009527   0.108  0.91392   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8049 on 673 degrees of freedom
## Multiple R-squared:  0.3114, Adjusted R-squared:  0.3084 
## F-statistic: 101.5 on 3 and 673 DF,  p-value: < 2.2e-16
##     df      AIC
## OP1  4 1631.361
## OP2  5 1633.349

The first model test for different intercepts but similar trends in the Upstream and Downstream sites. A significant increasing trend was detected with this model.

The second model tested for differing intercepts and trends found only a shared positive trend between the Upstream and Downstream sites.

According to AIC, the first model best explained the data. The associated R-squared value of this model was 0.31, which is a the low end of a moderate fit.

Check model assumptions

The model is not predicting extreme low values. This could be a problem if low values are skewed to earlier or later time periods.

Low residual values (<= -2) are evenly distributed across time. Although the model may not predict the low values, the trend line of interest appears to be accurate.

View the predicted trend line

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.01300 0.03000 0.02984 0.04000 0.15300

Can we tighten our model estimates with seasonal/ cyclic month variable, or stream stage?

TP

No Stream Stage in OWRB data, just try month

## 
## Call:
## lm(formula = log(TP) ~ Month.x + Month.y + Year, data = TPdat1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2581 -0.3801 -0.0977  0.2034  3.2029 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 29.216852  15.030512   1.944   0.0539 .
## Month.x     -0.131302   0.075697  -1.735   0.0850 .
## Month.y     -0.134088   0.076688  -1.748   0.0825 .
## Year        -0.016588   0.007485  -2.216   0.0283 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6471 on 142 degrees of freedom
## Multiple R-squared:  0.06906,    Adjusted R-squared:  0.04939 
## F-statistic: 3.511 on 3 and 142 DF,  p-value: 0.01697

Month improved the model fit slightly (adj-R2 from 0.02 to 0.05). There are still unexplained high outliers in the data. The declining trend in TP remains.

OP + Month + Stream Stage

## 
## Call:
## lm(formula = log(OP) ~ Group + Month.x + Month.y + StreamStage + 
##     Year, data = OPdat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.24953 -0.22512  0.04449  0.24687  1.95298 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -37.441070   6.660130  -5.622 2.78e-08 ***
## GroupUpstream   1.116889   0.049384  22.617  < 2e-16 ***
## Month.x        -0.103842   0.034534  -3.007 0.002737 ** 
## Month.y        -0.170167   0.032330  -5.263 1.91e-07 ***
## StreamStage     0.061641   0.016885   3.651 0.000282 ***
## Year            0.016139   0.003309   4.877 1.35e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5785 on 669 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.4645, Adjusted R-squared:  0.4605 
## F-statistic: 116.1 on 5 and 669 DF,  p-value: < 2.2e-16
##     df      AIC
## OP1  4 1631.361
## OP4  7 1184.653

Month and stream stage substantially improved the fit of the model (adj-R2 from 0.3 to 0.46). Each feature was statistically significant and the increasing trend in OP remained.

Summary

In this analysis, data from multiple sites and sources were combined to determine trends in Total Phosphorus (TP) and Orthophosphorus (OP) in Spring Creek. I first determined that data from different sources were compatible with one another, but there were differences in means and variances among sites. Sites were therefore grouped into Upstream and Downstream categories, as Upstream sites tended to have higher loads of both TP and OP.

To assess TP, only the OWRB data spanned the full range of years. Rotating Basin data included TP as well but was only collected in the past 3 years. Rotating Basin data were excluded from the TP analysis to avoid excessive influence in more recent years. A significant declining trend was detected in the TP model, however the model fit was very weak (adjusted R-squared = 0.02). Adding a cyclical Month variable improved the model fit slightly (Adjusted R-squared = 0.05), but the month variable were not statistically significant. High spikes in TP concentration appear to be the reason for the poor fit of the model.

In the above figure, TP was estimated to be higher at the start of the time frame with an estimated decrease of 0.007 mg/L over the past 25 years.

To assess OP, several Blue Thumb sites and two Rotating Basin sites were included. Unlike the TP analysis, coverage was uniform across the time frame of interest. At both, Upstream and Downstream sites, a significant increasing trend was detected. Additionally, there was a significant seasonal effect (cyclical Month variable), and a significant positive relationship with Stream Stage and OP.

The analysis of OP at Spring Creek estimates a slight but significant increasing trend in OP. The increase was more pronounced at the Upstream sites (blue, increase of 0.015 mg/L) than the downstream sites (black, increase of 0.005 mg/L). The OP model fit the data better than the TP model especially when seasonal and Stream Stage features were included (adjusted R-square = 0.46).