Determining the effects of lake variables on the relationship of TP (log transformed ug/L) with Chl-a concentrations (log transformed ug/L), using the gradient of residuals against variables.
#linear model of chlorophyll to phosphorus
lm1 <- lm(logCHLA~logTP, data=df)
#creating residuals
df$res.lm <- resid(lm1)plot.lm1 <- ggplot(df,aes(x=logTP,y=logCHLA))+
geom_point()+
geom_smooth(method='lm')+
labs(title="Linear log-log model",x="log TP",y="log CHLA")
print(plot.lm1)## `geom_smooth()` using formula 'y ~ x'
##
## Family: gaussian
## Link function: identity
##
## Formula:
## logCHLA ~ s(logTP)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.52699 0.01145 133.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(logTP) 7.636 8.521 551 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.421 Deviance explained = 42.2%
## GCV = 0.8489 Scale est. = 0.84777 n = 6463
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| SAMPLE_DATE | 0.850289148463432 | -6.89826635426592e-05 | 5.46234608554396e-119 | 2.00279661146439e-130 |
| DEPTH, BOTTOM_OW_NA | 0.183054485000925 | -0.011287521881885 | 4.5378047443628e-12 | 4.70045350964983e-08 |
| NITROGEN, NITRATE-NITRITE_OW_T | -0.00356774000397907 | -0.304239935061802 | 0.798986954082028 | 2.98343585229416e-05 |
| SPECIFIC CONDUCTANCE_OW_NA | 0.0238878360602011 | -0.000151213618286602 | 0.322744969555356 | 0.22335561977934 |
##
## Call:
## lm(formula = logCHLA ~ logTP, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6639 -0.4642 0.0532 0.5741 4.5213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.92671 0.06888 86.04 <2e-16 ***
## logTP 1.03479 0.01596 64.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9419 on 6461 degrees of freedom
## Multiple R-squared: 0.3941, Adjusted R-squared: 0.394
## F-statistic: 4202 on 1 and 6461 DF, p-value: < 2.2e-16
| Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|
| 1 | 3.73e+03 | 3.73e+03 | 4.2e+03 | 0 |
| 6461 | 5.73e+03 | 0.887 |
Outliers were identified as points with studentized residuals outside 3:-3
print(plot.lm2<- ggplot(df2,aes(x=logTP,y=logCHLA,alpha=0.5))+
geom_point()+
geom_abline(intercept=lm2$coefficients[1],slope=lm2$coefficients[2],size=1.1)+
theme_bw()+
theme(legend.position = "none")+
ylim(-2.5,6)+
xlim(-7.5,-2))## Warning: Removed 23 rows containing missing values (geom_point).
##
## Call:
## lm(formula = logCHLA ~ logTP, data = df2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9893 -0.4695 0.0344 0.5298 2.8209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.31345 0.06215 101.58 <2e-16 ***
## logTP 1.11567 0.01440 77.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8247 on 6356 degrees of freedom
## Multiple R-squared: 0.4855, Adjusted R-squared: 0.4855
## F-statistic: 5999 on 1 and 6356 DF, p-value: < 2.2e-16
| Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|
| 1 | 4.08e+03 | 4.08e+03 | 6e+03 | 0 |
| 6356 | 4.32e+03 | 0.68 |
Red is overall trend, green is first year of sampling, blue is most
recent year of sampling.
##
## Call:
## lm(formula = logCHLA ~ logTP, data = df3[df3$`sampling event` ==
## "first", ])
##
## Coefficients:
## (Intercept) logTP
## 6.095 1.036
##
## Call:
## lm(formula = logCHLA ~ logTP, data = df3[df3$`sampling event` ==
## "last", ])
##
## Coefficients:
## (Intercept) logTP
## 6.081 1.086
Pattern of logCHLA~logTP model residuals along gradients of variables. All variables are fitted to a linear model on the residuals.
# only keep relevant variables
df3 <- df2 %>%
mutate("Log(N:P)"=log(NPratio)) %>%
select(SAMPLE_DATE,
`DEPTH, SECCHI DISK DEPTH_SD_NA`,
`Log(N:P)`,
`TEMPERATURE, WATER_OW_NA`,
`TRUE COLOR_OW_T`,
res.lm) %>%
rename("Sample date"=SAMPLE_DATE,
"Secchi disk depth"=`DEPTH, SECCHI DISK DEPTH_SD_NA`,
"Lake temperature"=`TEMPERATURE, WATER_OW_NA`,
"True color"=`TRUE COLOR_OW_T`)
#loop
for(i in (2:ncol(df3)-1)){
lm.df <- df3[is.finite(df3[,i]),] #removing NAs from relevant column
lm.loop <- lm(res.lm ~ lm.df[,i], data=lm.df) #linear model of residuals against variable
p1 <- ggplot(df3,aes(x=df3[,i],y=res.lm))+ #scatterplot
geom_point()+
geom_smooth(method='lm')+
labs(x=paste(colnames(df3[i])),
y='residuals of LM(logCHLA~logTP)')
if(colnames(df3[i])=="Sample date"){
p1 <- p1+scale_x_date(date_labels="%Y")
}
p2 <- ggplot(data=df3, aes(x=df3[,i])) + #density plot
geom_density(fill="grey") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid=element_blank(),
axis.line=element_blank(),
panel.background = element_blank())
require(gridExtra)
grid.arrange(p2, p1, ncol = 1, heights = c(1, 4)) #output plots
stats[nrow(stats)+1,] <- c(colnames(df3[i]), #coefficients into stats table
coef(lm.loop),
summary(lm.loop)$coefficients[2,4],
summary(lm.loop)$r.squared)
vars[nrow(vars)+1,] <- c(colnames(df3)[i], #coefficients into variables table
paste(min(as.numeric(na.omit(df3[,i]))),
"-",max(as.numeric(na.omit(df3[,i])))),
nrow(df3[i]),
paste(median(as.numeric(na.omit(df3[,i]))),
"(",summary(as.numeric(na.omit(df3[,i])))[2],
"-",summary(as.numeric(na.omit(df3[,i])))[5],")")) #stats
}Variable descriptive statistics
| characteristic | range | n | median..IQR. |
|---|---|---|---|
| Sample date | 6007 - 18535 | 6358 | 12684.5 ( 8230 - 15878.5 ) |
| Secchi disk depth | 0.05 - 13.6 | 6358 | 3.3 ( 2 - 4.75 ) |
| Log(N:P) | -Inf - 5.64662406081625 | 6358 | 0.0143887374520995 ( -0.671485683778766 - 1.0929824056555 ) |
| Lake temperature | 4 - 36 | 6358 | 22.7777777777778 ( 20 - 25 ) |
| True color | 0 - 222 | 6358 | 10 ( 6 - 19 ) |
| N WET DEPOSITION | 2.30022078752517 - 10.3009757995605 | 2923 | 5.33139736132514 ( 4.55434440887906 - 6.44047206640243 ) |
| PRECIPITATION | 76.6439971923828 - 192.690994262695 | 2515 | 105.960250854492 ( 96.1419982910156 - 119.623184926382 ) |
| COLOR (LOW) | 0 - 20 | 5017 | 8 ( 5 - 13 ) |
| COLOR (HIGH) | 3.04452243772342 - 5.40267738187228 | 1444 | 3.40119738166216 ( 3.2188758248682 - 3.73766961828337 ) |
Coefficients for linear models of residuals along gradient of variables. Bold values indicate significant p-values (<0.05). Sorted by rsquared.
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| Secchi disk depth | 0.541803029246122 | -0.153340460763037 | 2.27804470391924e-173 | 0.119984980885286 |
| Sample date | 0.781027159754332 | -6.34600046156702e-05 | 4.68974655730493e-136 | 0.0924225582939648 |
| N WET DEPOSITION | -0.755215544881601 | 0.13670885904978 | 3.12451405622739e-41 | 0.0600533806507492 |
| Lake temperature | 0.356056733585495 | -0.015532781758223 | 2.46192107503823e-07 | 0.00439476951584657 |
| COLOR (LOW) | -0.0426556441410483 | 0.00639000301443781 | 0.00387279944648067 | 0.00169712544101299 |
| PRECIPITATION | 0.242827192111365 | -0.0011032455041843 | 0.208225997216743 | 0.000630136773462971 |
| Log(N:P) | -0.013505519227058 | -0.0132933619011853 | 0.097471001081596 | 0.000542896327019578 |
| COLOR (HIGH) | -0.22487052258355 | 0.0507795364317734 | 0.432435783381794 | 0.000460314324769141 |
| True color | 0.0060336628838756 | -0.000312665969811818 | 0.655097197357787 | 3.19121762271381e-05 |
## Error in `$<-.data.frame`(`*tmp*`, "DEPTH, BOTTOM_OW_NA", value = NA_real_): replacement has 1 row, data has 0
## Error in summary(lm.dep)$coefficients[2, 4]: subscript out of bounds
Variable descriptive statistics
| characteristic | years | range | n | median..IQR. |
|---|---|---|---|---|
| DEPTH, SECCHI DISK DEPTH_SD_NA | 2018 - 2020 | 0.65 - 11.85 | 275 | 3.9 ( 2.3 - 5.25 ) |
| NITROGEN, NITRATE-NITRITE_OW_T | 2018 - 2020 | 0.0035 - 0.677 | 275 | 0.01925 ( 0.01 - 0.06655 ) |
| NPratio | 2018 - 2020 | 0.0737463126843658 - 115.529010238908 | 275 | 1.41768029054015 ( 0.685363247863248 - 4.68072493224932 ) |
| TEMPERATURE, WATER_OW_NA | 2018 - 2020 | 15 - 30.5 | 275 | 25 ( 22.375 - 26.75 ) |
| TRUE COLOR_OW_T | 2018 - 2020 | 1 - 85 | 275 | 7 ( 6 - 13.25 ) |
| LAKE DEPTH | 2020 - 2020 | 11 - 11 | 9 | |
| ACID DEPOSITION (DIN:TP>1.5) | 2011 - 2018 | 2.30022078752517 - 6.68831753730773 | 82 | |
| ACID DEPOISITION (DIN:TP<1.5) | 2017 - 2018 | 2.30022078752517 - 7.62836072437158 | 176 | |
| PRECIPITATION | 2013 - 2016 | 76.9628023420061 - 131.316277313232 | 126 | |
| NITRATE+NITRITE (DIN:TP<1.5) | 2017 - 2020 | 0.0035 - 0.118 | 140 | |
| NITRATE+NITRITE (DIN:TP>1.5) | 2011 - 2020 | 0.01 - 0.677 | 72 |
Coefficients for linear models of residuals along gradient of variables in most recent year. Bold values indicate significant p-values (<0.05). Sorted by rsquared.
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| DEPTH, SECCHI DISK DEPTH_SD_NA | 0.428630517179621 | -0.135986962241364 | 1.60201682818058e-10 | 0.14227166740512 |
| NITROGEN, NITRATE-NITRITE_OW_T | 0.0016712605690871 | -1.28016037957015 | 0.0290891404205754 | 0.054174610820437 |
| NITRATE+NITRITE (DIN:TP>1.5) | -0.179891730221681 | -0.934634493915856 | 0.122262066952841 | 0.0347590611084569 |
| TRUE COLOR_OW_T | -0.193909744081353 | 0.00747050945244508 | 0.0342646800214777 | 0.0164911846704551 |
| NPratio | -0.0662176579841025 | -0.00589729926793749 | 0.256652296075805 | 0.0149363811196209 |
| PRECIPITATION | 0.362357170866486 | -0.00759818209253073 | 0.327031653256195 | 0.0080063144947164 |
| ACID DEPOSITION (DIN:TP>1.5) | -0.655290094819347 | 0.0731107936030177 | 0.518044200940721 | 0.00551816085938825 |
| ACID DEPOSITION (DIN:TP<1.5) | -0.655290094819347 | 0.0731107936030177 | 0.518044200940721 | 0.00551816085938825 |
| NITRATE+NITRITE (DIN:TP<1.5) | -0.0663124202509833 | 0.683616956989121 | 0.850616835120542 | 0.000261725815197304 |
| TEMPERATURE, WATER_OW_NA | -0.0512983597476334 | -0.00215894114077526 | 0.882001025195815 | 8.17492733348602e-05 |
High vs low color, sample date [1] “HIGH COLOR, N= 1444” [1] “LOW COLOR, N= 5017”
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| High color | 0.977960932533248 | -7.60055366367557e-05 | 2.22703116802528e-22 | 0.0682986263223938 |
| Low color | 0.767216135836936 | -6.27160009326501e-05 | 2.2179012103683e-119 | 0.104058314969691 |
Color trend, sample date [1] “INCREASING COLOR, N= 4250” [1] “DECREASING COLOR, N= 2108”
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| Increasing color | 0.863341279568197 | -7.26260704715752e-05 | 6.22896911385686e-116 | 0.116023927247504 |
| Decreasing color | 0.658066306787453 | -4.88466089095998e-05 | 8.81348882000011e-30 | 0.0591965327510898 |
Pattern of logCHLA~logTP model residuals along sample date for each lake. Data from each lake are fitted to a linear model on the residuals.
A red regression line indicates the model is statistically insignificant (p>0.05). A red border medians that the lake had a negative chlorophyll trend and a flat or positive phosphorous trend.
#each lake
#loop over each lake
LAKE_IDS <- unique(df$LAKE_ID) #list lakes
coeffs <- data.frame(lake=character(),
intercept=double(),
slope=double(),
'p-value'=double(),
rsquared=double()) #blank coefficient table
incongruous <- read.csv("~/OneDrive - New York State Office of Information Technology Services/Rscripts/Trend/chl-tp-lakes.csv")
incongruous <- incongruous[which(incongruous$CHL_slope!=incongruous$PHOS_slope),]
incongruous <- incongruous[incongruous$CHL_slope=="neg",]
for (i in 1:length(LAKE_IDS)) {
lake <- LAKE_IDS[[i]] #pick a lake
dat <- df2[df2$LAKE_ID == lake,] #create new data with that lake
resid.loop <- lm(logCHLA~logTP,data=dat) #linear model of chl vs tp
dat$res.loop <- resid(resid.loop) #residuals for that lake
lm.loop <- lm(res.loop~SAMPLE_DATE,data=dat) #linear model of residuals vs year
co <- coef(lm.loop)
plot <- ggplot(dat,aes(x=SAMPLE_DATE,y=res.loop))+ #plot lake
geom_point() +
labs(title=paste(lake),x='year',y='residuals of LM(logCHLA~logTP)')
if(summary(lm.loop)$coefficients[,4]<0.05){ #plot significant lakes
plot <- plot + geom_smooth(method='lm')
}else{
plot <- plot + geom_smooth(method='lm',aes(color="red"),show.legend=FALSE) #plot insignificant lakes
}
if(lake %in% incongruous$LAKE_ID){
plot <- plot + theme(panel.border = element_rect(colour = "red", fill=NA, size=3)) #color incongruous lakes
}
print(plot) #print
coeffs[nrow(coeffs)+1,] <- c(lake,
co,
summary(lm.loop)$coefficients[2,4],
summary(lm.loop)$r.squared)
}Coefficients for linear models of residuals along gradient of time for each lake. Bold values indicate significant p-values (<0.05). Asterisks (***) indicate that the lake had a negative chlorophyll trend and a flat or positive phosphorous trend.
| lake | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| 0202CHA0122 | 0.944152825865916 | -7.53079675136563e-05 | 9.56194504310585e-12 | 0.173548887674343 |
| 0202FIN0153 | 0.635749577648235 | -5.9507857767644e-05 | 0.000174570204931618 | 0.0712769366388969 |
| 0302SOD0096 | 0.48666769255914 | -3.82719485604449e-05 | 0.0106414539140654 | 0.0540104682370989 |
| 0403SIL0115 | 0.502854823465325 | -4.10077102364865e-05 | 0.00183205382752677 | 0.0576617991573737 |
| 0602BRA0154 | 1.06149274566162 | -8.763415619743e-05 | 1.9026283344591e-11 | 0.255844767247595 |
| 0602CRO0073 | -0.0279684006324571 | 2.34944278403729e-06 | 0.846100684208956 | 0.000251961954288501 |
| 0602EAR0146 | 0.851959664465121 | -6.94640784852235e-05 | 4.50387751003872e-06 | 0.124292588234536 |
| 0602EAT0163 | 1.3623772971199 | -0.000112118118047524 | 1.6356436919286e-15 | 0.344014792791017 |
| 0602HAT0155 | 0.593254365900425 | -4.89064089889259e-05 | 2.94395645687316e-05 | 0.106817806226883 |
| 0602LEB0153 | 1.11902446562697 | -9.78474627625745e-05 | 1.5274614012525e-11 | 0.271692914186169 |
| 0602MEL0039 | 0.63727113476422 | -5.08639374049919e-05 | 6.48105278768945e-06 | 0.0866190932257855 |
| 0602MOR0152 | 0.891944581461988 | -7.51301268569577e-05 | 2.05080228334012e-08 | 0.113300646478107 |
| 0602PET0078 | 0.331789456726402 | -2.531689333677e-05 | 0.0369076621152344 | 0.0234515777766317 |
| 0703CAZ0153 | 0.449212205317357 | -3.63154232384546e-05 | 0.00283750828850936 | 0.0539950072335631 |
| 0703DER0139A | 1.53095378510904 | -0.000125733656249714 | 7.94950449808548e-27 | 0.373941960810729 |
| 0703TUS0153A | 0.949073117601567 | -8.1672021690225e-05 | 2.25647663987518e-10 | 0.148365549026748 |
| 0705COM0333 | 0.673891148746326 | -5.21493736896624e-05 | 0.000318633359493314 | 0.064448084168005 |
| 0705DUC0222 | 1.14655293221319 | -9.03839962163188e-05 | 8.00695783808612e-07 | 0.17024507008015 |
| 0801SEC0782B | 0.129851054598145 | -1.10468600256644e-05 | 0.246408207564907 | 0.00725564585045497 |
| 0906BLA0001 | 1.10590132446955 | -9.06208131345451e-05 | 6.25783448358663e-08 | 0.127600869619626 |
| 0906BON0024 | 0.114791426411991 | -8.89386232928277e-06 | 0.245389759332291 | 0.0079291818651228 |
| 0906BUT0054 | 0.796388583921065 | -6.64826965822309e-05 | 1.57274435460662e-06 | 0.0944090894268697 |
| 1005GLE0441 | 0.367866424215461 | -2.90789951287784e-05 | 0.0136180282723028 | 0.0361108061211256 |
| 1102BAB1109 | 0.730036058917398 | -5.88644234673037e-05 | 0.000188977510494215 | 0.0604552801273828 |
| 1104GOO0672A | 0.607548239418278 | -5.2413319085535e-05 | 0.00132457826703812 | 0.0722238220775747 |
| 1104SAC0314 | 0.58979347330651 | -5.07916555060258e-05 | 2.63834051446714e-05 | 0.127488749571478 |
| 1104SCH0374 | 1.52501684883298 | -0.000116213381005224 | 5.64158238577791e-10 | 0.202904429881148 |
| 1302CAR0062A | 0.435295439195984 | -3.9720317605433e-05 | 0.00432172076770994 | 0.0962057148638054 |
| 1302WAC0117 | 0.0706969591230926 | -5.66313319549475e-06 | 0.602348849006115 | 0.00151966569190011 |
| 1308ROB0902 | 0.728387033604061 | -5.34849245969702e-05 | 0.00432618905257373 | 0.0648110381252875 |
| 1310QUE0057 | 0.508237857356091 | -3.99890117026577e-05 | 0.00447872961409716 | 0.0398814593146024 |
| 1401ANA0251 | 0.467390753646235 | -3.66754111085952e-05 | 0.00387574012362269 | 0.0386700308115073 |
| 1402WOL0037 | 1.12013529268377 | -9.92151035598942e-05 | 1.83213731872521e-08 | 0.231067618390379 |
| 1404OQU0383 | 1.09146002550101 | -8.28530634906585e-05 | 7.14167052707757e-08 | 0.150071736337309 |
| 1501LUC0982B | -0.217136136363866 | 1.76125478750811e-05 | 0.380000553377481 | 0.00714356847465035 |
| 1701LFR0662 | 1.42885257134606 | -0.000105374054264106 | 3.28397374780158e-06 | 0.119867716701038 |
Dot plot of slope for each lake. Grey dots are statistically insignificant (p>0.05).