Introduction

Reckhow and Simpson (1980) provide a method to predict lake phosphorus concentrations and, thereby, assess lake trophic state. An important contribution of their work was the inclusion of nonparametric error analysis. By including this uncertainty estimate the authors provide a key measure of the value of the information and facilitate risk assessment during the application of model results.

This type of model is a great candidate for interactive data visualizations in R. The 1980 publication presents the results in static tables and figures. A previous update to the model, built in Excel by Reckhow and Hession, provides interactive spreadsheets and tables. I used the data in Reckhow and Simpson (1980) to demonstrate how the same model might be presented in R to facilitate exploration and application of their results.

It is important to note that this model only applies to north temperate lakes. Table 1 in Reckhow and Simpson (1980) provides a set of minimum and maximum values for phosphorus concentration, phosphorus loading, and areal water loading. The model is only intended for lakes meeting these criteria.

Reckhow KH, Simpson JT (1980) A procedure using modeling and error analysis for the prediction of lake phosphorus concentration from land use information. Canadian Journal of Fisheries and Aquatic Sciences 37(9):1439-1448. DOI: 10.1139/f80-184

Data Preparation

Code Book

Variable Definition Units
P phosphorus concentration mg/L
L phosphorus loading g/m2/yr
qs areal water loading m/yr
vs apparent phosphorus settling velocity m/yr
Q inflow water volume to lake 10^6 m3/yr
Ad watershed area 10^6 m2
Ao lake surface area 10^6 m2
r total annual unit runoff m/yr
Pr mean annual net precipitation m/yr
EcFr(Hi,Lk,Lo) export coefficient for forest land (high, most likely, low) kg/10^6 m2/yr
EcAg(Hi,Lk,Lo) export coefficient for agricultural land (high, most likely, low) kg/10^6 m2/yr
EcUr(Hi,Lk,Lo) export coefficient for urban area (high, most likely, low) kg/10^6 m2/yr
EcPr(Hi,Lk,Lo) export coefficient for precipitation (high, most likely, low) kg/10^6 m2/yr
EcSt(Hi,Lk,Lo) export coefficient for septic tanks (high, most likely, low) kg/(capita/yr)/yr
AreaFr area of forest land 10^6 m2
AreaAg area of agricultural land 10^6 m2
AreaUr area of urban land 10^6 m2
NCap number of capita years in watershed serviced by septic tank/tile filed systems impacting the lake N
SR(Hi,Lk,Lo) soil retention coefficient (high, most likely, low) dimensionless
PSI(Hi,Lk,Lo) point source input (high, most likely, low) kg/yr
Ind(P,S) average number of persons per living unit (Permanent, Seasonal) N
Days(P,S) number of days spent at unit per year (Permanent, Seasonal) N
Units(P,S) number of living units (Permanent, Seasonal) N
M total phosphorus mass loading kg/yr
smlog log model error logarithmic units
sm(pos,neg) model error (pos,neg) mg/L
sL(pos,neg) loading error (pos,neg) mg/L
sT(pos,neg) uncertainty (pos,neg) mg/L
h multiplier constant unitless
pbounded probability that the confidence limits contain the true value probability
limit(up/lo) confidence limits (up,lo) mg/L

At present, only the Higgins Lake data have been entered into the data tables. This code requires five input data frames as follows:

  1. lake: Name, Ad, r, Ao, Pr
  2. area: Name, For, Agr, Urb
  3. coHi: Name, ECFrHi, EcAgHi, EcUrHi, EcPrHi, EcStHi, SRHi, PSIHi
  4. coLk: Name, ECFrLk, EcAgLk, EcUrLk, EcPrLk, EcStLk, SRLk, PSILk
  5. coLo: Name, ECFrLo, EcAgLo, EcUrLo, EcPrLo, EcStLo, SRLo, PSILo

The data could be maintained in single table, but the assumption is that a database design could be implemented and the separate data.frmes represent data chunks pulled from different sources and likely maintained (updated) on different schedules by different individuals. The design is also chosen to expedidite analysis later.

# TODO: change to read external csv files and select lake data by lake name
lake <- data.frame(Name="Higgins", Ad=87.41, r=0.2415, Ao=38.4, Pr=0.254)
area <- data.frame(Name="Higgins", For=83.47, Agr=0.16, Urb=3.78)
coHi <- data.frame(Name="Higgins", EcFrHi=40, EcAgHi=80, EcUrHi=150, EcPrHi=50, EcStHi=1, SRHi=0.5, PSIHi=0)
coLk <- data.frame(Name="Higgins", EcFrLk=20, EcAgLk=30, EcUrLk=90,EcPrLk=30, EcStLk=0.6, SRLk=0.75, PSILk=0)
coLo <- data.frame(Name="Higgins", EcFrLo=2, EcAgLo=10, EcUrLo=50, EcPrLo=15, EcStLo=0.3, SRLo=0.95, PSILo=0)
# NB: hi-lo SR values appear reversed because will be 1-value in equations
pop <- data.frame(Name="Higgins", IndP=0, DaysP=0, UnitsP=0, IndS=3.5, DaysS=60, UnitsS=1000)
# TODO: add high and low future scenarios
# TODO: add input error check to confirm that input parameters conform to restrictions in Table 1.

Data Analysis to Calculate Lake Phosphorus Concentration

Steps 1 through 3 in Reckhow and Simpson (1980)

Estimate Areal Water Loading

Q <- (lake$Ad*lake$r)+(lake$Ao*lake$Pr)
qs <- Q/lake$Ao

The total inflow volume of water to lake Higgins is 30.86 106 m3/yr and the estimated areal water loading is 0.8.

Estimate Export Coefficients

High, low, and most likely coefficient values estimated from literature review and best professional judgement are in the coHi, CoLk, and CoLo dataframes.

Estimate Number of Per Capita years

NCap <- (pop$IndP*(pop$DaysP/365)*pop$UnitsP)+(pop$IndS*(pop$DaysS/365)*pop$UnitsS)

Higgins has 1000 seasonal homes and 0 permanent homes. The estimated number of per capita years is 575.34.

Estimate Soil Retention Coefficient

High, low, and most likely coefficient values estimated from literature review and best professional judgement are in the coHi, CoLk, and CoLo dataframes.

Estimate Point Source Input

These were 0 kg/yr in the published model for the study lake because it had no known point source inputs. If point sources were present, then a High, Low, and Most Likey value would need to be provided. These estimates would come from literature review and best professional judgement and would be provided via the coHi, CoLk, and CoLo dataframes.

Calculate Total Phosphorus Mass Loading

MHi <- (area$For*coHi$EcFrHi)+(area$Agr*coHi$EcAgHi)+(area$Urb*coHi$EcUrHi)+(lake$Ao*coHi$EcPrHi)+(NCap*coHi$EcStHi*(1-coHi$SRHi))+coHi$PSIHi

MLk <- (area$For*coLk$EcFrLk)+(area$Agr*coLk$EcAgLk)+(area$Urb*coLk$EcUrLk)+(lake$Ao*coLk$EcPrLk)+(NCap*coLk$EcStLk*(1-coLk$SRLk))+coLk$PSILk

MLo <- (area$For*coLo$EcFrLo)+(area$Agr*coLo$EcAgLo)+(area$Urb*coLo$EcUrLo)+(lake$Ao*coLo$EcPrLo)+(NCap*coLo$EcStLo*(1-coLo$SRLo))+coLo$PSILo
#TODO: vectorize this code for efficiency

barplot(c(MHi,MLk,MLo), main="Total Phosphorus Mass Loading", 
        ylab="kg/yr", xlab="Scenario", col="blue",
        names.arg=c("High", "Most Likely", "Low"))

The mass loading estimates for Higgins are: High = 6126.27, Most Likely = 3252.7, and Low = 942.17.

Calculate Annual Areal Phosphorus Loading

LHi <- (MHi/lake$Ao)*10^-3
LLk <- MLk/lake$Ao*10^-3
LLo <- MLo/lake$Ao*10^-3
#TODO: vectorize this code for efficiency

barplot(c(LHi,LLk,LLo), main="Annual Areal Phosphorus Loading", 
        ylab="g/m^2/yr", xlab="Scenario", col="blue",
        names.arg=c("High", "Most Likely", "Low"))

The areal loading estimates for Higgins are: High = 0.16, Most Likely = 0.08, and Low = 0.02.

Calculate Lake Phosphorus Concentration

PHi<- LHi/(11.6+(1.2*qs))
PLk<- LLk/(11.6+(1.2*qs))
PLo<- LLo/(11.6+(1.2*qs))
#TODO: vectorize this code for efficiency

barplot(c(PHi,PLk,PLo), main="Lake Phosphorus Concentration", 
        ylab="mg/L", xlab="Scenario", col="blue",
        names.arg=c("High", "Most Likely", "Low"))

The phosphorus concentration estimates for Higgins are: High = 0.01, Most Likely = 0.01, and Low = 0.

Data Analysis of Model Uncertainty

Step 4 in Reckhow and Simpson (1980)

Calculate Log of Most-Likely Phosphorus Value

logPLk <- log10(PLk)

Estimate Model and Loading Errors

This calculation requires the value smlog, model error, as one of the inputs. It is not clear how the value, 0.128, was obtained.

#Positive and negative model errors
smpos <- (10^(logPLk+0.128))-PLk
smneg <- (10^(logPLk-0.128))-PLk

#Positive and negative loading errors
sLpos <- (PHi-PLk)/2
sLneg <- (PLk-PLo)/2

#Positive and negative uncertainty
sTpos <- sqrt((smpos)^2+(sLpos)^2)
sTneg <- sqrt((smneg)^2+(sLneg)^2)

# Confidence limits
# multiple of prediction error, h=1=55% bounds, h=2=90% bounds
h = 2 
pbounded <- 1-(1/(2.25*h^2))
limitlo <- PLk-(h*sTneg)
limitup <- PLk+(h*sTpos)

There is a 90% probability that the true phosphorus concentration in Higgins falls between 0.0008 mg/L and 0.0143 mg/L.

Trophic State Determination

#determine where lower bound falls
if (limitlo<0.010){
    trophlo <- "oligotrophic"
} else if(0.01<limitlo & limitlo<0.020){
        trophlo <- "mesotrophic"
    } else if (0.020<limitlo & limitlo<0.050){
            trophlo <- "eutrophic"
        } else {
            trophlo <- "hypereutrophic"
        }
#determine where upper bound falls
if (limitup<0.010){
    trophup <- "oligotrophic"
} else if(0.01<limitup & limitup<0.020){
        trophup <- "mesotrophic"
    } else if (0.020<limitup & limitup<0.050){
            trophup <- "eutrophic"
        } else {
            trophup <- "hypereutrophic"
        }

The lower bound falls in the oligotrophic state and the upper bound falls in the mesotrophic state.

Next Steps

With this code I have all the components into R and have confirmed that the output at each step matches the published values. My goal is to include the ability to interactively explore the model by extending this example into R Shiny. Immediate next steps could be to:

  1. Add a few additional lakes to allow user to select a lake for analysis.
  2. Allow user to vary (slider bars) the values of P source parameters to see the effect on a known lake. This could allow side-by-side comparison of alternative water quality management actions or targets.
  3. Allow user to vary (slider bars) the most likely values for the variables estimated from literature review and best professional judgement.