Non-detect method examples

Site_ID	Region	Date	Concentration	Detect_Limit
270535501	1	2009-07-30	0.00148	0.10
270535501	1	2009-09-30	0.00064	0.10
270535501	1	2009-11-30	0.24256	0.10
270535501	1	2009-12-30	0.00064	0.10
270535501	1	2010-01-30	0.16001	0.10
270535501	1	2010-03-30	0.19300	0.10
270535502	1	2009-03-30	0.21219	0.01
270535502	1	2009-07-30	0.01113	0.01
270535502	1	2009-09-30	0.00044	0.01
270535502	1	2009-11-30	0.00127	0.01
270535502	1	2009-12-30	0.00613	0.01
270535502	1	2010-01-30	0.02216	0.01
270535502	1	2010-03-30	0.08113	0.02

Methods

For air monitoring data we currently use the EnvStats package to generate annual summary values and confidence intervals. The package provides multiple non-detect methods that will automatically calculate confidence intervals for you. The methods below will help reduce the effects of different detection limits between sites and years.

To begin, add a column indicating if the sample was detected or not. When the concentration is less than the detection limit the Censored column is set to TRUE.

df$Censored <-  (df$Concentration) < (df$Detect_Limit)

Site_ID	Region	Date	Concentration	Detect_Limit	Censored
270535501	1	2009-07-30	0.00148	0.1	TRUE
270535501	1	2009-09-30	0.00064	0.1	TRUE
270535501	1	2009-11-30	0.24256	0.1	FALSE
270535501	1	2009-12-30	0.00064	0.1	TRUE
270535501	1	2010-01-30	0.16001	0.1	FALSE
270535501	1	2010-03-30	0.19300	0.1	FALSE

Similar to NADA, EnvStats will assume a detection limit is located at the value of the lowest detected concentration. To force the summary to show the correct detection limits you can substitute the detection limit in for the non-detected values.

# Save the original mean of the raw values
raw_mean <- mean(df$Concentration, na.rm=T)

# Replace non-detects with the detection limit
df$Concentration <- ifelse(df$Censored, df$Detect_Limit, df$Concentration)

Next we can use EnvStats to estimate the mean for the entire region. Because there are multiple detection limits the potential methods we can use are more limited. For multiple detection limits with limited amounts of detections (less than 10), Helsel recommends using Kaplan-Meier. Here’s an example of running Kaplan-Meier in EnvStats using the function enparCensored().

Kaplan-Meier

library(EnvStats)

# Kaplan-Meier

# summary for all sites and dates


km_summary <- enparCensored(df$Concentration, 
                            df$Censored, 
                            ci=T,
                            ci.method = "bootstrap",
                            n.bootstraps = 3000)

print(km_summary)

## 
## Results of Distribution Parameter Estimation
## Based on Type I Censored Data
## --------------------------------------------
## 
## Assumed Distribution:            None
## 
## Censoring Side:                  left
## 
## Censoring Level(s):              0.01 0.10 
## 
## Estimated Parameter(s):          mean    = 0.07879923
##                                  sd      = 0.08648545
##                                  se.mean = 0.01346641
## 
## Estimation Method:               Kaplan-Meier
## 
## Data:                            df$Concentration
## 
## Censoring Variable:              df$Censored
## 
## Sample Size:                     13
## 
## Percent Censored:                46.15385%
## 
## Confidence Interval for:         mean
## 
## Assumed Sample Size:             13
## 
## Confidence Interval Method:      Bootstrap
## 
## Number of Bootstraps:            3000
## 
## Number of Bootstrap Samples
## With No Censored Values:         1
## 
## Number of Times Bootstrap
## Repeated Because Too Few
## Uncensored Observations:         1
## 
## Confidence Interval Type:        two-sided
## 
## Confidence Level:                95%
## 
## Confidence Interval:             Pct.LCL = 0.03936345
##                                  Pct.UCL = 0.11841510
##                                  BCa.LCL = 0.03888024
##                                  BCa.UCL = 0.11788602
##                                  t.LCL   = 0.04240100
##                                  t.UCL   = 0.16189737

To pull out the mean value from the results use: $parameters[[1]]

km_mean <- enparCensored(df$Concentration, df$Censored)$parameters[[1]]

paste("The Kaplan-Meier mean is", round(km_mean, 3))

## [1] "The Kaplan-Meier mean is 0.079"

NADA

Compare with the NADA package

library(NADA)

Nada_KM_Mean <- cenfit(df$Concentration, df$Censored)

print(Nada_KM_Mean)

##           n       n.cen      median        mean          sd 
## 13.00000000  6.00000000  0.02216000  0.07919038  0.09448937

ROS and MLE

When you have larger data sets there are additional methods such as ROS and MLE, which use the the observed distribution of the detected values and applies it to predict the location of non-detect values. These methods usually generate less conservative (lower estimates of the mean), but in most cases are a closer approximation to reality. Below are 2 examples of using the ROS and MLE methods in the EnvStats package.

library(EnvStats)

# ROS

# Summary across all sites and dates

ros_summary <- enormCensored(df$Concentration, 
                             df$Censored, 
                             method = "rROS", 
                             ci = F,
                             lb.impute = min(df$Detect_Limit) * 0.10,
                             ub.impute = max(df$Detect_Limit))

# The lower bound imputed limt (lb.impute) sets the minimum assigned value for non-detects. If you know the background concentration for a body of water that could be a good minimum value. The example above uses 10% of the minimum detection limit for the lower bound.

# The upper bound imputed limt (ub.impute) sets the maximum assigned value for non-detects. It's set at the highest detection limit above.

print(ros_summary)

## 
## Results of Distribution Parameter Estimation
## Based on Type I Censored Data
## --------------------------------------------
## 
## Assumed Distribution:            Normal
## 
## Censoring Side:                  left
## 
## Censoring Level(s):              0.01 0.10 
## 
## Estimated Parameter(s):          mean = 0.07624475
##                                  sd   = 0.09239318
## 
## Estimation Method:               Imputation With Q-Q Regression (rROS)
## 
## Plotting Position Method:        hirsch-stedinger
## 
## Plotting Position Constant:      0.375
## 
## Data:                            df$Concentration
## 
## Censoring Variable:              df$Censored
## 
## Sample Size:                     13
## 
## Percent Censored:                46.15385%

# MLE

#summary across all sites and dates

mle_summary <- enormCensored(df$Concentration, 
                             df$Censored, 
                             method = "mle", 
                             ci = F)

print(mle_summary)

## 
## Results of Distribution Parameter Estimation
## Based on Type I Censored Data
## --------------------------------------------
## 
## Assumed Distribution:            Normal
## 
## Censoring Side:                  left
## 
## Censoring Level(s):              0.01 0.10 
## 
## Estimated Parameter(s):          mean = 0.04964891
##                                  sd   = 0.12195139
## 
## Estimation Method:               MLE
## 
## Data:                            df$Concentration
## 
## Censoring Variable:              df$Censored
## 
## Sample Size:                     13
## 
## Percent Censored:                46.15385%

1/2 the DL

Compare to substitution of 1/2 the detection limt

sub_half <- df

sub_half$Concentration <- ifelse(df$Censored, 
                                 df$Detect_Limit * 0.5, 
                                 df$Concentration)

sub_half_DL_Mean <- mean(sub_half$Concentration)

print(sub_half_DL_Mean)

## [1] 0.08362923

Random substitution

Compare to random substitution between 0 and the detection limit

set.seed(8*3*2016)

sub_random <- df

sub_random$Concentration <- ifelse(df$Censored, 
                                   runif(1, 0, df$Detect_Limit), 
                                   df$Concentration)

sub_random_Mean <- mean(sub_random$Concentration)

print(sub_random_Mean)

## [1] 0.08726135

Single detection limit

When your data has a single detection limit, as for air monitoring data from a single year, annual statistics can be generated using the imputed MLE method below.

# Imputed MLE

#summary across a single site with one detection limit

site1 <- filter(df, Site_ID == 270535501)

mle_summary <- enormCensored(site1$Concentration, 
                             site1$Censored, 
                             method = "impute.w.mle",
                             lb.impute = min(site1$Detect_Limit) * 0.10,
                             ci = T,
                             ci.method = "bootstrap")

#print(mle_summary)