Headline message

The use of a temporal adjustment method to estimate the long term mean concentrations based on a small sample is a difficult one and one needs to be extremely careful when selecting such method.

This very superficial and simple analysis suggests that the error in estimating an annual mean from a subset is dependent on the relationship between the reference and the rest of the sites as well as the length of the period used to estimate the annual mean. If the reference site is completely independent from the rest of the sites, then the errors are expected to be large but if the reference is chosen as background and there is a common temporal variation with the rest of the sites, the errors are expected to be smaller. Also, shorter periods (here we simulated one week and one month) lead to significantly larger expected errors.

Finally, for all the tests performed, the difference method resulted in smaller expected errors than the ratio method.

Analysis

This is an attempt at clarifying my understanding of the temporal adjustment applied to integrated samples within the SPARTANZ project. The problem can be summarized as follows:

We have data from several sites during a measurement campaign where only a small set of those sites were sampled for the duration of the campaign while the rest only have information for part of the period.

Hoek et al. (2002) applies the following:

To limit potential bias in the comparison of annual averages across sites due to temporal variation, an adjustment was conducted using data from the continuous measurement site. The adjustment procedure included calculation of the annual average concentration for the continuous measurement site (involving all 16 or more measurement periods); calculation for the continuous measurement site of the difference and the ratio of the measurement for period t (t = 1-16 or more) from the annual average; multiplication of the measurement at site i (i = 1-40) in period t with the adjustment factor for period t (or addition of the difference for period t) and finally calculation of the annual average concentration from the adjusted concentration values.

To test the suitability of these methods, let’s create some artificial datasets representing different ideals. All the datasets will consist of 50 sites, plus one labelled reference site

set.seed(1)

Random measurements are completely independent of each other

This is simulated by extracting all the datapoints from a log-normal distribution whose maximum is also extracted from a log-normal distribution (mean=0, sd=1). This gives 50 random time series with random means.

independent_data <- as.data.frame(matrix(data = NA, nrow = 360,ncol = 50))
independent_data$reference <- rlnorm(360,0,1)
for (i in (1:50)){
  independent_data[,i]<-rlnorm(360,rlnorm(1,0,1),1)
}

Then we calculate the annual mean for all the variables and we divide the year into 45 weeks of 8 days each and calculate the mean for all those periods.

independent_data.periods <- subset(independent_data,as.numeric(row.names(independent_data))<=45)
for (i in (1:45)){
  independent_data.periods[i,] <-colMeans(independent_data[((i-1)*8+1):(i*8),])
}

Now, we calculate the annaul mean for the reference variable and the weighting vector for temporal adjustment.

ref_mean <- mean(independent_data$reference)
independent_data.periods$temp_adj1 <- ref_mean / independent_data.periods$reference
independent_data.periods$temp_adj2 <- ref_mean - independent_data.periods$reference

With the mean and temporal adjustments we calculate the annual mean estimates for each period and compare them to the actual annual mean for the variables.

annual_obs<-colMeans(independent_data.periods[,1:51],na.rm = TRUE)
annual_estimate <- independent_data.periods
annual_estimate$temp_adj1<-NULL
annual_estimate$temp_adj2<-NULL
annual_errors <- independent_data.periods
annual_errors$temp_adj1<-NULL
annual_errors$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate[i,] <-independent_data.periods[i,1:51]*independent_data.periods$temp_adj1[i]
  annual_errors[i,] <- 100*(annual_estimate[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by ratio',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')
 annual_estimate2 <- independent_data.periods
annual_estimate2$temp_adj1<-NULL
annual_estimate2$temp_adj2<-NULL
annual_errors2 <- independent_data.periods
annual_errors2$temp_adj1<-NULL
annual_errors2$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate2[i,] <-independent_data.periods[i,1:51]+independent_data.periods$temp_adj2[i]
  annual_errors2[i,] <- 100*(annual_estimate2[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by difference',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')

Reference is the “background”

This is simulated by first extracting the refrerence dataset from a log-normal distribution (mean=0, sd=1). Then, each sample location data is obtained by adding values taken from a log-normal distribution (mean=0, sd=1) to the reference value for that point. This gives 50 random time series with random means that are all bigger than the reference values and thus simulating a random variation in the data over a common background.

independent_data <- as.data.frame(matrix(data = NA, nrow = 360,ncol = 50))
independent_data$reference <- rlnorm(360,0,1)
for (i in (1:50)){
  independent_data[,i]<-rlnorm(360,rlnorm(1,0,1),1)+independent_data$reference
}

Then we calculate the annual mean for all the variables and we divide the year into 45 weeks of 8 days each and calculate the mean for all those periods.

independent_data.periods <- subset(independent_data,as.numeric(row.names(independent_data))<=45)
for (i in (1:45)){
  independent_data.periods[i,] <-colMeans(independent_data[((i-1)*8+1):(i*8),])
}

Now, we calculate the annaul mean for the reference variable and the weighting vector for temporal adjustment.

ref_mean <- mean(independent_data$reference)
independent_data.periods$temp_adj1 <- ref_mean / independent_data.periods$reference
independent_data.periods$temp_adj2 <- ref_mean - independent_data.periods$reference

With the mean and temporal adjustments we calculate the annual mean estimates for each period and compare them to the actual annual mean for the variables.

annual_obs<-colMeans(independent_data.periods[,1:51],na.rm = TRUE)
annual_estimate <- independent_data.periods
annual_estimate$temp_adj1<-NULL
annual_estimate$temp_adj2<-NULL
annual_errors <- independent_data.periods
annual_errors$temp_adj1<-NULL
annual_errors$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate[i,] <-independent_data.periods[i,1:51]*independent_data.periods$temp_adj1[i]
  annual_errors[i,] <- 100*(annual_estimate[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by ratio',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')
annual_estimate2 <- independent_data.periods
annual_estimate2$temp_adj1<-NULL
annual_estimate2$temp_adj2<-NULL
annual_errors2 <- independent_data.periods
annual_errors2$temp_adj1<-NULL
annual_errors2$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate2[i,] <-independent_data.periods[i,1:51]+independent_data.periods$temp_adj2[i]
  annual_errors2[i,] <- 100*(annual_estimate2[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by difference',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')

Reference is the “background” and there is a common seasonal variability in the data

This is simulated by first extracting the reference dataset from a log-normal distribution (mean=1, sd=1), modulated by a sinusoidal seasonal trend with larger values in the middle of the period (simulating high concentrations during the southern hemisphere winter). The sample location data added to the reference data is still extracted from a log-normal distribution but the mean of the distribution follows the same sinusoidal seasonal trend but with a larger amplitude (double the one of the reference)

seasonal_adj<-sin(pi*(1:360)/360)
independent_data <- as.data.frame(matrix(data = NA, nrow = 360,ncol = 50))
independent_data$reference <- rlnorm(360,1,1)*seasonal_adj/2
for (i in (1:50)){
    for (j in (1:360)){
      independent_data[j,i]<-rlnorm(1,seasonal_adj[j],1)+independent_data$reference[j]
    }
}

Then we calculate the annual mean for all the variables and we divide the year into 45 weeks of 8 days each and calculate the mean for all those periods.

independent_data.periods <- subset(independent_data,as.numeric(row.names(independent_data))<=45)
for (i in (1:45)){
  independent_data.periods[i,] <-colMeans(independent_data[((i-1)*8+1):(i*8),])
}

Now, we calculate the annaul mean for the reference variable and the weighting vector for temporal adjustment.

ref_mean <- mean(independent_data$reference)
independent_data.periods$temp_adj1 <- ref_mean / independent_data.periods$reference
independent_data.periods$temp_adj2 <- ref_mean - independent_data.periods$reference

With the mean and temporal adjustments we calculate the annual mean estimates for each period and compare them to the actual annual mean for the variables.

annual_obs<-colMeans(independent_data.periods[,1:51],na.rm = TRUE)
annual_estimate <- independent_data.periods
annual_estimate$temp_adj1<-NULL
annual_estimate$temp_adj2<-NULL
annual_errors <- independent_data.periods
annual_errors$temp_adj1<-NULL
annual_errors$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate[i,] <-independent_data.periods[i,1:51]*independent_data.periods$temp_adj1[i]
  annual_errors[i,] <- 100*(annual_estimate[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by ratio',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')
annual_estimate2 <- independent_data.periods
annual_estimate2$temp_adj1<-NULL
annual_estimate2$temp_adj2<-NULL
annual_errors2 <- independent_data.periods
annual_errors2$temp_adj1<-NULL
annual_errors2$temp_adj2<-NULL
for (i in (1:45)){
  annual_estimate2[i,] <-independent_data.periods[i,1:51]+independent_data.periods$temp_adj2[i]
  annual_errors2[i,] <- 100*(annual_estimate2[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors2,main = 'Temporal adjustment by difference',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')

Reference is the “background” and there is a common seasonal variability in the data MONTHLY DATA

The same original data as in the previous section was used but instead of weekly periods, 30 days were used (monthly) instead.

seasonal_adj<-sin(pi*(1:360)/360)
independent_data <- as.data.frame(matrix(data = NA, nrow = 360,ncol = 50))
independent_data$reference <- rlnorm(360,1,1)*seasonal_adj/2
for (i in (1:50)){
    for (j in (1:360)){
      independent_data[j,i]<-rlnorm(1,seasonal_adj[j],1)+independent_data$reference[j]
    }
}

Then we calculate the annual mean for all the variables and we divide the year into 12 months of 30 days each and calculate the mean for all those periods.

independent_data.periods <- subset(independent_data,as.numeric(row.names(independent_data))<=12)
for (i in (1:12)){
  independent_data.periods[i,] <-colMeans(independent_data[((i-1)*30+1):(i*30),])
}

Now, we calculate the annaul mean for the reference variable and the weighting vector for temporal adjustment.

ref_mean <- mean(independent_data$reference)
independent_data.periods$temp_adj1 <- ref_mean / independent_data.periods$reference
independent_data.periods$temp_adj2 <- ref_mean - independent_data.periods$reference

With the mean and temporal adjustments we calculate the annual mean estimates for each period and compare them to the actual annual mean for the variables.

annual_obs<-colMeans(independent_data.periods[,1:51],na.rm = TRUE)
annual_estimate <- independent_data.periods
annual_estimate$temp_adj1<-NULL
annual_estimate$temp_adj2<-NULL
annual_errors <- independent_data.periods
annual_errors$temp_adj1<-NULL
annual_errors$temp_adj2<-NULL
for (i in (1:12)){
  annual_estimate[i,] <-independent_data.periods[i,1:51]*independent_data.periods$temp_adj1[i]
  annual_errors[i,] <- 100*(annual_estimate[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors,main = 'Temporal adjustment by ratio',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')
annual_estimate2 <- independent_data.periods
annual_estimate2$temp_adj1<-NULL
annual_estimate2$temp_adj2<-NULL
annual_errors2 <- independent_data.periods
annual_errors2$temp_adj1<-NULL
annual_errors2$temp_adj2<-NULL
for (i in (1:12)){
  annual_estimate2[i,] <-independent_data.periods[i,1:51]+independent_data.periods$temp_adj2[i]
  annual_errors2[i,] <- 100*(annual_estimate2[i,]-annual_obs)/annual_obs
}
boxplot(annual_errors2,main = 'Temporal adjustment by difference',xlab = 'Sample site',ylab = 'Annual Mean Error [%]')

References

Hoek, Gerard, Kees Meliefste, Josef Cyrys, Marie Lewné, Tom Bellander, Mike Brauer, Paul Fischer et al. “Spatial variability of fine particle concentrations in three European areas.” Atmospheric Environment 36, no. 25 (2002): 4077-4088.