Calculating Reference Intervals for Analytes in R

2024-02-01

Calculating reference intervals for analytes used in assessing liver health

Normal values of analytes can vary depending on several factors.\(^1\)

The nature of the population being sampled (sex, age, treatment status)
Regional variations (temperatures, pressures, altitudes)
The method being used to measure the analyte can affect the reference interval

In order to predict disease states, CLSI recommends a reference interval with 90% confidence of healthy individuals of a population.

Selecting Populations

Hepatitis C Data

- contains analyte data from 615 anonymized patients
- 533 disease negative
- 7 unknown status
- 75 in disease state

Reference ranges should be calculated separately by sex and age. For this presentation, we will be establishing a reference range for the AST marker in healthy adult men and women.

HCV sample data provided by ZAHRA AMINI on Kaggle.com https://www.kaggle.com/datasets/aminizahra/hcv-data

Population histogram for AST among healthy adult males and females

Scatter plot of AST values among healthy and diseased populations

Basic Statistics calculations

First we calculate the mean: \[ Mean = \frac{\sum_{1}^nx_{i}}{n} \] Then find the standard deviation: \[ SD = \large\sqrt{\frac{\sum_{}{|{x}-\bar{x}|}}{n-1}} \] Then calculate the variance: \[ s^2 = \large\frac{1}{n-1}\sum_{i=1}^n(x_{i}-\bar{x})^2 \]

Calculations to find a 90% reference interval

To find the upper and lower limits of the analyte for a specific population we can use the formula\(^2\):

\[ Limits = Mean \pm 1.96 \times Standard Deviation \]

Then calculate a 90% confidence interval for each limit using the following formula\(^2\):

\[ Limit\: \pm\: 1.64 \: \times\: \sqrt{Variance \times(\frac{1}{n} + \frac{2}{n-1})} \]

Sample R code for finding reference interval

confidence_int = function(dataset, test) {
  testdata = dataset %>% 
    select(test) %>% na.omit 
  n = nrow(testdata)  # find the sample number
  testdata = pull(testdata,test)  # convert column to vector to use sd function
  mean_pop = mean(testdata)  # find the mean
  sd_pop = sd(testdata)  # find the standard deviation for each population
  var_pop = var(testdata)   # find the variance of each population
  
  upper_limit = mean_pop + (1.96 * sd_pop)
  lower_limit = mean_pop - (1.96 * sd_pop)

  # find the 90% confidence interval for upper and lower
  low = lower_limit + 1.64 * sqrt(var_pop * (1/n + 2/(n-1)))
  high = upper_limit + 1.64 * sqrt(var_pop * (1/n + 2/(n-1)))
  
  if (low<0) low=0 #not possible to be less than zero level
  return(c(low,high))}

The range for AST for Healthy Adult Males is 14.47, 43.9
The range for AST for Healthy Adult Females is 9.679, 40.75

A selection of reference intervals for common liver analytes

References

1 - Schoonjans, F. (n.d.). Reference interval. MedCalc. Retrieved February 1, 2024, from https://www.medcalc.org/manual/referenceinterval.php

2 -Learn how reference range is determined for laboratory tests. (n.d.). Retrieved February 1, 2024, from https://www.clinlabnavigator.com/reference-ranges.html