Statistical Analysis of Historical Data and Recommendation for Alert and Action Levels

Purpose

The purpose of this analysis was to prescribe alert and action levels from given historical data for water conductivity, TOC, endotoxin, and bioburden environmental testing.

Scope

The scope of this analysis and recommendations therein encompasses only the historical data used. For brevity and simplicity, it does not address sampling methodology and assumes normal operating conditions. In this document:

Levels are based on historical data
Two standard deviations plus population estimates are used to create initial alert levels
Three standard deviations plus population estimates are used to create initial action levels
Recommendations are given to address data spikes
Recommendations for long term action levels are given.

Introduction

Within any bio-pharmaceutical environment, chemical or microbial water contamination can have a significant negative impact on a manufacturing process, thereby the product, and thereby the consumer. Environmental monitoring is a key tool in the prevention of unwanted contaminants through ensuring the manufacturing process is consistent with acceptable levels per FDA and EU guidelines. By being consistent with these levels, we ensure our products are manufactured in a safe and controlled space such that we can be sure that our products are made free of these contaminants.

Alert limits are intended as levels that can be used to signal a potential drift from normal operating conditions, such as an increase in microbial endotoxin levels in the manufacturing water system. Reaching an alert limit does not necessarily require suspending an operation, but it is an indication that operators and key personnel should want to begin monitoring the process more closely or begin an investigation. Action limits are intended to signal a definitive drift from normal operating conditions, and as such, indicative that the system is no longer within proper manufacturing control. Reaching an action limit does require suspension of an operation and a formal investigation.

Performing regular trend analysis of environmental monitoring data is crucial ensuring control is maintained in normal operating conditions. Trend analysis helps ensure this by:

Identifying any drift from control over time
Identifying issues even before alert limits are reached
Allows for risk mitigation before the drift affects the products
Allows readiness anticipatory of regulatory body audits

Per FDA and EU guidelines, the environmental monitoring program must be defined, documented, and maintained; and it must detail:

Sampling locations
Monitoring periodicity
Sampling periodicity
Sampling duration
Sample size (mass, liquid volume, surface area, air volume)
Sampling equipment
Sampling techniques
Alert and action levels
Actions taken when specifications are exceeded

Bioburden and Endotoxin

Purpose of bioburden and endotoxin testing

Bioburden is defined as the population of viable microorganisms on or in a particular object or medium, formulation, and/or finished product. It is the number of bacteria living in or on a surface or medium that has not been sterilized. It is expressed as CFU/mL (Conlony Forming Units). The purpose of bioburden testing is to:

determine the total number of viable microorganisms in or on a medical device, container, or component after completion of all in-process steps before sterilization
act as an early warning system for possible production issues that could lead to inadequate sterilization and possible product recall
calculate the necessary dose for effective sterilization against bacteria
act as an indicator of the overall manufacturing condition

Lipopolysaccharide (LPS) is the target agent in endotoxin testing. It is expressed as EU/mL (Endotoxin Units). LPS is found in the outer cell wall of Gram-negative bacteria (i.e. E. coli). LPS macromolecules range from 10 -20 kDA in molecular weight. The component responsible for the macromolecule’s toxic property is the lipid A hydrophobic section found in the cell outer membrane. While LPS serves to protect bacteria from bile salts and lipophilic antibodies; it acts as a pyrogenic agent when introduced to body tissue or blood, causing anything from fever to septic shock.

Figure 1. Endotoxin Structure

Outer cell wall components of Gram-negative bacteria are constantly released into the environment upon bacteria cell division or lysis. Because LPS is an extremely stable molecule (thermally resistant, ETO resistant, irradiation resistant) and because it accumulates in heat and chemical resistant biofilms, endotoxin environmental monitoring is pivotal in assuring a safe and controlled medical device and pharmaceutical environment.

The establishment of a microbial contamination program is critical to getting a new facility qualified and maintained in a state of control once qualified. Water is a primary source of bioburden and endotoxins due to the propensity of biofilm to adhere and accumulate to the surface of pipes. Water flow rates less than 3 ft/sec will not be paticualarly susceptible to biofilm amalgamation. Environmental monitoring can pass endotoxin screening but still fail bioburden screening as well as vice versa; therefore it is important to perform both tests in microbial screening.

Legal basis of bioburden and endotoxin testing

The legal basis for Bioburden testing lies within CFR 21 (Code of Federal Register 21) and ISO 11737 worldwide. 21 C.F.R. 211.110 (a)(6) states that bioburden in-process testing must be conducted pursuant to written procedures during the manufacturing process of drug products. Current good manufacturing practice (CGMP) requirements specified in 21 CFR Part 211.113, Control of Microbiological Contamination, states “Appropriate written procedures, designed to prevent objectionable microorganisms in drug products not required to be sterile, shall be established and followed.” The United States Pharmacopeia (USP) provides several tests that can be performed to quantify the Bioburden of non-sterile drug products.

Water is a frequent source of both endotoxins and bioburden, which carry Gram negative bacteria. There is no required or established industry method for setting water levels such as bioburden environmental levels, however, use of standard deviations to set levels is the most common approach due to simplicity. A misleading argument often used against using standard deviations is that microbial data does not fit a normal distribution. However, standard deviation is a good indicator of the dispersion of data, regardless of whether or not the data is normally distributed.

The Federal Register, January 18, 1980, proposed guidelines for determining endotoxins with the Limulus Amebocyte Lysate Test (LAL). Subsequently, the draft guideline was revised and reissued in 1983. The USP XX, 5th supplement, revised the Bacterial Endotoxins Test. However, unlike the FDA draft guideline, no retest provisions were included. Pharmaceutical grade purified water have an allowable endotoxin limit of 0.25 EU/mL. However, Bacteriostatic Water for Injection and Sterile Water for Inhalation have been given a slightly higher bacterial endotoxin limit of 0.5 EU/ml (USP - Supplement 4a - 1984).

TOC and Conductivity

Purpose of TOC and Conductivity testing

Total organic carbon (TOC) is an important parameter in purified water. It is expressed in parts per billion Carbon (ppbC). TOC quantifies the presence of organic carbon and is used as an indicator of water quality and cleanliness of pharmaceutical manufacturing facilities and equipment. Water conductivity is another measure of the purity of water, but regarding ion contaminants. It is expressed in microsiemens per centimeter (µS/cm). Water molecules dissociate into ions as a function of pH and temperature, which results in very predictable conductivity. Pure distilled and deionized water, for instance, has a conductivity of 0.05 µS/cm. Many gases (i.e.carbon dioxide) readily dissolve in water and interact to form ions, and because of this their resulting conductivity is considered intrinsic to water. But water is also affected by extraneous ions (i.e. chloride, ammonium), therefore water conductivity is a crucial part of environmental monitoring to ensure water quality does not negatively impact pharmaceutical processes, thereby products, and thereby consumers.

Statistical considerations for analysis

When evaluating historical data, it is always appropriate to consider whether it is important that the data fit a standard statistical model (i.e. normal distribution). For environmental monitoring (EM) alert/action levels, whether or not the data fits a standard classical model is less important than whether the alert and action levels are based on real empirical data. One reason why data may not fit a normal distribution is due to spikes. It is typical for most of bioburden data, for example, to be near the mean with occasional spikes.

Another consideration is what to do with missing data. When managing missing data, it is important to first have some knowledge of how the data was captured. Techniques like mean or regression imputation are predicated on assumptions on the data, therefore Identifying the type of missingness present is crucial in choosing the appropriate technique to manage it. Reasons for missing data can range from being a function of observed data (MAR) to event factors not captured in the data (MNAR). Without a predicate knowledge of the data capturing process, it is difficult to identify a root cause for the missing data and thereby the appropriate statistical response. When large amounts of data occur repeatedly missing across multiple attributes, it is a good practice to assume the data as MCAR or missing independent of the observed and unobserved data (i.e. who submitted the data or an error in the entire data capturing process). In many cases, missing data can be substituted with column means or regression-extrapolated values, but when attempting to set alert and action levels, it is best for the levels to be informed by the empirical data; and that often requires omitting data where there is no observation.

Another consideration is whether there is too little data. When there is not enough environmental monitoring data to establish long-term levels, initial/temporary levels can be used to establish a baseline to create temporary alert and action levels. When sufficient historical data exists, long term levels can be established. At this point, a plan of action for setting the long term alert and action levels should be documented, and it should cover the transition from temporary to long term levels as well as frequency of re-evaluation. Lastly, it is not desirable to set alert levels so low that they are triggered often.

When setting alert / action levels, it is not desirable that the alert / action levels be triggered often, which would be indicative of either too much variability in the results or that the alert level is too low. It is best to use estimates to establish values rather than averages or maximum values. In this simplistic approach, estimates are taken from the mean and standard error, and two and three standard deviations applied to them to approximate 95 and 97% of the population.

Figure 2. Normal Distribution

Methods

Assuming sampling methodology being already defined and maintained, and assuming historical data representing normal operating conditions, a trend analysis was performed from the historical data in an attempt to prescribe appropriate alert and action levels for bioburden, endotoxin, TOC, and conductivity environmental monitoring.

Missing data was presumed MCAR with no multicolinearity across attributes (sampling sites). Sampling dates with greater than 40% missing data were omitted from analysis, which accounted for roughly 17% of the original data. Imputation was not conducted on the missing data in order to ensure only empirical data observed, not inferred, projected, or extrapolated, was used to create alert and action levels.

Univariate analysis was performed on each set. A Shapiro_Wilk statistical test was used to evaluate the normality in observations. Alert levels were set to 2 standard deviations above historical estimates. Action levels were set to 3 standard deviations above historical estimates. This corresponds to the 95 percentile and 97 percentile approach used in data that fits a normal distribution. Standard deviation is deemed appropriate as it is still a useful measure of the dispersion of the data. It results in tight alert and action levels that are not too restrictive but can still signal any drift from normal operating conditions. Because of the limit in historical data, the levels prescribed were initial/ temporary levels to serve as a baseline until the availability of more historical data.

Bioburden

Analysis

biob <- read.csv("C:\\Users\\Greg Mack\\Documents\\Bioburden.csv",header = TRUE)
biob2 <- biob[c(17:109,111:112),]
biob2$QC.Front <- as.numeric(biob2$QC.Front)
biob2$QC.Back <- as.numeric(biob2$QC.Back)

install.packages("lubridate")
library(lubridate)
Collect_date <- dmy(biob2$ï..Collection.Date)
biob2$Col_Date <- Collect_date

par(mfrow=c(2,2))

hist(biob2$X104, xlab="CFU", ylab = "Frequency", main = "X104 CFU distribution")
hist(biob2$X108, xlab="CFU", ylab = "Frequency", main = "X108 CFU distribution")
hist(biob2$X111.01, xlab="CFU", ylab = "Frequency", main = "X111.01 CFU distribution")
hist(biob2$X111.05, xlab="CFU", ylab = "Frequency", main = "X111.05 CFU distribution")
hist(biob2$X111.06, xlab="CFU", ylab = "Frequency", main = "X111.06 CFU distribution")
hist(biob2$X113.01, xlab="CFU", ylab = "Frequency", main = "X113.01 CFU distribution")
hist(biob2$X113.02, xlab="CFU", ylab = "Frequency", main = "X113.02 CFU distribution")
hist(biob2$X115, xlab="CFU", ylab = "Frequency", main = "X115 CFU distribution")
hist(biob2$X117, xlab="CFU", ylab = "Frequency", main = "X117 CFU distribution")
hist(biob2$X123, xlab="CFU", ylab = "Frequency", main = "X123 CFU distribution")
hist(biob2$QC.Front, xlab="CFU", ylab = "Frequency", main = "QC Front CFU distribution")
hist(biob2$QC.Back, xlab="CFU", ylab = "Frequency", main = "QC Back CFU distribution")
hist(biob2$X172.Source, xlab="CFU", ylab = "Frequency", main = "X172 Source CFU distribution")
hist(biob2$X172.Return, xlab="CFU", ylab = "Frequency", main = "X172 Return CFU distribution")

boxplot(biob2$X104, ylab="CFU", main = "X104 CFU distribution")
boxplot(biob2$X108, ylab="CFU")
boxplot(biob2$X111.01, ylab="CFU")
boxplot(biob2$X111.05, ylab="CFU")
boxplot(biob2$X111.06, ylab="CFU")
boxplot(biob2$X113.01, ylab="CFU")
boxplot(biob2$X113.02, ylab="CFU")
boxplot(biob2$X115, ylab="CFU")
boxplot(biob2$X117, ylab="CFU")
boxplot(biob2$X123, ylab="CFU")
boxplot(biob2$QC.Front, ylab="CFU")

sapply(biob2[, c(2:15)], shapiro.test)

SE <- function(x, na.rm){
  if (na.rm) x <- na.omit(x)
  A <- sd(x) / sqrt(length(x))
  return(A)
}

mode <- function(x){
  which.max(tabulate(x))
}

Se <- function(x){
  A <- sd(x) / sqrt(length(x))
  return(A)
}


means_bio <- sapply(biob2[c(2:15)], mean, na.rm=TRUE)
stdDEVs_bio <- sapply(biob2[c(2:15)], sd, na.rm=TRUE)
maxs_bio <- sapply(biob2[c(2:15)], max, na.rm=TRUE)
mins_bio <- sapply(biob2[c(2:15)], min, na.rm=TRUE)
stdErrs_bio <- sapply(biob2[c(2:15)], SE, na.rm=TRUE)
Modes_bio <- sapply(biob2[,c(2:15)], mode)

df_bio <- cbind(means_bio, stdDEVs_bio, stdErrs_bio, mins_bio, maxs_bio, Modes_bio)

Estimates_bio <- rowSums(df_bio[, c(1,3)])
Estimates_2xSD_bio <- df_bio[, c(2)]*2 + df_bio[, c(6)]
Estimates_3xSD_bio <- df_bio[, c(2)]*3 + df_bio[, c(6)]
df_bio <- cbind(df_bio, Estimates_bio, Estimates_2xSD_bio, Estimates_3xSD_bio)
df_bio

Average_bio <- mean(means_bio)
StandardDeviation_bio <- sd(means_bio)
StandardError_bio <- Se(means_bio)
EstimateOverall_bio <- Average_bio + StandardError_bio
Estimate_2SD_bio <- EstimateOverall_bio + 2*StandardDeviation_bio
Estimate_3SD_bio <- EstimateOverall_bio + 3*StandardDeviation_bio

Levels_bio <- cbind(Average_bio, StandardDeviation_bio, StandardError_bio, EstimateOverall_bio, Estimate_2SD_bio, Estimate_3SD_bio)
Levels_bio

Results

Analysis of the data showed significant variability in CFU observed between sites. Univariate analysis of bioburden showed a Poisson distribution. Non-normal distribution was confirmed by Sharpio_Wilk statistical test, where respective p-values fell well below 0.05 cutoff in orders of 10^-5 or greater. The predominant modes per site was 1, 2, and 4 CFU. QC.Back, however, showed a significantly higher mode (224 CFU) than all other sites as well as a higher mean (184 CFU). X123 showed the highest spike (1761 CFU). Means and modes near 0 CFU juxtaposed to high bioburden spikes are responsible for the Poisson distribution, which is typical of bioburden data. Even as a larger sample size of bioburden becomes available, it still may not result in a normal distribution. The presence of one high value is sufficient to drive the data to not be normally distributed. If a normal distribution is desired, alternative statistical models can be proposed, such as utilizing transformation. However, if there are even a few large values seen in the data, a transformation could result in those, possibly important, outliers being missed. Further consideration should be made concerning the importance of the outliers verses the importance of normality, though it is in this statitician’s opinion that the inclusion of all empirical data is more important than achieving normality.

Figure 3. General Bioburden Distribution


par(mfrow=c(1,1))

plot(biob2$Col_Date, biob2$X104, type = "l", col = 2, xlab = "Date", ylab = "CFU/mL", main = "Bioburden Trend Analysis with Alert/Action Levels", ylim = c(0, 1800))
lines(biob2$Col_Date, biob2$X108, type = "l", col = 3)
lines(biob2$Col_Date, biob2$X111.01, type = "l", col = 4)
lines(biob2$Col_Date, biob2$X111.05, type = "l", col = 5)
lines(biob2$Col_Date, biob2$X111.06, type = "l", col = 6)
lines(biob2$Col_Date, biob2$X113.01, type = "l", col = 7)
lines(biob2$Col_Date, biob2$X113.02, type = "l", col = 8)
lines(biob2$Col_Date, biob2$X115, type = "l", col = 9)
lines(biob2$Col_Date, biob2$X117, type = "l", col = 10)
lines(biob2$Col_Date, biob2$X123, type = "l", col = 11)
lines(biob2$Col_Date, biob2$QC.Front, type = "l", col = 12)
lines(biob2$Col_Date, biob2$QC.Back, type = "l", col = 13)
lines(biob2$Col_Date, biob2$X172.Source, type = "l", col = 14)
lines(biob2$Col_Date, biob2$X172.Return, type = "l", col = 15)
abline(h=150.55, col="blue")
abline(h=199.17, col="red")
legend("topright",
       c("X104", "X108", "X111.01", "X111.05", "X111.06", "X113.01", "X113.02", "X115", "X117", "X123", "QC.Front", "QC.Back", "X172.Source", "X172.Return"),
       lty = 1, col = 2:15)

Table 1. Bioburden Estimates

Figure 4. Bioburden Alert / Action Levels

The above model resulted in alert levels being triggered frequently. Therefore, an alternative model from estimates using only QC.Back was used as QC.Back showed the highest mean bioburden scores. This statistical model was used for the remaining data sets as they too showed high data point spikes against low means. This model, however, still resulted in triggers due to the disproportionately high bioburden spikes seen at X123 and QC.Back.


plot(biob2$Col_Date, biob2$X104, type = "l", col = 2, xlab = "Date", ylab = "CFU/mL", main = "Bioburden Trend Analysis 2", ylim = c(0, 1800))
lines(biob2$Col_Date, biob2$X108, type = "l", col = 3)
lines(biob2$Col_Date, biob2$X111.01, type = "l", col = 4)
lines(biob2$Col_Date, biob2$X111.05, type = "l", col = 5)
lines(biob2$Col_Date, biob2$X111.06, type = "l", col = 6)
lines(biob2$Col_Date, biob2$X113.01, type = "l", col = 7)
lines(biob2$Col_Date, biob2$X113.02, type = "l", col = 8)
lines(biob2$Col_Date, biob2$X115, type = "l", col = 9)
lines(biob2$Col_Date, biob2$X117, type = "l", col = 10)
lines(biob2$Col_Date, biob2$X123, type = "l", col = 11)
lines(biob2$Col_Date, biob2$QC.Front, type = "l", col = 12)
lines(biob2$Col_Date, biob2$QC.Back, type = "l", col = 13)
lines(biob2$Col_Date, biob2$X172.Source, type = "l", col = 14)
lines(biob2$Col_Date, biob2$X172.Return, type = "l", col = 15)
abline(h=480.12, col="blue")
abline(h=608.18, col="red")
legend("topright",
       c("X104", "X108", "X111.01", "X111.05", "X111.06", "X113.01", "X113.02", "X115", "X117", "X123", "QC.Front", "QC.Back", "X172.Source", "X172.Return"),
       lty = 1, col = 2:15)

Figure 5. Alternative Bioburden Alert / Action Levels

Endotoxin

Analysis


endo <- read.csv("C:\\Users\\Greg Mack\\Documents\\Endotoxin.csv",header = TRUE)
endo2 <- endo[c(17:109,111:112),]
endo2$QC.Front <- as.numeric(biob2$QC.Front)
endo2$QC.Back <- as.numeric(biob2$QC.Back)

endo2$Col_Date <- Collect_date

par(mfrow=c(2,2))

hist(endo2$X104, xlab="EU", ylab = "Frequency", main = "X104 EU distribution")
hist(endo2$X108, xlab="EU", ylab = "Frequency", main = "X108 EU distribution")
hist(endo2$X111.01, xlab="EU", ylab = "Frequency", main = "X111.01 EU distribution")
hist(endo2$X111.05, xlab="EU", ylab = "Frequency", main = "X111.05 EU distribution")
hist(endo2$X111.06, xlab="EU", ylab = "Frequency", main = "X111.06 EU distribution")
hist(endo2$X113.01, xlab="EU", ylab = "Frequency", main = "X113.01 EU distribution")
hist(endo2$X113.02, xlab="EU", ylab = "Frequency", main = "X113.02 EU distribution")
hist(endo2$X115, xlab="EU", ylab = "Frequency", main = "X115 EU distribution")
hist(endo2$X117, xlab="EU", ylab = "Frequency", main = "X117 EU distribution")
hist(endo2$X123, xlab="EU", ylab = "Frequency", main = "X123 EU distribution")
hist(endo2$QC.Front, xlab="EU", ylab = "Frequency", main = "QC Front EU distribution")
hist(endo2$QC.Back, xlab="EU", ylab = "Frequency", main = "QC Back EU distribution")
hist(endo2$X172.Source, xlab="EU", ylab = "Frequency", main = "X172 Source EU distribution")
hist(endo2$X172.Return, xlab="EU", ylab = "Frequency", main = "X172 Return EU distribution")

boxplot(endo2$X104, ylab="EU")
boxplot(endo2$X108, ylab="EU")
boxplot(endo2$X111.01, ylab="EU")
boxplot(endo2$X111.05, ylab="EU")
boxplot(endo2$X111.06, ylab="EU")
boxplot(endo2$X113.01, ylab="EU")
boxplot(endo2$X113.02, ylab="EU")
boxplot(endo2$X115, ylab="EU")
boxplot(endo2$X117, ylab="EU")
boxplot(endo2$X123, ylab="EU")
boxplot(endo2$QC.Front, ylab="EU")


sapply(endo2[, c(2:15)], shapiro.test)


means_endo <- sapply(endo2[c(2:15)], mean, na.rm=TRUE)
stdDEVs_endo <- sapply(endo2[c(2:15)], sd, na.rm=TRUE)
maxs_endo <- sapply(endo2[c(2:15)], max, na.rm=TRUE)
mins_endo <- sapply(endo2[c(2:15)], min, na.rm=TRUE)
stdErrs_endo <- sapply(endo2[c(2:15)], SE, na.rm=TRUE)
Modes_endo <- sapply(endo2[,c(2:15)], mode)

df_endo <- cbind(means_endo, stdDEVs_endo, stdErrs_endo, mins_endo, maxs_endo, Modes_endo)

Estimates_endo <- rowSums(df_endo[, c(1,3)])
Estimates_2xSD_endo <- df_endo[, c(2)]*2 + df_endo[, c(6)]
Estimates_3xSD_endo <- df_endo[, c(2)]*3 + df_endo[, c(6)]
df_endo <- cbind(df_endo, Estimates_endo, Estimates_2xSD_endo, Estimates_3xSD_endo)
df_endo

Average_endo <- mean(means_endo)
StandardDeviation_endo <- sd(means_endo)
StandardError_endo <- Se(means_endo)
EstimateOverall_endo <- Average_endo + StandardError_endo
Estimate_2SD_endo <- EstimateOverall_endo + 2*StandardDeviation_endo
Estimate_3SD_endo <- EstimateOverall_endo + 3*StandardDeviation_endo

Levels_endo <- cbind(Average_endo, StandardDeviation_endo, StandardError_endo, EstimateOverall_endo, Estimate_2SD_endo, Estimate_3SD_endo)
Levels_endo

Results

Univariate analysis showed a non-normal distribution. A Sharpio_Wilk statistical test confirmed a non-normal distribution with respective p-values falling well below 0.05 cutoff and exceeded orders of 10^-5. The predominant mode per site was 1, with QC.Back being an outlier with a mean of 224 EU/mL QC.Front and QC.Back showed the highest spikes with 92 and 680 EU/ ML respectively.


par(mfrow=c(1,1))

plot(endo2$Col_Date, endo2$X104, type = "l", col = 2, xlab = "Date", ylab = "EU/mL", main = "Endotoxin Trend Analysis with Alert/Action Levels", ylim = c(0, 690))
lines(endo2$Col_Date, endo2$X108, type = "l", col = 3)
lines(endo2$Col_Date, endo2$X111.01, type = "l", col = 4)
lines(endo2$Col_Date, endo2$X111.05, type = "l", col = 5)
lines(endo2$Col_Date, endo2$X111.06, type = "l", col = 6)
lines(endo2$Col_Date, endo2$X113.01, type = "l", col = 7)
lines(endo2$Col_Date, endo2$X113.02, type = "l", col = 8)
lines(endo2$Col_Date, endo2$X115, type = "l", col = 9)
lines(endo2$Col_Date, endo2$X117, type = "l", col = 10)
lines(endo2$Col_Date, endo2$X123, type = "l", col = 11)
lines(endo2$Col_Date, endo2$QC.Front, type = "l", col = 12)
lines(endo2$Col_Date, endo2$QC.Back, type = "l", col = 13)
lines(endo2$Col_Date, endo2$X172.Source, type = "l", col = 14)
lines(endo2$Col_Date, endo2$X172.Return, type = "l", col = 15)
abline(h=480.12, col="blue")
abline(h=608.19, col="red")
legend("topright",
       c("X104", "X108", "X111.01", "X111.05", "X111.06", "X113.01", "X113.02", "X115", "X117", "X123", "QC.Front", "QC.Back", "X172.Source", "X172.Return"),
       lty = 1, col = 2:15)

Table 2. Endotoxin Estimates

Figure 6. Endotoxin Alert / Action Levels

toc <- read.csv("C:\\Users\\Greg Mack\\Documents\\toc.csv",header = TRUE)
toc2 <- toc[c(17:114,116:117),]
toc2$QC.Front <- as.numeric(toc2$QC.Front)
toc2$QC.Back <- as.numeric(toc2$QC.Back)

Collect_date <- dmy(toc2$ï..Collection.Date)
toc2$Col_Date <- Collect_date

par(mfrow=c(2,2))

hist(toc2$X104, xlab="ppbC", ylab = "Frequency", main = "X104 TOC distribution")
hist(toc22$X108, xlab="ppbC", ylab = "Frequency", main = "X108 TOC distribution")
hist(toc2$X111.01, xlab="ppbC", ylab = "Frequency", main = "X111.01 TOC distribution")
hist(toc2$X111.05, xlab="ppbC", ylab = "Frequency", main = "X111.05 TOC distribution")
hist(toc2$X111.06, xlab="ppbC", ylab = "Frequency", main = "X111.06 TOC distribution")
hist(toc2$X113.01, xlab="ppbC", ylab = "Frequency", main = "X113.01 TOC distribution")
hist(toc2$X113.02, xlab="ppbC", ylab = "Frequency", main = "X113.02 TOC distribution")
hist(toc2$X115, xlab="ppbC", ylab = "Frequency", main = "X115 TOC distribution")
hist(toc2$X117, xlab="ppbC", ylab = "Frequency", main = "X117 TOC distribution")
hist(toc2$X123, xlab="ppbC", ylab = "Frequency", main = "X123 TOC distribution")
hist(toc2$QC.Front, xlab="ppbC", ylab = "Frequency", main = "QC Front TOC distribution")
hist(toc2$QC.Back, xlab="ppbC", ylab = "Frequency", main = "QC Back TOC distribution")
hist(toc2$X172.Source, xlab="ppbC", ylab = "Frequency", main = "X172 Source TOC distribution")
hist(toc2$X172.Return, xlab="ppbC", ylab = "Frequency", main = "X172 Return TOC distribution")

boxplot(toc2$X104, ylab="ppbC")
boxplot(toc2$X108, ylab="ppbC")
boxplot(toc2$X111.01, ylab="ppbC")
boxplot(toc2$X111.05, ylab="ppbC")
boxplot(toc2$X111.06, ylab="ppbC")
boxplot(toc2$X113.01, ylab="ppbC")
boxplot(toc2$X113.02, ylab="ppbC")
boxplot(toc2$X115, ylab="ppbC")
boxplot(toc2$X117, ylab="ppbC")
boxplot(toc2$X123, ylab="ppbC")
boxplot(toc2$QC.Front, ylab="EU")


sapply(toc2[, c(2:15)], shapiro.test)


means_toc <- sapply(toc2[c(2:15)], mean, na.rm=TRUE)
stdDEVs_toc <- sapply(toc2[c(2:15)], sd, na.rm=TRUE)
maxs_toc <- sapply(toc2[c(2:15)], max, na.rm=TRUE)
mins_toc <- sapply(toc2[c(2:15)], min, na.rm=TRUE)
stdErrs_toc <- sapply(toc2[c(2:15)], SE, na.rm=TRUE)

df_toc <- cbind(means_toc, stdDEVs_toc, stdErrs_toc, mins_toc, maxs_toc)

Estimates_toc <- rowSums(df_toc[, c(1,3)])
df_toc <- cbind(df_toc, Estimates_toc)
Estimates_2xSD_toc <- df_toc[,c(2)]*2 + df_toc[, c(6)]
Estimates_3xSD_toc <- df_toc[, c(2)]*3 + df_toc[, c(6)]
df_toc <- cbind(df_toc, Estimates_2xSD_toc, Estimates_3xSD_toc)
df_toc

Average_toc <- mean(means_toc)
StandardDeviation_toc <- sd(means_toc)
StandardError_toc <- Se(means_toc)
EstimateOverall_toc <- Average_toc + StandardError_toc
Estimate_2SD_toc <- EstimateOverall_toc + 2*StandardDeviation_toc
Estimate_3SD_toc <- EstimateOverall_toc + 3*StandardDeviation_toc

Levels_toc <- cbind(Average_toc, StandardDeviation_toc, StandardError_toc, EstimateOverall_toc, Estimate_2SD_toc, Estimate_3SD_toc)
Levels_toc

Results

Univariate analysis showed a non-normal distribution. A Sharpio_Wilk statistical test confirmed a non-normal distribution with respective p-values falling well below 0.05 cutoff and exceeded orders of 10^-10. Total organic carbon showed much tighter and consistent means through the dataset but more aggressive spiking. The standard deviation of the means was 7.19 with a standard error of 1.02. No specific site stood out as OOT with the overall data. Estimates from X123 were used as it saw the highest mean. Even so, the a high predominance of TOC spikes caused the data to exceed these alert / action levels in multiple instances.

par(mfrow=c(1,1))

plot(toc2$Col_Date, toc2$X104, type = "l", col = 2, xlab = "Date", ylab = "ppbC", main = "TOC Trend Analysis with Alert/ Action Levels", ylim = c(0, 500))
lines(toc2$Col_Date, toc2$X108, type = "l", col = 3)
lines(toc2$Col_Date, toc2$X111.01, type = "l", col = 4)
lines(toc2$Col_Date, toc2$X111.05, type = "l", col = 5)
lines(toc2$Col_Date, toc2$X111.06, type = "l", col = 6)
lines(toc2$Col_Date, toc2$X113.01, type = "l", col = 7)
lines(toc2$Col_Date, toc2$X113.02, type = "l", col = 8)
lines(toc2$Col_Date, toc2$X115, type = "l", col = 9)
lines(toc2$Col_Date, toc2$X117, type = "l", col = 10)
lines(toc2$Col_Date, toc2$X123, type = "l", col = 11)
lines(toc2$Col_Date, toc2$QC.Front, type = "l", col = 12)
lines(toc2$Col_Date, toc2$QC.Back, type = "l", col = 13)
lines(toc2$Col_Date, toc2$X172.Source, type = "l", col = 14)
lines(toc2$Col_Date, toc2$X172.Return, type = "l", col = 15)
abline(h=264.04, col="blue")
abline(h=353.59, col="red")
legend("topright",
       c("X104", "X108", "X111.01", "X111.05", "X111.06", "X113.01", "X113.02", "X115", "X117", "X123", "QC.Front", "QC.Back", "X172.Source", "X172.Return"),
       lty = 1, col = 2:15)

Table 3. TOC Estimates

Figure 7. TOC Alert / Action Levels

Conductivity

Analysis


cond <- read.csv("C:\\Users\\Greg Mack\\Documents\\conductivity.csv",header = TRUE)
cond2 <- cond[c(17:109,111:112),]
cond2$QC.Front <- as.numeric(cond2$QC.Front)
cond2$QC.Back <- as.numeric(cond2$QC.Back)

Collect_date <- dmy(cond2$ï..Collection.Date)
cond2$Col_Date <- Collect_date

par(mfrow=c(2,2))

hist(cond2$X104, xlab="µS/cm", ylab = "Frequency", main = "X104 Conductivity distribution")
hist(cond22$X108, xlab="µS/cm", ylab = "Frequency", main = "X108 Conductivity distribution")
hist(cond2$X111.01, xlab="µS/cm", ylab = "Frequency", main = "X111.01 Conductivity distribution")
hist(cond2$X111.05, xlab="µS/cm", ylab = "Frequency", main = "X111.05 Conductivity distribution")
hist(cond2$X111.06, xlab="µS/cm", ylab = "Frequency", main = "X111.06 Conductivity distribution")
hist(cond2$X113.01, xlab="µS/cm", ylab = "Frequency", main = "X113.01 Conductivity distribution")
hist(cond2$X113.02, xlab="µS/cm", ylab = "Frequency", main = "X113.02 Conductivity distribution")
hist(cond2$X115, xlab="µS/cm", ylab = "Frequency", main = "X115 Conductivity distribution")
hist(cond2$X117, xlab="µS/cm", ylab = "Frequency", main = "X117 Conductivity distribution")
hist(cond2$X123, xlab="µS/cm", ylab = "Frequency", main = "X123 Conductivity distribution")
hist(cond2$QC.Front, xlab="µS/cm", ylab = "Frequency", main = "QC Front Conductivity distribution")
hist(cond2$QC.Back, xlab="µS/cm", ylab = "Frequency", main = "QC Back Conductivity distribution")
hist(cond2$X172.Source, xlab="µS/cm", ylab = "Frequency", main = "X172 Source Conductivity distribution")
hist(cond2$X172.Return, xlab="µS/cm", ylab = "Frequency", main = "X172 Return Conductivity distribution")

boxplot(cond2$X104, ylab="µS/cm")
boxplot(cond2$X108, ylab="µS/cm")
boxplot(cond2$X111.01, ylab="µS/cm")
boxplot(cond2$X111.05, ylab="µS/cm")
boxplot(cond2$X111.06, ylab="µS/cm")
boxplot(cond2$X113.01, ylab="µS/cm")
boxplot(cond2$X113.02, ylab="µS/cm")
boxplot(cond2$X115, ylab="µS/cm")
boxplot(cond2$X117, ylab="µS/cm")
boxplot(cond2$X123, ylab="µS/cm")
boxplot(cond2$QC.Front, ylab="(µS/cm")


sapply(cond2[, c(2:15)], shapiro.test)


means_cond <- sapply(cond2[c(2:15)], mean, na.rm=TRUE)
stdDEVs_cond <- sapply(cond2[c(2:15)], sd, na.rm=TRUE)
maxs_cond <- sapply(cond2[c(2:15)], max, na.rm=TRUE)
mins_cond <- sapply(cond2[c(2:15)], min, na.rm=TRUE)
stdErrs_cond <- sapply(cond2[c(2:15)], SE, na.rm=TRUE)

df_cond <- cbind(means_cond, stdDEVs_cond, stdErrs_cond, mins_cond, maxs_cond)

Estimates_cond <- rowSums(df_cond[, c(1,3)])
df_cond <- cbind(df_cond, Estimates_cond)
Estimates_2xSD_cond <- df_cond[,c(2)]*2 + df_cond[, c(6)]
Estimates_3xSD_cond <- df_cond[, c(2)]*3 + df_cond[, c(6)]
df_cond <- cbind(df_cond, Estimates_2xSD_cond, Estimates_3xSD_cond)
df_cond

Average_cond <- mean(means_cond)
StandardDeviation_cond <- sd(means_cond)
StandardError_cond <- Se(means_cond)
EstimateOverall_cond <- Average_cond + StandardError_cond
Estimate_2SD_cond <- EstimateOverall_cond + 2*StandardDeviation_cond
Estimate_3SD_cond <- EstimateOverall_cond + 3*StandardDeviation_cond

Levels_cond <- cbind(Average_cond, StandardDeviation_cond, StandardError_cond, EstimateOverall_cond, Estimate_2SD_cond, Estimate_3SD_cond)
Levels_cond

Results

Univariate analysis showed a some normality in the data but the data was still overall non-normally distribution. A Sharpio_Wilk statistical test confirmed a normal distribution at QC.Back and X172.Return but non-normality at all other sites. Estimates from X172.Return were used to create alert and action levels. Only X113.02 showed a spike exceeding the action level, and trend showed data points moving down in magnitude with time.

par(mfrow=c(1,1))

plot(cond2$Col_Date, cond2$X104, type = "l", col = 2, xlab = "Date", ylab = "ppbC", main = "Conductivity Trend Analysis with Alert/ Action Levels", ylim = c(0, 1.10))
lines(cond2$Col_Date, cond2$X108, type = "l", col = 3)
lines(cond2$Col_Date, cond2$X111.01, type = "l", col = 4)
lines(cond2$Col_Date, cond2$X111.05, type = "l", col = 5)
lines(cond2$Col_Date, cond2$X111.06, type = "l", col = 6)
lines(cond2$Col_Date, cond2$X113.01, type = "l", col = 7)
lines(cond2$Col_Date, cond2$X113.02, type = "l", col = 8)
lines(cond2$Col_Date, cond2$X115, type = "l", col = 9)
lines(cond2$Col_Date, cond2$X117, type = "l", col = 10)
lines(cond2$Col_Date, cond2$X123, type = "l", col = 11)
lines(cond2$Col_Date, cond2$QC.Front, type = "l", col = 12)
lines(cond2$Col_Date, cond2$QC.Back, type = "l", col = 13)
lines(cond2$Col_Date, cond2$X172.Source, type = "l", col = 14)
lines(cond2$Col_Date, cond2$X172.Return, type = "l", col = 15)
abline(h=0.88, col="blue")
abline(h=1.03, col="red")
legend("topright",
       c("X104", "X108", "X111.01", "X111.05", "X111.06", "X113.01", "X113.02", "X115", "X117", "X123", "QC.Front", "QC.Back", "X172.Source", "X172.Return"),
       lty = 1, col = 2:15)

Table 4. Conductivity Estimates

Figure 8. Conductivity Alert / Action Levels

Conclusion

Procedures for exceeded levels

Analysis of the historical data was able to identify Variability across sites, prevalence of spikes, and show potential problem sites (QC.Back, X123). The spikes in the data could be due to many things. For instance a bioburden agar plate may have growth covering the entire surface where distinct colonies cannot be enumerated. It is generally a good practice for these not be assigned a CFU value because there are too many CFUs to yield any meaningful count and using an arbitrarily assigned value beyond the countable range, such as 1000, would result in an under-estimation of the bioburden. Not allowing for an accurate count, it should be discarded when gathering historical data to establish bioburden levels but may have been included. While these can be indications of a bioburden problem, they are often an indication of a problem with the testing method. Values which are considered spikes or outliers (defined as greater than or equal to twice the mean) should be investigated. However they should not necessarily be included in the data, instead new samples from the same lot or family can be tested to determine if the spike is an actual representation of that lot or family or if it is a one-time or infrequent event. If it is determined that it is not a true values, then that value should be discarded. If the investigation determines that it is a true value, it may indicates either a bioburden problem or a testing process problem. Nevertheless, it is unwise to set alert and action levels while such a problem is present. If at all possible, the cause of the spike should be identifies and corrected first before continuing to set alert and action limits.

It is important to understand that even when a process is in control, there will be an occasional single value outside of the alert or action level. When a single value exceeds an action level, it is not expected that a long list of corrective actions be triggered, but it could be investigated.Generally, the main focus should be on trends, not individual values. However, determination of the causes of spikes (whether they be sampling or testing in nature) is important, such determinations fall outside the scope of this paper.

Investigations into the spikes in these areas is highly recommended, particularly into site QC.Back, as it contained disproportionately high data points in every EM category. It may also be worth reconsidering sample breaking the data into intervals, wherein each point on the x axis represents the mean. Such will dilute the affect of one high data point on a particular day, which, in turn, could result in more acceptable alert results. However, when considering how the data should best be represented, it is also important to not structure the data in such a way as to force an inference.

Long term Alert/Action levels

Bioburden

Ideally, consistency in water CFU across sites would be desired. Accomplishing this does not need to involve suspending operations, but opening investigations into the prevalent bioburden spikes. The following is a recommendation for establishing long term levels.

Consider re-evaluation of the data without QC.Back to determine alert/action levels that are more fitting of the overall data
Continue regular environmental testing
Begin an investigation into Bioburden spikes at X123 and QC.Back in order to reduce the predominance and severity of bioburden spikes
Determine the appropriate periodicity for evaluation and trend analysis
When trend analysis periodicity is established, perform regular trend analysis per set periodiciy to determine if established levels continue to remain appropriate as more historical data become available.
If the current levels become inappropriate,use means and standard deviation to establish appropriate water CFU alert / action levels

Naturally, the more data collected to establish the alert and action levels, the more representative the data will be. Moreover, with a small data set, the margin of error can be quite large. As more data are gathered, the margin of error will decrease. Below are ideal long term bioburden alert and action levels for reverse osmosis water.


longterm <- longterm <- read.csv("C:\\Users\\Greg Mack\\Documents\\bioburdenLongterm.csv",header = TRUE)
View(longterm)

Table 5. Ideal Bioburden Levels

Endotoxin

The following is a recommendation for establishing long term levels.

Consider re-evaluation of the data without QC.Back to determine alert/action levels that are more fitting of the overall data
Continue regular environmental testing
Begin an investigation into QC.Back endotoxin spikes
Determine the appropriate periodicity for evaluation and trend analysis
When trend analysis periodicity is established, perform regular trend analysis per set periodicity to determine if established levels continue to remain appropriate as more historical data become available.
If the current levels become inappropriate,use means and standard deviation to establish appropriate water CFU alert / action levels

Continue regular environmental testing
Begin an investigation into TOC spikes
Determine the appropriate periodicity for evaluation and trend analysis
When trend analysis periodicity is established, perform regular trend analysis per set periodicity to determine if established levels continue to remain appropriate as more historical data become available.
If the current levels become inappropriate,use means and standard deviation to establish appropriate water CFU alert / action levels

Conductivity

The following is a recommendation for establishing long term levels.

Continue regular environmental testing
Begin an investigation into spikes X113.02
Determine the appropriate periodicity for evaluation and trend analysis
When trend analysis periodicity is established, perform regular trend analysis per set periodicity to determine if established levels continue to remain appropriate as more historical data become available.
If the current levels become inappropriate,use means and standard deviation to establish appropriate water CFU alert / action levels

Recommendations for adressing exceeded alert / action levels

Reached alert level: 1) Initiate Investigation. 2) Conduct 3 consecutive repeat/follow up samples. 3) Upon completion of Investigation, if OOS result is determined, raise a deviation report

Reached action level: 1) Initiate Investigation. 2) Conduct 3 consecutive repeat/follow up samples. 3) Upon completion of Investigation, if OOS result is determined, raise a deviation report and determine a corrective and preventive action (CAPA)

References

Sterilization of medical devices — Microbiological methods — Part 1: Determination of a population of microorganisms on products, ANSI/AAMI/ISO 11737-1:2006, Arlington,VA, Association for the Advancement of Medical Instrumentation, 2006.

2.Sterilization of health care products — Radiation — Part 2: Establishing the sterilization dose, ANSI/AAMI/ISO 11137-2:2012, Arlington, VA, Association for the Advancement ofMedical Instrumentation, 2012.

3.PDA Journal of Pharmaceutical Science and Technology Fundamentals of an Environmental Monitoring Program. PDA Technical Report No. 13, September/October2001; vol. 55, No. 5.

4.The United States Pharmacopeia-National Formulary, <1116> Microbiological Evaluation of Clean Rooms and Other controlled Environments, USP 34-NF 29, May 1,2011- April 30, 2012; vol. 36 (6), pp. 633.

5.Guidance for Industry, Sterile Drug Products Produced by Aseptic Processing- Current Good Manufacturing Practice. Pharmaceutical CGMPs, September, 2004.

6.The United States Pharmacopeia-National Formulary, <1231> Water for Pharmaceutical Purposes , USP 34-NF 29, May 1, 2011- April 30, 2012; vol. 35(5) pp. 787.

Statistical Analysis of Historical Data and Recommendation for Alert and Action Levels

Gregory Mack

4/2/2020

Purpose

Scope

Introduction

Bioburden and Endotoxin

Purpose of bioburden and endotoxin testing

Legal basis of bioburden and endotoxin testing

TOC and Conductivity

Purpose of TOC and Conductivity testing

Statistical considerations for analysis

Methods

Bioburden

Analysis

Results

Endotoxin

Analysis

Results

TOC

Analysis

Results

Conductivity

Analysis

Results

Conclusion

Procedures for exceeded levels

Long term Alert/Action levels

Bioburden

Endotoxin

TOC

Conductivity

Recommendations for adressing exceeded alert / action levels

References