Background

Quantitative real-time polymerase chain reaction (qPCR) can be used for different purposes, however, in the context of genetics it is often used to estimate differences in the expression of particular genes. Notably, high throughput sequencing technologies often include this method in their pipelines in order to increase the starting material and improve detection¹.

Two different strategies can be thought of when it comes to running qPCR experiments. In a less frequent scenario a standard curve is utilized to estimate the “exact” initial quantity of a particular gene. This method is known as absolute quantification. In some qPCR experiments, a standard curve is generated using known concentrations of a reference DNA. The Ct values of these standards are used to create a linear regression line, which is then used to estimate the initial amount of target DNA in the experimental samples. Conversely, relative quantification estimates the expression of specific target gene(s) in specific conditions (known as treatment group) relative to a reference gene (known as housekeeping genes) and a reference sample (frequently labeled as the control group)²^,³.

Biological concepts

Gene expression

A gene is expressed in accordance to its transcription rate of a particular cell or group of cells. The latter is thought to match the function of such cell and to satisfactorily represent its phenotype.

qPCR: molecular basis

In qPCR, an initial amount of DNA material is amplified in exactly the same way that traditional PCR works. The big difference is that qPCR incorporates real-time monitoring of the amplification process. At the core of the rational behind qPCR lies the relationship between the amount of starting target sequence(s) and the amount of produced material through a specific (and known) number of amplification cycles.

These cycles represent the molecular basis of qPCR and can be described through key steps of 15-30 seconds each:

1- DNA denaturation

In which, the high temperature (~95°C) causes the double-stranded DNA in the template to separate into two single strands, breaking hydrogen bonds and exposing the target region for amplification.

2- Primer annealing

In which, at lowr temperatures (50-65°C) primers (short DNA sequences) designed to be complementary to the target region anneal to their respective sites on the single-stranded DNA.

3- DNA extension (Elongation)

Herein, DNA polymerase synthesizes a complementary DNA strand by adding nucleotides to the 3’ end of each primer, extending the sequence. At the end of this phase, a fluorescent dye or probe specific to the amplified DNA binds to the newly synthesized DNA. Finally, Fluorescence emission is detected using a detector (fluorometer).

Data acquisition and analysis

The fluorescence intensity is recorded at the end of each cycle and this variable is used to estimate the expression in downstream analysis. There are some characteristics of this fluorescence that are noteworthy.

For starters, the amplification line follows a sigmoidal-like behavior. but both the beginning and the end of the curve are not reliable. At the beginning, there is a relative absence of amplification due to the low amount of DNA which results in a fluorescence signal indistinguishable from background noise. Moreover, the DNA polymerase may be still in the process of activation and the primers are just binding to the template DNA. On the other hand, at the end there are no reagents left which leads to a plateau. A typical behavior can be observed in Figure 1A. Usually, as part of the initial quality check and data processing a baseline is taken into account to adjust the fluorescence signal which becomes then $\Delta Rn$. The transformation is depicted in figure 1B.

Delta delta method

Essentially, a threshold cycle (Ct) is established as the cycle at which the fluorescence reaches certain value that represent a reliable increase in the amount of the gene (amplicons). In other words, when the fluorescent signal is significantly above the background signal. And, of paramount importance: the Ct is inversely proportional to the original relative expression level of the gene of interest.

Specifically, in ThermoFisher the Cq, which is the cycle established as the threshold, can correspond either to Ct or Crt. In the first case, Ct is determined through the baseline threshold algorithm, whereas, the Crt is the result of the Relative Threshold algorithm.

The baseline threshold algorithm is an expression estimation algorithm that subtracts a baseline component and sets a fluorescence threshold in the exponential region, which can be observed in Figure 1. First, a baseline is determined as the background noise level from the initial cycles (before the amplification occurs). This is done with the help of a control reporter dye and it corresponds to the average of fluorescence signal over the cycles defined to be in the initial phase of the plot. In Figure 1 we selected 10 cycles. In theory, this initial phase ends in beginning of the exponential phase of the amplification curve. Indeed, the Ct is defined as the cycle in which the fluorescence level goes above the previously defined level⁴.

On the other hand, the relative threshold algorithm estimates a threshold for each curve individually and it does it based on the shape of the amplification curve, independently from the baseline fluorescence. For more details check [ref].

The Ct is, no brainer, at the very core of the $-Ct $ method. In mathematics $\Delta$ frequently represents a difference, just as it does in this case. Remember that our objective is to estimate the relative quantification of a gene in comparison to a reference gene (housekeeping genes). To do so we will determine the differences between the differences (not repeated) between Ct of a target gene and the Ct of a housekeeping gene in a target sample relative to a control sample. This can of course be divided into two steps:

1- First difference ($\Delta Ct$)

This corresponds to the difference between the Ct of a target gene and a housekeeping gene. So, within each treatment group (condition, level of a factor, etc.) of interest:

\[\Delta Ct=Ct\:\: of \:a\:target\:gene\:-Ct\:\: of \:a\:housekeeping\:gene\:\]

2- Second difference ($\Delta \Delta Ct$)

This corresponds to the difference between the $\Delta Ct$ of the treatment of interest and the $\Delta Ct$ of the reference group. A difference in differences.

3- power up the difference of differences ($2^{-\Delta \Delta Ct}$)

Finally we exponentiate $2$ to the $-\Delta \Delta Ct$ which serves as an estimation of the relative quantification of the target gene in the target condition in comparison to the control condition taking into account the amount of a housekeeping gene.

The exponential function is denoted by $f(x) = exp(x)$ or $e^x$ where $x$ is an exponent and $e$ is the base number. There are some properties of this measure that are worth mentioning to grasp the logic behind using it in this context.

Firstly, when we assume that the efficiency of the qPCR is $100%$, which is what most researchers do when analyzing relative quantification, the initial material would double after each cycle and this would be adequately captured by the exponential expression $2^n$. This can be seen in Figure 2.A.

Secondly, to ease the graphical representation of the the amplification many softwares produce figures in which the fluorescence (which serves as an approximation of the quantity of the genetic material) is transformed to a logarithmic scale. Just, as a reminder (or an useful note) the logarithmic scaleis somewhat the opposite of the exponential one when the base number is shared $(exp\:(1)\approx2.72 \:and\:log\:(2.72)\approx1)$. That is why when we plot the amplification curve using a log base 2 transformation of the material we get Figure 2.C with $y=x$.

Finally, derived from the first aspect that we mentioned, exponentiating 2 to the power of the difference $(\Delta \Delta Ct)$ will estimate a value of material in a target gene when the housekeeping value is one. One because dividing two exponentials expressions with the same base leads to subtract the two exponents and leave the base number as such. Then, using 2 as the base number, exponentiating to $(\Delta \Delta Ct)$ is the same as saying $\frac{2^{\Delta Ct\:[In\:target\:condition]}}{2^{\Delta Ct\:[In\:control\:condition]}}$. The latter corresponds to a ratio and in the ratio world 1 represents equality (as oppose to 0 in the subtraction world). And, as we are subtracting in both the numerator and the denominator the amount of housekeeping gene, this quantification are relative to those values (which will be equal to 1). The last point concerns the minus symbol. This is established as such because, as we mentioned above, the Ct is inversely proportional to the original relative expression level of the genetic material. So, a greater positive difference implies lesser starting quanitty of the target gene. The minus will invert this interpretation.

Statistical analysis

How to analyse the data to identify “significant” differences between the quantity of specific genes in specific conditions relative to a control is a heated debate (although by only a few).Even which variable should be explored is not straight forward. Although most research papers perform ANOVA or t-tests and adjust by multiple comparisons on the relative quantification values there are various reasons that make other alternatives perhaps more suited to do such analyses [Yuan]. Although take a look at [Levik].

Just to establish a clear context, there are two main models that are used to estimate the relative quantification: the efficiency calibrated model (Introduced by Michael Pfaffl) and the $(\Delta \Delta Ct)$ model. However, the former is a more generalized model of the second [Yuan]. Indeed, in the efficiency calibrated model the relative quantification is estimated as follows:

\[Relative\:quantification\:=\frac{(Efficiency\:\:in\:\:target\:\:gene)^{\:\:\Delta CT_{\:target\:gene}}}{(Efficiency\:\:in\:\:reference\:\:gene)^{\:\:\Delta CT_{\:reference\:gene}}}\] Thus, if the efficiencies are equal, due the properties of the ratios of exponential functions, the relative quantification would be equal to the base number(in this case 2 because the efficiency is “perfect” and the material doubles each cycle) exponentiated to the power of the subtractions of the two exponents, a.k.a $2^{\Delta CT_{Target} - \Delta CT_{reference}}$ which is the same as $2^{\Delta \Delta CT}$.

The Pfaffl model, however, requires a step to determine the efficiency, which in turn, requires to run a series of dilutions and determine the standard curve. Although, this is recommended it is of course more demanding. Therefore, most qPCR are intended to be analysed without knowing the efficiencies but assuming they are equal (polemic).

Once you obtained the CT values you can already perform the analyses. No need to perform any additional transformation, however, some like to show plots of the relative quantification in different logarithmic scales. As long are the stats are correctly done (or at least the choices justified) it really does not matter how you choose to present your data.

Classically, relative quantification measures are compared through t-tests and ANOVA, nonetheless, personally, I consider that a regression strategy can increase the power and the efficiency of the analysis and that to perform it using the untransformed CT values is quite enough. In particular mixed models might be an interesting alternative. Linear if you assume (or prove) that your independent variable follows a normal distribution (although more importantly, the normality should be checked on the residual and the pertinence of the model should be evaluated through a diagnostic process). On the other hand, generalized linear mixed models can be a more flexible tool when is hard to compromise with these assumptions.

Noteworthy, the way we analyse the data depends on, well, the data and its complexity. If your design is straight forward and there is no reason to think that different wells are correlated, then, a simple regression would do the job.

Now we will see a real case study and analyse the data generated by the thermofisher machine.

Case study with R

Description of the case:

# This sets the directory in which my xls files, produced by thermo, are stored
setwd("C:/Users/artur/OneDrive/Escritorio/HNeruStres/Qpcr/")

Loading packages

# Loading necessary packages 
library(ggplot2)# For plots
library(forcats)# Data shaping
library(readxl)# To import files
library(dplyr)# Data manipulation

Importing and preparing data for analysis

# Creating the data.frame

# 1) Importing the 4 files corresponding to the 4 runs of the PCR machine
trial1 <- read_excel("Comparative-Ct-SYBR-1.xls", sheet = "Results", skip = 37,n_max = 385)
trial2 <- read_excel("Comparative-Ct-SYBR-2.xls", sheet = "Results", skip = 37,n_max = 193)
trial3 <- read_excel("Comparative-Ct-SYBR-2.xls", sheet = "Results", skip = 37,n_max = 49)
trial4 <- read_excel("Comparative-Ct-SYBR-2.xls", sheet = "Results", skip = 37,n_max = 37)
# 2)  Selecting only three variables of interest 
# "Sample Name" (in this case corresponds to the genes)
# "Target Name" (in this case corresponds to the conditions)
# "CT" (value generated by the Quantstudio analysis through the basline threshold algorithm)
# And finally, we add a batch variable for every run of the machine.
qPCR1 <- trial1[c(4,5,15)];qPCR1$batch <- "SYBR1"
qPCR2 <- trial2[c(4,5,15)];qPCR2$batch <- "SYBR2"
qPCR3 <- trial3[c(4,5,15)];qPCR3$batch <- "SYBR3"
qPCR4 <- trial4[c(4,5,15)];qPCR4$batch <- "SYBR4"
# 3) Preparing the data.frame to work with.
# We append the four data.frames into a big one ("qPCR")
qPCR<- rbind(qPCR1,qPCR2,qPCR3,qPCR4)
# We create a condition variable merging all the biological replicates (3 per condition)
qPCR$condition <- gsub('[[:digit:]]+', '', qPCR$`Target Name`)
# Rename the variables for the ease of the manipulation
names(qPCR)[c(1,2)] <- c("Sample","Target")
# Fixing some typos
qPCR$condition[qPCR$condition=="estrodial"] <- "estradiol"
qPCR$condition[qPCR$condition=="progesterone "] <- "progesterone"
qPCR$condition[qPCR$condition=="estrodial"] <- "estradiol"
qPCR$Sample[qPCR$Sample=="INF"] <- "IFN"

Applying the Delta delta method

Given that we have a lot of conditions with several replicates we are going to generate summary statistics per gene for every condition. For instance, we determine the mean and the standard deviation of the Ct values of GAPDH accross replicates in non-treated (NT) wells, as well as in the Dexamethasone (Dex) treated wells and so forth.

# Generate statistics per target per group 
qPCR <- qPCR %>% left_join(aggregate(CT ~ condition+Sample, qPCR, mean), by= c("condition", "Sample"))
qPCR <- qPCR %>% left_join(aggregate(CT.x ~ condition+Sample, qPCR, sd), by= c("condition", "Sample"))
names(qPCR)[c(3,6,7)] <- c("CT","mean","sd"); qPCR_pruned <- qPCR[,c(-2,-3,-4)][!duplicated(qPCR[,c(-2,-3,-4)]),]

Subsequently, we calculate the $\Delta Ct$ of the mean of the Ct value in target gene in a specific condition in comparison to two housekeeping genes. As the values of these two housekeeping genes was very similar we considered that there were no differences choosing one or the other as a reference (Figure 3). To perform the analysis with several housekeeping genes take a look at paper of Riedel and colleagues¹.

Correlation between two housekeeping genes: GAPDH and RPLP0

ggplot(qPCR[qPCR$Sample=="GAPDH"|qPCR$Sample=="RPLP0",], aes(x=CT, y=CT, color=condition))+
geom_point()+
labs(title = " ",
       x = "CT",
       y = "CT")+
  scale_x_continuous(limits = c(10, 20),breaks=c(seq(10,20,2)))+
  scale_y_continuous(limits = c(10, 20),breaks=c(seq(10,20,2)))+
  theme_classic()+
  theme(axis.title.x = element_text(size=14))+
  theme(axis.title.y = element_text(size=14, vjust = 2))+
  theme(axis.text.x = element_text(size = 13))+
  theme(axis.text.y = element_text(size = 13))+
  theme(plot.title = element_text(size=14, face = "bold"), 
        legend.text = element_text(size=13), 
        legend.title =element_text(size=13, face = "bold"))+
  theme_classic()+
  facet_wrap(~Sample)

We chose GAPDH as the reference housekeeping genes, however we calculated the $\Delta Ct$ for both of them. In this case the mean of the replicates of GAPDH or RPLP0is subtracted from the mean of every gene within every condition.

# Calculating Deltas with two references genes
qPCR_pruned_2 <- qPCR_pruned  %>%
    group_by(condition) %>%
    mutate(deltaC_GADPH = mean - mean[Sample == 'GAPDH'])

qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(condition) %>%
    mutate(deltaC_RPLP0 = mean - mean[Sample == 'RPLP0'])

We then proceeded to calculate $\Delta \Delta Ct$ for each condition and its corresponding control. The latter corresponds to the vehicle used in the administration of the different exposures. For instance, LPS is administered with saline solution and Dihydrotestosterone (DHT) with methanol (MEOH).

# Calculating deltas deltas according to 4 different reference groups
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_d_NT = deltaC_GADPH - deltaC_GADPH[condition=="NT"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_d_ETOH = deltaC_GADPH - deltaC_GADPH[condition=="ETOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_d_MEOH = deltaC_GADPH - deltaC_GADPH[condition=="MEOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_d_Saline = deltaC_GADPH - deltaC_GADPH[condition=="saline"])

As the last step in the $\Delta \Delta Ct$ method we estimate the relative quantification exponentiating 2 to the $-\Delta \Delta Ct$ obtained.

# Estimating relative quantification according to the four references
qPCR_pruned_2$RQ_NT<- 2^-(qPCR_pruned_2$delta_d_NT)  
qPCR_pruned_2$RQ_ETOH <- 2^-(qPCR_pruned_2$delta_d_ETOH)  
qPCR_pruned_2$RQ_MEOH <- 2^-(qPCR_pruned_2$delta_d_MEOH)  
qPCR_pruned_2$RQ_saline <- 2^-(qPCR_pruned_2$delta_d_Saline)

Given that we want to see the variability of the relative quantification (not necessarily, depends on how you perform your statistical analysis) we repeat the process for the standard deviation (upper and lower limits separately). - Controversial I should check this-

############ VARIABILITY ##########
qPCR_pruned_2$lower_mean <- qPCR_pruned_2$mean- qPCR_pruned_2$sd
qPCR_pruned_2$higher_mean <- qPCR_pruned_2$mean + qPCR_pruned_2$sd
##### Lower_mean ######
# Calculating Deltas with two references genes

qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(condition) %>%
    mutate(deltaC_lower_GADPH = lower_mean - lower_mean[Sample == 'GAPDH'])

# Calculating deltas deltas of sd according to the 4 different reference groups
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_lower_NT_GADPH = deltaC_lower_GADPH - deltaC_lower_GADPH[condition=="NT"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_lower_ETOH_GADPH = deltaC_lower_GADPH - deltaC_lower_GADPH[condition=="ETOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_lower_MEOH_GADPH = deltaC_lower_GADPH - deltaC_lower_GADPH[condition=="MEOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_lower_Saline_GADPH = deltaC_lower_GADPH - deltaC_lower_GADPH[condition=="saline"])

# Estimating relative quantification according to the four references
qPCR_pruned_2$RQ_min_NT<- 2^-(qPCR_pruned_2$delta_lower_NT_GADPH)  
qPCR_pruned_2$RQ_min_ETOH <- 2^-(qPCR_pruned_2$delta_lower_ETOH_GADPH)  
qPCR_pruned_2$RQ_min_MEOH <- 2^-(qPCR_pruned_2$delta_lower_MEOH_GADPH)  
qPCR_pruned_2$RQ_min_saline <- 2^-(qPCR_pruned_2$delta_lower_Saline_GADPH)  

##### higher_mean ######
# Calculating Deltas with two references genes
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(condition) %>%
    mutate(deltaC_higher_GADPH = higher_mean - higher_mean[Sample == 'GAPDH'])

# Calculating deltas deltas of sd according to the 4 different reference groups
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_higher_NT_GADPH = deltaC_higher_GADPH - deltaC_higher_GADPH[condition=="NT"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_higher_ETOH_GADPH = deltaC_higher_GADPH - deltaC_higher_GADPH[condition=="ETOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_higher_MEOH_GADPH = deltaC_higher_GADPH - deltaC_higher_GADPH[condition=="MEOH"])
qPCR_pruned_2 <- qPCR_pruned_2  %>%
    group_by(Sample) %>%
    mutate(delta_higher_Saline_GADPH = deltaC_higher_GADPH - deltaC_higher_GADPH[condition=="saline"])

# Estimating relative quantification according to the four references
qPCR_pruned_2$RQ_max_NT<- 2^-(qPCR_pruned_2$delta_higher_NT_GADPH)  
qPCR_pruned_2$RQ_max_ETOH <- 2^-(qPCR_pruned_2$delta_higher_ETOH_GADPH)  
qPCR_pruned_2$RQ_max_MEOH <- 2^-(qPCR_pruned_2$delta_higher_MEOH_GADPH)  
qPCR_pruned_2$RQ_max_saline <- 2^-(qPCR_pruned_2$delta_higher_Saline_GADPH)

Results

Now we check the relative quantification of all the conditions in comparison to the non-treated group.

### 
qPCR_pruned_2$Sample_fac <- factor(qPCR_pruned_2$Sample, levels = c("GAPDH", "RPLP0", "FKBP5",
                                                                    "GR","IFN","AR","ER","PR"))
qPCR_pruned_2$condition_fac <- factor(qPCR_pruned_2$condition, levels = c("NT", "ETOH", "MEOH",
                                                                    "saline","Dex","LPS","estradiol","DHT",
                                                                    "progesterone"))

qPCR_pruned_2[qPCR_pruned_2$condition_fac!="NT" & qPCR_pruned_2$condition_fac!="saline" ,] %>%
  ggplot(aes(x = condition_fac, y=RQ_NT,fill=Sample_fac)) +
  labs(x = " ")+
  geom_errorbar(aes(ymin=RQ_min_NT, ymax=RQ_max_NT), width=.6,
                position=position_dodge(.9))+
  guides(fill = guide_legend(title = "Genes"))+
  scale_y_continuous(limits=c(0,6), breaks = c(0,1,2,3,4,5,6))+
  geom_bar(position="dodge", stat="identity",colour="black") +
  geom_hline(yintercept = 1,linetype="dashed", color="Red", size=1.1)+
  scale_y_continuous(limits = c(0,4), expand = c(0, 0)) +
  theme_classic()+
  ylab("Relative Quantification")+
  ggtitle("qPCR analysis relative \n to the non-treated group") +
  theme(axis.title.y = element_text(size=15, vjust = 2))+
  theme(axis.text.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 14))+
  theme(plot.title = element_text(size=22, hjust = 0.5, vjust = -4))

Then we visualize specific condition to their respective controls.

LPS vs saline

#### 
qPCR_pruned_2[qPCR_pruned_2$condition_fac=="LPS",] %>%
  ggplot(aes(x = condition_fac, y=RQ_saline,fill=Sample_fac)) +
  labs(x = " ")+
  geom_errorbar(aes(ymin=RQ_min_saline, ymax=RQ_max_saline), width=.6,
                position=position_dodge(.9))+
  guides(fill = guide_legend(title = "Genes"))+
  scale_y_continuous(limits=c(0,6), breaks = c(0,1,2,3,4,5,6))+
  geom_bar(position="dodge", stat="identity",colour="black") +
  geom_hline(yintercept = 1,linetype="dashed", color="Red", size=1.1)+
  scale_y_continuous(limits = c(0,4), expand = c(0, 0)) +
  theme_classic()+
  ylab("Relative Quantification")+
  ggtitle("qPCR analysis relative \n to the saline group") +
  theme(axis.title.y = element_text(size=15, vjust = 2))+
  theme(axis.text.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 14))+
  theme(plot.title = element_text(size=22, hjust = 0.5, vjust = -4))

2) Dex, estradiol and progesterone vs MEOH

###
qPCR_pruned_2[qPCR_pruned_2$condition_fac!="ETOH" & qPCR_pruned_2$condition_fac!="LPS" 
              & qPCR_pruned_2$condition_fac!="MEOH" & qPCR_pruned_2$condition_fac!="DHT"
              & qPCR_pruned_2$condition_fac!="saline"& qPCR_pruned_2$condition_fac!="NT",] %>%
  ggplot(aes(x = fct_infreq(condition_fac), y=RQ_ETOH,fill=Sample_fac)) +
  labs(x = " ")+
  geom_errorbar(aes(ymin=RQ_min_ETOH, ymax=RQ_max_ETOH), width=.6,
                position=position_dodge(.9))+
  guides(fill = guide_legend(title = "Genes"))+
  scale_y_continuous(limits=c(0,6), breaks = c(0,1,2,3,4,5,6))+
  geom_bar(position="dodge", stat="identity",colour="black") +
  geom_hline(yintercept = 1,linetype="dashed", color="Red", size=1.1)+
  scale_y_continuous(limits = c(0,4), expand = c(0, 0)) +
  theme_classic()+
  ylab("Relative Quantification")+
  ggtitle("qPCR analysis relative \n to the ETOH group") +
  theme(axis.title.y = element_text(size=15, vjust = 2))+
  theme(axis.text.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 14))+
  theme(plot.title = element_text(size=22, hjust = 0.5, vjust = -4))

DHT vs MEOH

### 
qPCR_pruned_2[qPCR_pruned_2$condition_fac!="ETOH" & qPCR_pruned_2$condition_fac!="LPS" 
              & qPCR_pruned_2$condition_fac!="MEOH" & qPCR_pruned_2$condition_fac!="Dex"
              & qPCR_pruned_2$condition_fac!="saline"& qPCR_pruned_2$condition_fac!="NT"
              & qPCR_pruned_2$condition_fac!="progesterone"& qPCR_pruned_2$condition_fac!="estradiol",] %>%
  ggplot(aes(x = fct_infreq(condition_fac), y=RQ_MEOH,fill=Sample_fac)) +
  labs(x = " ")+
  geom_errorbar(aes(ymin=RQ_min_MEOH, ymax=RQ_max_MEOH), width=.6,
                position=position_dodge(.9))+
  guides(fill = guide_legend(title = "Genes"))+
  scale_y_continuous(limits=c(0,6), breaks = c(0,1,2,3,4,5,6))+
  geom_bar(position="dodge", stat="identity",colour="black") +
  geom_hline(yintercept = 1,linetype="dashed", color="Red", size=1.1)+
  scale_y_continuous(limits = c(0,4), expand = c(0, 0)) +
  theme_classic()+
  ylab("Relative Quantification")+
  ggtitle("qPCR analysis relative \n to the MEOH group") +
  theme(axis.title.y = element_text(size=15, vjust = 2))+
  theme(axis.text.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 14))+
  theme(plot.title = element_text(size=22, hjust = 0.5, vjust = -4))

Statistical analysis

Quality check

This first step is to check the trend of quantification throughout the cycles. It should have a negative slope (around -1).

ggplot(qPCR_pruned_2,aes(x=log2(RQ_NT),y=mean, color=Sample))+
  geom_point(size=2.5)+
  geom_line(size=1)+
  labs(title = " ",
       x = "Log2 (Relative quantification to NT)",
       y = "CT")+
  scale_color_discrete(name="Gene")+
  theme(axis.title.x = element_text(size=14))+
  theme(axis.title.y = element_text(size=14, vjust = 2))+
  theme(axis.text.x = element_text(size = 13))+
  theme(axis.text.y = element_text(size = 13))+
  theme(plot.title = element_text(size=14, face = "bold"), 
        legend.text = element_text(size=13), 
        legend.title =element_text(size=13, face = "bold"))+
  theme_classic()

The curves kind of make sense (some would suggest to make an statistical test to see if the slope is no signifcantly different from 0).

Independent variable visualization

Although we already checked the transformed data, given that the analyses will be performed on the raw CT values, we want to check a little bit their distribution.

# As usual we create a backup data.frame 
qPCR_analysis <- qPCR
# Factorize and establish an order for the gene and the condition levels
qPCR_analysis$Sample_fac <- factor(qPCR_analysis$Sample, levels = c("GAPDH", "RPLP0", "FKBP5",
                                                                    "GR","IFN","AR","ER","PR"))
qPCR_analysis$condition_fac <- factor(qPCR_analysis$condition, levels = c("NT", "ETOH", "MEOH",
                                                                    "saline","Dex","LPS","estradiol","DHT",
                                                                    "progesterone"))
# We add a control vs treatment variable for visualization
qPCR_analysis$grouped_condition <- NA
qPCR_analysis$grouped_condition[qPCR_analysis$condition_fac!="ETOH" & qPCR_analysis$condition_fac!="MEOH" 
                                & qPCR_analysis$condition_fac!="saline"& qPCR_analysis$condition_fac!="NT"] <- "Treatment"
qPCR_analysis$grouped_condition[is.na(qPCR_analysis$grouped_condition)] <- "Control"
qPCR_analysis$for_geom <- paste0(qPCR_analysis$condition,"_",qPCR_analysis$Sample)
# Create a boxplot for every gene
ggplot(qPCR_analysis, aes(x=condition_fac, y=CT, color=Sample_fac ))+
  geom_point(position = position_jitterdodge(dodge.width = 0.1,jitter.width = 0.1, jitter.height = 0.1),aes(group=Sample_fac))+
  geom_boxplot(aes(x=condition_fac), position = position_dodge(0.1))+
  geom_smooth(aes(group=Sample_fac))+
    labs(title = " ",
       x = "Condition",
       y = "CT")+
  scale_color_discrete(name="Gene")+theme_classic2()+
  theme(axis.title.x = element_text(size=16, vjust = -1))+
  theme(axis.title.y = element_text(size=16, vjust = 2))+
  theme(axis.text.x = element_text(size = 15))+
  theme(axis.text.y = element_text(size = 15))+
  theme(plot.title = element_text(size=17, face = "bold"), 
        legend.text = element_text(size=15), 
        legend.title =element_text(size=16, face = "bold"))+
  facet_wrap(~grouped_condition, scales = "free_x")+
  theme(strip.text.x = element_text(size = 15))

As we can see, in general terms we see pretty much the same thing that we saw with the relative quantification plots. Dexamethasone lowers the FKBP5 curve, but the rest of conditions or genes do not seem to have a major effect. And there are no important differences between the control samples.

Multiple linear regression

Now we proceed to make the formal analysis. First we will implement a multiple “fixed effect” linear model and then a more complex mixed linear model. Nonetheless, a more thorough analysis could be implemented if the distribution of CT was taking into account.

#### Statistical analysis
library(sjPlot)
qPCR_analysis$C_ <- qPCR_analysis$condition_fac
qPCR_analysis$G_ <- qPCR_analysis$Sample_fac
# considering that relative quantification is not the measure per se I consider go with linear 
# regression with the independent variable as Ct within each group

# All vs NT
lm_all <- lm(CT~C_*G_, qPCR_analysis[qPCR_analysis$C_!="ETOH" & qPCR_analysis$C_!="saline" & qPCR_analysis$C_!="MEOH",])

Multiple mixed linear regression

library(lme4)
crossedmm <- lmer(CT ~ C_*G_ +
                    (1|C_) +
                    (1|G_),
                  qPCR_analysis[qPCR_analysis$C_!="ETOH" & qPCR_analysis$C_!="saline" & qPCR_analysis$C_!="MEOH",])

tab_model(lm_all, crossedmm,auto.label = F,CSS = list(css.centeralign = 'text-align: center;',
                                              css.arc = 'color:blue;'))

	CT			CT
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	16.77	16.27 – 17.28	<0.001	16.77	-1.55 – 35.10	0.073
C_Dex	-0.36	-0.93 – 0.20	0.203	-0.36	-1.01 – 0.28	0.264
C_LPS	-0.27	-0.81 – 0.26	0.314	-0.27	-0.89 – 0.34	0.383
C_estradiol	-0.00	-0.56 – 0.56	1.000	-0.00	-0.64 – 0.64	1.000
C_DHT	-0.06	-0.63 – 0.50	0.822	-0.06	-0.70 – 0.58	0.844
C_progesterone	-0.59	-1.15 – -0.03	0.040	-0.59	-1.23 – 0.05	0.071
G_RPLP0	0.08	-0.64 – 0.79	0.835	0.08	-25.84 – 25.99	0.995
G_FKBP5	7.49	6.78 – 8.20	<0.001	7.49	-18.42 – 33.41	0.570
G_GR	5.34	4.63 – 6.05	<0.001	5.34	-20.58 – 31.25	0.686
G_IFN	9.97	9.26 – 10.68	<0.001	9.97	-15.95 – 35.88	0.450
G_AR	8.82	8.11 – 9.53	<0.001	8.82	-17.09 – 34.74	0.504
G_ER	10.33	9.62 – 11.04	<0.001	10.33	-15.59 – 36.24	0.434
G_PR	11.61	10.90 – 12.32	<0.001	11.61	-14.31 – 37.52	0.379
C_Dex:G_RPLP0	-0.10	-0.89 – 0.70	0.806	-0.10	-0.89 – 0.70	0.806
C_LPS:G_RPLP0	-0.21	-0.97 – 0.55	0.588	-0.21	-0.97 – 0.55	0.588
C_estradiol:G_RPLP0	0.08	-0.71 – 0.88	0.843	0.08	-0.71 – 0.88	0.843
C_DHT:G_RPLP0	-0.27	-1.06 – 0.53	0.510	-0.27	-1.06 – 0.53	0.510
C_progesterone:G_RPLP0	0.00	-0.79 – 0.80	0.996	0.00	-0.79 – 0.80	0.996
C_Dex:G_FKBP5	-1.38	-2.17 – -0.58	0.001	-1.38	-2.17 – -0.58	0.001
C_LPS:G_FKBP5	0.34	-0.41 – 1.09	0.375	0.34	-0.41 – 1.09	0.375
C_estradiol:G_FKBP5	0.33	-0.46 – 1.13	0.412	0.33	-0.46 – 1.13	0.412
C_DHT:G_FKBP5	0.02	-0.77 – 0.82	0.955	0.02	-0.77 – 0.82	0.955
C_progesterone:G_FKBP5	0.55	-0.25 – 1.34	0.176	0.55	-0.25 – 1.34	0.176
C_Dex:G_GR	0.47	-0.33 – 1.26	0.248	0.47	-0.33 – 1.26	0.248
C_LPS:G_GR	0.27	-0.49 – 1.02	0.485	0.27	-0.49 – 1.02	0.485
C_estradiol:G_GR	0.21	-0.59 – 1.00	0.612	0.21	-0.59 – 1.00	0.612
C_DHT:G_GR	0.09	-0.71 – 0.88	0.826	0.09	-0.71 – 0.88	0.826
C_progesterone:G_GR	0.40	-0.39 – 1.20	0.320	0.40	-0.39 – 1.20	0.320
C_Dex:G_IFN	0.86	0.07 – 1.66	0.033	0.86	0.07 – 1.66	0.033
C_LPS:G_IFN	0.25	-0.51 – 1.00	0.520	0.25	-0.51 – 1.00	0.520
C_estradiol:G_IFN	0.20	-0.60 – 0.99	0.627	0.20	-0.60 – 0.99	0.627
C_DHT:G_IFN	0.03	-0.77 – 0.82	0.945	0.03	-0.77 – 0.82	0.945
C_progesterone:G_IFN	1.06	0.27 – 1.86	0.009	1.06	0.27 – 1.86	0.009
C_Dex:G_AR	0.06	-0.73 – 0.86	0.876	0.06	-0.73 – 0.86	0.876
C_LPS:G_AR	-0.38	-1.13 – 0.38	0.329	-0.38	-1.13 – 0.38	0.329
C_estradiol:G_AR	-0.02	-0.82 – 0.77	0.960	-0.02	-0.82 – 0.77	0.960
C_DHT:G_AR	-0.07	-0.86 – 0.73	0.870	-0.07	-0.86 – 0.73	0.870
C_progesterone:G_AR	-0.22	-1.02 – 0.57	0.583	-0.22	-1.02 – 0.57	0.583
C_Dex:G_ER	0.54	-0.26 – 1.33	0.186	0.54	-0.26 – 1.33	0.186
C_LPS:G_ER	0.06	-0.70 – 0.81	0.884	0.06	-0.70 – 0.81	0.884
C_estradiol:G_ER	-0.10	-0.90 – 0.69	0.797	-0.10	-0.90 – 0.69	0.797
C_DHT:G_ER	-0.06	-0.86 – 0.73	0.876	-0.06	-0.86 – 0.73	0.876
C_progesterone:G_ER	0.91	0.12 – 1.71	0.025	0.91	0.12 – 1.71	0.025
C_Dex:G_PR	1.04	0.24 – 1.83	0.011	1.04	0.24 – 1.83	0.011
C_LPS:G_PR	0.04	-0.71 – 0.80	0.912	0.04	-0.71 – 0.80	0.912
C_estradiol:G_PR	0.48	-0.31 – 1.28	0.232	0.48	-0.31 – 1.28	0.232
C_DHT:G_PR	0.15	-0.65 – 0.94	0.713	0.15	-0.65 – 0.94	0.713
C_progesterone:G_PR	0.96	0.16 – 1.75	0.018	0.96	0.16 – 1.75	0.018
Random Effects
σ²				0.20
τ₀₀				86.96 _{G_}
				0.01 _{C_}
ICC				1.00
N				6 _{C_}
				8 _{G_}
Observations	590			590
R² / R² adjusted	0.991 / 0.990			0.181 / 0.998

Vs ETOH

crossedmm_etoh <- lmer(CT ~ C_*G_ +
                    (1|C_) +
                    (1|G_),
                  qPCR_analysis[qPCR_analysis$C_=="ETOH" | qPCR_analysis$C_=="Dex"| qPCR_analysis$C_=="estradiol"| qPCR_analysis$C_=="progesterone",])

tab_model(crossedmm_etoh, auto.label = F, CSS = list(css.centeralign = 'text-align: center;'))

	CT
Predictors	Estimates	CI	p
(Intercept)	16.61	1.57 – 31.65	0.031
C_Dex	-0.20	-0.90 – 0.50	0.574
C_estradiol	0.16	-0.54 – 0.87	0.645
C_progesterone	-0.43	-1.13 – 0.28	0.234
G_RPLP0	-0.03	-21.30 – 21.24	0.998
G_FKBP5	7.89	-13.38 – 29.15	0.466
G_GR	5.62	-15.65 – 26.89	0.603
G_IFN	10.46	-10.81 – 31.73	0.334
G_AR	8.76	-12.50 – 30.03	0.418
G_ER	10.93	-10.34 – 32.20	0.312
G_PR	12.09	-9.18 – 33.35	0.264
C_Dex:G_RPLP0	0.01	-0.82 – 0.83	0.988
C_estradiol:G_RPLP0	0.19	-0.64 – 1.01	0.657
C_progesterone:G_RPLP0	0.11	-0.71 – 0.93	0.797
C_Dex:G_FKBP5	-1.77	-2.59 – -0.95	<0.001
C_estradiol:G_FKBP5	-0.06	-0.88 – 0.76	0.886
C_progesterone:G_FKBP5	0.16	-0.67 – 0.98	0.708
C_Dex:G_GR	0.18	-0.64 – 1.01	0.660
C_estradiol:G_GR	-0.08	-0.90 – 0.74	0.850
C_progesterone:G_GR	0.12	-0.70 – 0.94	0.776
C_Dex:G_IFN	0.37	-0.45 – 1.20	0.371
C_estradiol:G_IFN	-0.29	-1.12 – 0.53	0.482
C_progesterone:G_IFN	0.57	-0.25 – 1.40	0.171
C_Dex:G_AR	0.12	-0.70 – 0.94	0.774
C_estradiol:G_AR	0.04	-0.79 – 0.86	0.930
C_progesterone:G_AR	-0.16	-0.99 – 0.66	0.693
C_Dex:G_ER	-0.07	-0.89 – 0.75	0.868
C_estradiol:G_ER	-0.71	-1.53 – 0.11	0.090
C_progesterone:G_ER	0.31	-0.52 – 1.13	0.463
C_Dex:G_PR	0.56	-0.26 – 1.38	0.183
C_estradiol:G_PR	0.01	-0.82 – 0.83	0.989
C_progesterone:G_PR	0.48	-0.34 – 1.30	0.252
Random Effects
σ²	0.21
τ₀₀ _{G_}	58.29
τ₀₀ _{C_}	0.02
ICC	1.00
N _{C_}	4
N _{G_}	8
Observations	312
Marginal R² / Conditional R²	0.256 / 0.997

Vs MEOH

crossedmm_meoh <- lmer(CT ~ C_*G_ +
                    (1|C_) +
                    (1|G_),
                  qPCR_analysis[qPCR_analysis$C_=="MEOH" | qPCR_analysis$C_=="DHT",])

tab_model(crossedmm_meoh, auto.label = F, CSS = list(css.centeralign = 'text-align: center;'))

	CT
Predictors	Estimates	CI	p
(Intercept)	16.72	2.48 – 30.96	0.022
C_DHT	-0.01	-0.48 – 0.46	0.964
G_RPLP0	-0.15	-20.28 – 19.99	0.988
G_FKBP5	7.87	-12.27 – 28.00	0.440
G_GR	5.61	-14.53 – 25.75	0.582
G_IFN	10.33	-9.81 – 30.46	0.311
G_AR	9.07	-11.06 – 29.21	0.374
G_ER	10.68	-9.45 – 30.82	0.295
G_PR	12.52	-7.62 – 32.65	0.220
C_DHT:G_RPLP0	-0.04	-0.45 – 0.36	0.837
C_DHT:G_FKBP5	-0.35	-0.75 – 0.06	0.091
C_DHT:G_GR	-0.18	-0.59 – 0.22	0.371
C_DHT:G_IFN	-0.33	-0.74 – 0.07	0.106
C_DHT:G_AR	-0.32	-0.72 – 0.09	0.126
C_DHT:G_ER	-0.42	-0.82 – -0.01	0.043
C_DHT:G_PR	-0.76	-1.17 – -0.36	<0.001
Random Effects
σ²	0.05
τ₀₀ _{G_}	51.50
τ₀₀ _{C_}	0.02
ICC	1.00
N _{C_}	2
N _{G_}	8
Observations	120
Marginal R² / Conditional R²	0.270 / 0.999

Vs saline

crossedmm_saline <- lmer(CT ~ C_*G_ +
                    (1|C_) +
                    (1|G_), data = qPCR_analysis[qPCR_analysis$C_=="saline" | qPCR_analysis$C_=="LPS",])

tab_model(crossedmm_saline, auto.label = F, CSS = list(css.centeralign = 'text-align: center;'))

	CT
Predictors	Estimates	CI	p
(Intercept)	16.69	1.39 – 31.98	0.033
C_LPS	-0.19	-1.09 – 0.72	0.684
G_RPLP0	-0.03	-21.65 – 21.59	0.998
G_FKBP5	8.23	-13.39 – 29.85	0.454
G_GR	5.93	-15.69 – 27.55	0.589
G_IFN	10.77	-10.85 – 32.38	0.327
G_AR	9.21	-12.41 – 30.82	0.402
G_ER	11.25	-10.36 – 32.87	0.306
G_PR	12.85	-8.77 – 34.46	0.243
C_LPS:G_RPLP0	-0.10	-0.92 – 0.72	0.806
C_LPS:G_FKBP5	-0.40	-1.22 – 0.42	0.339
C_LPS:G_GR	-0.32	-1.14 – 0.50	0.437
C_LPS:G_IFN	-0.55	-1.37 – 0.27	0.187
C_LPS:G_AR	-0.76	-1.58 – 0.06	0.069
C_LPS:G_ER	-0.87	-1.69 – -0.05	0.037
C_LPS:G_PR	-1.20	-2.02 – -0.38	0.005
Random Effects
σ²	0.23
τ₀₀ _{G_}	59.97
τ₀₀ _{C_}	0.06
ICC	1.00
N _{C_}	2
N _{G_}	8
Observations	206
Marginal R² / Conditional R²	0.238 / 0.997

References

Riedel, G. et al. An extended $\Delta$CT-method facilitating normalisation with multiple reference genes suited for quantitative RT-PCR analyses of human hepatocyte-like cells. PLoS ONE 9, (2014).

Arya, M. et al. Basic principles of real-time quantitative PCR. Expert Review of Molecular Diagnostics 5, 209–219 (2005).

Rao, X., Huang, X., Zhou, Z. & Lin, X. An improvement of the 2ˆ(-delta delta CT) method for quantitative real-time polymerase chain reaction data analysis. Biostatistics, bioinformatics and biomathematics 3, 71–85 (2013).

Li, X. et al. qPCRtools: An R package for qPCR data processing and visualization. Frontiers in Genetics 13, (2022).

qPCR - Analysis

Arturo Marroquín

2023-11-21

Background

Biological concepts

Gene expression

qPCR: molecular basis

Data acquisition and analysis

Delta delta method

Statistical analysis

Case study with R

Loading packages

Importing and preparing data for analysis

Applying the Delta delta method

Correlation between two housekeeping genes: GAPDH and RPLP0

Results

Statistical analysis

Quality check

Independent variable visualization

Multiple linear regression

Multiple mixed linear regression

Vs ETOH

Vs MEOH

Vs saline

References