require(tidyverse)
require(cowplot)
old = theme_set(theme_cowplot())

Goal

The goal of this analysis is to use qPCR to measure the gene induction time course after phosphate starvation in Saccharomyces cerevisiae in comparison with Candida glabrata. In particular, we would like to resolve a previous contradiction where oxidative stress response genes were found to be induced upon phosphate starvation in S. cerevisiae in one experiment but not in another. Here we repeat the time course experiment in S. cerevisiae and C. glabrata side by side to see if the induction of OSR genes upon phosphate starvation is species-specific.

Experimental Design

experimental design Preparation: Cells were grown to log phase in synthetic complete (SC) medium with replete phosphate; Sample C1 was collected from this pre-stress condition; Switch media: Log phase cultures were then subjected to parallel media exchange via spin/filter procedure: experimental cultures were transferred to phosphate-depleted SC medium (-Pi), while control cultures were transferred to fresh phosphate-replete SC medium to control for media switch effects. The parallel phosphate-replete culture (+Pi) was then sampled and used as a secondary control (C2). Following media exchange, a time course was performed with samples collected at 30 minutes (E1 for -Pi condition) and 60 minutes (E2 for -Pi condition); for the parallel +Pi culture, an experimental sample was only taken at 30 minutes post-stress induction. In a separate experiment for oxidative stress analysis, log phase cells grown in SC medium were transferred via spin/filter to H2O2-treated medium (1 mM for S. cerevisiae, and 10 mM for C. glabrata). A positive control sample was collected immediately post-transfer (C1 control), followed by experimental samples at 30 minutes (E1) and 60 minutes (E2) of H2O2 exposure. All samples underwent total RNA isolation, cDNA synthesis, and RT-qPCR analysis. Representative amplification curves show threshold determination for calculating Ct values, with reactions performed in duplicate (2x).

Data

The qPCR data file is a tab-delimited text file with the following columns: Well, Row, Col, Sample, Species, Task, Treatment, Timepoint, Replicate, Target, Cq, Amp Status

  1. Sample (e.g. A1, B1, etc.,no RT,NTC)
  2. Species (Sc or Cg)
  3. Task (e.g. unknown, NTC, NoRT)
  4. Treatment (e.g. Pi, H2O2)
  5. Timepoint (e.g. 0,30,60)
  6. Replicate (e.g. 1,2)
  7. Target (e.g. ACT1, CTA1, TRX2, PHO84)
  8. Cq (Ct value)
  9. Amp Status (e.g. OK, No Amp, etc.)

Read & Format data

dat <- read_csv("../input/20251010-OSR-qPCR-Ct-raw-formatted.csv",
                col_types = cols()) |>
  mutate(Ct = na_if(Cq, "Undetermined") |> as.numeric())

Check to make sure the NTC are negative

dat %>% filter(Task %in% c("NTC", "NoRT")) |> 
                 select(Well, Sample, Task, Target, Ct, `Amp Status`)

There was one NTC reaction for PHO84 that had extremely low Ct. Double check

EDA and QC

Overview of Ct values

dat %>%
  filter(Task  == "Unknown") %>%
  mutate(label = paste(Species, Treatment, Timepoint, sep = "_")) %>% 
  ggplot(aes(x = label, Ct)) + geom_point(aes(color = Replicate), shape=5) + 
  facet_wrap( ~Target ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 10))

Looking at ACT1, we see that one sample, Cg_Pi_60m_b1, appears to have a problem (Ct value much higher than expected). We will examine this sample further

dat %>% filter(Sample == "Cg-Pi-60m-b1") %>% 
  select(Well, Sample, Task, Target, Ct, `Amp Status`)

This sample clearly failed. Will remove from further analyses

Also, we notice that ‘Cg-H2O2-30m-b2’ appears to be missing.

dat %>% filter(Sample == "Cg-H2O2-30m-b2") %>% 
  select(Well, Sample, Task, Target, Ct, `Amp Status`)

This sample failed as well. We will remove both failed samples from further analyses

filtered_dat <- dat %>% filter(!Sample %in% c("Cg-Pi-60m-b1", "Cg-H2O2-30m-b2"),
                               Task == "Unknown")

Check the experimental design and number of replicates per group after filtering

with(filtered_dat %>% mutate(Target = fct_recode(Target, CTT1 = "CTA1")),
     table(Species, paste(Treatment, Timepoint, sep = "_"), Target))
, , Target = ACT1

       
Species CTRL_0m H2O2_30m H2O2_60m Pi_30m Pi_60m
     Cg       3        3        6      6      3
     Sc       3        6        6      6      6

, , Target = CTT1

       
Species CTRL_0m H2O2_30m H2O2_60m Pi_30m Pi_60m
     Cg       3        3        6      6      3
     Sc       3        6        6      6      6

, , Target = PHO84

       
Species CTRL_0m H2O2_30m H2O2_60m Pi_30m Pi_60m
     Cg       3        3        6      6      3
     Sc       3        6        6      6      6

, , Target = TRX2-p1

       
Species CTRL_0m H2O2_30m H2O2_60m Pi_30m Pi_60m
     Cg       3        3        6      6      3
     Sc       3        6        6      6      6

, , Target = TRX2-p2

       
Species CTRL_0m H2O2_30m H2O2_60m Pi_30m Pi_60m
     Cg       0        3        6      6      3
     Sc       0        6        6      6      6
  1. TRX2-p2 was not run on the reference sample.
  2. Cg_H2O2_30m only has one replicate after filtering.
  3. Cg_Pi_60m only has one replicate after filtering.

ddCt analysis

Standard curve and efficiency calculation

To calculate efficiency, use linear regression \[ n_t = D_i n_0 E^{C_{T,i}} \]

We don’t have standard curves in this experiment. Skip

ddCt calculation

Set the reference gene and reference condition

ref.target <- "ACT1"
ref.treat <- "CTRL"
dilute <- 100 # all samples diluted to 100 fold in this experiment
# Mean and sd
# ---
# all samples are diluted to the same 100 fold, hence no need to back calculate here
# data1$Ct1 <- data1$Ct - log2(data1$Dilution/dilute)
# ---
average_Ct <- filtered_dat %>% 
  group_by(Species, Treatment, Timepoint, Replicate, Target) %>%
  summarize(Ct = Ct(Ct, na.rm=TRUE), sd = sd(Ct, na.rm=TRUE), .groups="drop")
Error in `summarize()`:
ℹ In argument: `Ct = Ct(Ct, na.rm = TRUE)`.
ℹ In group 1: `Species = "Cg"`, `Treatment = "CTRL"`, `Timepoint = "0m"`, `Replicate = "b1"`, `Target =
  "ACT1"`.
Caused by error in `Ct()`:
! could not find function "Ct"
Backtrace:
  1. ... %>% ...
  3. dplyr:::summarise.grouped_df(., Ct = Ct(Ct, na.rm = TRUE), sd = sd(Ct, na.rm = TRUE), .groups = "drop")
  4. dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
  6. dplyr:::map(quosures, summarise_eval_one, mask = mask)
  7. base::lapply(.x, .f, ...)
  8. dplyr (local) FUN(X[[i]], ...)
  9. mask$eval_all_summarise(quo)
 10. dplyr (local) eval()

Next, we group the data by {Species, Treatment, Timepoint, and Replicate}, and calculate ddCt relative to the reference target and then the reference sample.

dCt calculation

ddCt calculation.

Note that only one control sample was run and will be used as the reference sample for both biological replicates; also, while two sets of primers were used for TRX2, only one of them, p1, was tested on the reference sample. p2 will be dropped as a result.

Plot results

---
title: "qPCR analysis of Pi starvation time course to determine OSR gene induction"
date: 2025-10-21, updated on `r Sys.Date()`
author: Bin He
output: 
  html_notebook:
    toc: true
    toc_depth: 3
    toc_float: true
    theme: readable
    code_folding: hide
---

```{r}
require(tidyverse)
require(cowplot)
```

```{r}
old = theme_set(theme_cowplot())
```

## Goal
The goal of this analysis is to use qPCR to measure the gene induction time course after phosphate starvation in _Saccharomyces cerevisiae_ in comparison with _Candida glabrata_. In particular, we would like to resolve a previous contradiction where oxidative stress response genes were found to be induced upon phosphate starvation in _S. cerevisiae_ in one experiment but not in another. Here we repeat the time course experiment in _S. cerevisiae_ and _C. glabrata_ side by side to see if the induction of OSR genes upon phosphate starvation is species-specific. 

## Experimental Design
![experimental design](../asset/20251010-OSR-qRT-PCR-design.png)
**Preparation**: Cells were grown to log phase in synthetic complete (SC) medium with replete phosphate; **Sample C1** was collected from this pre-stress condition; **Switch media**: Log phase cultures were then subjected to parallel media exchange via spin/filter procedure: experimental cultures were transferred to phosphate-depleted SC medium (-Pi), while control cultures were transferred to fresh phosphate-replete SC medium to control for media switch effects. The parallel phosphate-replete culture (+Pi) was then sampled and used as a secondary control (C2). Following media exchange, a time course was performed with samples collected at 30 minutes (E1 for -Pi condition) and 60 minutes (E2 for -Pi condition); for the parallel +Pi culture, an experimental sample was only taken at 30 minutes post-stress induction. In a separate experiment for oxidative stress analysis, log phase cells grown in SC medium were transferred via spin/filter to H2O2-treated medium (1 mM for _S. cerevisiae_, and 10 mM for _C. glabrata_). A positive control sample was collected immediately post-transfer (C1 control), followed by experimental samples at 30 minutes (E1) and 60 minutes (E2) of H2O2 exposure. All samples underwent total RNA isolation, cDNA synthesis, and RT-qPCR analysis. Representative amplification curves show threshold determination for calculating Ct values, with reactions performed in duplicate (2x).

## Data
The qPCR data file is a tab-delimited text file with the following columns:
Well, Row, Col, Sample, Species, Task, Treatment, Timepoint, Replicate, Target, Cq, Amp Status

3. Sample       (e.g. A1, B1, etc.,no RT,NTC)
4. Species      (Sc or Cg)
5. Task         (e.g. unknown, NTC, NoRT)
6. Treatment    (e.g. Pi, H2O2)
7. Timepoint    (e.g. 0,30,60)
8. Replicate    (e.g. 1,2)
9. Target       (e.g. ACT1, CTA1, TRX2, PHO84)
10. Cq          (Ct value)
11. Amp Status  (e.g. OK, No Amp, etc.)

## Read & Format data
```{r}
dat <- read_csv("../input/20251010-OSR-qPCR-Ct-raw-formatted.csv",
                col_types = cols()) |>
  mutate(Ct = na_if(Cq, "Undetermined") |> as.numeric())
```

Check to make sure the NTC are negative
```{r}
dat %>% filter(Task %in% c("NTC", "NoRT")) |> 
                 select(Well, Sample, Task, Target, Ct, `Amp Status`)
```
> There was one NTC reaction for PHO84 that had extremely low Ct. Double check

## EDA and QC

### Overview of Ct values
```{r}
dat %>%
  filter(Task  == "Unknown") %>%
  mutate(label = paste(Species, Treatment, Timepoint, sep = "_")) %>% 
  ggplot(aes(x = label, Ct)) + geom_point(aes(color = Replicate), shape=5) + 
  facet_wrap( ~Target ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 10))
```

> Looking at ACT1, we see that one sample, Cg_Pi_60m_b1, appears to have a problem (Ct value much higher than expected). We will examine this sample further

```{r}
dat %>% filter(Sample == "Cg-Pi-60m-b1") %>% 
  select(Well, Sample, Task, Target, Ct, `Amp Status`)
```
> This sample clearly failed. Will remove from further analyses

Also, we notice that 'Cg-H2O2-30m-b2' appears to be missing.
```{r}
dat %>% filter(Sample == "Cg-H2O2-30m-b2") %>% 
  select(Well, Sample, Task, Target, Ct, `Amp Status`)
```
> This sample failed as well. We will remove both failed samples from further analyses

```{r}
filtered_dat <- dat %>% filter(!Sample %in% c("Cg-Pi-60m-b1", "Cg-H2O2-30m-b2"),
                               Task == "Unknown")
```

Check the experimental design and number of replicates per group after filtering
```{r}
with(filtered_dat %>% mutate(Target = fct_recode(Target, CTT1 = "CTA1")),
     table(Species, paste(Treatment, Timepoint, sep = "_"), Target))
```

> 1. TRX2-p2 was not run on the reference sample.
> 2. Cg_H2O2_30m only has one replicate after filtering.
> 3. Cg_Pi_60m only has one replicate after filtering.

## ddCt analysis
### Standard curve and efficiency calculation
To calculate efficiency, use linear regression
$$ n_t = D_i n_0 E^{C_{T,i}} $$ 

We don't have standard curves in this experiment. Skip
```{r echo=FALSE, include=FALSE}
# Standard curve
p <- ggplot( subset(data1, Well.Type == "std"), aes(log2(Dilution/dilute), Ct) ) 
p + geom_point() + geom_smooth(aes(group=1),method="lm") + facet_grid( Plate ~ Assay, labeller = label_both )

# Calc primer amplification efficiency
linmod <- function(df) {
  if( sum(!is.na(df$Ct)) == 0 ) { # if no non-NA data
    return( c("intercept"=NA, "slope"=NA, "p.value"=NA, "r.squared"=NA) )
  }
  lm <- lm( log2(Dilution/dilute) ~ Ct, data=df, na.action = "na.omit")
  intercept <- as.numeric(coef(lm)[1])
  slope <- as.numeric(coef(lm)[2])
  efficiency <- 2^(slope)
  sum <- summary(lm)
  p <- coef(sum)[2,4]
  R2 <- sum$r.squared
  return( c("efficiency"=efficiency, "slope"=slope, "p.value"=p, "r.squared"=R2) )
}
eff <- ddply( subset(data1,Well.Type == "std"), .(Plate, Assay), linmod)
print(eff, digits=3)
#cat("There are some problems with the 8 fold dilution in plate C.")
```

### ddCt calculation
Set the reference gene and reference condition
```{r}
ref.target <- "ACT1"
ref.treat <- "CTRL"
dilute <- 100 # all samples diluted to 100 fold in this experiment
```

```{r fig.width=9, fig.height=5, warning=FALSE}
# Mean and sd
# ---
# all samples are diluted to the same 100 fold, hence no need to back calculate here
# data1$Ct1 <- data1$Ct - log2(data1$Dilution/dilute)
# ---
average_Ct <- filtered_dat %>% 
  group_by(Species, Treatment, Timepoint, Replicate, Target) %>%
  summarize(Ct = mean(Ct, na.rm=TRUE), sd = sd(Ct, na.rm=TRUE), .groups="drop")
```

Next, we group the data by {Species, Treatment, Timepoint, and Replicate}, and calculate ddCt relative to the reference target and then the reference sample.

dCt calculation
```{r}
dCt <- average_Ct %>%
  select(-sd) %>% 
  group_by(Species, Treatment, Timepoint, Replicate) %>%
  mutate(dCt = Ct - Ct[Target == ref.target]) %>%
  filter(Target != ref.target) %>%
  ungroup()
```

ddCt calculation.

> Note that only one control sample was run and will be used as the reference sample
> for both biological replicates; also, while two sets of primers were used for TRX2,
> only one of them, p1, was tested on the reference sample. p2 will be dropped as a 
> result.

```{r}
ddCt <- dCt %>%
  filter(Target != "TRX2-p2") %>%
  group_by(Species, Target) %>%
  arrange(Species, Target) %>%
  mutate(ddCt = dCt - dCt[Treatment == ref.treat]) %>%
  filter(Treatment != ref.treat) %>%
  ungroup()
```

## Plot results
```{r}
species = c("Sc" = "_S. cerevisiae_", "Cg" = "_C. glabrata_")
treat = c("Pi" = "-Pi", "H2O2" = "H<sub>2</sub>O<sub>2</sub>")
ddCt %>% 
  mutate(Target = fct_recode(Target, TRX2 = "TRX2-p1"),
         Target = fct_relevel(Target, "CTA1", "CTT1", "TRX2", "PHO84"),
         Species = factor(Species, levels = names(species), labels = species),
         Treatment = factor(Treatment, levels = names(treat), labels = treat)
         ) %>%
  ggplot(aes(x = Target, y = -ddCt, group = Replicate)) +
  geom_bar(stat = "summary", fun = "mean", width = 0.45, 
           fill = "grey50", position = position_dodge(.5)) +
  geom_point(aes(color = Replicate), position = position_dodge(.5)) + 
  scale_color_manual(values = c("orange", "steelblue3")) +
  facet_grid(Treatment ~ Species, scales = "free_x") +
  theme_bw(base_size=14) +
  theme(
    strip.background = element_blank(),
    strip.text = element_markdown(size = rel(0.9))
  )
p + xlab("Time after starvation (min)") + ylab( "fold induction" ) + labs( title="Fold induction relative to ALG9 and pre-stress")

```
