This project, Using CRISPRa to Investigate the Role of SCRIB Homolog Upregulation in Yeast as a Model for Cancer Therapeutics, focuses on upregulating the BEM1 gene in Saccharomyces cerevisiae using CRISPR activation (CRISPRa). BEM1 is a homolog for the tumour suppressor gene SCRIB, two processes frequently disrupted in cancer progression. The study employs a dead Cas9 (dCas9) fused with transcriptional activators like VP64 to enhance BEM1 expression, simulating therapeutic strategies to restore tumour suppressor function. Guide RNAs (gRNAs) targeting the BEM1 promoter were designed and validated using tools like CHOPCHOP and CRISPOR, ensuring high specificity (MIT and CFD scores), balanced efficiency (Doench and Mor.-Mateos scores), and minimal off-target risks.
# Guide RNA Analysis and validation for CRISPRra upregulation of BEM1 in Saccharomyces cerevisiae (yeast). 

# This analysis aims to identify the optimal guide RNA sequence among the top four ranked guides, focusing on specificity, efficiency, and off-target effects.

# Load necessary libraries
install.packages("ggcorrplot")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
library(ggcorrplot)
## Loading required package: ggplot2
library(tidyr)
library(readxl)      # For reading Excel files
library(dplyr)       # For data manipulation
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)     # For data visualisation
library(knitr)       # For rendering tables in Markdown
library(kableExtra)  # For enhanced table aesthetics
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(gridExtra)   # For multi-plot layout
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
# Load the data
gRNA_data <- read_csv("Guide_RNA_Analysis_BEM1.xlsx - Sheet1.csv")
## Rows: 3 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Target Sequence, Genomic Location, Strand, Off-Targets, Restrictio...
## dbl (13): Rank, GC Content (%), Self-Complementarity, MM0, MM1, MM2, MM3, Ef...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the data table
kable(gRNA_data) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank Target Sequence Genomic Location Strand GC Content (%) Self-Complementarity MM0 MM1 MM2 MM3 Efficiency MIT Specificity Score CFD Specificity Score Doench ’16 Efficiency Mor.-Mateos Efficiency Doench RuleSet3 Score Off-Targets Restriction Enzymes
1 CTAAACGGACAAATGGCGAAGGG chrII:620692
45 3 0 0 0 0 65.85 100 100 66 34 15 0 None
2 TTCCTGTTCGTAAATGAATGGGG chrII:620715
35 0 0 0 0 0 60.31 100 100 60 18 -60 2 (CPA2, COQ6) TspDTI
3 CCTAAACGGACAAATGGCGAAGG chrII:620691
50 3 0 0 0 0 61.09 100 100 61 23 15 0 None
# Summary statistics of the dataset
summary(gRNA_data)
##       Rank     Target Sequence    Genomic Location      Strand         
##  Min.   :1.0   Length:3           Length:3           Length:3          
##  1st Qu.:1.5   Class :character   Class :character   Class :character  
##  Median :2.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2.0                                                           
##  3rd Qu.:2.5                                                           
##  Max.   :3.0                                                           
##  GC Content (%)  Self-Complementarity      MM0         MM1         MM2   
##  Min.   :35.00   Min.   :0.0          Min.   :0   Min.   :0   Min.   :0  
##  1st Qu.:40.00   1st Qu.:1.5          1st Qu.:0   1st Qu.:0   1st Qu.:0  
##  Median :45.00   Median :3.0          Median :0   Median :0   Median :0  
##  Mean   :43.33   Mean   :2.0          Mean   :0   Mean   :0   Mean   :0  
##  3rd Qu.:47.50   3rd Qu.:3.0          3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
##  Max.   :50.00   Max.   :3.0          Max.   :0   Max.   :0   Max.   :0  
##       MM3      Efficiency    MIT Specificity Score CFD Specificity Score
##  Min.   :0   Min.   :60.31   Min.   :100           Min.   :100          
##  1st Qu.:0   1st Qu.:60.70   1st Qu.:100           1st Qu.:100          
##  Median :0   Median :61.09   Median :100           Median :100          
##  Mean   :0   Mean   :62.42   Mean   :100           Mean   :100          
##  3rd Qu.:0   3rd Qu.:63.47   3rd Qu.:100           3rd Qu.:100          
##  Max.   :0   Max.   :65.85   Max.   :100           Max.   :100          
##  Doench '16 Efficiency Mor.-Mateos Efficiency Doench RuleSet3 Score
##  Min.   :60.00         Min.   :18.0           Min.   :-60.0        
##  1st Qu.:60.50         1st Qu.:20.5           1st Qu.:-22.5        
##  Median :61.00         Median :23.0           Median : 15.0        
##  Mean   :62.33         Mean   :25.0           Mean   :-10.0        
##  3rd Qu.:63.50         3rd Qu.:28.5           3rd Qu.: 15.0        
##  Max.   :66.00         Max.   :34.0           Max.   : 15.0        
##  Off-Targets        Restriction Enzymes
##  Length:3           Length:3           
##  Class :character   Class :character   
##  Mode  :character   Mode  :character   
##                                        
##                                        
## 
# Plot Doench '16, Mor.-Mateos, and RuleSet3 efficiency scores
p1 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Doench '16 Efficiency`, fill = factor(Rank))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Doench '16 Efficiency", x = "Guide Rank", y = "Efficiency Score") +
  theme_minimal()

p2 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Mor.-Mateos Efficiency`, fill = factor(Rank))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Mor.-Mateos Efficiency", x = "Guide Rank", y = "Efficiency Score") +
  theme_minimal()

p3 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Doench RuleSet3 Score`, fill = factor(Rank))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Doench RuleSet3 Score", x = "Guide Rank", y = "Efficiency Score") +
  theme_minimal()

# Arrange the plots side by side
grid.arrange(p1, p2, p3, ncol = 3)

Figure 1 illustrates the efficiency scores for three ranked guide RNAs (gRNAs) targeting the BEM1 promoter, evaluated using three different algorithms: Doench ’16 Efficiency, Mor.-Mateos Efficiency, and Doench RuleSet3 Score. The x-axis represents the guide rank (1, 2, and 3), with each rank, colour-coded (red for Rank 1, green for Rank 2, and blue for Rank 3), while the y-axis indicates the efficiency score predicted by each algorithm. The Doench ’16 Efficiency panel consistently scores high for all ranks, with Rank 1 achieving the highest, suggesting strong activation potential. The Mor.-Mateos Efficiency panel reveals moderate scores, with Rank 1 outperforming the others. The Doench RuleSet3 Score panel highlights significant variability, with Rank 2 showing a negative score, indicating potential inefficiency, while Rank 1 and Rank 3 have neutral to slightly positive values.
# Display off-target information
off_target_summary <- gRNA_data %>%
  select(Rank, `Off-Targets`)

kable(off_target_summary, col.names = c("Guide Rank", "Off-Target Summary")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Guide Rank Off-Target Summary
1 0
2 2 (CPA2, COQ6)
3 0
Table 1 summarises the off-target information for the three ranked guide RNAs (gRNAs) targeting the BEM1 promoter in yeast. The first column, Guide Rank, indicates the ranking of each gRNA (1, 2, or 3) based on specificity and efficiency metrics. The second column, Off-Target Summary, provides the number and details of off-target effects detected for each gRNA. Guide Rank 1 and 3 show no off-targets, making them ideal candidates for precise CRISPRa experiments with minimal unintended genome editing. In contrast, Guide Rank 2 has two identified off-targets in the exonic regions of the CPA2 and COQ6 genes, which may introduce unintended effects.
# Plot GC content
ggplot(gRNA_data, aes(x = factor(Rank), y = `GC Content (%)`, fill = factor(Rank))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "GC Content by Guide Rank", x = "Guide Rank", y = "GC Content (%)") +
  theme_minimal()

Figure 2 bar chart illustrates the GC content (%) for each guide RNA (gRNA) ranked for targeting the BEM1 promoter in yeast. The x-axis represents the Guide Rank, which ranks gRNAs based on specificity, efficiency, and off-target analysis (Rank 1, 2, and 3). The y-axis indicates the GC Content (%), a critical factor influencing the stability and binding efficiency of gRNAs. Rank 1 has a GC content of approximately 45%, Rank 2 exhibits a lower GC content of around 35%, and Rank 3 shows the highest GC content at 50%.
# Plot self-complementarity
ggplot(gRNA_data, aes(x = factor(Rank), y = `Self-Complementarity`, fill = factor(Rank))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Self-Complementarity by Guide Rank", x = "Guide Rank", y = "Self-Complementarity") +
  theme_minimal()

Figure 3 bar chart depicts the Self-Complementarity scores for guide RNAs (gRNAs) ranked for targeting the BEM1 promoter in yeast. The x-axis represents the Guide Rank (Rank 1, 2, and 3), indicating the hierarchical order based on specificity, efficiency, and off-target analysis. The y-axis shows the Self-Complementarity, a measure of the tendency of a gRNA to form secondary structures, such as hairpins, due to intramolecular binding. Rank 1 and 3 exhibit a self-complementarity score of 3, indicating a moderate likelihood of forming such structures, while Rank 2 has a score of 0, suggesting no risk of secondary structure formation.
# Calculate correlation matrix
cor_matrix <- cor(gRNA_data %>% select(`MIT Specificity Score`, `CFD Specificity Score`, `Doench '16 Efficiency`, `Mor.-Mateos Efficiency`, `Doench RuleSet3 Score`, `GC Content (%)`))
## Warning in cor(gRNA_data %>% select(`MIT Specificity Score`, `CFD Specificity
## Score`, : the standard deviation is zero
# Plot correlation matrix
ggcorrplot(cor_matrix, lab = TRUE, title = "Correlation Matrix of Guide RNA Metrics")

Figure 4 is a heatmap that visualises the correlation matrix of various metrics used to evaluate guide RNAs (gRNAs) for CRISPRa targeting the BEM1 promoter in yeast. The x-axis and y-axis represent the compared metrics, including MIT Specificity Score, CFD Specificity Score, Doench ’16 Efficiency, Mor.-Mateos Efficiency, Doench RuleSet3 Score, and GC Content (%). The colour scale indicates the strength and direction of the correlation, ranging from red (positive correlation) to blue (negative correlation).
# Filter the data to identify the best guide RNA
# Criteria: MIT Specificity Score >= 98, CFD Specificity Score = 100, Doench '16 Efficiency > 60, and no off-targets
best_guide <- gRNA_data %>%
  filter(`MIT Specificity Score` >= 98 & 
         `CFD Specificity Score` == 100 & 
         `Doench '16 Efficiency` > 60 & 
         `Off-Targets` == "0")

# Display the best guide in a formatted table
kable(best_guide, col.names = c("Rank", "Target Sequence", "Genomic Location", "Strand", "GC Content (%)",
                                "Self-Complementarity", "MM0", "MM1", "MM2", "MM3", 
                                "Efficiency", "MIT Specificity Score", "CFD Specificity Score", 
                                "Doench '16 Efficiency", "Mor.-Mateos Efficiency", 
                                "Doench RuleSet3 Score", "Off-Targets", "Restriction Enzymes")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank Target Sequence Genomic Location Strand GC Content (%) Self-Complementarity MM0 MM1 MM2 MM3 Efficiency MIT Specificity Score CFD Specificity Score Doench ’16 Efficiency Mor.-Mateos Efficiency Doench RuleSet3 Score Off-Targets Restriction Enzymes
1 CTAAACGGACAAATGGCGAAGGG chrII:620692
45 3 0 0 0 0 65.85 100 100 66 34 15 0 None
3 CCTAAACGGACAAATGGCGAAGG chrII:620691
50 3 0 0 0 0 61.09 100 100 61 23 15 0 None
Table 2 highlights the best guide RNA (gRNA) identified for CRISPRa experiments targeting the BEM1 promoter in yeast. The Rank column represents the priority of the gRNA, with Rank 1 being the most optimal. Target Sequence lists the 23-nucleotide sequence of the gRNA, while Genomic Location and Strand specify the exact position and strand targeted within the yeast genome. GC Content (%) and Self-Complementarity are important metrics affecting the stability and binding Efficiency of the gRNA. Key efficiency scores include Doench ’16 Efficiency (66 for Rank 1, 61 for Rank 3), indicating moderate-to-high activation potential. MIT Specificity Score and CFD Specificity Score are 100 for these guides, reflecting perfect specificity and minimal off-target risks. Both guides have 0 off-targets, ensuring precise targeting. Restriction Enzymes list whether any enzymes overlap the sequence; here, no restriction enzymes.