This project, Using CRISPRa to Investigate the Role of SCRIB Homolog
Upregulation in Yeast as a Model for Cancer Therapeutics, focuses on
upregulating the BEM1 gene in Saccharomyces cerevisiae using CRISPR
activation (CRISPRa). BEM1 is a homolog for the tumour suppressor gene
SCRIB, two processes frequently disrupted in cancer progression. The
study employs a dead Cas9 (dCas9) fused with transcriptional activators
like VP64 to enhance BEM1 expression, simulating therapeutic strategies
to restore tumour suppressor function. Guide RNAs (gRNAs) targeting the
BEM1 promoter were designed and validated using tools like CHOPCHOP and
CRISPOR, ensuring high specificity (MIT and CFD scores), balanced
efficiency (Doench and Mor.-Mateos scores), and minimal off-target
risks.
# Guide RNA Analysis and validation for CRISPRra upregulation of BEM1 in Saccharomyces cerevisiae (yeast).
# This analysis aims to identify the optimal guide RNA sequence among the top four ranked guides, focusing on specificity, efficiency, and off-target effects.
# Load necessary libraries
install.packages("ggcorrplot")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
library(ggcorrplot)
## Loading required package: ggplot2
library(tidyr)
library(readxl) # For reading Excel files
library(dplyr) # For data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # For data visualisation
library(knitr) # For rendering tables in Markdown
library(kableExtra) # For enhanced table aesthetics
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(gridExtra) # For multi-plot layout
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Load the data
gRNA_data <- read_csv("Guide_RNA_Analysis_BEM1.xlsx - Sheet1.csv")
## Rows: 3 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Target Sequence, Genomic Location, Strand, Off-Targets, Restrictio...
## dbl (13): Rank, GC Content (%), Self-Complementarity, MM0, MM1, MM2, MM3, Ef...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the data table
kable(gRNA_data) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank
|
Target Sequence
|
Genomic Location
|
Strand
|
GC Content (%)
|
Self-Complementarity
|
MM0
|
MM1
|
MM2
|
MM3
|
Efficiency
|
MIT Specificity Score
|
CFD Specificity Score
|
Doench ’16 Efficiency
|
Mor.-Mateos Efficiency
|
Doench RuleSet3 Score
|
Off-Targets
|
Restriction Enzymes
|
1
|
CTAAACGGACAAATGGCGAAGGG
|
chrII:620692
|
|
45
|
3
|
0
|
0
|
0
|
0
|
65.85
|
100
|
100
|
66
|
34
|
15
|
0
|
None
|
2
|
TTCCTGTTCGTAAATGAATGGGG
|
chrII:620715
|
|
35
|
0
|
0
|
0
|
0
|
0
|
60.31
|
100
|
100
|
60
|
18
|
-60
|
2 (CPA2, COQ6)
|
TspDTI
|
3
|
CCTAAACGGACAAATGGCGAAGG
|
chrII:620691
|
|
50
|
3
|
0
|
0
|
0
|
0
|
61.09
|
100
|
100
|
61
|
23
|
15
|
0
|
None
|
# Summary statistics of the dataset
summary(gRNA_data)
## Rank Target Sequence Genomic Location Strand
## Min. :1.0 Length:3 Length:3 Length:3
## 1st Qu.:1.5 Class :character Class :character Class :character
## Median :2.0 Mode :character Mode :character Mode :character
## Mean :2.0
## 3rd Qu.:2.5
## Max. :3.0
## GC Content (%) Self-Complementarity MM0 MM1 MM2
## Min. :35.00 Min. :0.0 Min. :0 Min. :0 Min. :0
## 1st Qu.:40.00 1st Qu.:1.5 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :45.00 Median :3.0 Median :0 Median :0 Median :0
## Mean :43.33 Mean :2.0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:47.50 3rd Qu.:3.0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :50.00 Max. :3.0 Max. :0 Max. :0 Max. :0
## MM3 Efficiency MIT Specificity Score CFD Specificity Score
## Min. :0 Min. :60.31 Min. :100 Min. :100
## 1st Qu.:0 1st Qu.:60.70 1st Qu.:100 1st Qu.:100
## Median :0 Median :61.09 Median :100 Median :100
## Mean :0 Mean :62.42 Mean :100 Mean :100
## 3rd Qu.:0 3rd Qu.:63.47 3rd Qu.:100 3rd Qu.:100
## Max. :0 Max. :65.85 Max. :100 Max. :100
## Doench '16 Efficiency Mor.-Mateos Efficiency Doench RuleSet3 Score
## Min. :60.00 Min. :18.0 Min. :-60.0
## 1st Qu.:60.50 1st Qu.:20.5 1st Qu.:-22.5
## Median :61.00 Median :23.0 Median : 15.0
## Mean :62.33 Mean :25.0 Mean :-10.0
## 3rd Qu.:63.50 3rd Qu.:28.5 3rd Qu.: 15.0
## Max. :66.00 Max. :34.0 Max. : 15.0
## Off-Targets Restriction Enzymes
## Length:3 Length:3
## Class :character Class :character
## Mode :character Mode :character
##
##
##
# Plot Doench '16, Mor.-Mateos, and RuleSet3 efficiency scores
p1 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Doench '16 Efficiency`, fill = factor(Rank))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Doench '16 Efficiency", x = "Guide Rank", y = "Efficiency Score") +
theme_minimal()
p2 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Mor.-Mateos Efficiency`, fill = factor(Rank))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Mor.-Mateos Efficiency", x = "Guide Rank", y = "Efficiency Score") +
theme_minimal()
p3 <- ggplot(gRNA_data, aes(x = factor(Rank), y = `Doench RuleSet3 Score`, fill = factor(Rank))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Doench RuleSet3 Score", x = "Guide Rank", y = "Efficiency Score") +
theme_minimal()
# Arrange the plots side by side
grid.arrange(p1, p2, p3, ncol = 3)

Figure 2 bar chart illustrates the GC content (%) for each guide RNA
(gRNA) ranked for targeting the BEM1 promoter in yeast. The x-axis
represents the Guide Rank, which ranks gRNAs based on specificity,
efficiency, and off-target analysis (Rank 1, 2, and 3). The y-axis
indicates the GC Content (%), a critical factor influencing the
stability and binding efficiency of gRNAs. Rank 1 has a GC content of
approximately 45%, Rank 2 exhibits a lower GC content of around 35%, and
Rank 3 shows the highest GC content at 50%.
# Plot self-complementarity
ggplot(gRNA_data, aes(x = factor(Rank), y = `Self-Complementarity`, fill = factor(Rank))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Self-Complementarity by Guide Rank", x = "Guide Rank", y = "Self-Complementarity") +
theme_minimal()

Figure 4 is a heatmap that visualises the correlation matrix of
various metrics used to evaluate guide RNAs (gRNAs) for CRISPRa
targeting the BEM1 promoter in yeast. The x-axis and y-axis represent
the compared metrics, including MIT Specificity Score, CFD Specificity
Score, Doench ’16 Efficiency, Mor.-Mateos Efficiency, Doench RuleSet3
Score, and GC Content (%). The colour scale indicates the strength and
direction of the correlation, ranging from red (positive correlation) to
blue (negative correlation).
# Filter the data to identify the best guide RNA
# Criteria: MIT Specificity Score >= 98, CFD Specificity Score = 100, Doench '16 Efficiency > 60, and no off-targets
best_guide <- gRNA_data %>%
filter(`MIT Specificity Score` >= 98 &
`CFD Specificity Score` == 100 &
`Doench '16 Efficiency` > 60 &
`Off-Targets` == "0")
# Display the best guide in a formatted table
kable(best_guide, col.names = c("Rank", "Target Sequence", "Genomic Location", "Strand", "GC Content (%)",
"Self-Complementarity", "MM0", "MM1", "MM2", "MM3",
"Efficiency", "MIT Specificity Score", "CFD Specificity Score",
"Doench '16 Efficiency", "Mor.-Mateos Efficiency",
"Doench RuleSet3 Score", "Off-Targets", "Restriction Enzymes")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank
|
Target Sequence
|
Genomic Location
|
Strand
|
GC Content (%)
|
Self-Complementarity
|
MM0
|
MM1
|
MM2
|
MM3
|
Efficiency
|
MIT Specificity Score
|
CFD Specificity Score
|
Doench ’16 Efficiency
|
Mor.-Mateos Efficiency
|
Doench RuleSet3 Score
|
Off-Targets
|
Restriction Enzymes
|
1
|
CTAAACGGACAAATGGCGAAGGG
|
chrII:620692
|
|
45
|
3
|
0
|
0
|
0
|
0
|
65.85
|
100
|
100
|
66
|
34
|
15
|
0
|
None
|
3
|
CCTAAACGGACAAATGGCGAAGG
|
chrII:620691
|
|
50
|
3
|
0
|
0
|
0
|
0
|
61.09
|
100
|
100
|
61
|
23
|
15
|
0
|
None
|
Table 2 highlights the best guide RNA (gRNA) identified for CRISPRa
experiments targeting the BEM1 promoter in yeast. The Rank column
represents the priority of the gRNA, with Rank 1 being the most optimal.
Target Sequence lists the 23-nucleotide sequence of the gRNA, while
Genomic Location and Strand specify the exact position and strand
targeted within the yeast genome. GC Content (%) and
Self-Complementarity are important metrics affecting the stability and
binding Efficiency of the gRNA. Key efficiency scores include Doench ’16
Efficiency (66 for Rank 1, 61 for Rank 3), indicating moderate-to-high
activation potential. MIT Specificity Score and CFD Specificity Score
are 100 for these guides, reflecting perfect specificity and minimal
off-target risks. Both guides have 0 off-targets, ensuring precise
targeting. Restriction Enzymes list whether any enzymes overlap the
sequence; here, no restriction enzymes.