This project focuses on the CRISPR-Cas9 mediated knockout of SRG1, a
regulatory RNA in Saccharomyces cerevisiae, to study its role in
chromatin regulation and gene expression. SRG1 is particularly relevant
because it regulates the SER3 gene by modulating chromatin structure,
providing a strong model for understanding the behaviour of long
non-coding RNAs (lncRNAs) in human cancer biology. To achieve this,
three guide RNAs (gRNAs) targeting SRG1 were designed and evaluated
based on specificity and efficiency using tools like CRISPOR and
ChopChop. Rank 1, Rank 2 and Rank 3 were also analysed for comparative
studies for their high Doench ’16 Efficiency Score of and minimal
off-targets.The knockout experiments are designed to investigate the
impact of SRG1 loss on chromatin structure, transcriptional repression,
and stress response pathways in yeast, mimicking cancer-like
dysregulation.
# Guide RNA Analysis and validation for CRISPR knock out gene SRG1 in Saccharomyces cerevisiae (yeast).
# This analysis aims to identify the optimal guide RNA sequence among the top four ranked guides, focusing on specificity, efficiency, and off-target effects.
# Load necessary libraries
install.packages("ggcorrplot")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
library(ggcorrplot)
## Loading required package: ggplot2
library(tidyr)
library(readxl) # For reading Excel files
library(dplyr) # For data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # For data visualisation
library(knitr) # For rendering tables in Markdown
library(kableExtra) # For enhanced table aesthetics
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(gridExtra) # For multi-plot layout
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Load the data
RNA_Seq <- read_csv("Project_4_Guide_RNA_Analysis_SRG1_in_yeast.xlsx - Sheet1.csv")
## Rows: 3 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Target Sequence, Genomic Location, Strand, Off-Targets, Restrictio...
## dbl (12): Rank, GC Content (%), Self-Complementarity, MM0 (Exact Match), MM1...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the data table
kable(RNA_Seq) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank
|
Target Sequence
|
Genomic Location
|
Strand
|
GC Content (%)
|
Self-Complementarity
|
MM0 (Exact Match)
|
MM1
|
MM2
|
MM3
|
MIT Specificity Score
|
CFD Specificity Score
|
Doench ’16 Efficiency
|
Mor.-Mateos Efficiency
|
Doench RuleSet3 Score
|
Off-Targets
|
Restriction Enzymes
|
1
|
CAACAAGCTATGAATATGAGCGG
|
chrV:322725-322747
|
|
35
|
0
|
0
|
0
|
0
|
0
|
100
|
100
|
69
|
35
|
39
|
0
|
None
|
2
|
ACTCACAAATGGAATTCAAGGGG
|
chrV:322335-322357
|
|
35
|
0
|
0
|
0
|
0
|
0
|
98
|
100
|
62
|
28
|
26
|
2 (HGH1, TPK3)
|
AgsI
|
3
|
CCCGTGCAGGGTTTTCTGAGCGG
|
chrV:322467-322489
|
|
60
|
0
|
0
|
0
|
0
|
0
|
99
|
100
|
62
|
50
|
-91
|
2 (intergenic: HSP82-YAR1, YJU3-MBR1)
|
BstDEI, Hpy188I
|
# Summary statistics of the dataset
summary(RNA_Seq)
## Rank Target Sequence Genomic Location Strand
## Min. :1.0 Length:3 Length:3 Length:3
## 1st Qu.:1.5 Class :character Class :character Class :character
## Median :2.0 Mode :character Mode :character Mode :character
## Mean :2.0
## 3rd Qu.:2.5
## Max. :3.0
## GC Content (%) Self-Complementarity MM0 (Exact Match) MM1 MM2
## Min. :35.00 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:35.00 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :35.00 Median :0 Median :0 Median :0 Median :0
## Mean :43.33 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:47.50 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :60.00 Max. :0 Max. :0 Max. :0 Max. :0
## MM3 MIT Specificity Score CFD Specificity Score Doench '16 Efficiency
## Min. :0 Min. : 98.0 Min. :100 Min. :62.00
## 1st Qu.:0 1st Qu.: 98.5 1st Qu.:100 1st Qu.:62.00
## Median :0 Median : 99.0 Median :100 Median :62.00
## Mean :0 Mean : 99.0 Mean :100 Mean :64.33
## 3rd Qu.:0 3rd Qu.: 99.5 3rd Qu.:100 3rd Qu.:65.50
## Max. :0 Max. :100.0 Max. :100 Max. :69.00
## Mor.-Mateos Efficiency Doench RuleSet3 Score Off-Targets
## Min. :28.00 Min. :-91.000 Length:3
## 1st Qu.:31.50 1st Qu.:-32.500 Class :character
## Median :35.00 Median : 26.000 Mode :character
## Mean :37.67 Mean : -8.667
## 3rd Qu.:42.50 3rd Qu.: 32.500
## Max. :50.00 Max. : 39.000
## Restriction Enzymes
## Length:3
## Class :character
## Mode :character
##
##
##
# Visualising MIT and CFD Specificity Scores
ggplot(RNA_Seq, aes(x = factor(Rank), y = `MIT Specificity Score`, fill = factor(Rank))) +
geom_bar(stat = "identity") +
labs(title = "MIT Specificity Score by Guide Rank", x = "Guide Rank", y = "MIT Specificity Score") +
theme_minimal() +
theme(legend.position = "none")

Table 1 summarises the off-target analysis for guide RNAs (gRNAs)
ranked 1 to 3, targeting the SRG1 gene in yeast. Rank 1 demonstrates
perfect specificity with zero off-targets, making it the most precise
choice for CRISPR-Cas9-mediated knockout experiments. Rank 2, while
highly specific, has two off-targets located in exonic regions of the
HGH1 and TPK3 genes, which may pose a risk of unintended gene
disruptions. Rank 3 also shows two off-targets located in intergenic
regions between HSP82-YAR1 and YJU3-MBR1.
# Create a correlation matrix for specificity and efficiency scores
cor_matrix <- cor(RNA_Seq %>%
select(`MIT Specificity Score`, `CFD Specificity Score`, `Doench '16 Efficiency`, `Mor.-Mateos Efficiency`, `Doench RuleSet3 Score`))
## Warning in cor(RNA_Seq %>% select(`MIT Specificity Score`, `CFD Specificity
## Score`, : the standard deviation is zero
# Visualize the correlation matrix
ggcorrplot(cor_matrix, method = "circle", type = "lower", lab = TRUE, lab_size = 3, title = "Correlation Matrix of Scores")
##### Figure 4 correlation matrix visualises the relationships between
various scoring metrics used to evaluate guide RNAs (gRNAs) for
targeting the SRG1 gene in yeast. The x-axis and y-axis represent the
scores: Doench ’16 Efficiency, Mor.-Mateos Efficiency, MIT Specificity
Score, and Doench RuleSet3 Score, with their pairwise correlations
indicated in the cells. The colour gradient signifies the strength and
direction of the correlation, ranging from blue for strong negative
correlations (-1.0) to red for strong positive correlations (+1.0). Key
observations include a strong positive correlation (0.87) between the
MIT Specificity Score and Doench ’16 Efficiency, suggesting that guides
with high specificity also tend to have higher Efficiency.
# Identify the best guide based on specificity and efficiency
best_guide <- RNA_Seq %>%
filter(`MIT Specificity Score` >= 98 & `CFD Specificity Score` == 100 & `Doench '16 Efficiency` > 60 & `Off-Targets` == "0")
# Display the best guide in a formatted table
kable(best_guide, col.names = c("Rank", "Target Sequence", "Genomic Location", "Strand", "GC Content (%)",
"Self-Complementarity", "MM0", "MM1", "MM2", "MM3",
"MIT Specificity Score", "CFD Specificity Score",
"Doench '16 Efficiency", "Mor.-Mateos Efficiency",
"Doench RuleSet3 Score", "Off-Targets", "Restriction Enzymes")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Rank
|
Target Sequence
|
Genomic Location
|
Strand
|
GC Content (%)
|
Self-Complementarity
|
MM0
|
MM1
|
MM2
|
MM3
|
MIT Specificity Score
|
CFD Specificity Score
|
Doench ’16 Efficiency
|
Mor.-Mateos Efficiency
|
Doench RuleSet3 Score
|
Off-Targets
|
Restriction Enzymes
|
1
|
CAACAAGCTATGAATATGAGCGG
|
chrV:322725-322747
|
|
35
|
0
|
0
|
0
|
0
|
0
|
100
|
100
|
69
|
35
|
39
|
0
|
None
|
# Conclusion: Based on the analysis, the guide RNA with Rank 1 is the optimal choice due to its high specificity, strong efficiency, and absence of off-target effects.
Table 2 identifies the optimal guide RNA for targeting the SRG1 gene
in yeast, summarised. Rank 1 is the most suitable choice due to its
exceptional performance across multiple evaluation metrics. Its Target
Sequence, CAACAAGCTATGAATATGAGCGG, is located on chromosome V at
position 322725–322747 on the forward strand with a GC content of 35%.
The guide shows no self-complementarity or mismatches (MM0–MM3),
ensuring minimal unintended interactions. Importantly, it achieves a
perfect MIT Specificity Score (100) and CFD Specificity Score (100),
guaranteeing high precision and specificity in the yeast genome. The
guide’s Doench ’16 Efficiency Score is a robust 69, coupled with a
moderate Mor.-Mateos Efficiency Score of 35 and a Doench RuleSet3 Score
of 39, collectively indicating strong potential for effective gene
knockout. Furthermore, this guide has zero off-targets and no associated
restriction enzyme conflicts.