Guide_RNA_Analysis_SRG1_in

This project focuses on the CRISPR-Cas9 mediated knockout of SRG1, a regulatory RNA in Saccharomyces cerevisiae, to study its role in chromatin regulation and gene expression. SRG1 is particularly relevant because it regulates the SER3 gene by modulating chromatin structure, providing a strong model for understanding the behaviour of long non-coding RNAs (lncRNAs) in human cancer biology. To achieve this, three guide RNAs (gRNAs) targeting SRG1 were designed and evaluated based on specificity and efficiency using tools like CRISPOR and ChopChop. Rank 1, Rank 2 and Rank 3 were also analysed for comparative studies for their high Doench ’16 Efficiency Score of and minimal off-targets.The knockout experiments are designed to investigate the impact of SRG1 loss on chromatin structure, transcriptional repression, and stress response pathways in yeast, mimicking cancer-like dysregulation.

# Guide RNA Analysis and validation for CRISPR knock out gene SRG1 in Saccharomyces cerevisiae (yeast). 

# This analysis aims to identify the optimal guide RNA sequence among the top four ranked guides, focusing on specificity, efficiency, and off-target effects.

# Load necessary libraries
install.packages("ggcorrplot")

## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

library(readr)
library(ggcorrplot)

## Loading required package: ggplot2

library(tidyr)
library(readxl)      # For reading Excel files
library(dplyr)       # For data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)     # For data visualisation
library(knitr)       # For rendering tables in Markdown
library(kableExtra)  # For enhanced table aesthetics

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(gridExtra)   # For multi-plot layout

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

# Load the data
RNA_Seq <- read_csv("Project_4_Guide_RNA_Analysis_SRG1_in_yeast.xlsx - Sheet1.csv")

## Rows: 3 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Target Sequence, Genomic Location, Strand, Off-Targets, Restrictio...
## dbl (12): Rank, GC Content (%), Self-Complementarity, MM0 (Exact Match), MM1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Display the data table
kable(RNA_Seq) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Rank	Target Sequence	Genomic Location	GC Content (%)	MIT Specificity Score	CFD Specificity Score	Doench ’16 Efficiency	Mor.-Mateos Efficiency	Doench RuleSet3 Score	Off-Targets	Restriction Enzymes
1	CAACAAGCTATGAATATGAGCGG	chrV:322725-322747	35	100	100	69	35	39	0	None
2	ACTCACAAATGGAATTCAAGGGG	chrV:322335-322357	35	98	100	62	28	26	2 (HGH1, TPK3)	AgsI
3	CCCGTGCAGGGTTTTCTGAGCGG	chrV:322467-322489	60	99	100	62	50	-91	2 (intergenic: HSP82-YAR1, YJU3-MBR1)	BstDEI, Hpy188I

# Summary statistics of the dataset
summary(RNA_Seq)

##       Rank     Target Sequence    Genomic Location      Strand         
##  Min.   :1.0   Length:3           Length:3           Length:3          
##  1st Qu.:1.5   Class :character   Class :character   Class :character  
##  Median :2.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2.0                                                           
##  3rd Qu.:2.5                                                           
##  Max.   :3.0                                                           
##  GC Content (%)  Self-Complementarity MM0 (Exact Match)      MM1         MM2   
##  Min.   :35.00   Min.   :0            Min.   :0         Min.   :0   Min.   :0  
##  1st Qu.:35.00   1st Qu.:0            1st Qu.:0         1st Qu.:0   1st Qu.:0  
##  Median :35.00   Median :0            Median :0         Median :0   Median :0  
##  Mean   :43.33   Mean   :0            Mean   :0         Mean   :0   Mean   :0  
##  3rd Qu.:47.50   3rd Qu.:0            3rd Qu.:0         3rd Qu.:0   3rd Qu.:0  
##  Max.   :60.00   Max.   :0            Max.   :0         Max.   :0   Max.   :0  
##       MM3    MIT Specificity Score CFD Specificity Score Doench '16 Efficiency
##  Min.   :0   Min.   : 98.0         Min.   :100           Min.   :62.00        
##  1st Qu.:0   1st Qu.: 98.5         1st Qu.:100           1st Qu.:62.00        
##  Median :0   Median : 99.0         Median :100           Median :62.00        
##  Mean   :0   Mean   : 99.0         Mean   :100           Mean   :64.33        
##  3rd Qu.:0   3rd Qu.: 99.5         3rd Qu.:100           3rd Qu.:65.50        
##  Max.   :0   Max.   :100.0         Max.   :100           Max.   :69.00        
##  Mor.-Mateos Efficiency Doench RuleSet3 Score Off-Targets       
##  Min.   :28.00          Min.   :-91.000       Length:3          
##  1st Qu.:31.50          1st Qu.:-32.500       Class :character  
##  Median :35.00          Median : 26.000       Mode  :character  
##  Mean   :37.67          Mean   : -8.667                         
##  3rd Qu.:42.50          3rd Qu.: 32.500                         
##  Max.   :50.00          Max.   : 39.000                         
##  Restriction Enzymes
##  Length:3           
##  Class :character   
##  Mode  :character   
##                     
##                     
##

# Visualising MIT and CFD Specificity Scores
ggplot(RNA_Seq, aes(x = factor(Rank), y = `MIT Specificity Score`, fill = factor(Rank))) +
  geom_bar(stat = "identity") +
  labs(title = "MIT Specificity Score by Guide Rank", x = "Guide Rank", y = "MIT Specificity Score") +
  theme_minimal() +
  theme(legend.position = "none")

Figure 1 is a bar chart that displays the MIT Specificity Scores of guide RNAs (gRNAs) ranked 1 to 3, measuring their precision for targeting the SRG1 gene in yeast during CRISPR-Cas9 experiments. The x-axis represents the guide rank, categorized from 1 to 3 based on their suitability for gene targeting. The y-axis shows the MIT Specificity Score, which ranges from 0 to 100, where a higher score indicates greater specificity and reduced likelihood of off-target effects. Guide Rank 1 achieves the maximum score of 100, representing perfect precision, while Ranks 2 and 3 also have high scores, slightly below 100, confirming their substantial targeting accuracy.

ggplot(RNA_Seq, aes(x = factor(Rank), y = `CFD Specificity Score`, fill = factor(Rank))) +
  geom_bar(stat = "identity") +
  labs(title = "CFD Specificity Score by Guide Rank", x = "Guide Rank", y = "CFD Specificity Score") +
  theme_minimal() +
  theme(legend.position = "none")

Figure 2 bar chart illustrates the CFD Specificity Scores for guide RNAs (gRNAs) ranked 1 to 3, evaluated for their precision in targeting the SRG1 gene in yeast for CRISPR-Cas9 experiments. The x-axis represents the guide rank, ranging from 1 to 3, while the y-axis shows the CFD (Cutting Frequency Determination) Specificity Score, scaled from 0 to 100. The CFD score quantifies the likelihood of accurate binding to the intended target sequence while minimising off-target binding events. In this chart, all three guides achieve the maximum CFD score of 100, indicating perfect specificity.

# Efficiency scores comparison using a scatter plot
ggplot(RNA_Seq, aes(x = factor(Rank))) +
  geom_point(aes(y = `Doench '16 Efficiency`, color = "Doench '16")) +
  geom_point(aes(y = `Mor.-Mateos Efficiency`, color = "Mor.-Mateos")) +
  geom_point(aes(y = `Doench RuleSet3 Score`, color = "RuleSet3")) +
  labs(title = "Efficiency Scores by Guide Rank", x = "Guide Rank", y = "Efficiency Score") +
  scale_color_manual(values = c("Doench '16" = "blue", "Mor.-Mateos" = "green", "RuleSet3" = "red")) +
  theme_minimal()

Figure 3 shows a scatter plot presenting the Efficiency Scores for guide RNAs (gRNAs) ranked 1 to 3. The x-axis displays the guide rank, while the y-axis indicates Efficiency Scores based on three metrics: Doench ’16 (blue), Mor.-Mateos (green), and RuleSet3 (red). The Doench ’16 Efficiency Score predicts gRNA binding and cleavage success, with higher values signifying more potent activity. At the same time, the Mor.-Mateos Efficiency Score reflects experimental validation of gRNA activity, and the RuleSet3 Efficiency Score evaluates functionality, with positive values indicating high efficiency. Guide Rank 1 demonstrates the highest efficiency across all metrics, particularly for Doench ’16, indicating strong targeting potential.

# Table summarizing off-targets for each guide
off_target_summary <- RNA_Seq %>%
  select(Rank, `Off-Targets`)

kable(off_target_summary, col.names = c("Rank", "Off-Target Summary")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Rank	Off-Target Summary
1	0
2	2 (HGH1, TPK3)
3	2 (intergenic: HSP82-YAR1, YJU3-MBR1)

Table 1 summarises the off-target analysis for guide RNAs (gRNAs) ranked 1 to 3, targeting the SRG1 gene in yeast. Rank 1 demonstrates perfect specificity with zero off-targets, making it the most precise choice for CRISPR-Cas9-mediated knockout experiments. Rank 2, while highly specific, has two off-targets located in exonic regions of the HGH1 and TPK3 genes, which may pose a risk of unintended gene disruptions. Rank 3 also shows two off-targets located in intergenic regions between HSP82-YAR1 and YJU3-MBR1.

# Create a correlation matrix for specificity and efficiency scores
cor_matrix <- cor(RNA_Seq %>%
  select(`MIT Specificity Score`, `CFD Specificity Score`, `Doench '16 Efficiency`, `Mor.-Mateos Efficiency`, `Doench RuleSet3 Score`))

## Warning in cor(RNA_Seq %>% select(`MIT Specificity Score`, `CFD Specificity
## Score`, : the standard deviation is zero

# Visualize the correlation matrix
ggcorrplot(cor_matrix, method = "circle", type = "lower", lab = TRUE, lab_size = 3, title = "Correlation Matrix of Scores")

##### Figure 4 correlation matrix visualises the relationships between various scoring metrics used to evaluate guide RNAs (gRNAs) for targeting the SRG1 gene in yeast. The x-axis and y-axis represent the scores: Doench ’16 Efficiency, Mor.-Mateos Efficiency, MIT Specificity Score, and Doench RuleSet3 Score, with their pairwise correlations indicated in the cells. The colour gradient signifies the strength and direction of the correlation, ranging from blue for strong negative correlations (-1.0) to red for strong positive correlations (+1.0). Key observations include a strong positive correlation (0.87) between the MIT Specificity Score and Doench ’16 Efficiency, suggesting that guides with high specificity also tend to have higher Efficiency.

# Identify the best guide based on specificity and efficiency
best_guide <- RNA_Seq %>%
  filter(`MIT Specificity Score` >= 98 & `CFD Specificity Score` == 100 & `Doench '16 Efficiency` > 60 & `Off-Targets` == "0")

# Display the best guide in a formatted table
kable(best_guide, col.names = c("Rank", "Target Sequence", "Genomic Location", "Strand", "GC Content (%)",
                                "Self-Complementarity", "MM0", "MM1", "MM2", "MM3", 
                                "MIT Specificity Score", "CFD Specificity Score", 
                                "Doench '16 Efficiency", "Mor.-Mateos Efficiency", 
                                "Doench RuleSet3 Score", "Off-Targets", "Restriction Enzymes")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Rank	Target Sequence	Genomic Location	Strand	GC Content (%)	Self-Complementarity	MM0	MM1	MM2	MM3	MIT Specificity Score	CFD Specificity Score	Doench ’16 Efficiency	Mor.-Mateos Efficiency	Doench RuleSet3 Score	Off-Targets	Restriction Enzymes
1	CAACAAGCTATGAATATGAGCGG	chrV:322725-322747		35	0	0	0	0	0	100	100	69	35	39	0	None

# Conclusion: Based on the analysis, the guide RNA with Rank 1 is the optimal choice due to its high specificity, strong efficiency, and absence of off-target effects.

Table 2 identifies the optimal guide RNA for targeting the SRG1 gene in yeast, summarised. Rank 1 is the most suitable choice due to its exceptional performance across multiple evaluation metrics. Its Target Sequence, CAACAAGCTATGAATATGAGCGG, is located on chromosome V at position 322725–322747 on the forward strand with a GC content of 35%. The guide shows no self-complementarity or mismatches (MM0–MM3), ensuring minimal unintended interactions. Importantly, it achieves a perfect MIT Specificity Score (100) and CFD Specificity Score (100), guaranteeing high precision and specificity in the yeast genome. The guide’s Doench ’16 Efficiency Score is a robust 69, coupled with a moderate Mor.-Mateos Efficiency Score of 35 and a Doench RuleSet3 Score of 39, collectively indicating strong potential for effective gene knockout. Furthermore, this guide has zero off-targets and no associated restriction enzyme conflicts.

Guide_RNA_Analysis_SRG1_in_yeast

Catherine Taylor

2024-10-31