This R notebook demonstrates how to retrieve and analyze genetic
variant information for the PTPN11 gene using data from
ClinVar and the Ensembl Variant Effect Predictor (VEP). The analysis
focuses on a specific variant, rs121918459, to illustrate
data retrieval and interpretation.
First, we need to install and load the necessary R packages. If you
are running this notebook for the first time, uncomment and run the
install.packages and devtools::install_github
lines. BiocManager::install is used for Bioconductor
packages.
# Install clinvaR from GitHub (uncomment and run if not already installed)
# install.packages("devtools")
# devtools::install_github("jamesdiao/clinvaR")
# BiocManager::install("ensemblVEP")
library(DT)
library(ensemblVEP)
library(clinvaR)
library(httr)
library(jsonlite)
# Confirm successful loading of libraries
print("Required libraries loaded successfully.")
## [1] "Required libraries loaded successfully."
We will download the ClinVar database using the
get_clinvar() function from the clinvaR
package. This may take some time depending on your internet connection
as it retrieves a large dataset.
After downloading, we will filter the data to only include entries
related to the PTPN11 gene.
# Download ClinVar data
clinvar_data <- get_clinvar()
# Subset data for the PTPN11 gene
ptpn11_subset <- clinvar_data[which(clinvar_data$GENE == "PTPN11") ,]
print(paste("Entries for PTPN11 gene:", nrow(ptpn11_subset)))
## [1] "Entries for PTPN11 gene: 259"
# Display the first few rows of the PTPN11 subset
head(as.data.frame(ptpn11_subset))
## VAR_ID GENE CHROM POS REF ALT ID CLNSIG
## 159034 12_112856737_G_C PTPN11 12 112856737 G C rs545369072 0, 0, 0
## 159035 12_112856777_G_A PTPN11 12 112856777 G A rs886048964 0, 0, 0
## 159037 12_112856871_T_G PTPN11 12 112856871 T G rs587781131 2
## 159038 12_112856920_C_T PTPN11 12 112856920 C T rs267606990 5, 5, 5
## 159039 12_112856925_C_G PTPN11 12 112856925 C G rs886041517 0
## 159040 12_112856937_G_T PTPN11 12 112856937 G T rs886048965 0, 0, 0
## CLNDBN pathogenic_incl_conflicts pathogenic_no_conflicts
## 159034 Noonan_s.... FALSE FALSE
## 159035 Noonan_s.... FALSE FALSE
## 159037 not_spec.... FALSE FALSE
## 159038 Noonan_s.... TRUE TRUE
## 159039 not_spec.... FALSE FALSE
## 159040 Noonan_s.... FALSE FALSE
From the PTPN11 subset, we will search for a specific
variant using its ClinVar ID, rs121918459. This variant is
c.188A>G.
# Search for the specific variant rs121918459 (c.188A>G) in the PTPN11 subset
specific_variant <- ptpn11_subset[which(ptpn11_subset$ID == "rs121918459"),]
if (nrow(specific_variant) > 0) {
print("Specific variant rs121918459 found:")
# Display as interactive DT table
datatable(specific_variant,
caption = "Variant rs121918459 (c.188A>G) in PTPN11",
rownames = FALSE,
options = list(
scrollX = TRUE, # Enable horizontal scrolling
autoWidth = TRUE,
pageLength = 10,
columnDefs = list(
list(width = '100px', targets = "_all") # Set minimum column width
),
dom = 't' # Only show table (no pagination needed for single row)
),
class = 'cell-border stripe hover') %>%
formatStyle(columns = colnames(specific_variant),
fontSize = '12px',
backgroundColor = styleEqual(1, '#f0f8ff'))
} else {
print("Specific variant rs121918459 not found in the PTPN11 subset.")
}
## [1] "Specific variant rs121918459 found:"
To get detailed annotations for the variant, we will use the Ensembl
Variant Effect Predictor (VEP) API. A custom R function
get_vep_variant is defined below to interact with the
Ensembl REST API. This function constructs the API request based on
chromosome, position, reference allele, alternative allele, species, and
genome build.
get_vep_variant <- function(chromosome, position, ref_allele, alt_allele,
species = "human", genome = "GRCh38") {
# Select the appropriate server based on genome build
if (genome == "GRCh37" || genome == "hg19") {
server <- "https://grch37.rest.ensembl.org"
} else if (genome == "GRCh38" || genome == "hg38") {
server <- "https://rest.ensembl.org"
} else {
stop("Genome must be either 'GRCh37' or 'GRCh38'")
}
# Construct the region string for the VEP API
# The VEP API expects format: chr:start-end/allele
region <- paste0(chromosome, ":", position, "-", position, "/", alt_allele)
# Build the API endpoint
endpoint <- paste0("/vep/", species, "/region/", region)
# Make the HTTP GET request to the Ensembl VEP API
response <- GET(paste0(server, endpoint),
content_type("application/json"),
accept("application/json"))
# Parse the JSON response
if (status_code(response) == 200) {
result <- fromJSON(toJSON(content(response)))
return(result)
} else {
stop(paste("API request failed with status:", status_code(response), "and message:", content(response, "text")))
}
}
print("get_vep_variant function defined.")
## [1] "get_vep_variant function defined."
Now, we will use the get_vep_variant function to query
the Ensembl VEP API for our specific variant (rs121918459)
using the GRCh37 genome build. This will provide detailed
annotations such as transcript consequences.
# GRCh37 variant
variant_grch37 <- get_vep_variant(specific_variant$CHROM, specific_variant$POS, specific_variant$REF, specific_variant$ALT, genome = "GRCh37")
# Display transcript consequences as interactive table
if (!is.null(variant_grch37$transcript_consequences)) {
tc_df <- as.data.frame(variant_grch37$transcript_consequences)
datatable(tc_df,
caption = "Transcript Consequences for Variant",
filter = 'top',
options = list(
pageLength = 10,
scrollX = TRUE,
autoWidth = TRUE,
columnDefs = list(list(width = '150px', targets = "_all"))
))
}
This notebook successfully demonstrated how to:
PTPN11).rs121918459).GRCh37 genome
build.This workflow can be adapted to analyze other genes and variants, providing valuable insights for genetic research and clinical interpretation.