VERSION HISTORY
drawProteins is available from Bioconductor. You’ll need to use have and load the BiocManager manager package to download it. Packages from Bioconductor are downloaded with the install() function.
library(BiocManager)
## Bioconductor version '3.13' is out-of-date; the current release version '3.14'
## is available with R version '4.1'; see https://bioconductor.org/install
#install("drawProteins")
library(drawProteins)
library(ggplot2)
library(drawProteins)
Example: SNX11
First, we use a UniProt accession to download data from UniProt. This produces a list.
Q9Y5W8_json <- drawProteins::get_features("Q9Y5W8")
## [1] "Download has worked"
is(Q9Y5W8_json)
## [1] "list" "vector"
Then the raw data from the webpage is converted to a dataframe
my_prot_df <- drawProteins::feature_to_dataframe(Q9Y5W8_json)
is(my_prot_df)
## [1] "data.frame" "list" "oldClass" "vector"
The information available on a protein on UniProt varies a lot depending on how much its been studied. drawProteins can extract information about the following things:
and others
If available, it can plot the information. You can get a sense for what’s available by looking at the dataframe produced by drawProteins::feature_to_dataframe()
my_prot_df[,-2]
## type begin end length accession entryName taxid order
## featuresTemp CHAIN 1 968 967 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.1 DOMAIN 97 284 187 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.2 DOMAIN 373 496 123 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.3 DOMAIN 570 691 121 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.4 BINDING 612 612 0 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.5 BINDING 614 614 0 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.6 BINDING 639 639 0 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.7 BINDING 653 653 0 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.8 VAR_SEQ 569 579 10 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.9 VARIANT 472 472 0 Q9Y5W8 SNX13_HUMAN 9606 1
## featuresTemp.10 CONFLICT 638 638 0 Q9Y5W8 SNX13_HUMAN 9606 1
From the dataframe it can plot the available information. It uses ggplot2 and so uses some coding conventions of ggplot which can look unfamiliar if you’re new to it. Also, its a little tricky to understand how information in the dataframe gets turned turned into things on the plots by different function.
For a particular protein, what I recommend is trying out each chunk of code below and seeing what looks most interesting. Ideally there are domains, but this isn’t always the case with less well-studied proteins. You can comment out a line of code on a chunk to see how it impacts what get’s plotted.
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df,
label_size = 2.5)
my_canvas <- draw_domains(my_canvas, my_prot_df)
my_canvas
The following protein has no domains or motifs, only folds.
DIO1_json <- drawProteins::get_features("P49895")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(DIO1_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_recept_dom(my_canvas, my_prot_df)
my_canvas
Here’s an example of a protein with LOTS of info. First I’ll plot everything I can, then dial it back so its manageable to read.
Q9Y6K5_json <- drawProteins::get_features("Q9Y6K5")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(Q9Y6K5_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas
Fibrinogen gamma chain (P04115) has information on regions (a disordered stretch at the end) and folding.
P04115_json <- drawProteins::get_features("P04115")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P04115_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas
p53 (P04637) has lots of “region” information. I commented out regions and just included the code for draw_folding() to reduce the clutter.
P04637_json <- drawProteins::get_features("P04637")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P04637_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
#my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas
P42127
P42127_json <- drawProteins::get_features("P42127")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P42127_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas
Leucine-rich repeat transmembrane protein FLRT3 (Q9NZU0) has several Leucine-rich repeats which can be shown with draw_repeat().
Q9NZU0_json <- drawProteins::get_features("Q9NZU0")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(Q9NZU0_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_repeat(my_canvas, my_prot_df)
my_canvas
Cellobiose 2-epimerase (A4XGA6) is known to have repeats according to RepeatsDB. However drawProteins() isn’t able to extract anything interesting from the UniProt entry. This is totally ok - it just indicates that the UniProt doesn’t haven sufficient annotations to allow us to plot anything. I used almost every function drawProteins() and nothing shows up; we can see why from looking at the dataframe.
A4XGA6_json <- drawProteins::get_features("A4XGA6")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(A4XGA6_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_motif(my_canvas, my_prot_df)
my_canvas <- draw_phospho(my_canvas, my_prot_df)
my_canvas <- draw_repeat(my_canvas, my_prot_df)
my_canvas <- draw_recept_dom(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas
my_prot_df
## type description begin end length accession entryName taxid
## featuresTemp COILED NONE 264 284 20 A4XGA6 A4XGA6_CALS8 351627
## 1 CHAIN NONE 1 390 389 A4XGA6 A4XGA6_CALS8 351627
## order
## featuresTemp 1
## 1 1
Calmodulin (P02593) is a heavily phosphorylated protein. draw_phospho() shows the phosphorylation sites. UniProt also has information on folding we can include if we want (I’ve commented it out).
P0DP23_json <- drawProteins::get_features("P0DP23")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P0DP23_json)
my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_phospho(my_canvas, my_prot_df)
#my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas