VERSION HISTORY

Packages

drawProteins is available from Bioconductor. You’ll need to use have and load the BiocManager manager package to download it. Packages from Bioconductor are downloaded with the install() function.

library(BiocManager)
## Bioconductor version '3.13' is out-of-date; the current release version '3.14'
##   is available with R version '4.1'; see https://bioconductor.org/install
#install("drawProteins")
library(drawProteins)
library(ggplot2)
library(drawProteins)

Building a protein diagram

Example: SNX11

First, we use a UniProt accession to download data from UniProt. This produces a list.

Q9Y5W8_json  <- drawProteins::get_features("Q9Y5W8")
## [1] "Download has worked"
is(Q9Y5W8_json)
## [1] "list"   "vector"

Then the raw data from the webpage is converted to a dataframe

my_prot_df <- drawProteins::feature_to_dataframe(Q9Y5W8_json)
is(my_prot_df)
## [1] "data.frame" "list"       "oldClass"   "vector"

The information available on a protein on UniProt varies a lot depending on how much its been studied. drawProteins can extract information about the following things:

  1. domains
  2. chains
  3. regions
  4. motifs
  5. phosphorylated sites
  6. repeats

and others

If available, it can plot the information. You can get a sense for what’s available by looking at the dataframe produced by drawProteins::feature_to_dataframe()

my_prot_df[,-2]
##                     type begin end length accession   entryName taxid order
## featuresTemp       CHAIN     1 968    967    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.1    DOMAIN    97 284    187    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.2    DOMAIN   373 496    123    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.3    DOMAIN   570 691    121    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.4   BINDING   612 612      0    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.5   BINDING   614 614      0    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.6   BINDING   639 639      0    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.7   BINDING   653 653      0    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.8   VAR_SEQ   569 579     10    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.9   VARIANT   472 472      0    Q9Y5W8 SNX13_HUMAN  9606     1
## featuresTemp.10 CONFLICT   638 638      0    Q9Y5W8 SNX13_HUMAN  9606     1

From the dataframe it can plot the available information. It uses ggplot2 and so uses some coding conventions of ggplot which can look unfamiliar if you’re new to it. Also, its a little tricky to understand how information in the dataframe gets turned turned into things on the plots by different function.

For a particular protein, what I recommend is trying out each chunk of code below and seeing what looks most interesting. Ideally there are domains, but this isn’t always the case with less well-studied proteins. You can comment out a line of code on a chunk to see how it impacts what get’s plotted.

Domains present

my_canvas <- draw_canvas(my_prot_df)  
my_canvas <- draw_chains(my_canvas, my_prot_df, 
                         label_size = 2.5)
my_canvas <- draw_domains(my_canvas, my_prot_df)
my_canvas

Only “receptor domain”

The following protein has no domains or motifs, only folds.

DIO1_json <- drawProteins::get_features("P49895")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(DIO1_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_recept_dom(my_canvas, my_prot_df)
my_canvas

Protein with LOTS of “Region” information

Here’s an example of a protein with LOTS of info. First I’ll plot everything I can, then dial it back so its manageable to read.

Q9Y6K5_json <- drawProteins::get_features("Q9Y6K5")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(Q9Y6K5_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas

Protein with region AND folding information

Fibrinogen gamma chain (P04115) has information on regions (a disordered stretch at the end) and folding.

P04115_json <- drawProteins::get_features("P04115")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P04115_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas

Protein with LOTS of information

p53 (P04637) has lots of “region” information. I commented out regions and just included the code for draw_folding() to reduce the clutter.

P04637_json <- drawProteins::get_features("P04637")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P04637_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
#my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas

Another ecample of protein with region AND folding information

P42127

P42127_json <- drawProteins::get_features("P42127")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P42127_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas

Protein with repeats

Leucine-rich repeat transmembrane protein FLRT3 (Q9NZU0) has several Leucine-rich repeats which can be shown with draw_repeat().

Q9NZU0_json <- drawProteins::get_features("Q9NZU0")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(Q9NZU0_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)

my_canvas <- draw_repeat(my_canvas, my_prot_df)
my_canvas

And Sometimes, there’s nothing

Cellobiose 2-epimerase (A4XGA6) is known to have repeats according to RepeatsDB. However drawProteins() isn’t able to extract anything interesting from the UniProt entry. This is totally ok - it just indicates that the UniProt doesn’t haven sufficient annotations to allow us to plot anything. I used almost every function drawProteins() and nothing shows up; we can see why from looking at the dataframe.

A4XGA6_json <- drawProteins::get_features("A4XGA6")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(A4XGA6_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)

my_canvas <- draw_regions(my_canvas, my_prot_df)
my_canvas <- draw_motif(my_canvas, my_prot_df)
my_canvas <- draw_phospho(my_canvas, my_prot_df)
my_canvas <- draw_repeat(my_canvas, my_prot_df)
my_canvas <- draw_recept_dom(my_canvas, my_prot_df)
my_canvas <- draw_folding(my_canvas, my_prot_df)
my_canvas

my_prot_df
##                type description begin end length accession    entryName  taxid
## featuresTemp COILED        NONE   264 284     20    A4XGA6 A4XGA6_CALS8 351627
## 1             CHAIN        NONE     1 390    389    A4XGA6 A4XGA6_CALS8 351627
##              order
## featuresTemp     1
## 1                1

Phosphorylation

Calmodulin (P02593) is a heavily phosphorylated protein. draw_phospho() shows the phosphorylation sites. UniProt also has information on folding we can include if we want (I’ve commented it out).

P0DP23_json <- drawProteins::get_features("P0DP23")
## [1] "Download has worked"
my_prot_df <- drawProteins::feature_to_dataframe(P0DP23_json)

my_canvas <- draw_canvas(my_prot_df)
my_canvas <- draw_chains(my_canvas, my_prot_df, label_size = 2.5)
my_canvas <- draw_phospho(my_canvas, my_prot_df)
#my_canvas <- draw_folding(my_canvas, my_prot_df)

my_canvas