Introduction

PON2 encodes a ubiqitous membrane-bound protein called paraoxonase 2 and is part of the paraoxonase gene family, located on human chromosome 7. It likely has an immune function, as it may act to reduce oxidative stress and can also break down acyl-homoserine lactones which are used by gram negative bacteria in quorum sensing. This quorum quenching function may help to prevent virulence factors from being expressed.

References: https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/pon2 https://www.uniprot.org/uniprot/Q15165 https://www.genecards.org/cgi-bin/carddisp.pl?gene=PON2 https://www.ncbi.nlm.nih.gov/homologene/385 https://www.ncbi.nlm.nih.gov/nuccore/209447066

Preliminary

Loading relevant packages:

library(rentrez)
#used to acces Entrez
library(compbio4all)
#used to help clean FASTA's
library(msa)
## Loading required package: Biostrings
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following object is masked from 'package:grDevices':
## 
##     windows
## Loading required package: XVector
## Loading required package: GenomeInfoDb
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
#used to build MSA's
library(ggplot2)
#used for better plotting
library(pander)
#used for displaying data frames better
library(ape)
## 
## Attaching package: 'ape'
## The following object is masked from 'package:Biostrings':
## 
##     complement
#used for phyogenies
library(drawProteins)
#used to draw protein domains
library(HGNChelper)
## Warning: package 'HGNChelper' was built under R version 4.1.2
#used for gene symbols 
library(seqinr)
## 
## Attaching package: 'seqinr'
## The following objects are masked from 'package:ape':
## 
##     as.alignment, consensus
## The following object is masked from 'package:Biostrings':
## 
##     translate
#used for distance alignments

Accession Numbers

Creating a data frame of 10 PON2 proteins found in different species:

Ref_Accessions <- c("NP_000296.2","NP_899131","XP_519213","XP_003809757","XP_018886552","NP_001080649","NP_001003205","NP_001013606","XP_013835161.1","NP_997899.1")
Uni_Accessions <-c("Q15165","Q62086","H2QUY6","A0A2R9A6X3","G3RXU5","Q6IRR7","P54832","Q58DS7","F1SFA2","Q6NXA5")
PDB_Accessions <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
Name_Sci <- c("Homo sapiens","Mus musculus","Pan troglodytes","Pan paniscus","Gorilla gorilla","Xenopus laevis","Canis lupis familiaris","Bos taurus","Sus scrofa","Danio rerio")
Name_Common <- c("Human","House Mouse","Chimpanzee","Bonobo","Gorilla","African Clawed Frog","Dog","Cow","Pig","Zebra Fish")

PON2_Accession <-data.frame(Ref_Accessions, Uni_Accessions, PDB_Accessions, Name_Sci, Name_Common)

Displaying Data:

pander(PON2_Accession)
Table continues below
Ref_Accessions Uni_Accessions PDB_Accessions Name_Sci
NP_000296.2 Q15165 NA Homo sapiens
NP_899131 Q62086 NA Mus musculus
XP_519213 H2QUY6 NA Pan troglodytes
XP_003809757 A0A2R9A6X3 NA Pan paniscus
XP_018886552 G3RXU5 NA Gorilla gorilla
NP_001080649 Q6IRR7 NA Xenopus laevis
NP_001003205 P54832 NA Canis lupis familiaris
NP_001013606 Q58DS7 NA Bos taurus
XP_013835161.1 F1SFA2 NA Sus scrofa
NP_997899.1 Q6NXA5 NA Danio rerio
Name_Common
Human
House Mouse
Chimpanzee
Bonobo
Gorilla
African Clawed Frog
Dog
Cow
Pig
Zebra Fish

Data Preparation

Downloading all FASTA files using RefSeq Accession Numbers:

PON2_Sequences <-matrix(nrow = length(Ref_Accessions), ncol=1)
for(i in 1:nrow(PON2_Sequences)){
  PON2_Sequences[i,] <- rentrez::entrez_fetch(db = "protein", id = Ref_Accessions[i], rettype = "fasta")
}

Cleaning the FASTA file sequences we downloaded:

Cleaned_Sequences <- PON2_Sequences
for(i in 1:nrow(PON2_Sequences)){
  Cleaned_Sequences[i,] <- fasta_cleaner(Cleaned_Sequences[i,], parse=FALSE)
}

General Protein Information

Creating a data frame of features:

PON2_features <- get_features("Q15165")
## [1] "Download has worked"
PON2_feature_df<-feature_to_dataframe(PON2_features)
## Warning in drawProteins::extract_feat_acc(features_in_lists_of_six[[i]]): NAs
## introduced by coercion

Visualizing those features:

my_canvas <- draw_canvas(PON2_feature_df)  
my_canvas <- draw_chains(my_canvas, PON2_feature_df, label_size = 2.5)
my_canvas <- draw_regions(my_canvas, PON2_feature_df)
my_canvas <- draw_motif(my_canvas, PON2_feature_df)
my_canvas <- draw_phospho(my_canvas, PON2_feature_df)
my_canvas <- draw_repeat(my_canvas, PON2_feature_df)
my_canvas <- draw_recept_dom(my_canvas, PON2_feature_df)
my_canvas <- draw_folding(my_canvas, PON2_feature_df)
my_canvas

The lack of any detail shows UniProt did not have much infromation on PON2’s structure.

Plotting various dot plots at different parameters vs self using the human PON2 protein sequence:

#create 2x2 showing different values
human_PON2 <- fasta_cleaner(PON2_Sequences[1])
par(mfrow = c(2,2), mar = c(0,0,2,1))
dotPlot(human_PON2, human_PON2, wsize = 1, nmatch = 1, main = "PON2: Default")
dotPlot(human_PON2, human_PON2, wsize = 10, nmatch = 1, main = "PON2: wsize = 10, nmatch = 1")
dotPlot(human_PON2, human_PON2, wsize = 10, nmatch = 5, main = "PON2: wsize = 10, nmatch = 5")
dotPlot(human_PON2, human_PON2, wsize = 20, nmatch = 5, main = "PON2: wsize = 20, nmatch = 5")

par(mfrow = c(1,1), mar = c(4,4,4,4))

Enlarging Best Dot Plot:

#single large plot with the best version
dotPlot(human_PON2, human_PON2, wsize = 20, nmatch = 5, main = "PON2: wsize = 20, nmatch = 5")

Creating Table of PON2 features:

features <- c("Arulesterase start:167 end:252", "http://pfam.xfam.org/protein/Q15165")
DisProt <- c(NA,NA)
RepeatsDB <- c(NA,NA)
Subcell_loc <- c("Membrane Protein", "https://www.uniprot.org/uniprot/Q15165")
Sec_class <- c("alpha + beta","https://alphafold.ebi.ac.uk/entry/Q15165")
Pro_properties <- data.frame(features,DisProt,RepeatsDB,Subcell_loc,Sec_class)
colnames(Pro_properties) <- c("Features", "Disorganized", "Repeats", "Subcellular", "Structure Class")
pander(Pro_properties)
Table continues below
Features Disorganized Repeats
Arulesterase start:167 end:252 NA NA
http://pfam.xfam.org/protein/Q15165 NA NA
Table continues below
Subcellular
Membrane Protein
https://www.uniprot.org/uniprot/Q15165
Structure Class
alpha + beta
https://alphafold.ebi.ac.uk/entry/Q15165

Protein Feature Prediction

These three methods are necessary for the code below:

table_to_vector <- function(table_x){
  table_names <- attr(table_x, "dimnames")[[1]]
  table_vect <- as.vector(table_x)
  names(table_vect) <- table_names
  return(table_vect)
}
chou_cor <- function(x,y){
  numerator <- sum(x*y)
denominator <- sqrt((sum(x^2))*(sum(y^2)))
result <- numerator/denominator
return(result)
}
chou_cosine <- function(z.1, z.2){
  z.1.abs <- sqrt(sum(z.1^2))
  z.2.abs <- sqrt(sum(z.2^2))
  my.cosine <- sum(z.1*z.2)/(z.1.abs*z.2.abs)
  return(my.cosine)
}

Compiling Chou’s (1995) data for protein prediction:

alpha <- c(285, 53, 97, 163, 22, 67, 134, 197, 111, 91, 221, 249, 48, 123, 82, 122, 119, 33, 63, 167)
beta <- c(203, 67, 139, 121, 75, 122, 86, 297, 49, 120, 177, 115, 16, 85, 127, 341, 253, 44, 110, 229)
a.plus.b <- c(175, 78, 120, 111, 74, 74, 86, 171, 33, 93, 110, 112, 25, 52, 71, 126, 117, 30, 108, 123)
a.div.b <- c(361, 146, 183, 244, 63, 114, 257, 377, 107, 239, 339, 321, 91, 158, 188, 327, 238, 72, 130, 378)
alpha.prop <- alpha/sum(alpha)
beta.prop <- beta/sum(beta)
a.plus.b.prop <- a.plus.b/sum(a.plus.b)
a.div.b <- a.div.b/sum(a.div.b)
aa.prop <- data.frame(alpha.prop, beta.prop, a.plus.b.prop, a.div.b)
row.names(aa.prop) <- c("A","R","N","D","C","Q","E","G","H","I","L","K","M","F","P","S","T","W","Y","V")

Getting PON2 amino acid frequencies:

PON2_freq_table <- table(human_PON2)/length(human_PON2)
PON2_freq <- table_to_vector(PON2_freq_table)
aa.prop$PON2_freq <- PON2_freq
pander(aa.prop)
  alpha.prop beta.prop a.plus.b.prop a.div.b PON2_freq
A 0.1165 0.07313 0.09264 0.08331 0.06215
R 0.02166 0.02414 0.04129 0.03369 0.008475
N 0.03964 0.05007 0.06353 0.04223 0.06215
D 0.06661 0.04359 0.05876 0.05631 0.06497
C 0.008991 0.02702 0.03917 0.01454 0.0452
Q 0.02738 0.04395 0.03917 0.02631 0.06497
E 0.05476 0.03098 0.04553 0.05931 0.04237
G 0.08051 0.107 0.09052 0.08701 0.05932
H 0.04536 0.01765 0.01747 0.02469 0.05085
I 0.03719 0.04323 0.04923 0.05516 0.1328
L 0.09031 0.06376 0.05823 0.07824 0.0113
K 0.1018 0.04143 0.05929 0.07408 0.06497
M 0.01962 0.005764 0.01323 0.021 0.05367
F 0.05027 0.03062 0.02753 0.03646 0.0113
P 0.03351 0.04575 0.03759 0.04339 0.03107
S 0.04986 0.1228 0.0667 0.07547 0.0678
T 0.04863 0.09114 0.06194 0.05493 0.03955
W 0.01349 0.01585 0.01588 0.01662 0.08192
Y 0.02575 0.03963 0.05717 0.03 0.00565
V 0.06825 0.08249 0.06511 0.08724 0.03955

Calculating Correlation, Similarity, and Distance

#Correlation
corr.alpha <- chou_cor(aa.prop[,5], aa.prop[,1])
corr.beta  <- chou_cor(aa.prop[,5], aa.prop[,2])
corr.apb   <- chou_cor(aa.prop[,5], aa.prop[,3])
corr.adb   <- chou_cor(aa.prop[,5], aa.prop[,4])
#Cosine Similarity
cos.alpha <- chou_cosine(aa.prop[,5], aa.prop[,1])
cos.beta  <- chou_cosine(aa.prop[,5], aa.prop[,2])
cos.apb   <- chou_cosine(aa.prop[,5], aa.prop[,3])
cos.adb   <- chou_cosine(aa.prop[,5], aa.prop[,4])
#Euclidian Distance
aa.prop.flipped <- t(aa.prop)
dist.alpha <- dist((aa.prop.flipped[c(1,5),]),  method = "euclidean")
dist.beta  <- dist((aa.prop.flipped[c(2,5),]),  method = "euclidean")
dist.apb   <- dist((aa.prop.flipped[c(3,5),]),  method = "euclidean")
dist.adb  <- dist((aa.prop.flipped[c(4,5),]), method = "euclidean")

Compile all the data together and display:

fold.type <- c("alpha","beta","alpha plus beta", "alpha/beta")
corr.sim <- round(c(corr.alpha,corr.beta,corr.apb,corr.adb),5)
cosine.sim <- round(c(cos.alpha,cos.beta,cos.apb,cos.adb),5)
Euclidean.dist <- round(c(dist.alpha,dist.beta,dist.apb,dist.adb),5)

sim.sum <- c("","","most.sim","")
dist.sum <- c("","","min.dist","")

df <- data.frame(fold.type, corr.sim, cosine.sim, Euclidean.dist, sim.sum, dist.sum)
pander(df)
fold.type corr.sim cosine.sim Euclidean.dist sim.sum dist.sum
alpha 0.7545 0.7545 0.1809
beta 0.7594 0.7594 0.1803
alpha plus beta 0.8092 0.8092 0.1555 most.sim min.dist
alpha/beta 0.8007 0.8007 0.16

PID Table

Calculate PID for Humans, Mice, Chimpanzees, and Bonobos:

pid_matrix <- matrix(nrow=4, ncol=4)
for(i in 1:4){
  for(j in 1:4){
    temp_align <- pairwiseAlignment(Cleaned_Sequences[i], Cleaned_Sequences[j])
    pid_matrix[i,j] <- pid(temp_align)
  }
}

Display the PID matrix:

pid_names<-c(Name_Sci[1:4])
colnames(pid_matrix) <- pid_names
rownames(pid_matrix) <- pid_names
pid_matrix
##                 Homo sapiens Mus musculus Pan troglodytes Pan paniscus
## Homo sapiens       100.00000     88.13559        94.40000     94.40000
## Mus musculus        88.13559    100.00000        88.13559     88.13559
## Pan troglodytes     94.40000     88.13559       100.00000    100.00000
## Pan paniscus        94.40000     88.13559       100.00000    100.00000

Calculating PID using different methods, demonstrated through comparing human and chimp PON2 proteins:

chimp.human.align <- pairwiseAlignment(Cleaned_Sequences[1,], Cleaned_Sequences[3,])

methods <- c("PID1","PID2","PID3","PID4")
chimpPID <- c(NA,NA,NA,NA)
for(i in 1:4){
  chimpPID[i]<-pid(chimp.human.align, type = methods[i])
}
denominator <- c("aligned positions + internal gap positions","aligned positions","length shorter sequence","average length of the two sequences")
pid_data1 <- data.frame(methods, chimpPID, denominator)
pander(pid_data1)
methods chimpPID denominator
PID1 94.4 aligned positions + internal gap positions
PID2 100 aligned positions
PID3 100 length shorter sequence
PID4 97.12 average length of the two sequences

Calculating PID using different methods, demonstrated through comparing human and Zebrafish PON2 proteins:

fish.human.align <- pairwiseAlignment(Cleaned_Sequences[1,], Cleaned_Sequences[10,])

fishPID <- c(NA,NA,NA,NA)
for(i in 1:4){
  fishPID[i]<-pid(fish.human.align, type = methods[i])
}

pid_data2 <- data.frame(methods, fishPID, denominator)
pander(pid_data2)
methods fishPID denominator
PID1 53.93 aligned positions + internal gap positions
PID2 54.55 aligned positions
PID3 54.24 length shorter sequence
PID4 54.16 average length of the two sequences

Multiple Sequence Alignment (MSA)

Build the MSA using all 10 PON2 protein sequences:

PON2_ss <- AAStringSet(Cleaned_Sequences)
names(PON2_ss) <- Name_Sci
PON2_msa <- msa(PON2_ss, method = "ClustalW")
## use default substitution matrix

Display the MSA:

class(PON2_msa) <- "AAMultipleAlignment"
PON2_align_seqinr <- msaConvert(PON2_msa, type = "seqinr::alignment")
print_msa(alignment = PON2_align_seqinr, 
          chunksize = 50)
## [1] "MAPPTELLARPERSSAPGSRAMGRLVAVGLLGIALA-LLGERLLALRNRL 0"
## [1] "MAPPTELLARPERSSAPGSRAMGRLVAVGLLGIALA-LLGERLLALRNRL 0"
## [1] "---------------------MGRLVAVGLLGIALA-LLGERLLALRNRL 0"
## [1] "MAPPTELLARPERGSARGSRAMGRLVAVGLLGIALA-LLGERLLALRNRL 0"
## [1] "---------------------MGRLLALSLLGIALA-LLGERLLALRNRL 0"
## [1] "---------------------MGRLLALSLLGIALA-LLGERLLALRNRL 0"
## [1] "---------------------MGRLLAVGLLGLALA-LLGERLLALRNRL 0"
## [1] "---------------------MGRMVALSLLGIGLA-LLGERFLALRSRL 0"
## [1] "---------------------MGKLLKVTLIGILLA-FIGERIVQFCHRA 0"
## [1] "---------------------MGTLAFLSLAVVAFAVLIGERLISLRHVA 0"
## [1] " "
## [1] "KASREVESVD-LPHCHLIKGIEAGSEDIDILPNGLAFFSVGLKFPGLHSF 0"
## [1] "KASREVESVD-LPHCHLIKGIEAGSEDIDILPNGLAFFSVGLKFPGLHSF 0"
## [1] "KASREVESVD-LPHCHLIKGIEAGSEDIDILPNGLAFFSVGLKFPGLHSF 0"
## [1] "KASREVESVD-LPHCHLIKGIEAGSEDIDILPNGLAFFSVGLKFPGLHSF 0"
## [1] "KASREVESVD-LPNCHLIKGIEAGAEDIDILPNGLAFFSVGLKCPGLHSF 0"
## [1] "KASREVESVD-LPNCHLIKGIEAGSEDIDILPSGLAFFSVGLKCPGLHSF 0"
## [1] "KASREVESVD-LPNCHLIKGIEAGADDIDILPNGLAFFSVGLKCPGLHSF 0"
## [1] "KASREVESVD-LPNCHLIKGIETGAEDIDILPNGLAFFSVGLKFPGLHSF 0"
## [1] "NAFRKVDPVDLLPNCQLLKGIEFGSEDIEILPNGLAFISSGLKYPGVMNF 0"
## [1] "LSYRELTQNY-LPNCNFIEGIDFGAEDITIL-DGLAFLSTGLKYPGVPSY 0"
## [1] " "
## [1] "APDKPGGILMMDLKEEKPRARELRISRGFDLASFNPHGISTFID-NDDTV 0"
## [1] "APDKPGGILMMDLKEEKPRARELRISRGFDLASFNPHGISTFID-NDDTV 0"
## [1] "APDKPGGILMMDLKEEKPRARELRISRGFDLASFNPHGISTFID-NDDTV 0"
## [1] "APDKPGGILMMDLKEEKPRARELRISRGFDLASFNPHGISTFID-NDDTV 0"
## [1] "APDKPGGILMMDLNEENPRALELRVSRGFNLASFNPHGISTFID-SDDTV 0"
## [1] "APDKPGGILMMDLKEENPRALELRISRGFNLASFNPHGISTFID-SDDTV 0"
## [1] "SPDKPGGILLMDLKKENPRALELRISRGFNLASFNPHGISTFID-SDDTV 0"
## [1] "APDKPGGILMMDLKDERPRALELRVSWGFDLASFNPHGISTFID-DDDTV 0"
## [1] "QPDKPGEIFLLDLNDEKLRPVPLRLSRGFDFSTFNPHGMSTYIDPKDDTV 0"
## [1] "SED-PGKIYTLNLLDSEQKIKVLHIRGDFDKDSFNPHGISVYTDDKDGAI 0"
## [1] " "
## [1] "YLFVVNHPEFKNTVEIFKFEEAENSLLHLKTVKHELLPSVNDITAVGPAH 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEAENSLLHLKTVKHELLPSVNDITAVGPAH 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEAENSLLHLKTVKHELLPSVNDITAVGPAH 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEAENSLLHLKTVKHELLPSVNDITAVGPAH 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEEENSLLHLKTIKHELLPSVNDIIAVGPEH 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEEENSLLHLKTIKHELLP------------ 0"
## [1] "YLFVVNHPEFKNTVEIFKFEEEENSLLHLKTIKHELLPSVNDIIAVGPAH 0"
## [1] "YLFVVNHPQFKSTVEIFKFQEEENSLLHLKTIKHELLPSVNDIIAVGPTH 0"
## [1] "YLFVVNHPLYKTTIELFKFEEEENVLLHLKTIKHDLMWSANDIVAVGPES 0"
## [1] "YLFVVNHPQGKSQVEIFRFLENENALEYLKTIRHELLHNVNDIVAVGTES 0"
## [1] " "
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPNEVKVVAEGFDSANGIN 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPNEVKVVAEGFDSANGIN 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPNEVKVVAEGFDSANGIN 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPNEVKVVAEGFDSANGIN 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWTNVVYYSPNEVKVVAEGFDSANGIN 0"
## [1] "-------------------------------------------------- 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPDEVKVVAEGFDAANGIN 0"
## [1] "FYATNDHYFSDPFLKYLETYLNLHWANVVYYSPEEVKLVAEGFDSANGIN 0"
## [1] "FYTTNDLYFTDFTMRQLEIFLGIAWSNVIYYSPTEVKQVSSGYYYANGIA 0"
## [1] "FYATNDHYFTNDILKIVEPFLSLPWCDVVYYSPETVQVVAGGFLSANGIN 0"
## [1] " "
## [1] "ISPDDKYIYVADILAHEIHVLEKHTNMNLTQLKVLELDTLVDNLSIDPSS 0"
## [1] "ISPDDKYIYVADILAHEIHVLEKHTNMNLTQLKVLELDTLVDNLSIDPSS 0"
## [1] "ISPDDKYIYVADILAHEIHVLEKHTNMNLTQLKVLELDTLVDNLSIDPSS 0"
## [1] "ISPDDKYIYVADILAHEIHVLEKHTNMNLTQLKVLELDTLVDNLSIDPSS 0"
## [1] "ISPDKKYIYVADILAHEIHVLEKHPNMNLTQLKVLKLDTLVDNLSIDPSS 0"
## [1] "-----RYIYVADILAHEIHVLEKQPNMNLTQLKVLELDTLVDNISIDPSS 0"
## [1] "ISPDKKYIYVADILAHEIHVLEKHPNMNLTQLKVLKLDTLVDNLSIDPSS 0"
## [1] "ISPDKKYVYVADILAHEIHVLEKQPNMNLTQLKVLQLGTLVDNLSIDPSS 0"
## [1] "MSTDNKYIYVADIMGHTIDILEKQADWSLTPVKVLKLDTLLDNLFVDPNT 0"
## [1] "ISPDKRHLYVSHILKHTIAVLEIQKNTVLSHVKEIDVGSLCDNIEVDRET 0"
## [1] " "
## [1] "GDIWVGCHPNGQKLFVYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDIWVGCHPNGQKLFVYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDIWVGCHPNGQKLFVYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDIWVGCHPNGQKLFVYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDVLVGCHPNGQKLFVYDPKNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDILVGCHPNGQKLFVYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSI 0"
## [1] "GDILVGCHPNGQKLFIYDPNNPPSSEVLRIQNILSEKPTVTTVYANNGSV 0"
## [1] "GDIWVGCHPNGQRLFVYHPNHPPASEVLRIQNILSEKPSVTTVYINNGSV 0"
## [1] "GDIWTGAHPNGWKLFSYNSDDLPGSEVIRVQNIHSDNPIVTQVYVNNGSV 0"
## [1] "GDLWIGCHPNGLKCVFHDPNDPPGSEVIRIENILSEKPQVTQVYSDDGSV 0"
## [1] " "
## [1] "LQGSSVASVYDGKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASVYDGKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASVYDGKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASVYDGKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASVYDKKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASLYDRKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVASVYDRKLLIGTLYHRALYCEL- 21"
## [1] "LQGSSVATIYDRKLLVGTLYQKALYCEL- 21"
## [1] "IQASSSAAVYEGKLLIGTVFHKALCCELS 21"
## [1] "IIASSVAAPYREKLLIGTVYQKALICDLK 21"
## [1] " "
#ggmsa(PON2_msa,start=160,end=260)

Distance Matrix

Building a Distance Matrix with all sequences for use in constructing a phylogenetic tree:

PON2_dist <- dist.alignment(PON2_align_seqinr, matrix = "identity")
PON2_dist_round <- round(PON2_dist, 3)
PON2_dist_round
##                        Pan troglodytes Pan paniscus Homo sapiens
## Pan paniscus                     0.000                          
## Homo sapiens                     0.000        0.000             
## Gorilla gorilla                  0.073        0.073        0.000
## Bos taurus                       0.260        0.260        0.260
## Sus scrofa                       0.264        0.264        0.264
## Canis lupis familiaris           0.260        0.260        0.260
## Mus musculus                     0.344        0.344        0.344
## Xenopus laevis                   0.618        0.618        0.618
## Danio rerio                      0.674        0.674        0.674
##                        Gorilla gorilla Bos taurus Sus scrofa
## Pan paniscus                                                
## Homo sapiens                                                
## Gorilla gorilla                                             
## Bos taurus                       0.260                      
## Sus scrofa                       0.264      0.213           
## Canis lupis familiaris           0.260      0.219      0.236
## Mus musculus                     0.344      0.332      0.354
## Xenopus laevis                   0.618      0.618      0.619
## Danio rerio                      0.674      0.668      0.683
##                        Canis lupis familiaris Mus musculus Xenopus laevis
## Pan paniscus                                                             
## Homo sapiens                                                             
## Gorilla gorilla                                                          
## Bos taurus                                                               
## Sus scrofa                                                               
## Canis lupis familiaris                                                   
## Mus musculus                            0.349                            
## Xenopus laevis                          0.620        0.633               
## Danio rerio                             0.672        0.661          0.701

Phylogenetic Tree

Making a rooted phylogenetic tree using all PON2 sequences:

PON2_tree <- nj(PON2_dist_round)

plot.phylo(PON2_tree, use.edge.length = F, main = "PON2 Protein Family Tree")