Transient receptor potential cation channel subfamily M (melastatin) member 8 (TRPM8) is a protein that is coded for by the TRPM8 gene. The gene codes for cold-sensing TRP channel that is activated by chemical ligands such as menthol and icilin; it is the primary molecular transducer for cold somatosensation in humans.
Some important resources used to compile this information: RefSeq Page: https://www.ncbi.nlm.nih.gov/nuccore/NM_001397607.1 HomoloGene Page: https://www.ncbi.nlm.nih.gov/gene/79054 UniProt Page: https://www.uniprot.org/uniprot/Q7Z2W7 PDB Page: https://www.rcsb.org/structure/6BPQ
Other resources consulted include: Neanderthal Genome: http://neandertal.ensemblgenomes.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000144481
# Github packages
library(compbio4all)
library(ggmsa)
## Registered S3 methods overwritten by 'ggalt':
## method from
## grid.draw.absoluteGrob ggplot2
## grobHeight.absoluteGrob ggplot2
## grobWidth.absoluteGrob ggplot2
## grobX.absoluteGrob ggplot2
## grobY.absoluteGrob ggplot2
# CRAN packages
library(rentrez)
library(seqinr)
library(ape)
##
## Attaching package: 'ape'
## The following objects are masked from 'package:seqinr':
##
## as.alignment, consensus
library(pander)
library(ggplot2)
# Bioconductor packages
library(BiocManager)
## Bioconductor version '3.13' is out-of-date; the current release version '3.14'
## is available with R version '4.1'; see https://bioconductor.org/install
library(drawProteins) # not working
# library(msa) # does not work, will use manual functions
library(Biostrings)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: XVector
## Loading required package: GenomeInfoDb
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:ape':
##
## complement
## The following object is masked from 'package:seqinr':
##
## translate
## The following object is masked from 'package:base':
##
## strsplit
library(HGNChelper)
data(BLOSUM50)
Accession numbers were obtained from RefSeq, RefSeq Homlogene, UniProt and PDB. UniProt accession numbers can be found by searching for the gene name. PDB accessions can be found by searching with a UniProt accession or a gene name; though many proteins are not in PDB, TRPM8 is. The Neanderthal genome database was searched as well.
A protein BLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome) was carried out excluding vertebrates to determine if it occurred outside of vertebrates. The gene does not appear in non-vertebrates and so a second search was conducted to exclude mammals.
Does not occur outside of vertebrates.
ncbi.protein.accession <- c("NP_076985","NP_599013.1", "NP_599198.2",
"NP_001007083.1", "XP_005049003.1", "NP_001104239.1",
"NP_001166561.1", "XP_028687091.1", "NP_001192995.1",
"XP_024210920.1")
UniProt.id <- c("Q7Z2W7","Q8R4D5", "Q8R455", "NA", "NA", "NA", "NA", "NA", "NA",
"NA")
PDB <- c("6BPQ","NA","NA","NA", "NA", "NA", "NA", "NA", "NA", "NA")
species <- c("Homo sapiens","Mus musculus", "Rattus norvegicus", "Gallus gallus",
"Ficedula albicollis", "Canis lupus familiaris", "Cavia porcellus",
"Macaca mulatta", "Bos taurus", "Pan troglodytes")
species <- c("Human", "Mouse", "Rat", "Chicken", "Flycatcher", "Dog", "Guinea
pig", "Monkey", "Cattle", "Chimpanzee")
gene.name <- c("TRPM8", "Trpm8","Trpm8", "TRPM8", "TRPM8", "TRPM8", "Trpm8",
"TRPM8", "TRPM8", "TRPM8")
# Converting the vectors into a combined dataframe
trpm8.df <- data.frame(ncbi.protein.accession = ncbi.protein.accession, UniProt.id = UniProt.id, PDB = PDB, species = species, species = species, gene.name = gene.name)
# Display the table
pander::pander(trpm8.df)
| ncbi.protein.accession | UniProt.id | PDB | species |
|---|---|---|---|
| NP_076985 | Q7Z2W7 | 6BPQ | Human |
| NP_599013.1 | Q8R4D5 | NA | Mouse |
| NP_599198.2 | Q8R455 | NA | Rat |
| NP_001007083.1 | NA | NA | Chicken |
| XP_005049003.1 | NA | NA | Flycatcher |
| NP_001104239.1 | NA | NA | Dog |
| NP_001166561.1 | NA | NA | Guinea pig |
| XP_028687091.1 | NA | NA | Monkey |
| NP_001192995.1 | NA | NA | Cattle |
| XP_024210920.1 | NA | NA | Chimpanzee |
| species.1 | gene.name |
|---|---|
| Human | TRPM8 |
| Mouse | Trpm8 |
| Rat | Trpm8 |
| Chicken | TRPM8 |
| Flycatcher | TRPM8 |
| Dog | TRPM8 |
| Guinea pig | Trpm8 |
| Monkey | TRPM8 |
| Cattle | TRPM8 |
| Chimpanzee | TRPM8 |
All sequences were downloaded using a wrapper compbio4all::entrez_fetch_list() which uses rentrez::entrez_fetch() to access NCBI databases.
## [1] 10
## [1] ">NP_076985.4 transient receptor potential cation channel subfamily M member 8 isoform 1 [Homo sapiens]\nMSFRAARLSMRNRRNDTLDSTRTLYSSASRSTDLSYSESDLVNFIQANFKKRECVFFTKDSKATENVCKC\nGYAQSQHMEGTQINQSEKWNYKKHTKEFPTDAFGDIQFETLGKKGKYIRLSCDTDAEILYELLTQHWHLK\nTPNLVISVTGGAKNFALKPRMRKIFSRLIYIAQSKGAWILTGGTHYGLMKYIGEVVRDNTISRSSEENIV\nAIGIAAWGMVSNRDTLIRNCDAEGYFLAQYLMDDFTRDPLYILDNNHTHLLLVDNGCHGHPTVEAKLRNQ\nLEKYISERTIQDSNYGGKIPIVCFAQGGGKETLKAINTSIKNKIPCVVVEGSGQIADVIASLVEVEDALT\nSSAVKEKLVRFLPRTVSRLPEEETESWIKWLKEILECSHLLTVIKMEEAGDEIVSNAISYALYKAFSTSE\nQDKDNWNGQLKLLLEWNQLDLANDEIFTNDRRWESADLQEVMFTALIKDRPKFVRLFLENGLNLRKFLTH\nDVLTELFSNHFSTLVYRNLQIAKNSYNDALLTFVWKLVANFRRGFRKEDRNGRDEMDIELHDVSPITRHP\nLQALFIWAILQNKKELSKVIWEQTRGCTLAALGASKLLKTLAKVKNDINAAGESEELANEYETRAVELFT\nECYSSDEDLAEQLLVYSCEAWGGSNCLELAVEATDQHFIAQPGVQNFLSKQWYGEISRDTKNWKIILCLF\nIIPLVGCGFVSFRKKPVDKHKKLLWYYVAFFTSPFVVFSWNVVFYIAFLLLFAYVLLMDFHSVPHPPELV\nLYSLVFVLFCDEVRQWYVNGVNYFTDLWNVMDTLGLFYFIAGIVFRLHSSNKSSLYSGRVIFCLDYIIFT\nLRLIHIFTVSRNLGPKIIMLQRMLIDVFFFLFLFAVWMVAFGVARQGILRQNEQRWRWIFRSVIYEPYLA\nMFGQVPSDVDGTTYDFAHCTFTGNESKPLCVELDEHNLPRFPEWITIPLVCIYMLSTNILLVNLLVAMFG\nYTVGTVQENNDQVWKFQRYFLVQEYCSRLNIPFPFIVFAYFYMVVKKCFKCCCKEKNMESSVCCFKNEDN\nETLAWEGVMKENYLVKINTKANDTSEEMRHRFRQLDTKLNDLKGLLKEIANKIK\n\n"
Remove FASTA header.
First, we use a UniProt accession to download data from UniProt. This produces a list.
## [1] "Download has worked"
## [1] "list" "vector" "list_OR_List" "vector_OR_Vector"
## [5] "vector_OR_factor"
Then the raw data from the webpage is converted to a dataframe.
## [1] "data.frame" "list" "oldClass" "vector"
## [5] "list_OR_List" "vector_OR_Vector" "vector_OR_factor"
The information available on a protein on UniProt varies a lot depending on how much its been studied. drawProteins can extract information about the following things:
domains chains regions motifs phosphorylated sites repeats and others
If available, it can plot the information. You can get a sense for what’s available by looking at the dataframe produced by drawProteins::feature_to_dataframe()
## type begin end length accession entryName taxid order
## featuresTemp CHAIN 1 1104 1103 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.1 TOPO_DOM 1 691 690 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.2 TRANSMEM 692 712 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.3 TOPO_DOM 713 734 21 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.4 TRANSMEM 735 755 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.5 TOPO_DOM 756 759 3 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.6 TRANSMEM 760 780 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.7 TOPO_DOM 781 794 13 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.8 TRANSMEM 795 815 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.9 TOPO_DOM 816 829 13 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.10 TRANSMEM 830 850 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.11 TOPO_DOM 851 958 107 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.12 TRANSMEM 959 979 20 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.13 TOPO_DOM 980 1104 124 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.14 REGION 187 195 8 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.15 COILED 1071 1104 33 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.16 CARBOHYD 934 934 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.17 VAR_SEQ 1 188 187 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.18 VAR_SEQ 1 77 76 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.19 VAR_SEQ 1 2 1 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.20 VAR_SEQ 3 314 311 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.21 VAR_SEQ 234 242 8 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.22 VAR_SEQ 243 1104 861 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.23 VAR_SEQ 675 784 109 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.24 VARIANT 247 247 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.25 VARIANT 251 251 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.26 VARIANT 419 419 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.27 VARIANT 462 462 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.28 VARIANT 732 732 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.29 VARIANT 821 821 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.30 MUTAGEN 821 821 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.31 MUTAGEN 934 934 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.32 MUTAGEN 946 946 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.33 MUTAGEN 1089 1089 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.34 CONFLICT 58 58 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.35 CONFLICT 693 693 0 Q7Z2W7 TRPM8_HUMAN 9606 1
## featuresTemp.36 CONFLICT 795 795 0 Q7Z2W7 TRPM8_HUMAN 9606 1
Taking only the human sequence of TRPM8.
## chr [1:1104] "M" "S" "F" "R" "A" "A" "R" "L" "S" "M" "R" "N" "R" "R" "N" ...
Below are links to relevant information. This particular protein is not in Pfam, DisProt, or RepeatDB. In UniProt, the sub-cellular location is listed as: Endoplasmic reticulum membrane. In PDB, the secondary structure is shown as containing alpha helices and beta sheets.
Multivariate statistcal techniques were used to confirm the information about protein structure and location in the line database.
Uniprot indicates that the protein is a membrane-bound protein in the ER.
Alphafold indicates that there are a mix of alpha helices and beta sheets. I therefore predict that machine-learning methods will indicate an a+b and a/b structure.
## aa.1.1 alpha beta a.plus.b a.div.b
## 1 A 285 203 175 361
## 2 R 53 67 78 146
## 3 N 97 139 120 183
## 4 D 163 121 111 244
## 5 C 22 75 74 63
## 6 Q 67 122 74 114
## 7 E 134 86 86 257
## 8 G 197 297 171 377
## 9 H 111 49 33 107
## 10 I 91 120 93 239
## 11 L 221 177 110 339
## 12 K 249 115 112 321
## 13 M 48 16 25 91
## 14 F 123 85 52 158
## 15 P 82 127 71 188
## 16 S 122 341 126 327
## 17 T 119 253 117 238
## 18 W 33 44 30 72
## 19 Y 63 110 108 130
## 20 V 167 229 123 378
## alpha.prop beta.prop a.plus.b.prop a.div.b
## A 0.116469146 0.073126801 0.09264161 0.08331410
## R 0.021659174 0.024135447 0.04129169 0.03369490
## N 0.039640376 0.050072046 0.06352567 0.04223402
## D 0.066612178 0.043587896 0.05876125 0.05631202
## C 0.008990601 0.027017291 0.03917417 0.01453958
## Q 0.027380466 0.043948127 0.03917417 0.02630972
## E 0.054760932 0.030979827 0.04552673 0.05931225
## G 0.080506743 0.106988473 0.09052409 0.08700669
## H 0.045361667 0.017651297 0.01746956 0.02469421
## I 0.037188394 0.043227666 0.04923240 0.05515809
## L 0.090314671 0.063760807 0.05823187 0.07823679
## K 0.101757254 0.041426513 0.05929063 0.07408262
## M 0.019615856 0.005763689 0.01323452 0.02100162
## F 0.050265631 0.030619597 0.02752779 0.03646434
## P 0.033510421 0.045749280 0.03758602 0.04338795
## S 0.049856968 0.122838617 0.06670196 0.07546734
## T 0.048630977 0.091138329 0.06193753 0.05492730
## W 0.013485901 0.015850144 0.01588142 0.01661666
## Y 0.025745811 0.039625360 0.05717311 0.03000231
## V 0.068246833 0.082492795 0.06511382 0.08723748
## A C D E F G H
## 0.05706522 0.02445652 0.04981884 0.06702899 0.06340580 0.04710145 0.02083333
## I K L M N P Q
## 0.06250000 0.06521739 0.11050725 0.01992754 0.05706522 0.02536232 0.03351449
## R S T V W Y
## 0.04981884 0.06159420 0.05434783 0.06974638 0.02264493 0.03804348
## character(0)
## named numeric(0)
| Â | alpha.prop | beta.prop | a.plus.b.prop | a.div.b | TRPM8.human.aa.freq |
|---|---|---|---|---|---|
| A | 0.1165 | 0.07313 | 0.09264 | 0.08331 | 0.05707 |
| R | 0.02166 | 0.02414 | 0.04129 | 0.03369 | 0.02446 |
| N | 0.03964 | 0.05007 | 0.06353 | 0.04223 | 0.04982 |
| D | 0.06661 | 0.04359 | 0.05876 | 0.05631 | 0.06703 |
| C | 0.008991 | 0.02702 | 0.03917 | 0.01454 | 0.06341 |
| Q | 0.02738 | 0.04395 | 0.03917 | 0.02631 | 0.0471 |
| E | 0.05476 | 0.03098 | 0.04553 | 0.05931 | 0.02083 |
| G | 0.08051 | 0.107 | 0.09052 | 0.08701 | 0.0625 |
| H | 0.04536 | 0.01765 | 0.01747 | 0.02469 | 0.06522 |
| I | 0.03719 | 0.04323 | 0.04923 | 0.05516 | 0.1105 |
| L | 0.09031 | 0.06376 | 0.05823 | 0.07824 | 0.01993 |
| K | 0.1018 | 0.04143 | 0.05929 | 0.07408 | 0.05707 |
| M | 0.01962 | 0.005764 | 0.01323 | 0.021 | 0.02536 |
| F | 0.05027 | 0.03062 | 0.02753 | 0.03646 | 0.03351 |
| P | 0.03351 | 0.04575 | 0.03759 | 0.04339 | 0.04982 |
| S | 0.04986 | 0.1228 | 0.0667 | 0.07547 | 0.06159 |
| T | 0.04863 | 0.09114 | 0.06194 | 0.05493 | 0.05435 |
| W | 0.01349 | 0.01585 | 0.01588 | 0.01662 | 0.06975 |
| Y | 0.02575 | 0.03963 | 0.05717 | 0.03 | 0.02264 |
| V | 0.06825 | 0.08249 | 0.06511 | 0.08724 | 0.03804 |
Two custom functions are needed: one to calculate correlates between two columns of our table, and one to calculate correlation similarities.
## A R N D C Q E G H I L K
## alpha.prop 0.12 0.02 0.04 0.07 0.01 0.03 0.05 0.08 0.05 0.04 0.09 0.10
## beta.prop 0.07 0.02 0.05 0.04 0.03 0.04 0.03 0.11 0.02 0.04 0.06 0.04
## a.plus.b.prop 0.09 0.04 0.06 0.06 0.04 0.04 0.05 0.09 0.02 0.05 0.06 0.06
## a.div.b 0.08 0.03 0.04 0.06 0.01 0.03 0.06 0.09 0.02 0.06 0.08 0.07
## TRPM8.human.aa.freq 0.06 0.02 0.05 0.07 0.06 0.05 0.02 0.06 0.07 0.11 0.02 0.06
## M F P S T W Y V
## alpha.prop 0.02 0.05 0.03 0.05 0.05 0.01 0.03 0.07
## beta.prop 0.01 0.03 0.05 0.12 0.09 0.02 0.04 0.08
## a.plus.b.prop 0.01 0.03 0.04 0.07 0.06 0.02 0.06 0.07
## a.div.b 0.02 0.04 0.04 0.08 0.05 0.02 0.03 0.09
## TRPM8.human.aa.freq 0.03 0.03 0.05 0.06 0.05 0.07 0.02 0.04
## alpha.prop beta.prop a.plus.b.prop a.div.b
## beta.prop 0.13342098
## a.plus.b.prop 0.09281824 0.08289406
## a.div.b 0.06699039 0.08659174 0.06175113
## TRPM8.human.aa.freq 0.16132094 0.15447076 0.12884017 0.14072023
| fold.type | corr.sim | cosine.sim | Euclidean.dist | sim.sum | dist.sum |
|---|---|---|---|---|---|
| alpha | 0.7949 | 0.7949 | 0.1613 | ||
| beta | 0.8154 | 0.8154 | 0.1545 | ||
| alpha plus beta | 0.8599 | 0.8599 | 0.1288 | most.sim | min.dist |
| alpha/beta | 0.8362 | 0.8362 | 0.1407 |
Convert all FASTA records intro entries in a single vector. FASTA entries are contained in a list produced at the beginning of the script. They were cleaned to remove the header and newline characters.
## [1] "NP_076985" "NP_599013.1" "NP_599198.2" "NP_001007083.1"
## [5] "XP_005049003.1" "NP_001104239.1" "NP_001166561.1" "XP_028687091.1"
## [9] "NP_001192995.1" "XP_024210920.1"
## [1] 10
## $NP_076985
## [1] "MSFRAARLSMRNRRNDTLDSTRTLYSSASRSTDLSYSESDLVNFIQANFKKRECVFFTKDSKATENVCKCGYAQSQHMEGTQINQSEKWNYKKHTKEFPTDAFGDIQFETLGKKGKYIRLSCDTDAEILYELLTQHWHLKTPNLVISVTGGAKNFALKPRMRKIFSRLIYIAQSKGAWILTGGTHYGLMKYIGEVVRDNTISRSSEENIVAIGIAAWGMVSNRDTLIRNCDAEGYFLAQYLMDDFTRDPLYILDNNHTHLLLVDNGCHGHPTVEAKLRNQLEKYISERTIQDSNYGGKIPIVCFAQGGGKETLKAINTSIKNKIPCVVVEGSGQIADVIASLVEVEDALTSSAVKEKLVRFLPRTVSRLPEEETESWIKWLKEILECSHLLTVIKMEEAGDEIVSNAISYALYKAFSTSEQDKDNWNGQLKLLLEWNQLDLANDEIFTNDRRWESADLQEVMFTALIKDRPKFVRLFLENGLNLRKFLTHDVLTELFSNHFSTLVYRNLQIAKNSYNDALLTFVWKLVANFRRGFRKEDRNGRDEMDIELHDVSPITRHPLQALFIWAILQNKKELSKVIWEQTRGCTLAALGASKLLKTLAKVKNDINAAGESEELANEYETRAVELFTECYSSDEDLAEQLLVYSCEAWGGSNCLELAVEATDQHFIAQPGVQNFLSKQWYGEISRDTKNWKIILCLFIIPLVGCGFVSFRKKPVDKHKKLLWYYVAFFTSPFVVFSWNVVFYIAFLLLFAYVLLMDFHSVPHPPELVLYSLVFVLFCDEVRQWYVNGVNYFTDLWNVMDTLGLFYFIAGIVFRLHSSNKSSLYSGRVIFCLDYIIFTLRLIHIFTVSRNLGPKIIMLQRMLIDVFFFLFLFAVWMVAFGVARQGILRQNEQRWRWIFRSVIYEPYLAMFGQVPSDVDGTTYDFAHCTFTGNESKPLCVELDEHNLPRFPEWITIPLVCIYMLSTNILLVNLLVAMFGYTVGTVQENNDQVWKFQRYFLVQEYCSRLNIPFPFIVFAYFYMVVKKCFKCCCKEKNMESSVCCFKNEDNETLAWEGVMKENYLVKINTKANDTSEEMRHRFRQLDTKLNDLKGLLKEIANKIK"
## [1] 93.75
## [1] 93.75
## [1] 80.70652
## [1] 98.55072
## [1] 79.98188
## [1] 79.71014
| Â | Human | Mouse | Rat | Chicken |
|---|---|---|---|---|
| Human | 1 | NA | NA | NA |
| Mouse | 93.75 | 1 | NA | NA |
| Rat | 93.75 | 98.55 | 1 | NA |
| Chicken | 80.71 | 79.98 | 79.71 | 1 |
| method | PID | Denominator |
|---|---|---|
| PID1 | 80.7065217391304 | (aligned positions + internal gap positions) |
| PID2 | 81.3698630136986 | (aligned positions) |
| PID3 | 81.3698630136986 | (length shorter sequence) |
| PID4 | 81.0368349249659 | (average length of the two sequences) |
I am skipping the ggpubr step due to problems with MSA. Dr. Brouwer said this was okay!
## NP_599198.2 NP_001166561.1 XP_028687091.1 XP_024210920.1
## NP_001166561.1 0.2388833
## XP_028687091.1 0.2623749 0.2658048
## XP_024210920.1 0.2675032 0.2708682 0.2658048
## NP_076985 0.2500000 0.2500000 0.2481818 0.1563858
## NP_001104239.1 0.2292078 0.2252213 0.2481818 0.2211629
## NP_001192995.1 0.2725351 0.2691910 0.2855201 0.2571443
## NP_001007083.1 0.4400083 0.4431106 0.4420789 0.4410448
## XP_005049003.1 0.4189305 0.4211068 0.4200200 0.4178381
## NP_076985 NP_001104239.1 NP_001192995.1 NP_001007083.1
## NP_001166561.1
## XP_028687091.1
## XP_024210920.1
## NP_076985
## NP_001104239.1 0.2272233
## NP_001192995.1 0.2606430 0.2388833
## NP_001007083.1 0.4400083 0.4316264 0.4368839
## XP_005049003.1 0.4200200 0.4112228 0.4167428 0.4211068