COMANDOS (SUPER) BÁSICOS NO R & MAFTOOLS

O meu primeiro plot

Vamos iniciar sinalizando alguns vetores e criando gráficos básicos em R.

#vetores
height <- c(145, 167, 176, 123, 150)
weight <- c(51, 63, 64, 40, 55)

#plotando vetores
plot(height,weight)

#verificando classe e estrutura
class(height)
## [1] "numeric"
str(height)
##  num [1:5] 145 167 176 123 150

Obtendo informação de conjunto de dados pré-registrados.

Vamos a usar o dataset iris que apresenta informações de 3 classes de flores Iris (Iris setosa, Iris versicolour, Iris virginica)

#Dataset
flower<-iris
class(flower)
## [1] "data.frame"
str(flower)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#Resumos numericos do dataframe
summary(flower)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
#Plotando um boxplot
boxplot(flower[,1:2], main='Lenght vs Width')

#Salvando 
#Com getwd podemos saber onde serão salvos os arquivos
#getwd()
#Com setwd podemos alterar a pasta onde salvaremos os arquivos.
#setwd(inserir caminho à pasta aqui)
write.table(flower,file = "flores.txt", sep = "\t",row.names = FALSE)

Usando Maftools

Maftools é um pacote que inclui ferramentas para analisar arquivos MAF (Mutation Annotation Format), um formato de arquivo amplamente usado em análises genômicos em câncer. Para mais informação, podem acessar aqui.

Para esta parte, usaremos um arquivo MAF pré-definido que pode ser baixado desde este endereço.

Uma vez baixado, devemos colocar o arquivo na pasta onde são rodados os códigos. Para saber qual é a pasta, basta digitar getwd()

#Lendo arquivo no formato MAF

pre_maf_ov<- read.delim("MAFOV70MAISCOMPLETO.txt", sep="\t")

#explorando classe e estrutura
class(pre_maf_ov)
## [1] "data.frame"
str(pre_maf_ov)
## 'data.frame':    10370 obs. of  16 variables:
##  $ Hugo_Symbol           : chr  "TCF7L2" "PCDH15" "ANKK1" "TECTA" ...
##  $ Entrez_Gene_Id        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Center                : chr  "." "." "." "." ...
##  $ NCBI_Build            : chr  "GRCh37" "GRCh37" "GRCh37" "GRCh37" ...
##  $ Chromosome            : chr  "10" "10" "11" "11" ...
##  $ Start_Position        : int  114903690 55700670 113267956 120996396 2156723 71193951 122277623 40742213 57559950 78400349 ...
##  $ End_Position          : int  114903690 55700670 113267956 120996396 2156723 71193951 122277624 40742213 57559950 78400349 ...
##  $ Strand                : chr  "+" "+" "+" "+" ...
##  $ Variant_Classification: chr  "Missense_Mutation" "Missense_Mutation" "Silent" "Missense_Mutation" ...
##  $ Variant_Type          : chr  "SNP" "SNP" "SNP" "SNP" ...
##  $ Reference_Allele      : chr  "C" "A" "C" "C" ...
##  $ Tumor_Seq_Allele1     : chr  "C" "A" "C" "C" ...
##  $ Tumor_Seq_Allele2     : chr  "T" "T" "A" "G" ...
##  $ Tumor_Sample_Barcode  : chr  "TCGA-04-1341-01" "TCGA-04-1341-01" "TCGA-04-1341-01" "TCGA-04-1341-01" ...
##  $ Genome_Change         : chr  "ENST00000355995.4:c.694C>T" "ENST00000320301.6:c.3188T>A" "ENST00000303941.3:c.849C>A" "ENST00000264037.2:c.1589C>G" ...
##  $ Protein_Change        : chr  "p.R232W" "p.I1063N" "p.I283=" "p.T530R" ...
#Instalando o pacote Maftools

#install.packages("maftools") via install.packages ou,

#BiocManager::install("maftools") via BiocManager

#Carregando o pacote Maftools
library(maftools)

#transformando o dataframe no formato MAF em um objeto MAF
maf_ov<-read.maf(pre_maf_ov)
## -Validating
## --Removed 13 duplicated variants
## --Non MAF specific values in Variant_Classification column:
##   Start_Codon_Del
## -Silent variants: 3036 
## -Summarizing
## --Mutiple reference builds found
## GRCh37;37--Mutiple centers found
## .;broad.mit.edu;genome.wustl.edu;hgsc.bcm.edu--Possible FLAGS among top ten genes:
##   TTN
##   HMCN1
##   FLG
##   DST
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.916s elapsed (0.977s cpu)
class(maf_ov)
## [1] "MAF"
## attr(,"package")
## [1] "maftools"
#VERIFICANDO COMANDO UTEIS

#Resumo dos 100 genes mais frequentemente alterados na coorte.
oncoplot(maf_ov,top = 100)

#Resumo relacao genes/tipos de variantes
head(getGeneSummary(maf_ov))
##    Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
## 1:        TP53               8               1            1            0
## 2:         TTN               1               0            0            0
## 3:       HMCN1               0               0            0            0
## 4:        RYR2               0               0            0            0
## 5:       CSMD3               0               0            0            0
## 6:         RB1               3               0            0            0
##    Missense_Mutation Nonsense_Mutation Nonstop_Mutation Splice_Site total
## 1:                70                10                0          15   105
## 2:                28                 3                0           1    33
## 3:                10                 0                0           0    10
## 4:                10                 0                0           0    10
## 5:                 8                 0                0           1     9
## 6:                 4                 1                0           1     9
##    MutatedSamples AlteredSamples
## 1:            103            103
## 2:             21             21
## 3:             10             10
## 4:              9              9
## 5:              9              9
## 6:              9              9
#Resumo relacao amostras/tipos de variantes
head(getSampleSummary(maf_ov))
##    Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
## 1:      TCGA-59-2349-01              49              11            8
## 2:      TCGA-13-0888-01               2               3            1
## 3:      TCGA-10-0930-01               2               1            4
## 4:      TCGA-13-0885-01               1               0            1
## 5:      TCGA-61-1740-01               3               1            5
## 6:      TCGA-29-1761-01               8               0            1
##    In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation
## 1:            0               668                49                0
## 2:            0               145                11                1
## 3:            0               103                 7                2
## 4:            0               117                 3                0
## 5:            0               105                 5                0
## 6:            0               102                 3                0
##    Splice_Site total
## 1:           6   791
## 2:           3   166
## 3:          10   129
## 4:           5   127
## 5:           8   127
## 6:           9   123
#Assinalando os resumos em variaveis
resumo_genes<-getGeneSummary(maf_ov)
resumo_amostras<-getSampleSummary(maf_ov)
#Explorando posicao das variantes nos dominios de proteinas
lollipopPlot(maf_ov,AACol = "Protein_Change",gene = "TP53",showDomainLabel = FALSE)
## 8 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
##    HGNC    refseq.ID   protein.ID aa.length
## 1: TP53    NM_000546    NP_000537       393
## 2: TP53 NM_001126112 NP_001119584       393
## 3: TP53 NM_001126118 NP_001119590       354
## 4: TP53 NM_001126115 NP_001119587       261
## 5: TP53 NM_001126113 NP_001119585       346
## 6: TP53 NM_001126117 NP_001119589       214
## 7: TP53 NM_001126114 NP_001119586       341
## 8: TP53 NM_001126116 NP_001119588       209
## Using longer transcript NM_000546 for now.
## Removed 11 mutations for which AA position was not available

#Matriz de relacao genes/amostras
head(mutCountMatrix(maf_ov))
##       TCGA-13-0885-01 TCGA-29-1761-01 TCGA-13-0755-01 TCGA-61-1733-01
## TP53                1               1               1               1
## TTN                 3               2               2               2
## HMCN1               1               0               0               0
## RYR2                2               1               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               1               1
##       TCGA-24-1469-01 TCGA-09-1674-01 TCGA-25-2399-01 TCGA-61-1740-01
## TP53                1               1               1               1
## TTN                 1               2               1               2
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1331-01 TCGA-25-2392-01 TCGA-23-1032-01 TCGA-24-2033-01
## TP53                1               1               1               1
## TTN                 1               1               1               1
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1365-01 TCGA-04-1338-01 TCGA-04-1343-01 TCGA-13-0757-01
## TP53                1               1               1               1
## TTN                 1               1               1               2
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-25-1329-01 TCGA-04-1332-01 TCGA-61-1727-01 TCGA-13-0888-01
## TP53                1               1               1               1
## TTN                 1               2               0               0
## HMCN1               0               0               1               1
## RYR2                0               0               1               0
## CSMD3               0               0               0               0
## FLG                 0               0               1               1
##       TCGA-25-1635-01 TCGA-25-1630-01 TCGA-61-2102-01 TCGA-23-2641-01
## TP53                1               2               1               1
## TTN                 0               0               0               0
## HMCN1               1               1               1               1
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-24-1466-01 TCGA-24-1463-01 TCGA-13-1507-01 TCGA-04-1347-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               1               0               0               0
## RYR2                0               1               1               1
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-13-0923-01 TCGA-09-2044-01 TCGA-20-1686-01 TCGA-61-2613-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                1               1               0               0
## CSMD3               0               0               1               1
## FLG                 0               0               0               0
##       TCGA-59-2352-01 TCGA-24-2288-01 TCGA-24-2261-01 TCGA-25-1323-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               1               1               1               1
## FLG                 0               0               0               0
##       TCGA-24-1464-01 TCGA-13-1409-01 TCGA-10-0933-01 TCGA-13-1500-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               1               1               0               0
## FLG                 0               0               1               2
##       TCGA-29-1778-01 TCGA-24-1850-01 TCGA-13-1489-01 TCGA-09-1665-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 1               1               0               0
##       TCGA-24-1849-01 TCGA-10-0938-01 TCGA-25-1325-01 TCGA-04-1652-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-36-2543-01 TCGA-29-1766-01 TCGA-20-0991-01 TCGA-13-1487-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1346-01 TCGA-24-2030-01 TCGA-61-2097-01 TCGA-42-2587-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-24-1558-01 TCGA-36-2534-01 TCGA-09-2053-01 TCGA-29-1771-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-13-1481-01 TCGA-24-2024-01 TCGA-24-1422-01 TCGA-13-0768-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1337-01 TCGA-25-1631-01 TCGA-13-0802-01 TCGA-13-1498-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-20-0990-01 TCGA-13-1411-01 TCGA-WR-A838-01 TCGA-25-2400-01
## TP53                1               1               2               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1649-01 TCGA-13-0889-01 TCGA-61-2614-01 TCGA-25-1627-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-09-0364-01 TCGA-24-1552-01 TCGA-23-1116-01 TCGA-24-1544-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-24-0966-01 TCGA-13-0724-01 TCGA-31-1950-01 TCGA-24-2260-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-04-1341-01 TCGA-61-1730-01 TCGA-59-2372-01 TCGA-OY-A56Q-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-13-0730-01 TCGA-13-0804-01 TCGA-09-1661-01 TCGA-24-0982-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-61-1741-01 TCGA-13-0891-01 TCGA-25-2409-01 TCGA-09-0365-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-25-1623-01 TCGA-29-1702-01 TCGA-36-1575-01 TCGA-25-1319-01
## TP53                1               1               1               1
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-25-2396-01 TCGA-23-2643-01 TCGA-04-1517-01 TCGA-59-2349-01
## TP53                1               1               1               0
## TTN                 0               0               0               4
## HMCN1               0               0               0               1
## RYR2                0               0               0               1
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-61-2012-01 TCGA-29-1774-01 TCGA-29-1693-01 TCGA-61-1899-01
## TP53                0               0               0               0
## TTN                 1               1               0               0
## HMCN1               0               0               1               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               1
## FLG                 0               0               0               0
##       TCGA-10-0928-01 TCGA-25-2393-01 TCGA-13-2065-01 TCGA-24-1565-01
## TP53                0               0               0               0
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-24-2280-01 TCGA-09-1672-01 TCGA-10-0930-01 TCGA-29-2429-01
## TP53                0               0               0               0
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-13-0921-01 TCGA-25-2398-01 TCGA-04-1342-01 TCGA-13-0727-01
## TP53                0               0               0               0
## TTN                 0               0               0               0
## HMCN1               0               0               0               0
## RYR2                0               0               0               0
## CSMD3               0               0               0               0
## FLG                 0               0               0               0
##       TCGA-25-1634-01 TCGA-36-1576-01 TCGA-04-1351-01
## TP53                0               0               0
## TTN                 0               0               0
## HMCN1               0               0               0
## RYR2                0               0               0
## CSMD3               0               0               0
## FLG                 0               0               0
matriz_genes_amostras<-as.matrix(mutCountMatrix(maf_ov))
#Genes em co-ocorrencia e em exclusividade mutua
somaticInteractions(maf_ov,top = 50)

##          gene1  gene2       pValue oddsRatio  00 11 01 10              Event
##    1:     FAT4  SYNE1 0.0006362758  49.96064 115  3  3  2       Co_Occurence
##    2:      DST   FAT4 0.0017363983  30.88428 113  3  2  5       Co_Occurence
##    3:    ASXL3    TTN 0.0029194934  22.77023 101  4 17  1       Co_Occurence
##    4:      ATM  HMCN1 0.0036253850  22.18643 111  3  7  2       Co_Occurence
##    5:   NOTCH4    RB1 0.0049477502  17.37517 111  3  6  3       Co_Occurence
##   ---                                                                       
## 1221:     FBN2 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1222:    FREM1 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1223: KIAA1462 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1224:    KMT2A ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1225:    KMT2C ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
##                   pair event_ratio
##    1:      FAT4, SYNE1         3/5
##    2:        DST, FAT4         3/7
##    3:       ASXL3, TTN        4/18
##    4:       ATM, HMCN1         3/9
##    5:      NOTCH4, RB1         3/9
##   ---                             
## 1221:     FBN2, ZNF208        0/11
## 1222:    FREM1, ZNF208        0/11
## 1223: KIAA1462, ZNF208        0/11
## 1224:    KMT2A, ZNF208        0/11
## 1225:    KMT2C, ZNF208        0/11
#salvando imagens
jpeg("interacoes_genes.jpeg",height = 10, width =10, units = 'in', res=300)
somaticInteractions(maf_ov,top = 50)
##          gene1  gene2       pValue oddsRatio  00 11 01 10              Event
##    1:     FAT4  SYNE1 0.0006362758  49.96064 115  3  3  2       Co_Occurence
##    2:      DST   FAT4 0.0017363983  30.88428 113  3  2  5       Co_Occurence
##    3:    ASXL3    TTN 0.0029194934  22.77023 101  4 17  1       Co_Occurence
##    4:      ATM  HMCN1 0.0036253850  22.18643 111  3  7  2       Co_Occurence
##    5:   NOTCH4    RB1 0.0049477502  17.37517 111  3  6  3       Co_Occurence
##   ---                                                                       
## 1221:     FBN2 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1222:    FREM1 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1223: KIAA1462 ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1224:    KMT2A ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
## 1225:    KMT2C ZNF208 1.0000000000   0.00000 112  0  6  5 Mutually_Exclusive
##                   pair event_ratio
##    1:      FAT4, SYNE1         3/5
##    2:        DST, FAT4         3/7
##    3:       ASXL3, TTN        4/18
##    4:       ATM, HMCN1         3/9
##    5:      NOTCH4, RB1         3/9
##   ---                             
## 1221:     FBN2, ZNF208        0/11
## 1222:    FREM1, ZNF208        0/11
## 1223: KIAA1462, ZNF208        0/11
## 1224:    KMT2A, ZNF208        0/11
## 1225:    KMT2C, ZNF208        0/11
dev.off()
## png 
##   2
#salvando tabelas
write.table(resumo_genes,file = "resumo_genes.txt", sep = "\t",row.names = FALSE)