Vamos iniciar sinalizando alguns vetores e criando gráficos básicos em R.
#vetores
height <- c(145, 167, 176, 123, 150)
weight <- c(51, 63, 64, 40, 55)
#plotando vetores
plot(height,weight)
#verificando classe e estrutura
class(height)
## [1] "numeric"
str(height)
## num [1:5] 145 167 176 123 150
Vamos a usar o dataset iris que apresenta informações de 3 classes de flores Iris (Iris setosa, Iris versicolour, Iris virginica)
#Dataset
flower<-iris
class(flower)
## [1] "data.frame"
str(flower)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#Resumos numericos do dataframe
summary(flower)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
#Plotando um boxplot
boxplot(flower[,1:2], main='Lenght vs Width')
#Salvando
#Com getwd podemos saber onde serão salvos os arquivos
#getwd()
#Com setwd podemos alterar a pasta onde salvaremos os arquivos.
#setwd(inserir caminho à pasta aqui)
write.table(flower,file = "flores.txt", sep = "\t",row.names = FALSE)
Maftools é um pacote que inclui ferramentas para analisar arquivos MAF (Mutation Annotation Format), um formato de arquivo amplamente usado em análises genômicos em câncer. Para mais informação, podem acessar aqui.
Para esta parte, usaremos um arquivo MAF pré-definido que pode ser baixado desde este endereço.
Uma vez baixado, devemos colocar o arquivo na pasta onde são rodados os códigos. Para saber qual é a pasta, basta digitar getwd()
#Lendo arquivo no formato MAF
pre_maf_ov<- read.delim("MAFOV70MAISCOMPLETO.txt", sep="\t")
#explorando classe e estrutura
class(pre_maf_ov)
## [1] "data.frame"
str(pre_maf_ov)
## 'data.frame': 10370 obs. of 16 variables:
## $ Hugo_Symbol : chr "TCF7L2" "PCDH15" "ANKK1" "TECTA" ...
## $ Entrez_Gene_Id : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Center : chr "." "." "." "." ...
## $ NCBI_Build : chr "GRCh37" "GRCh37" "GRCh37" "GRCh37" ...
## $ Chromosome : chr "10" "10" "11" "11" ...
## $ Start_Position : int 114903690 55700670 113267956 120996396 2156723 71193951 122277623 40742213 57559950 78400349 ...
## $ End_Position : int 114903690 55700670 113267956 120996396 2156723 71193951 122277624 40742213 57559950 78400349 ...
## $ Strand : chr "+" "+" "+" "+" ...
## $ Variant_Classification: chr "Missense_Mutation" "Missense_Mutation" "Silent" "Missense_Mutation" ...
## $ Variant_Type : chr "SNP" "SNP" "SNP" "SNP" ...
## $ Reference_Allele : chr "C" "A" "C" "C" ...
## $ Tumor_Seq_Allele1 : chr "C" "A" "C" "C" ...
## $ Tumor_Seq_Allele2 : chr "T" "T" "A" "G" ...
## $ Tumor_Sample_Barcode : chr "TCGA-04-1341-01" "TCGA-04-1341-01" "TCGA-04-1341-01" "TCGA-04-1341-01" ...
## $ Genome_Change : chr "ENST00000355995.4:c.694C>T" "ENST00000320301.6:c.3188T>A" "ENST00000303941.3:c.849C>A" "ENST00000264037.2:c.1589C>G" ...
## $ Protein_Change : chr "p.R232W" "p.I1063N" "p.I283=" "p.T530R" ...
#Instalando o pacote Maftools
#install.packages("maftools") via install.packages ou,
#BiocManager::install("maftools") via BiocManager
#Carregando o pacote Maftools
library(maftools)
#transformando o dataframe no formato MAF em um objeto MAF
maf_ov<-read.maf(pre_maf_ov)
## -Validating
## --Removed 13 duplicated variants
## --Non MAF specific values in Variant_Classification column:
## Start_Codon_Del
## -Silent variants: 3036
## -Summarizing
## --Mutiple reference builds found
## GRCh37;37--Mutiple centers found
## .;broad.mit.edu;genome.wustl.edu;hgsc.bcm.edu--Possible FLAGS among top ten genes:
## TTN
## HMCN1
## FLG
## DST
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.916s elapsed (0.977s cpu)
class(maf_ov)
## [1] "MAF"
## attr(,"package")
## [1] "maftools"
#VERIFICANDO COMANDO UTEIS
#Resumo dos 100 genes mais frequentemente alterados na coorte.
oncoplot(maf_ov,top = 100)
#Resumo relacao genes/tipos de variantes
head(getGeneSummary(maf_ov))
## Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
## 1: TP53 8 1 1 0
## 2: TTN 1 0 0 0
## 3: HMCN1 0 0 0 0
## 4: RYR2 0 0 0 0
## 5: CSMD3 0 0 0 0
## 6: RB1 3 0 0 0
## Missense_Mutation Nonsense_Mutation Nonstop_Mutation Splice_Site total
## 1: 70 10 0 15 105
## 2: 28 3 0 1 33
## 3: 10 0 0 0 10
## 4: 10 0 0 0 10
## 5: 8 0 0 1 9
## 6: 4 1 0 1 9
## MutatedSamples AlteredSamples
## 1: 103 103
## 2: 21 21
## 3: 10 10
## 4: 9 9
## 5: 9 9
## 6: 9 9
#Resumo relacao amostras/tipos de variantes
head(getSampleSummary(maf_ov))
## Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
## 1: TCGA-59-2349-01 49 11 8
## 2: TCGA-13-0888-01 2 3 1
## 3: TCGA-10-0930-01 2 1 4
## 4: TCGA-13-0885-01 1 0 1
## 5: TCGA-61-1740-01 3 1 5
## 6: TCGA-29-1761-01 8 0 1
## In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation
## 1: 0 668 49 0
## 2: 0 145 11 1
## 3: 0 103 7 2
## 4: 0 117 3 0
## 5: 0 105 5 0
## 6: 0 102 3 0
## Splice_Site total
## 1: 6 791
## 2: 3 166
## 3: 10 129
## 4: 5 127
## 5: 8 127
## 6: 9 123
#Assinalando os resumos em variaveis
resumo_genes<-getGeneSummary(maf_ov)
resumo_amostras<-getSampleSummary(maf_ov)
#Explorando posicao das variantes nos dominios de proteinas
lollipopPlot(maf_ov,AACol = "Protein_Change",gene = "TP53",showDomainLabel = FALSE)
## 8 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
## HGNC refseq.ID protein.ID aa.length
## 1: TP53 NM_000546 NP_000537 393
## 2: TP53 NM_001126112 NP_001119584 393
## 3: TP53 NM_001126118 NP_001119590 354
## 4: TP53 NM_001126115 NP_001119587 261
## 5: TP53 NM_001126113 NP_001119585 346
## 6: TP53 NM_001126117 NP_001119589 214
## 7: TP53 NM_001126114 NP_001119586 341
## 8: TP53 NM_001126116 NP_001119588 209
## Using longer transcript NM_000546 for now.
## Removed 11 mutations for which AA position was not available
#Matriz de relacao genes/amostras
head(mutCountMatrix(maf_ov))
## TCGA-13-0885-01 TCGA-29-1761-01 TCGA-13-0755-01 TCGA-61-1733-01
## TP53 1 1 1 1
## TTN 3 2 2 2
## HMCN1 1 0 0 0
## RYR2 2 1 0 0
## CSMD3 0 0 0 0
## FLG 0 0 1 1
## TCGA-24-1469-01 TCGA-09-1674-01 TCGA-25-2399-01 TCGA-61-1740-01
## TP53 1 1 1 1
## TTN 1 2 1 2
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1331-01 TCGA-25-2392-01 TCGA-23-1032-01 TCGA-24-2033-01
## TP53 1 1 1 1
## TTN 1 1 1 1
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1365-01 TCGA-04-1338-01 TCGA-04-1343-01 TCGA-13-0757-01
## TP53 1 1 1 1
## TTN 1 1 1 2
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-25-1329-01 TCGA-04-1332-01 TCGA-61-1727-01 TCGA-13-0888-01
## TP53 1 1 1 1
## TTN 1 2 0 0
## HMCN1 0 0 1 1
## RYR2 0 0 1 0
## CSMD3 0 0 0 0
## FLG 0 0 1 1
## TCGA-25-1635-01 TCGA-25-1630-01 TCGA-61-2102-01 TCGA-23-2641-01
## TP53 1 2 1 1
## TTN 0 0 0 0
## HMCN1 1 1 1 1
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-24-1466-01 TCGA-24-1463-01 TCGA-13-1507-01 TCGA-04-1347-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 1 0 0 0
## RYR2 0 1 1 1
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-13-0923-01 TCGA-09-2044-01 TCGA-20-1686-01 TCGA-61-2613-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 1 1 0 0
## CSMD3 0 0 1 1
## FLG 0 0 0 0
## TCGA-59-2352-01 TCGA-24-2288-01 TCGA-24-2261-01 TCGA-25-1323-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 1 1 1 1
## FLG 0 0 0 0
## TCGA-24-1464-01 TCGA-13-1409-01 TCGA-10-0933-01 TCGA-13-1500-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 1 1 0 0
## FLG 0 0 1 2
## TCGA-29-1778-01 TCGA-24-1850-01 TCGA-13-1489-01 TCGA-09-1665-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 1 1 0 0
## TCGA-24-1849-01 TCGA-10-0938-01 TCGA-25-1325-01 TCGA-04-1652-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-36-2543-01 TCGA-29-1766-01 TCGA-20-0991-01 TCGA-13-1487-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1346-01 TCGA-24-2030-01 TCGA-61-2097-01 TCGA-42-2587-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-24-1558-01 TCGA-36-2534-01 TCGA-09-2053-01 TCGA-29-1771-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-13-1481-01 TCGA-24-2024-01 TCGA-24-1422-01 TCGA-13-0768-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1337-01 TCGA-25-1631-01 TCGA-13-0802-01 TCGA-13-1498-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-20-0990-01 TCGA-13-1411-01 TCGA-WR-A838-01 TCGA-25-2400-01
## TP53 1 1 2 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1649-01 TCGA-13-0889-01 TCGA-61-2614-01 TCGA-25-1627-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-09-0364-01 TCGA-24-1552-01 TCGA-23-1116-01 TCGA-24-1544-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-24-0966-01 TCGA-13-0724-01 TCGA-31-1950-01 TCGA-24-2260-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-04-1341-01 TCGA-61-1730-01 TCGA-59-2372-01 TCGA-OY-A56Q-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-13-0730-01 TCGA-13-0804-01 TCGA-09-1661-01 TCGA-24-0982-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-61-1741-01 TCGA-13-0891-01 TCGA-25-2409-01 TCGA-09-0365-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-25-1623-01 TCGA-29-1702-01 TCGA-36-1575-01 TCGA-25-1319-01
## TP53 1 1 1 1
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-25-2396-01 TCGA-23-2643-01 TCGA-04-1517-01 TCGA-59-2349-01
## TP53 1 1 1 0
## TTN 0 0 0 4
## HMCN1 0 0 0 1
## RYR2 0 0 0 1
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-61-2012-01 TCGA-29-1774-01 TCGA-29-1693-01 TCGA-61-1899-01
## TP53 0 0 0 0
## TTN 1 1 0 0
## HMCN1 0 0 1 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 1
## FLG 0 0 0 0
## TCGA-10-0928-01 TCGA-25-2393-01 TCGA-13-2065-01 TCGA-24-1565-01
## TP53 0 0 0 0
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-24-2280-01 TCGA-09-1672-01 TCGA-10-0930-01 TCGA-29-2429-01
## TP53 0 0 0 0
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-13-0921-01 TCGA-25-2398-01 TCGA-04-1342-01 TCGA-13-0727-01
## TP53 0 0 0 0
## TTN 0 0 0 0
## HMCN1 0 0 0 0
## RYR2 0 0 0 0
## CSMD3 0 0 0 0
## FLG 0 0 0 0
## TCGA-25-1634-01 TCGA-36-1576-01 TCGA-04-1351-01
## TP53 0 0 0
## TTN 0 0 0
## HMCN1 0 0 0
## RYR2 0 0 0
## CSMD3 0 0 0
## FLG 0 0 0
matriz_genes_amostras<-as.matrix(mutCountMatrix(maf_ov))
#Genes em co-ocorrencia e em exclusividade mutua
somaticInteractions(maf_ov,top = 50)
## gene1 gene2 pValue oddsRatio 00 11 01 10 Event
## 1: FAT4 SYNE1 0.0006362758 49.96064 115 3 3 2 Co_Occurence
## 2: DST FAT4 0.0017363983 30.88428 113 3 2 5 Co_Occurence
## 3: ASXL3 TTN 0.0029194934 22.77023 101 4 17 1 Co_Occurence
## 4: ATM HMCN1 0.0036253850 22.18643 111 3 7 2 Co_Occurence
## 5: NOTCH4 RB1 0.0049477502 17.37517 111 3 6 3 Co_Occurence
## ---
## 1221: FBN2 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1222: FREM1 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1223: KIAA1462 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1224: KMT2A ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1225: KMT2C ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## pair event_ratio
## 1: FAT4, SYNE1 3/5
## 2: DST, FAT4 3/7
## 3: ASXL3, TTN 4/18
## 4: ATM, HMCN1 3/9
## 5: NOTCH4, RB1 3/9
## ---
## 1221: FBN2, ZNF208 0/11
## 1222: FREM1, ZNF208 0/11
## 1223: KIAA1462, ZNF208 0/11
## 1224: KMT2A, ZNF208 0/11
## 1225: KMT2C, ZNF208 0/11
#salvando imagens
jpeg("interacoes_genes.jpeg",height = 10, width =10, units = 'in', res=300)
somaticInteractions(maf_ov,top = 50)
## gene1 gene2 pValue oddsRatio 00 11 01 10 Event
## 1: FAT4 SYNE1 0.0006362758 49.96064 115 3 3 2 Co_Occurence
## 2: DST FAT4 0.0017363983 30.88428 113 3 2 5 Co_Occurence
## 3: ASXL3 TTN 0.0029194934 22.77023 101 4 17 1 Co_Occurence
## 4: ATM HMCN1 0.0036253850 22.18643 111 3 7 2 Co_Occurence
## 5: NOTCH4 RB1 0.0049477502 17.37517 111 3 6 3 Co_Occurence
## ---
## 1221: FBN2 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1222: FREM1 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1223: KIAA1462 ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1224: KMT2A ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## 1225: KMT2C ZNF208 1.0000000000 0.00000 112 0 6 5 Mutually_Exclusive
## pair event_ratio
## 1: FAT4, SYNE1 3/5
## 2: DST, FAT4 3/7
## 3: ASXL3, TTN 4/18
## 4: ATM, HMCN1 3/9
## 5: NOTCH4, RB1 3/9
## ---
## 1221: FBN2, ZNF208 0/11
## 1222: FREM1, ZNF208 0/11
## 1223: KIAA1462, ZNF208 0/11
## 1224: KMT2A, ZNF208 0/11
## 1225: KMT2C, ZNF208 0/11
dev.off()
## png
## 2
#salvando tabelas
write.table(resumo_genes,file = "resumo_genes.txt", sep = "\t",row.names = FALSE)