For this project, I will be showing the log2(fold-change) in differential gene expression for RNA-seq data in two subsets of colon cancer: adenocarcinoma and cystic, mucinous and serous neoplasms. The final deliverable will be the the Glimma Vignette. The input files for the Glimma vignette will be HTSeq-count data obtained from The Cancer Genome Atlas Program (TCGA). I will be turning in an HTML containing my R Notebook.
Data is obtained from TCGA. I filtered for RNA-Seq experimental strategy, TXT data format and HTSeq-counts workflow type. HTSeq-counts is a tool that quantifies the aligned reads overlapping a gene’s exons. HTSeq data does not have a header, is tab-delimited, the first column is the Ensembl gene ID and the second column is the number of mapped reads of the gene. The counts will be used in differential gene expression analysis using edgeR as the method. To look at the differential gene expression, the counts will be normalized using the calcNormFactors in edgeR and only reads that unambigously map to one gene are used.
RAW: URL for cystic, mucinous, and serous neoplasms, I will choose 30:
When data is read in:
ENSG00000000003.13
<fctr>
X5290
<int>
ENSG00000000005.5 47
ENSG00000000419.11 1212
ENSG00000000457.12 1176
ENSG00000000460.15 121
ENSG00000000938.11 166
ENSG00000000971.14 1012
ENSG00000001036.12 4401
ENSG00000001084.9 1977
ENSG00000001167.13 976
ENSG00000001460.16 1638
1-10 of 60 rows
When dataset is made using the 60 text files:
[1] "DGEList"
attr(,"package")
[1] "edgeR"
[1] 60487 60
Make a box plots of unnormalized and normalized data
This project will compute and analyze the logarithmic ratio of differential gene expression of two subtypes of colon cancer. EdgeR will be used to import, organize, and normalize the data, Mus.musculus will be used for gene annotions, limma will be used to examine the gene expression anaylsis and make exploratory plots, Glimma will be used to make these plots interactive. RColorBrewer and gplots will be used to make heatmaps.
library(DiagrammeR)
grViz("digraph flowchart {
# node definitions with substituted label text
node [fontname = Helvetica, shape = rectangle]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
tab4 [label = '@@4']
tab5 [label = '@@5']
tab6 [label = '@@6']
tab7 [label = '@@7']
tab8 [label = '@@8']
tab9 [label = '@@9']
tab10 [label = '@@10']
# edge definitions with the node IDs
tab1 -> tab2;
tab2 -> tab3;
tab3 -> tab4;
tab4 -> tab8 -> tab5;
tab4 -> tab5 -> tab6 -> tab7 -> tab9 -> tab10
}
[1]: 'Download necessary libraries'
[2]: 'Load and read datasets'
[3]: 'Join datasets'
[4]: 'Unit test: Data was properly loaded?'
[5]: 'Normalize data'
[6]: 'Find Mean Varience Trend'
[7]: 'Analyze DE genes'
[8]: 'Troubleshoot and fix errors'
[9]: 'Make Interactive MDS plot'
[10]: 'Make HeatMap of log-CPM data'
")
Week 1: Run the Glimma vignette. I will install the necessary packages in R and understand each step in the vignette.
Week 2: Load in the data (joins, creating datasets) and do a simple, 1 line unit test to look at the data. I will download 60 datasets (30 from each subtype) and join multiple datasets. Emailed Dr. Craig on 11/19 and agreed on turning this milestone in on Sat Nov 23, 2019.
Week 3: Confirm that the data was loaded in correctly and analyze data using the Glimma vignette.Emailed Dr. Craig on 11/26 and agreed on turning this milestone in on 12/1.
Week 4: Troubleshoot for more errors and enhance the user interface.
I anticipate having boxplots, heatmaps, and interactive multi-dimensional scaling (MDS) plots done in an R Notebook. I will submit an HTML page of my completed R Notebook.
{ if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“limma”) library(limma)
if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“Glimma”) library(Glimma)
if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)
BiocManager::install(“edgeR”) library(edgeR)
if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)
BiocManager::install(“Mus.musculus”) library(Mus.musculus)
library(R.utils)
if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)
BiocManager::install(“CAMERA”) library(CAMERA) }
HTML page will be generarted and opened in a browser if launch=TRUE
To open HTML page in a browser, make launch=TRUE
I created a new folder called COAD and stored all 60 files in it. I then changed them to TXT files and opened them.
https://portal.gdc.cancer.gov/files/536f5a77-0087-457d-ac95-6d1a9abad8cb, UUID 536f5a77-0087-457d-ac95-6d1a9abad8cb, case: TCGA-AA-3516
https://portal.gdc.cancer.gov/files/ed52de66-66fa-44ce-b679-cf641b0d92cd, UUID ed52de66-66fa-44ce-b679-cf641b0d92cd, case: TCGA-AA-3516
https://portal.gdc.cancer.gov/files/b28090c5-c42d-4836-9bb1-ce906d3ead95, UUID: b28090c5-c42d-4836-9bb1-ce906d3ead95, case TCGA-AA-3854
https://portal.gdc.cancer.gov/cases/57cdaa1c-4e94-4a28-ab3b-300c0457555f, UUID: 49e29c69-d9d7-4496-9f24-26f42c8b6d8e, case: TCGA-A6-2674
https://portal.gdc.cancer.gov/files/08ed32e4-fb94-4bc0-8715-83ee2143a13d, UUID: 08ed32e4-fb94-4bc0-8715-83ee2143a13d, case: TCGA-AA-A00J
https://portal.gdc.cancer.gov/files/6e571f71-d5fb-42f3-a35b-554c5ab76587, UUID: 6e571f71-d5fb-42f3-a35b-554c5ab76587, case: TCGA-AA-A01G
https://portal.gdc.cancer.gov/files/8b12a000-f588-4a78-a9eb-f06041a65789, UUID: 8b12a000-f588-4a78-a9eb-f06041a65789, case: TCGA-A6-6780
https://portal.gdc.cancer.gov/files/02734d4d-fc8f-4ef7-ac82-1b4d7184cc5e, UUID: 02734d4d-fc8f-4ef7-ac82-1b4d7184cc5e, case: TCGA-CK-4950
https://portal.gdc.cancer.gov/files/6466a8b1-d1e2-4195-a353-0800576c13c8, UUID: 6466a8b1-d1e2-4195-a353-0800576c13c8, case: TCGA-G4-6322
https://portal.gdc.cancer.gov/files/bc47f01c-1994-4ff8-a356-94d9679b66ee, UUID: bc47f01c-1994-4ff8-a356-94d9679b66ee, case: TCGA-AA-3947
https://portal.gdc.cancer.gov/files/b045ee79-82a6-4636-a875-1a58603d89ff, UUID: b045ee79-82a6-4636-a875-1a58603d89ff, case: TCGA-A6-A566
https://portal.gdc.cancer.gov/files/c383ba2c-b00a-4bd2-82cb-b3f04c2a8172, UUID: c383ba2c-b00a-4bd2-82cb-b3f04c2a8172, case: TCGA-AA-3877
https://portal.gdc.cancer.gov/files/b52775aa-273e-484e-82c7-c625f09415fa, UUID: b52775aa-273e-484e-82c7-c625f09415fa, case: TCGA-A6-3809
https://portal.gdc.cancer.gov/files/7b15a87a-805c-4b8a-84de-549cec9c44e3, UUID: 7b15a87a-805c-4b8a-84de-549cec9c44e3, case: TCGA-AA-3684
https://portal.gdc.cancer.gov/files/b4f3dbbb-2686-4896-9e60-5bef6c9150b4, UUID: b4f3dbbb-2686-4896-9e60-5bef6c9150b4, case: TCGA-AA-3692
https://portal.gdc.cancer.gov/files/0b16e2bd-3ec7-4901-9ff0-a389670e5019, UUID: 0b16e2bd-3ec7-4901-9ff0-a389670e5019, case: TCGA-D5-6534
https://portal.gdc.cancer.gov/files/a6690007-f347-49c3-a0ba-28e01d131971, UUID: a6690007-f347-49c3-a0ba-28e01d131971, case: TCGA-A6-3809
https://portal.gdc.cancer.gov/files/a1742cf6-c3c5-43e7-879c-489494460e78, UUID: a1742cf6-c3c5-43e7-879c-489494460e78, case: TCGA-AA-A00N
https://portal.gdc.cancer.gov/files/d5be795d-beb6-4def-bda8-f485ee45bfc1, UUID: d5be795d-beb6-4def-bda8-f485ee45bfc1, case: TCGA-A6-2674
https://portal.gdc.cancer.gov/files/46306072-c59c-4b4b-963c-9c4e778ff34b, UUID: 46306072-c59c-4b4b-963c-9c4e778ff34b, case: TCGA-A6-6780
https://portal.gdc.cancer.gov/files/a938cb2c-c8e8-4395-915b-37e1e279a4da, UUID: a938cb2c-c8e8-4395-915b-37e1e279a4da, case: TCGA-G4-6302
https://portal.gdc.cancer.gov/files/7fec7c90-fd2e-4ee2-ba1a-77f85920771f, UUID: 7fec7c90-fd2e-4ee2-ba1a-77f85920771f, case: TCGA-DM-A282
https://portal.gdc.cancer.gov/files/2c3fd34c-70d1-4331-9628-260b77329b53, UUID: 2c3fd34c-70d1-4331-9628-260b77329b53, case: TCGA-F4-6704
https://portal.gdc.cancer.gov/files/4168a720-521e-47ff-afb5-4abe3e815490, UUID: 4168a720-521e-47ff-afb5-4abe3e815490, case: TCGA-AA-3950
https://portal.gdc.cancer.gov/files/ecc90bd1-f594-41ea-ba4b-d42f4c64880b, UUID: ecc90bd1-f594-41ea-ba4b-d42f4c64880b, case: TCGA-A6-6781
https://portal.gdc.cancer.gov/files/8736ed27-2141-48d9-b677-b1a0e14d4b50, UUID: 8736ed27-2141-48d9-b677-b1a0e14d4b50, case: TCGA-CA-6717
https://portal.gdc.cancer.gov/files/3b8d04cd-d658-46ba-adca-079fee531e17, UUID: 3b8d04cd-d658-46ba-adca-079fee531e17, case: TCGA-AA-3821
https://portal.gdc.cancer.gov/files/b27da518-d023-4f9c-a9ab-5cd68ee37870, UUID: b27da518-d023-4f9c-a9ab-5cd68ee37870, case: TCGA-CK-4951
https://portal.gdc.cancer.gov/files/e7005df6-f78b-4e47-abe7-61ae6a2ee026, UUID: e7005df6-f78b-4e47-abe7-61ae6a2ee026, case: TCGA-AA-A01R
https://portal.gdc.cancer.gov/files/e3598d14-292c-41cc-9b59-4497fa078272, UUID: e3598d14-292c-41cc-9b59-4497fa078272, case: TCGA-D5-6930
https://portal.gdc.cancer.gov/files/f1185347-ad15-43ae-9ef3-d5343b31a0fc, UUID: f1185347-ad15-43ae-9ef3-d5343b31a0fc, case: TCGA-A6-6654
https://portal.gdc.cancer.gov/files/0d53cb1c-97c4-4088-9e43-029de88fd66d, UUID: 0d53cb1c-97c4-4088-9e43-029de88fd66d, case: TCGA-DM-A1D4
https://portal.gdc.cancer.gov/files/a74bbce0-7f3d-434e-b294-7fa45e5b3a60, UUID: a74bbce0-7f3d-434e-b294-7fa45e5b3a60, case: TCGA-A6-2684
https://portal.gdc.cancer.gov/files/47554e4e-cd13-4b92-80be-e1940f9a950f, UUID: 47554e4e-cd13-4b92-80be-e1940f9a950f, case: TCGA-A6-5657
https://portal.gdc.cancer.gov/files/de60dbd7-8a93-47a5-b1ea-a3f95beade8a, UUID: de60dbd7-8a93-47a5-b1ea-a3f95beade8a, case: TCGA-F4-6854
https://portal.gdc.cancer.gov/files/70883b31-d130-4efd-a7c6-169c8d4a253d, UUID: 70883b31-d130-4efd-a7c6-169c8d4a253d, case: TCGA-AD-A5EJ
https://portal.gdc.cancer.gov/files/042bda3d-77aa-4522-8a97-c121711a760e, UUID: 042bda3d-77aa-4522-8a97-c121711a760e, case: TCGA-AG-3582
https://portal.gdc.cancer.gov/files/b6388e09-7ed5-4041-97bb-4427ba5571ba, UUID: b6388e09-7ed5-4041-97bb-4427ba5571ba, case: TCGA-AY-6197
https://portal.gdc.cancer.gov/files/54394c0b-6ae3-4b48-8e89-350ad5349611, UUID: 54394c0b-6ae3-4b48-8e89-350ad5349611, case: TCGA-AA-3554
https://portal.gdc.cancer.gov/files/f7e21d61-19b6-4e99-887f-463d4419628c, UUID: f7e21d61-19b6-4e99-887f-463d4419628c, case: TCGA-AG-4015
https://portal.gdc.cancer.gov/files/b4114885-38cd-4e8a-874b-b78da8d95e2c, UUID: b4114885-38cd-4e8a-874b-b78da8d95e2c, case: TCGA-CM-6171
https://portal.gdc.cancer.gov/files/f9fda40d-67e4-4cb9-859c-ddc2ea84b7e4, UUID: f9fda40d-67e4-4cb9-859c-ddc2ea84b7e4, case: TCGA-CM-6170
https://portal.gdc.cancer.gov/files/b4aebb2a-d0b8-43d8-bd1f-78af2065d8f9, UUID: b4aebb2a-d0b8-43d8-bd1f-78af2065d8f9, case: TCGA-AA-3846
https://portal.gdc.cancer.gov/files/6a750710-5ed9-4d24-b2bf-3a4e3211878f, UUID: 6a750710-5ed9-4d24-b2bf-3a4e3211878f, case: TCGA-CM-6677
https://portal.gdc.cancer.gov/files/93d1a78f-423e-4560-b4d3-ee4a89ac922b, UUID: 93d1a78f-423e-4560-b4d3-ee4a89ac922b, case: TCGA-RU-A8FL
https://portal.gdc.cancer.gov/files/2e632fd9-fa17-4290-9601-a5d462cf152c, UUID: 2e632fd9-fa17-4290-9601-a5d462cf152c, case: TCGA-AZ-4323
https://portal.gdc.cancer.gov/files/7239b026-2587-489d-81fe-7bc657b7523c, UUID: 7239b026-2587-489d-81fe-7bc657b7523c, case: TCGA-CM-6164
https://portal.gdc.cancer.gov/files/90e86a26-fffa-4c38-b2e0-bf0704ee3615, UUID: 90e86a26-fffa-4c38-b2e0-bf0704ee3615, case: TCGA-AZ-4315
https://portal.gdc.cancer.gov/files/9ff11fe0-037c-405e-95c3-dc4a15413db8, UUID: 9ff11fe0-037c-405e-95c3-dc4a15413db8, case: TCGA-G4-6311
https://portal.gdc.cancer.gov/files/b8eed826-6051-4358-9b3d-44d1553dd9ad, UUID: b8eed826-6051-4358-9b3d-44d1553dd9ad, case: TCGA-AA-3522
https://portal.gdc.cancer.gov/files/c172bc07-d4f0-41be-a558-49abc81065c2, UUID: c172bc07-d4f0-41be-a558-49abc81065c2, case: TCGA-AA-3667
https://portal.gdc.cancer.gov/files/260edc5e-1ca6-4b07-b96d-59594d03ac54, UUID: 260edc5e-1ca6-4b07-b96d-59594d03ac54, case: TCGA-AA-A00U
https://portal.gdc.cancer.gov/files/0c5c1a38-7e9c-4b43-810d-0761c3af49b1, UUID: 0c5c1a38-7e9c-4b43-810d-0761c3af49b1, case: TCGA-AA-3506
https://portal.gdc.cancer.gov/files/7024ba0c-be56-4907-9254-cdb2579e536e, UUID: 7024ba0c-be56-4907-9254-cdb2579e536e, case: TCGA-NH-A8F7
https://portal.gdc.cancer.gov/files/031cf2a5-74e0-4b5f-98bd-da60628c0854, UUID: 031cf2a5-74e0-4b5f-98bd-da60628c0854, case: TCGA-AA-3680
https://portal.gdc.cancer.gov/files/91991ecf-cc54-4110-8a4e-9236bf8aa072, UUID: 91991ecf-cc54-4110-8a4e-9236bf8aa072, case: TCGA-A6-4105
https://portal.gdc.cancer.gov/files/47aceec1-a01d-419f-9689-c46284c79bcb, UUID: 47aceec1-a01d-419f-9689-c46284c79bcb, case: TCGA-D5-6922
https://portal.gdc.cancer.gov/files/ce84c955-63db-473a-a6d7-0e3daad6efd4, UUID: ce84c955-63db-473a-a6d7-0e3daad6efd4, case: TCGA-AA-3524
https://portal.gdc.cancer.gov/files/a071fc45-61ea-4815-93bf-be34980e59ee, UUID: a071fc45-61ea-4815-93bf-be34980e59ee, case: TCGA-AA-3855
https://portal.gdc.cancer.gov/files/8b275144-b885-4fb0-af39-fea1e48a970a, UUID: 8b275144-b885-4fb0-af39-fea1e48a970a, case: TCGA-AA-A00Q
First rename the “.count” files to “.txt” and unzip each one by opening each file.
setwd('~/Desktop/COAD_Data/')
COAD_files <- c("9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt", "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt",
"5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt", "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt",
"15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt", "649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq.txt",
"86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq.txt", "28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq.txt", "911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq.txt", "d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq.txt", "f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq.txt", "f590941d-19dc-427a-95b6-942c97ea8333.htseq.txt", "55aa6d16-3598-42ca-8844-0fe84739ef66.htseq.txt", "0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq.txt", "9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq.txt", "d2587070-cb7d-440d-ae49-52f5077248e6.htseq.txt", "7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq.txt", "2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq.txt", "424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq.txt", "934f9dc6-1260-4268-b022-870f1e37dd6f.htseq.txt", "0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq.txt", "c8544a8a-4352-438d-94d4-3495af2e9a78.htseq.txt", "dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq.txt", "e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq.txt", "debd6982-7c27-42e8-b778-20afcc78a5f3.htseq.txt", "17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq.txt", "7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq.txt", "fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq.txt", "abe20df7-6b97-4397-8864-881bac27e92c.htseq.txt", "62f84581-4c7d-4c8e-835c-9304bcec3106.htseq.txt", "3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq.txt", "087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq.txt",
"c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq.txt", "13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq.txt",
"6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq.txt", "8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq.txt",
"168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq.txt", "0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq.txt", "4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq.txt", "7fb73a84-867a-4c28-aa02-93068efffb7b.htseq.txt", "b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq.txt", "f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq.txt", "f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq.txt", "e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq.txt", "a26d49db-2309-46a0-a3ed-275378d484e7.htseq.txt", "a3f88a5d-7169-465b-bb80-e5999590681c.htseq.txt", "c264fe3b-482b-44ec-83a4-73df565663ff.htseq.txt", "bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq.txt", "7261b656-c79c-4581-a503-15b653e2b5d2.htseq.txt", "ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq.txt", "f596eabc-e39a-4e35-9fc6-edade04eb785.htseq.txt", "bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq.txt", "564daa81-cfef-45b6-94a0-3249b2724d9b.htseq.txt", "82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq.txt", "9c52ed00-325f-4664-8873-327bcaa5ea74.htseq.txt", "fabefb10-5546-4017-8ea1-29982a10fb3c.htseq.txt", "32a115cf-570f-4ad9-a123-8e1970062f51.htseq.txt", "05eef9f8-a246-403a-b0be-07d274b6f93a.htseq.txt", "5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq.txt", "43b292be-5d63-4523-a43f-666d20039208.htseq.txt")
read.delim(COAD_files[1], nrows = 60)
Use edgeR to create a matrix of 60 text files.
Spoke to professor Craig on 12/4 and it is ok to not change the root, just setwd to desktop as my desktop since files were downloaded locally.
setwd('~/Desktop/COAD_Data/')
library(edgeR)
## Loading required package: limma
x <- readDGE(COAD_files, columns=c(1,2)) #joins my 60 files and creates a dataset
## Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique
class(x)
## [1] "DGEList"
## attr(,"package")
## [1] "edgeR"
dim(x)
## [1] 60487 60
names(x) #accessor function
## [1] "samples" "counts"
str(x) #displays the structure of x in compact way, alternative to summary and best for displaying contents of lists
## Formal class 'DGEList' [package "edgeR"] with 1 slot
## ..@ .Data:List of 2
## .. ..$ :'data.frame': 60 obs. of 4 variables:
## .. .. ..$ files : chr [1:60] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt" "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt" "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt" "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt" ...
## .. .. ..$ group : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
## .. .. ..$ lib.size : num [1:60] 9.02e+07 3.68e+07 4.30e+07 1.12e+08 3.63e+07 ...
## .. .. ..$ norm.factors: num [1:60] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ : num [1:60487, 1:60] 47 1212 1176 121 166 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ Tags : chr [1:60487] "ENSG00000000005.5" "ENSG00000000419.11" "ENSG00000000457.12" "ENSG00000000460.15" ...
## .. .. .. ..$ Samples: chr [1:60] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq" "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq" "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq" "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq" ...
x$samples
samplenames <- substring(colnames(x), 1, nchar(colnames(x)))
samplenames
## [1] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq"
## [2] "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq"
## [3] "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq"
## [4] "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq"
## [5] "15864159-be88-41c8-bdef-c2c5927cb1a1.htseq"
## [6] "649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq"
## [7] "86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq"
## [8] "28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq"
## [9] "911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq"
## [10] "d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq"
## [11] "f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq"
## [12] "f590941d-19dc-427a-95b6-942c97ea8333.htseq"
## [13] "55aa6d16-3598-42ca-8844-0fe84739ef66.htseq"
## [14] "0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq"
## [15] "9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq"
## [16] "d2587070-cb7d-440d-ae49-52f5077248e6.htseq"
## [17] "7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq"
## [18] "2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq"
## [19] "424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq"
## [20] "934f9dc6-1260-4268-b022-870f1e37dd6f.htseq"
## [21] "0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq"
## [22] "c8544a8a-4352-438d-94d4-3495af2e9a78.htseq"
## [23] "dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq"
## [24] "e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq"
## [25] "debd6982-7c27-42e8-b778-20afcc78a5f3.htseq"
## [26] "17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq"
## [27] "7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq"
## [28] "fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq"
## [29] "abe20df7-6b97-4397-8864-881bac27e92c.htseq"
## [30] "62f84581-4c7d-4c8e-835c-9304bcec3106.htseq"
## [31] "3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq"
## [32] "087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq"
## [33] "c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq"
## [34] "13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq"
## [35] "6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq"
## [36] "8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq"
## [37] "168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq"
## [38] "0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq"
## [39] "4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq"
## [40] "7fb73a84-867a-4c28-aa02-93068efffb7b.htseq"
## [41] "b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq"
## [42] "f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq"
## [43] "f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq"
## [44] "e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq"
## [45] "a26d49db-2309-46a0-a3ed-275378d484e7.htseq"
## [46] "a3f88a5d-7169-465b-bb80-e5999590681c.htseq"
## [47] "c264fe3b-482b-44ec-83a4-73df565663ff.htseq"
## [48] "bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq"
## [49] "7261b656-c79c-4581-a503-15b653e2b5d2.htseq"
## [50] "ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq"
## [51] "f596eabc-e39a-4e35-9fc6-edade04eb785.htseq"
## [52] "bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq"
## [53] "564daa81-cfef-45b6-94a0-3249b2724d9b.htseq"
## [54] "82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq"
## [55] "9c52ed00-325f-4664-8873-327bcaa5ea74.htseq"
## [56] "fabefb10-5546-4017-8ea1-29982a10fb3c.htseq"
## [57] "32a115cf-570f-4ad9-a123-8e1970062f51.htseq"
## [58] "05eef9f8-a246-403a-b0be-07d274b6f93a.htseq"
## [59] "5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq"
## [60] "43b292be-5d63-4523-a43f-666d20039208.htseq"
colnames(x) <- samplenames
group <- as.factor(c("CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
"CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
"CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
"CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
"CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
"ADENOCARCINOMA", "ADENOCARCINOMA"))
x$samples$group <- group
x$samples
DF<-x$samples #for my own visualization purposes
{ if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)
BiocManager::install(“Homo.sapiens”) library(Homo.sapiens) install.packages(gsubfn) library(gsubfn) }
First install Homo.sapiens, then use a script remove the decimals and numbers after the decimal points in all 60487 ENSEMBL geneid elements.
library(Homo.sapiens)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following object is masked from 'package:limma':
##
## plotMA
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: OrganismDbi
## Loading required package: GenomicFeatures
## Loading required package: GenomeInfoDb
## Loading required package: GenomicRanges
## Loading required package: GO.db
##
## Loading required package: org.Hs.eg.db
##
## Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
#library(stringr)
library(gsubfn)
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
## Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
## Reason: image not found
## Could not load tcltk. Will use slower R code instead.
geneid <- rownames(x)
#geneid_test <- c("ENSG00000000005",
# "ENSG00000000419",
# "ENSG00000000457",
# "ENSG00000000938")
#geneid <- str_remove(geneid, "[.]") removes decimals only
geneid <- gsub("\\.[0-9]*$", "", geneid) #remove decimals and numbers after decimals
genes <- select(Homo.sapiens, keys=geneid, columns=c("SYMBOL", "TXCHROM"),
keytype="ENSEMBL")
## 'select()' returned 1:many mapping between keys and columns
head(genes)
genes <- genes[!duplicated(genes$ENSEMBL),]
x$genes <- genes
x
## An object of class "DGEList"
## $samples
## files
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt
## group lib.size norm.factors
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq CMS 90179803 1
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq CMS 36807306 1
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq CMS 42963355 1
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq CMS 111649651 1
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq CMS 36349055 1
## 55 more rows ...
##
## $counts
## Samples
## Tags 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
## ENSG00000000005.5 47
## ENSG00000000419.11 1212
## ENSG00000000457.12 1176
## ENSG00000000460.15 121
## ENSG00000000938.11 166
## Samples
## Tags bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
## ENSG00000000005.5 4
## ENSG00000000419.11 710
## ENSG00000000457.12 236
## ENSG00000000460.15 211
## ENSG00000000938.11 140
## Samples
## Tags 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 702
## ENSG00000000457.12 552
## ENSG00000000460.15 320
## ENSG00000000938.11 93
## Samples
## Tags 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
## ENSG00000000005.5 10
## ENSG00000000419.11 432
## ENSG00000000457.12 803
## ENSG00000000460.15 605
## ENSG00000000938.11 473
## Samples
## Tags 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
## ENSG00000000005.5 3
## ENSG00000000419.11 641
## ENSG00000000457.12 311
## ENSG00000000460.15 182
## ENSG00000000938.11 130
## Samples
## Tags 649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
## ENSG00000000005.5 14
## ENSG00000000419.11 1151
## ENSG00000000457.12 246
## ENSG00000000460.15 202
## ENSG00000000938.11 52
## Samples
## Tags 86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
## ENSG00000000005.5 14
## ENSG00000000419.11 3675
## ENSG00000000457.12 1901
## ENSG00000000460.15 1436
## ENSG00000000938.11 862
## Samples
## Tags 28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
## ENSG00000000005.5 37
## ENSG00000000419.11 2278
## ENSG00000000457.12 835
## ENSG00000000460.15 697
## ENSG00000000938.11 687
## Samples
## Tags 911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 1934
## ENSG00000000457.12 745
## ENSG00000000460.15 464
## ENSG00000000938.11 138
## Samples
## Tags d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
## ENSG00000000005.5 3
## ENSG00000000419.11 707
## ENSG00000000457.12 366
## ENSG00000000460.15 206
## ENSG00000000938.11 80
## Samples
## Tags f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
## ENSG00000000005.5 10
## ENSG00000000419.11 1282
## ENSG00000000457.12 624
## ENSG00000000460.15 267
## ENSG00000000938.11 979
## Samples
## Tags f590941d-19dc-427a-95b6-942c97ea8333.htseq
## ENSG00000000005.5 0
## ENSG00000000419.11 727
## ENSG00000000457.12 356
## ENSG00000000460.15 240
## ENSG00000000938.11 378
## Samples
## Tags 55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
## ENSG00000000005.5 1
## ENSG00000000419.11 2949
## ENSG00000000457.12 892
## ENSG00000000460.15 823
## ENSG00000000938.11 389
## Samples
## Tags 0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
## ENSG00000000005.5 10
## ENSG00000000419.11 219
## ENSG00000000457.12 95
## ENSG00000000460.15 106
## ENSG00000000938.11 320
## Samples
## Tags 9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
## ENSG00000000005.5 7
## ENSG00000000419.11 1503
## ENSG00000000457.12 566
## ENSG00000000460.15 389
## ENSG00000000938.11 235
## Samples
## Tags d2587070-cb7d-440d-ae49-52f5077248e6.htseq
## ENSG00000000005.5 124
## ENSG00000000419.11 2070
## ENSG00000000457.12 886
## ENSG00000000460.15 283
## ENSG00000000938.11 2117
## Samples
## Tags 7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 604
## ENSG00000000457.12 215
## ENSG00000000460.15 255
## ENSG00000000938.11 228
## Samples
## Tags 2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
## ENSG00000000005.5 6
## ENSG00000000419.11 518
## ENSG00000000457.12 215
## ENSG00000000460.15 119
## ENSG00000000938.11 159
## Samples
## Tags 424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
## ENSG00000000005.5 5
## ENSG00000000419.11 2300
## ENSG00000000457.12 1445
## ENSG00000000460.15 831
## ENSG00000000938.11 2183
## Samples
## Tags 934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
## ENSG00000000005.5 11
## ENSG00000000419.11 627
## ENSG00000000457.12 518
## ENSG00000000460.15 401
## ENSG00000000938.11 227
## Samples
## Tags 0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 1012
## ENSG00000000457.12 468
## ENSG00000000460.15 187
## ENSG00000000938.11 534
## Samples
## Tags c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
## ENSG00000000005.5 433
## ENSG00000000419.11 3532
## ENSG00000000457.12 771
## ENSG00000000460.15 449
## ENSG00000000938.11 99
## Samples
## Tags dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
## ENSG00000000005.5 6
## ENSG00000000419.11 3445
## ENSG00000000457.12 840
## ENSG00000000460.15 523
## ENSG00000000938.11 891
## Samples
## Tags e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
## ENSG00000000005.5 0
## ENSG00000000419.11 757
## ENSG00000000457.12 234
## ENSG00000000460.15 232
## ENSG00000000938.11 333
## Samples
## Tags debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
## ENSG00000000005.5 0
## ENSG00000000419.11 1519
## ENSG00000000457.12 869
## ENSG00000000460.15 317
## ENSG00000000938.11 526
## Samples
## Tags 17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
## ENSG00000000005.5 11
## ENSG00000000419.11 1875
## ENSG00000000457.12 650
## ENSG00000000460.15 325
## ENSG00000000938.11 742
## Samples
## Tags 7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
## ENSG00000000005.5 11
## ENSG00000000419.11 757
## ENSG00000000457.12 332
## ENSG00000000460.15 237
## ENSG00000000938.11 139
## Samples
## Tags fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
## ENSG00000000005.5 8
## ENSG00000000419.11 959
## ENSG00000000457.12 307
## ENSG00000000460.15 246
## ENSG00000000938.11 324
## Samples
## Tags abe20df7-6b97-4397-8864-881bac27e92c.htseq
## ENSG00000000005.5 3
## ENSG00000000419.11 392
## ENSG00000000457.12 341
## ENSG00000000460.15 335
## ENSG00000000938.11 342
## Samples
## Tags 62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
## ENSG00000000005.5 1
## ENSG00000000419.11 1144
## ENSG00000000457.12 358
## ENSG00000000460.15 320
## ENSG00000000938.11 199
## Samples
## Tags 3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
## ENSG00000000005.5 1
## ENSG00000000419.11 2901
## ENSG00000000457.12 731
## ENSG00000000460.15 494
## ENSG00000000938.11 1845
## Samples
## Tags 087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
## ENSG00000000005.5 36
## ENSG00000000419.11 3725
## ENSG00000000457.12 1188
## ENSG00000000460.15 741
## ENSG00000000938.11 89
## Samples
## Tags c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 378
## ENSG00000000457.12 171
## ENSG00000000460.15 230
## ENSG00000000938.11 440
## Samples
## Tags 13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
## ENSG00000000005.5 100
## ENSG00000000419.11 2292
## ENSG00000000457.12 831
## ENSG00000000460.15 874
## ENSG00000000938.11 489
## Samples
## Tags 6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
## ENSG00000000005.5 31
## ENSG00000000419.11 4884
## ENSG00000000457.12 765
## ENSG00000000460.15 628
## ENSG00000000938.11 284
## Samples
## Tags 8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
## ENSG00000000005.5 4
## ENSG00000000419.11 1593
## ENSG00000000457.12 575
## ENSG00000000460.15 368
## ENSG00000000938.11 376
## Samples
## Tags 168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
## ENSG00000000005.5 76
## ENSG00000000419.11 1247
## ENSG00000000457.12 274
## ENSG00000000460.15 239
## ENSG00000000938.11 158
## Samples
## Tags 0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
## ENSG00000000005.5 3
## ENSG00000000419.11 1853
## ENSG00000000457.12 673
## ENSG00000000460.15 437
## ENSG00000000938.11 271
## Samples
## Tags 4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
## ENSG00000000005.5 1
## ENSG00000000419.11 506
## ENSG00000000457.12 270
## ENSG00000000460.15 184
## ENSG00000000938.11 918
## Samples
## Tags 7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
## ENSG00000000005.5 19
## ENSG00000000419.11 1464
## ENSG00000000457.12 271
## ENSG00000000460.15 303
## ENSG00000000938.11 93
## Samples
## Tags b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
## ENSG00000000005.5 1
## ENSG00000000419.11 1331
## ENSG00000000457.12 743
## ENSG00000000460.15 422
## ENSG00000000938.11 437
## Samples
## Tags f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
## ENSG00000000005.5 72
## ENSG00000000419.11 4749
## ENSG00000000457.12 877
## ENSG00000000460.15 536
## ENSG00000000938.11 446
## Samples
## Tags f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
## ENSG00000000005.5 7
## ENSG00000000419.11 1954
## ENSG00000000457.12 422
## ENSG00000000460.15 283
## ENSG00000000938.11 97
## Samples
## Tags e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
## ENSG00000000005.5 252
## ENSG00000000419.11 2538
## ENSG00000000457.12 581
## ENSG00000000460.15 521
## ENSG00000000938.11 208
## Samples
## Tags a26d49db-2309-46a0-a3ed-275378d484e7.htseq
## ENSG00000000005.5 26
## ENSG00000000419.11 3001
## ENSG00000000457.12 875
## ENSG00000000460.15 462
## ENSG00000000938.11 39
## Samples
## Tags a3f88a5d-7169-465b-bb80-e5999590681c.htseq
## ENSG00000000005.5 4
## ENSG00000000419.11 2746
## ENSG00000000457.12 732
## ENSG00000000460.15 542
## ENSG00000000938.11 888
## Samples
## Tags c264fe3b-482b-44ec-83a4-73df565663ff.htseq
## ENSG00000000005.5 104
## ENSG00000000419.11 5777
## ENSG00000000457.12 684
## ENSG00000000460.15 634
## ENSG00000000938.11 304
## Samples
## Tags bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
## ENSG00000000005.5 10
## ENSG00000000419.11 980
## ENSG00000000457.12 193
## ENSG00000000460.15 309
## ENSG00000000938.11 76
## Samples
## Tags 7261b656-c79c-4581-a503-15b653e2b5d2.htseq
## ENSG00000000005.5 2
## ENSG00000000419.11 2981
## ENSG00000000457.12 658
## ENSG00000000460.15 845
## ENSG00000000938.11 459
## Samples
## Tags ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
## ENSG00000000005.5 78
## ENSG00000000419.11 1846
## ENSG00000000457.12 1368
## ENSG00000000460.15 415
## ENSG00000000938.11 283
## Samples
## Tags f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
## ENSG00000000005.5 12
## ENSG00000000419.11 478
## ENSG00000000457.12 98
## ENSG00000000460.15 95
## ENSG00000000938.11 112
## Samples
## Tags bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
## ENSG00000000005.5 4
## ENSG00000000419.11 850
## ENSG00000000457.12 277
## ENSG00000000460.15 315
## ENSG00000000938.11 67
## Samples
## Tags 564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
## ENSG00000000005.5 19
## ENSG00000000419.11 202
## ENSG00000000457.12 117
## ENSG00000000460.15 61
## ENSG00000000938.11 91
## Samples
## Tags 82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
## ENSG00000000005.5 26
## ENSG00000000419.11 5155
## ENSG00000000457.12 728
## ENSG00000000460.15 626
## ENSG00000000938.11 46
## Samples
## Tags 9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
## ENSG00000000005.5 24
## ENSG00000000419.11 557
## ENSG00000000457.12 454
## ENSG00000000460.15 175
## ENSG00000000938.11 70
## Samples
## Tags fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
## ENSG00000000005.5 15
## ENSG00000000419.11 4147
## ENSG00000000457.12 679
## ENSG00000000460.15 764
## ENSG00000000938.11 477
## Samples
## Tags 32a115cf-570f-4ad9-a123-8e1970062f51.htseq
## ENSG00000000005.5 37
## ENSG00000000419.11 2843
## ENSG00000000457.12 1259
## ENSG00000000460.15 869
## ENSG00000000938.11 438
## Samples
## Tags 05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
## ENSG00000000005.5 42
## ENSG00000000419.11 844
## ENSG00000000457.12 137
## ENSG00000000460.15 133
## ENSG00000000938.11 24
## Samples
## Tags 5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
## ENSG00000000005.5 179
## ENSG00000000419.11 1307
## ENSG00000000457.12 571
## ENSG00000000460.15 307
## ENSG00000000938.11 136
## Samples
## Tags 43b292be-5d63-4523-a43f-666d20039208.htseq
## ENSG00000000005.5 140
## ENSG00000000419.11 1101
## ENSG00000000457.12 407
## ENSG00000000460.15 191
## ENSG00000000938.11 85
## 60482 more rows ...
##
## $genes
## ENSEMBL SYMBOL TXCHROM
## 1 ENSG00000000005 TNMD chrX
## 2 ENSG00000000419 DPM1 chr20
## 3 ENSG00000000457 SCYL3 chr1
## 4 ENSG00000000460 C1orf112 chr1
## 5 ENSG00000000938 FGR chr1
## 60482 more rows ...
cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)
L <- mean(x$samples$lib.size) * 1e-6
M <- median(x$samples$lib.size) * 1e-6
c(L, M)
## [1] 64.23804 58.76902
summary(lcpm)
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5302
## 3rd Qu.:-0.7721
## Max. :17.9542
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.573
## 3rd Qu.:-1.020
## Max. :18.478
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.548
## 3rd Qu.:-1.079
## Max. :18.160
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-3.4138
## Mean :-2.3687
## 3rd Qu.:-0.6434
## Max. :19.0973
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.406
## 3rd Qu.:-0.591
## Max. :18.390
## 649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4792
## 3rd Qu.:-0.8821
## Max. :18.3537
## 86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.6634
## Mean :-2.4026
## 3rd Qu.:-0.7838
## Max. :18.3240
## 28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.6140
## Mean :-2.4599
## 3rd Qu.:-0.7757
## Max. :18.0427
## 911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5570
## Mean :-2.4067
## 3rd Qu.:-0.6097
## Max. :17.9832
## d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4745
## 3rd Qu.:-0.6094
## Max. :18.1525
## f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
## Min. :-5.00536
## 1st Qu.:-5.00536
## Median :-4.29934
## Mean :-2.20443
## 3rd Qu.:-0.06656
## Max. :17.62822
## f590941d-19dc-427a-95b6-942c97ea8333.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.3872
## 3rd Qu.:-0.5109
## Max. :18.2714
## 55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-4.694
## Mean :-2.678
## 3rd Qu.:-1.498
## Max. :19.004
## 0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4202
## 3rd Qu.:-0.3006
## Max. :18.0102
## 9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.2625
## Mean :-2.3000
## 3rd Qu.:-0.2363
## Max. :18.3430
## d2587070-cb7d-440d-ae49-52f5077248e6.htseq
## Min. :-5.00536
## 1st Qu.:-5.00536
## Median :-4.55846
## Mean :-2.21997
## 3rd Qu.:-0.02803
## Max. :17.69040
## 7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-4.644
## Mean :-2.912
## 3rd Qu.:-1.374
## Max. :18.862
## 2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.3752
## 3rd Qu.:-0.3785
## Max. :17.9383
## 424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.1417
## Mean :-2.4586
## 3rd Qu.:-0.7395
## Max. :19.0773
## 934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-3.9604
## Mean :-2.5190
## 3rd Qu.:-0.7787
## Max. :18.7200
## 0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.310
## 3rd Qu.:-0.129
## Max. :17.739
## c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5364
## Mean :-2.4309
## 3rd Qu.:-0.6687
## Max. :17.9586
## dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.6905
## Mean :-2.3374
## 3rd Qu.:-0.2584
## Max. :18.1117
## e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5190
## 3rd Qu.:-0.8769
## Max. :18.2798
## debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-4.529
## Mean :-2.669
## 3rd Qu.:-1.269
## Max. :19.271
## 17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.4684
## Mean :-2.2833
## 3rd Qu.:-0.1722
## Max. :18.0424
## 7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.315
## 3rd Qu.:-0.347
## Max. :18.253
## fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4626
## 3rd Qu.:-0.7878
## Max. :18.5223
## abe20df7-6b97-4397-8864-881bac27e92c.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4819
## 3rd Qu.:-0.6892
## Max. :18.2755
## 62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4094
## 3rd Qu.:-0.6332
## Max. :18.1225
## 3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.4363
## Mean :-2.3126
## 3rd Qu.:-0.3237
## Max. :17.6880
## 087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5625
## Mean :-2.4787
## 3rd Qu.:-0.9039
## Max. :18.0167
## c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5007
## 3rd Qu.:-0.6715
## Max. :18.2731
## 13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.6001
## Mean :-2.2739
## 3rd Qu.:-0.3063
## Max. :17.8027
## 6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5434
## Mean :-2.3342
## 3rd Qu.:-0.4192
## Max. :17.8651
## 8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.4560
## Mean :-2.4084
## 3rd Qu.:-0.7874
## Max. :17.8969
## 168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4576
## 3rd Qu.:-0.6493
## Max. :18.3474
## 0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5027
## Mean :-2.3667
## 3rd Qu.:-0.5842
## Max. :17.8840
## 4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.3788
## 3rd Qu.:-0.5019
## Max. :18.2679
## 7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5250
## 3rd Qu.:-0.8667
## Max. :18.4535
## b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.4286
## Mean :-2.3446
## 3rd Qu.:-0.5695
## Max. :17.7747
## f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5018
## Mean :-2.3837
## 3rd Qu.:-0.5533
## Max. :18.0439
## f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.374
## 3rd Qu.:-0.463
## Max. :18.269
## e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.494
## 3rd Qu.:-0.882
## Max. :18.606
## a26d49db-2309-46a0-a3ed-275378d484e7.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.505
## 3rd Qu.:-1.076
## Max. :17.950
## a3f88a5d-7169-465b-bb80-e5999590681c.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.0762
## Mean :-1.9487
## 3rd Qu.: 0.8739
## Max. :18.1564
## c264fe3b-482b-44ec-83a4-73df565663ff.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5239
## Mean :-2.2732
## 3rd Qu.:-0.3071
## Max. :17.8054
## bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4576
## 3rd Qu.:-0.7238
## Max. :18.4775
## 7261b656-c79c-4581-a503-15b653e2b5d2.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.3392
## 3rd Qu.:-0.4668
## Max. :17.7787
## ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.6205
## Mean :-2.3583
## 3rd Qu.:-0.4248
## Max. :17.9569
## f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.635
## 3rd Qu.:-0.981
## Max. :18.298
## bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5172
## 3rd Qu.:-0.9961
## Max. :18.2876
## 564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.5218
## 3rd Qu.:-0.6848
## Max. :18.2312
## 82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-4.5039
## Mean :-2.4147
## 3rd Qu.:-0.7359
## Max. :17.8176
## 9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.3284
## 3rd Qu.:-0.2542
## Max. :18.0456
## fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-4.399
## Mean :-2.282
## 3rd Qu.:-0.272
## Max. :17.898
## 32a115cf-570f-4ad9-a123-8e1970062f51.htseq
## Min. :-5.00536
## 1st Qu.:-5.00536
## Median :-4.51407
## Mean :-2.18955
## 3rd Qu.:-0.05016
## Max. :17.82851
## 05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
## Min. :-5.005
## 1st Qu.:-5.005
## Median :-5.005
## Mean :-2.610
## 3rd Qu.:-1.036
## Max. :18.150
## 5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4734
## 3rd Qu.:-0.6384
## Max. :18.7800
## 43b292be-5d63-4523-a43f-666d20039208.htseq
## Min. :-5.0054
## 1st Qu.:-5.0054
## Median :-5.0054
## Mean :-2.4267
## 3rd Qu.:-0.6233
## Max. :18.3746
True signifies how many genes have counts equal to zero, meaning genes are unexpressed throughout all samples.
table(rowSums(x$counts==0)==9)
##
## FALSE TRUE
## 60074 413
keep.exprs <- filterByExpr(x, group=group)
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)
## [1] 19105 60
There is a sample that is a potential outlier (green colored line), could remove the sample for future analysis but spoke to porfessor Craig on 12/4 and agreed to leave the sample in since the vignette has a normalisation step.
Spoke to professor Craig on 12/4 and agreed to stop working on this issue. I understand that the “Paired” palatte only offers 12 colors so every 13th sample repeats color scheme. I tried increasing the number of colors available with colorRampPalatte but was unsuccesful.
lcpm.cutoff <- log2(10/M + 2/L)
library(RColorBrewer)
#library(colorRamps)
nsamples <- ncol(x)
col <- brewer.pal(nsamples, "Paired") #results in the error message: n too large, allowed maximum for palette Paired is 12. Returning the palette you asked for with that many colors
## Warning in brewer.pal(nsamples, "Paired"): n too large, allowed maximum for palette Paired is 12
## Returning the palette you asked for with that many colors
#nb.cols = 60
#col <- colorRampPalette(brewer.pal(nsamples, "Paired"))(nb.cols) #colorRampPalette is a constructor function that builds palettes with arbitrary number of colors by interpolating existing palette
par(mfrow=c(1,2)) #1 row, 2 columns
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.26), las=2, main="", xlab="")
title(main="A. Raw data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")
lcpm <- cpm(x, log=TRUE)
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.26), las=2, main="", xlab="")
title(main="B. Filtered data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")
x <- calcNormFactors(x, method = "TMM")
x$samples$norm.factors
## [1] 0.8247877 0.8701067 0.9744718 0.3338411 1.0672340 0.9853437 0.8355374
## [8] 1.0685271 1.1041480 1.1621714 1.3190864 1.1616288 0.5656024 1.0079779
## [15] 1.1311621 1.3455788 0.2548806 1.1728192 0.5164198 0.4158854 1.1552465
## [22] 1.0977806 1.1538364 1.0380711 0.5054850 1.2962443 1.1705252 0.9900169
## [29] 0.9259593 1.1222135 1.2985756 1.0598233 0.9471138 1.3391984 1.3043419
## [36] 1.1144424 1.0697018 1.1921660 1.1054413 0.9399911 1.2384865 1.2243347
## [43] 1.0760378 0.9933192 1.1164186 1.4176459 1.3689476 0.8989757 1.3130426
## [50] 1.0789261 0.7851273 1.0110826 0.9751891 1.1994225 1.2667583 1.3476310
## [57] 1.4869359 1.0917179 0.8579364 1.0468322
x2 <- x
x2$samples$norm.factors <- 1
x2$counts[,1] <- ceiling(x2$counts[,1]*0.05)
x2$counts[,2] <- x2$counts[,2]*5
par(mfrow=c(1,1)) #makes boxplot look less cramped
lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="A. Example: Unnormalised data",ylab="Log-cpm")
x2 <- calcNormFactors(x2)
x2$samples$norm.factors
## [1] 0.04889808 4.36314201 0.99857375 0.36344409 1.08814631 1.00048470
## [7] 0.91165862 1.07196030 1.10500025 1.21054683 1.28772624 1.16063702
## [13] 0.60882746 1.04998828 1.13203522 1.31787908 0.28023587 1.16142488
## [19] 0.53329090 0.46494919 1.14274272 1.09741972 1.16406362 1.04925131
## [25] 0.51082602 1.30832124 1.17758030 1.00128465 0.97668711 1.11752935
## [31] 1.28315406 1.05679584 0.99263908 1.36510564 1.33873108 1.14293994
## [37] 1.10193770 1.21035319 1.10148876 0.97629109 1.25905436 1.25325781
## [43] 1.12605092 1.02503014 1.11956858 1.41609997 1.38704078 0.91182127
## [49] 1.31684398 1.09702295 0.85028060 1.03939580 1.01768934 1.20164571
## [55] 1.27684217 1.35509884 1.51897054 1.10244902 0.85428854 1.05218206
This step forces the samples to even out, may not be a good thing since there is a potential outlier.
lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="B. Example: Normalised data",ylab="Log-cpm")
I spoke to professor Craig on 12/4, ok to ignore error since I am only comparing 2 different subsets of colon cancer. To get rid of this error I would need to add an additional factor: lane.
lcpm <- cpm(x, log=TRUE)
par(mfrow=c(1,1)) #1 row, 1 column
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Set1") #n= number of different colors in a palette with the min being 3
## Warning in brewer.pal(nlevels(col.group), "Set1"): minimal value for n is 3, returning requested palette with 3 different levels
col.group <- as.character(col.group)
#col.lane <- lane did not have lanes for my data
#levels(col.lane) <- brewer.pal(nlevels(col.lane), "Set2")
#col.lane <- as.character(col.lane)
plotMDS(lcpm, labels=group, col=col.group)
title(main="A. Sample groups")
#plotMDS(lcpm, labels=lane, col=col.lane, dim=c(3,4))
#title(main="B. Sequencing lanes")
HTML page will be generarted and opened in a browser if launch=TRUE
library(Glimma)
glMDSPlot(lcpm, labels=paste(group, sep="_"),
groups=x$samples[,c(1,2)], launch=TRUE)
design <- model.matrix(~0+group) #removes intercept from the factor group
#design <- model.matrix(~group) leaves intercept from factor group, but model contrasts are more straight forward without intercept
colnames(design) <- gsub("group", "", colnames(design))
design
## ADENOCARCINOMA CMS
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 6 0 1
## 7 0 1
## 8 0 1
## 9 0 1
## 10 0 1
## 11 0 1
## 12 0 1
## 13 0 1
## 14 0 1
## 15 0 1
## 16 0 1
## 17 0 1
## 18 0 1
## 19 0 1
## 20 0 1
## 21 0 1
## 22 0 1
## 23 0 1
## 24 0 1
## 25 0 1
## 26 0 1
## 27 0 1
## 28 0 1
## 29 0 1
## 30 0 1
## 31 1 0
## 32 1 0
## 33 1 0
## 34 1 0
## 35 1 0
## 36 1 0
## 37 1 0
## 38 1 0
## 39 1 0
## 40 1 0
## 41 1 0
## 42 1 0
## 43 1 0
## 44 1 0
## 45 1 0
## 46 1 0
## 47 1 0
## 48 1 0
## 49 1 0
## 50 1 0
## 51 1 0
## 52 1 0
## 53 1 0
## 54 1 0
## 55 1 0
## 56 1 0
## 57 1 0
## 58 1 0
## 59 1 0
## 60 1 0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
Since I am only comparing CMS and Adenocarcinoma, I will only have 1 pairwise comparison.
library(limma)
contr.matrix <- makeContrasts(
ADENOCARCINOMAvsCMS = ADENOCARCINOMA-CMS,
levels = colnames(design))
contr.matrix
## Contrasts
## Levels ADENOCARCINOMAvsCMS
## ADENOCARCINOMA 1
## CMS -1
Each black dot represents a gene. The red curve is the estimated mean-varience trend used to compute the voom weights.
par(mfrow=c(1,2))
v <- voom(x, design, plot=TRUE) #voom converts raw counts to log-CPM values by extracting library sizes and normalisation factors from x
v
## An object of class "EList"
## $genes
## ENSEMBL SYMBOL TXCHROM
## 1 ENSG00000000005 TNMD chrX
## 2 ENSG00000000419 DPM1 chr20
## 3 ENSG00000000457 SCYL3 chr1
## 4 ENSG00000000460 C1orf112 chr1
## 5 ENSG00000000938 FGR chr1
## 19100 more rows ...
##
## $targets
## files
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt
## group lib.size norm.factors
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq CMS 74292479 0.8247877
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq CMS 31987425 0.8701067
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq CMS 41835745 0.9744718
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq CMS 37079196 0.3338411
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq CMS 38748737 1.0672340
## 55 more rows ...
##
## $E
## Samples
## Tags 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
## ENSG00000000005.5 -0.6452887
## ENSG00000000419.11 4.0286248
## ENSG00000000457.12 3.9851413
## ENSG00000000460.15 0.7096682
## ENSG00000000938.11 1.1642341
## Samples
## Tags bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
## ENSG00000000005.5 -2.829508
## ENSG00000000419.11 4.473258
## ENSG00000000457.12 2.886263
## ENSG00000000460.15 2.725081
## ENSG00000000938.11 2.134993
## Samples
## Tags 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
## ENSG00000000005.5 -4.064736
## ENSG00000000419.11 4.069690
## ENSG00000000457.12 3.723166
## ENSG00000000460.15 2.937516
## ENSG00000000938.11 1.160230
## Samples
## Tags 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
## ENSG00000000005.5 -1.820221
## ENSG00000000419.11 3.544018
## ENSG00000000457.12 4.437616
## ENSG00000000460.15 4.029445
## ENSG00000000938.11 3.674683
## Samples
## Tags 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
## ENSG00000000005.5 -3.468722
## ENSG00000000419.11 4.049228
## ENSG00000000457.12 3.007011
## ENSG00000000460.15 2.235675
## ENSG00000000938.11 1.751829
## Samples
## Tags 649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
## ENSG00000000005.5 -1.1743179
## ENSG00000000419.11 5.1369998
## ENSG00000000457.12 2.9131449
## ENSG00000000460.15 2.6294792
## ENSG00000000938.11 0.6819466
## Samples
## Tags 86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
## ENSG00000000005.5 -2.786013
## ENSG00000000419.11 5.199731
## ENSG00000000457.12 4.248928
## ENSG00000000460.15 3.844348
## ENSG00000000938.11 3.108387
## Samples
## Tags 28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
## ENSG00000000005.5 -1.553263
## ENSG00000000419.11 4.371787
## ENSG00000000457.12 2.924414
## ENSG00000000460.15 2.663967
## ENSG00000000938.11 2.643134
## Samples
## Tags 911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
## ENSG00000000005.5 -5.2805208
## ENSG00000000419.11 4.3152962
## ENSG00000000457.12 2.9396157
## ENSG00000000460.15 2.2570859
## ENSG00000000938.11 0.5112933
## Samples
## Tags d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
## ENSG00000000005.5 -2.408949
## ENSG00000000419.11 5.250282
## ENSG00000000457.12 4.301365
## ENSG00000000460.15 3.473694
## ENSG00000000938.11 2.114612
## Samples
## Tags f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
## ENSG00000000005.5 -2.674378
## ENSG00000000419.11 4.258048
## ENSG00000000457.12 3.219863
## ENSG00000000460.15 1.996700
## ENSG00000000938.11 3.869207
## Samples
## Tags f590941d-19dc-427a-95b6-942c97ea8333.htseq
## ENSG00000000005.5 -6.376198
## ENSG00000000419.11 4.130606
## ENSG00000000457.12 3.101561
## ENSG00000000460.15 2.533696
## ENSG00000000938.11 3.187952
## Samples
## Tags 55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
## ENSG00000000005.5 -5.645779
## ENSG00000000419.11 5.295513
## ENSG00000000457.12 3.570967
## ENSG00000000460.15 3.454883
## ENSG00000000938.11 2.374738
## Samples
## Tags 0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
## ENSG00000000005.5 -1.143810
## ENSG00000000419.11 3.241950
## ENSG00000000457.12 2.041301
## ENSG00000000460.15 2.198582
## ENSG00000000938.11 3.788053
## Samples
## Tags 9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
## ENSG00000000005.5 -2.844269
## ENSG00000000419.11 4.802950
## ENSG00000000457.12 3.394772
## ENSG00000000460.15 2.854320
## ENSG00000000938.11 2.128424
## Samples
## Tags d2587070-cb7d-440d-ae49-52f5077248e6.htseq
## ENSG00000000005.5 0.06657207
## ENSG00000000419.11 4.12233363
## ENSG00000000457.12 2.89854696
## ENSG00000000460.15 1.25377506
## ENSG00000000938.11 4.15471639
## Samples
## Tags 7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
## ENSG00000000005.5 -3.518050
## ENSG00000000419.11 4.399620
## ENSG00000000457.12 2.911566
## ENSG00000000460.15 3.157201
## ENSG00000000938.11 2.996072
## Samples
## Tags 2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
## ENSG00000000005.5 -2.288229
## ENSG00000000419.11 4.029531
## ENSG00000000457.12 2.762875
## ENSG00000000460.15 1.912198
## ENSG00000000938.11 2.328744
## Samples
## Tags 424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
## ENSG00000000005.5 -3.874298
## ENSG00000000419.11 4.834002
## ENSG00000000457.12 4.163623
## ENSG00000000460.15 3.365842
## ENSG00000000938.11 4.758697
## Samples
## Tags 934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
## ENSG00000000005.5 -1.706603
## ENSG00000000419.11 4.063307
## ENSG00000000457.12 3.788035
## ENSG00000000460.15 3.419091
## ENSG00000000938.11 2.599558
## Samples
## Tags 0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
## ENSG00000000005.5 -4.763380
## ENSG00000000419.11 3.898399
## ENSG00000000457.12 2.786598
## ENSG00000000460.15 1.465439
## ENSG00000000938.11 2.976738
## Samples
## Tags c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
## ENSG00000000005.5 2.2408036
## ENSG00000000419.11 5.2673893
## ENSG00000000457.12 3.0724378
## ENSG00000000460.15 2.2930928
## ENSG00000000938.11 0.1175401
## Samples
## Tags dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
## ENSG00000000005.5 -4.544553
## ENSG00000000419.11 4.505505
## ENSG00000000457.12 2.470112
## ENSG00000000460.15 1.787053
## ENSG00000000938.11 2.555099
## Samples
## Tags e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
## ENSG00000000005.5 -5.921106
## ENSG00000000419.11 4.643996
## ENSG00000000457.12 2.952338
## ENSG00000000460.15 2.939981
## ENSG00000000938.11 3.460437
## Samples
## Tags debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
## ENSG00000000005.5 -7.372309
## ENSG00000000419.11 4.197072
## ENSG00000000457.12 3.391733
## ENSG00000000460.15 1.938303
## ENSG00000000938.11 2.667980
## Samples
## Tags 17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
## ENSG00000000005.5 -3.003489
## ENSG00000000419.11 4.346009
## ENSG00000000457.12 2.818355
## ENSG00000000460.15 1.819463
## ENSG00000000938.11 3.009197
## Samples
## Tags 7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
## ENSG00000000005.5 -1.751115
## ENSG00000000419.11 4.290426
## ENSG00000000457.12 3.102534
## ENSG00000000460.15 2.617107
## ENSG00000000938.11 1.849445
## Samples
## Tags fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
## ENSG00000000005.5 -2.408306
## ENSG00000000419.11 4.410370
## ENSG00000000457.12 2.768674
## ENSG00000000460.15 2.449675
## ENSG00000000938.11 2.846306
## Samples
## Tags abe20df7-6b97-4397-8864-881bac27e92c.htseq
## ENSG00000000005.5 -3.366592
## ENSG00000000419.11 3.442602
## ENSG00000000457.12 3.241795
## ENSG00000000460.15 3.216222
## ENSG00000000938.11 3.246014
## Samples
## Tags 62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
## ENSG00000000005.5 -5.454801
## ENSG00000000419.11 4.120738
## ENSG00000000457.12 2.446066
## ENSG00000000460.15 2.284417
## ENSG00000000938.11 1.600482
## Samples
## Tags 3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
## ENSG00000000005.5 -5.844178
## ENSG00000000419.11 5.073443
## ENSG00000000457.12 3.085574
## ENSG00000000460.15 2.520686
## ENSG00000000938.11 4.420656
## Samples
## Tags 087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
## ENSG00000000005.5 -1.37511256
## ENSG00000000419.11 5.29828123
## ENSG00000000457.12 3.64998907
## ENSG00000000460.15 2.96936577
## ENSG00000000938.11 -0.08112134
## Samples
## Tags c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
## ENSG00000000005.5 -3.735749
## ENSG00000000419.11 3.506472
## ENSG00000000457.12 2.364387
## ENSG00000000460.15 2.790945
## ENSG00000000938.11 3.725321
## Samples
## Tags 13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
## ENSG00000000005.5 -0.3984018
## ENSG00000000419.11 4.1132525
## ENSG00000000457.12 2.6501190
## ENSG00000000460.15 2.7228611
## ENSG00000000938.11 1.8857116
## Samples
## Tags 6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
## ENSG00000000005.5 -1.815983
## ENSG00000000419.11 5.460732
## ENSG00000000457.12 2.786995
## ENSG00000000460.15 2.502506
## ENSG00000000938.11 1.359022
## Samples
## Tags 8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
## ENSG00000000005.5 -4.099968
## ENSG00000000419.11 4.368090
## ENSG00000000457.12 2.898779
## ENSG00000000460.15 2.255627
## ENSG00000000938.11 2.286613
## Samples
## Tags 168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
## ENSG00000000005.5 1.352327
## ENSG00000000419.11 5.379763
## ENSG00000000457.12 3.195602
## ENSG00000000460.15 2.998821
## ENSG00000000938.11 2.403278
## Samples
## Tags 0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
## ENSG00000000005.5 -4.712731
## ENSG00000000419.11 4.335951
## ENSG00000000457.12 2.875449
## ENSG00000000460.15 2.253054
## ENSG00000000938.11 1.564723
## Samples
## Tags 4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
## ENSG00000000005.5 -4.648538
## ENSG00000000419.11 3.750918
## ENSG00000000457.12 2.845984
## ENSG00000000460.15 2.293977
## ENSG00000000938.11 4.609635
## Samples
## Tags 7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
## ENSG00000000005.5 -0.8970334
## ENSG00000000419.11 5.3337569
## ENSG00000000457.12 2.9023728
## ENSG00000000460.15 3.0631171
## ENSG00000000938.11 1.3644589
## Samples
## Tags b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
## ENSG00000000005.5 -5.751942
## ENSG00000000419.11 4.041932
## ENSG00000000457.12 3.201284
## ENSG00000000460.15 2.385903
## ENSG00000000938.11 2.436234
## Samples
## Tags f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
## ENSG00000000005.5 -0.375647
## ENSG00000000419.11 5.658004
## ENSG00000000457.12 3.221699
## ENSG00000000460.15 2.511878
## ENSG00000000938.11 2.246960
## Samples
## Tags f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
## ENSG00000000005.5 -2.367437
## ENSG00000000419.11 5.658257
## ENSG00000000457.12 3.448480
## ENSG00000000460.15 2.872878
## ENSG00000000938.11 1.333003
## Samples
## Tags e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
## ENSG00000000005.5 2.116966
## ENSG00000000419.11 5.446587
## ENSG00000000457.12 3.320462
## ENSG00000000460.15 3.163350
## ENSG00000000938.11 1.840730
## Samples
## Tags a26d49db-2309-46a0-a3ed-275378d484e7.htseq
## ENSG00000000005.5 -1.557761
## ENSG00000000419.11 5.265786
## ENSG00000000457.12 3.488282
## ENSG00000000460.15 2.567628
## ENSG00000000938.11 -0.981901
## Samples
## Tags a3f88a5d-7169-465b-bb80-e5999590681c.htseq
## ENSG00000000005.5 -4.475837
## ENSG00000000419.11 4.777617
## ENSG00000000457.12 2.870923
## ENSG00000000460.15 2.437718
## ENSG00000000938.11 3.149466
## Samples
## Tags c264fe3b-482b-44ec-83a4-73df565663ff.htseq
## ENSG00000000005.5 -0.08499537
## ENSG00000000419.11 5.70387514
## ENSG00000000457.12 2.62655223
## ENSG00000000460.15 2.51712185
## ENSG00000000938.11 1.45794392
## Samples
## Tags bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
## ENSG00000000005.5 -1.499968
## ENSG00000000419.11 5.045088
## ENSG00000000457.12 2.703904
## ENSG00000000460.15 3.381510
## ENSG00000000938.11 1.365102
## Samples
## Tags 7261b656-c79c-4581-a503-15b653e2b5d2.htseq
## ENSG00000000005.5 -4.885539
## ENSG00000000419.11 5.334355
## ENSG00000000457.12 3.155572
## ENSG00000000460.15 3.516193
## ENSG00000000938.11 2.636454
## Samples
## Tags ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
## ENSG00000000005.5 -0.528445
## ENSG00000000419.11 4.027512
## ENSG00000000457.12 3.595314
## ENSG00000000460.15 1.875639
## ENSG00000000938.11 1.324139
## Samples
## Tags f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
## ENSG00000000005.5 -0.4005062
## ENSG00000000419.11 4.8580127
## ENSG00000000457.12 2.5776894
## ENSG00000000460.15 2.5330664
## ENSG00000000938.11 2.7694188
## Samples
## Tags bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
## ENSG00000000005.5 -2.9336310
## ENSG00000000419.11 4.6286114
## ENSG00000000457.12 3.0127879
## ENSG00000000460.15 3.1979402
## ENSG00000000938.11 0.9732596
## Samples
## Tags 564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
## ENSG00000000005.5 0.2422946
## ENSG00000000419.11 3.6186704
## ENSG00000000457.12 2.8334093
## ENSG00000000460.15 1.8994068
## ENSG00000000938.11 2.4725922
## Samples
## Tags 82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
## ENSG00000000005.5 -1.804916
## ENSG00000000419.11 5.799060
## ENSG00000000457.12 2.975948
## ENSG00000000460.15 2.758334
## ENSG00000000938.11 -0.993678
## Samples
## Tags 9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
## ENSG00000000005.5 -0.425799
## ENSG00000000419.11 4.082319
## ENSG00000000457.12 3.787628
## ENSG00000000460.15 2.414818
## ENSG00000000938.11 1.099043
## Samples
## Tags fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
## ENSG00000000005.5 -2.416932
## ENSG00000000419.11 5.646898
## ENSG00000000457.12 3.037201
## ENSG00000000460.15 3.207244
## ENSG00000000938.11 2.528229
## Samples
## Tags 32a115cf-570f-4ad9-a123-8e1970062f51.htseq
## ENSG00000000005.5 -1.647989
## ENSG00000000419.11 4.596644
## ENSG00000000457.12 3.421828
## ENSG00000000460.15 2.887235
## ENSG00000000938.11 1.899625
## Samples
## Tags 05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
## ENSG00000000005.5 1.5670755
## ENSG00000000419.11 5.8796382
## ENSG00000000457.12 3.2609724
## ENSG00000000460.15 3.2183805
## ENSG00000000938.11 0.7723944
## Samples
## Tags 5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
## ENSG00000000005.5 2.000466
## ENSG00000000419.11 4.865221
## ENSG00000000457.12 3.671236
## ENSG00000000460.15 2.777068
## ENSG00000000938.11 1.605383
## Samples
## Tags 43b292be-5d63-4523-a43f-666d20039208.htseq
## ENSG00000000005.5 1.851908
## ENSG00000000419.11 4.822736
## ENSG00000000457.12 3.388138
## ENSG00000000460.15 2.298683
## ENSG00000000938.11 1.135335
## 19100 more rows ...
##
## $weights
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.4272084 0.3605236 0.3757352 0.365797 0.3693936 0.3605236 0.4570683
## [2,] 1.9814015 1.6365197 1.7819632 1.719739 1.7436574 1.6501657 2.0277074
## [3,] 1.6643834 1.1684340 1.3145104 1.246240 1.2706888 1.1800283 1.8183637
## [4,] 1.3834658 0.9685114 1.0782455 1.026371 1.0448994 0.9770827 1.5681360
## [5,] 1.3694188 0.9598946 1.0679714 1.016793 1.0351053 0.9683706 1.5534252
## [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 0.4672378 0.4540638 0.3605236 0.4174268 0.3751305 0.4282511 0.3605236
## [2,] 2.0368915 2.0243354 1.3060271 1.9565953 1.7784959 1.9834135 1.4385854
## [3,] 1.8583854 1.8042999 0.9377362 1.6032037 1.3103639 1.6709795 1.0217228
## [4,] 1.6255947 1.5498688 0.8002510 1.3223929 1.0749748 1.3900854 0.8613773
## [5,] 1.6119818 1.5353716 0.7943371 1.3091158 1.0648054 1.3759544 0.8544378
## [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## [1,] 0.3974921 0.4756471 0.3605236 0.3605236 0.4352165 0.3667904 0.4186382
## [2,] 1.8910097 2.0431154 1.5704385 1.6320811 1.9967627 1.7263329 1.9598408
## [3,] 1.4673786 1.8891483 1.1145441 1.1646677 1.7094138 1.2529634 1.6107185
## [4,] 1.2010650 1.6720149 0.9289599 0.9657247 1.4336004 1.0314693 1.3298629
## [5,] 1.1890175 1.6579150 0.9210325 0.9571388 1.4193715 1.0218322 1.3164920
## [,22] [,23] [,24] [,25] [,26] [,27] [,28]
## [1,] 0.4481102 0.5034813 0.3605236 0.4378621 0.4486748 0.3693138 0.3821114
## [2,] 2.0166399 2.0545965 1.6045624 2.0017922 2.0173736 1.7431256 1.8186144
## [3,] 1.7764873 1.9617136 1.1414146 1.7237786 1.7791217 1.2701433 1.3587262
## [4,] 1.5140091 1.7983482 0.9485683 1.4502539 1.5173906 1.0444864 1.1130911
## [5,] 1.4999295 1.7871833 0.9404314 1.4358184 1.5032717 1.0346972 1.1022734
## [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,] 0.3636339 0.4156818 0.5566926 0.5692988 0.4457222 0.6160439 0.5910201
## [2,] 1.7054115 1.9519111 2.0455284 2.0508520 1.8436462 2.0553194 2.0556041
## [3,] 1.2316779 1.5924075 1.7025964 1.7535783 1.1491360 1.8988201 1.8304581
## [4,] 1.0153172 1.3116768 1.5372851 1.5961086 1.0233400 1.7812708 1.6886882
## [5,] 1.0059897 1.2983333 1.1829062 1.2325189 0.8208803 1.4281382 1.3216210
## [,36] [,37] [,38] [,39] [,40] [,41] [,42]
## [1,] 0.5422713 0.4351421 0.5651271 0.4583116 0.4546091 0.5482909 0.5684231
## [2,] 2.0374975 1.7951817 2.0493867 1.8918037 1.8778143 2.0416245 2.0506310
## [3,] 1.6396777 1.0987680 1.7365313 1.2115461 1.1929927 1.6677198 1.7499956
## [4,] 1.4686227 0.9815781 1.5772147 1.0754848 1.0599818 1.4980710 1.5923585
## [5,] 1.1280796 0.7932221 1.2158480 0.8549097 0.8448113 1.1506285 1.2290082
## [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## [1,] 0.4612994 0.507252 0.5436882 0.576914 0.5909329 0.4342706 0.5367139
## [2,] 1.9008018 2.003378 2.0385513 2.052757 2.0556004 1.7909115 2.0333376
## [3,] 1.2266409 1.466563 1.6462312 1.780728 1.8301464 1.0946825 1.6140829
## [4,] 1.0883208 1.298794 1.4756170 1.628880 1.6883568 0.9782159 1.4413704
## [5,] 0.8634054 1.004397 1.1333250 1.263292 1.3212579 0.7909791 1.1076346
## [,50] [,51] [,52] [,53] [,54] [,55] [,56]
## [1,] 0.5938889 0.3807075 0.4489648 0.3806337 0.5663092 0.4445163 0.5513897
## [2,] 2.0557268 1.4513570 1.8561201 1.4508110 2.0499284 1.8390051 2.0430719
## [3,] 1.8385894 0.8728752 1.1650257 0.8726089 1.7413566 1.1432791 1.6813825
## [4,] 1.6996040 0.7953596 1.0366278 0.7951382 1.5829164 1.0184223 1.5124736
## [5,] 1.3335985 0.6665224 0.8295810 0.6663713 1.2205580 0.8176537 1.1624749
## [,57] [,58] [,59] [,60]
## [1,] 0.5991062 0.3690418 0.4772951 0.4619017
## [2,] 2.0559479 1.3656913 1.9465166 1.9026135
## [3,] 1.8531617 0.8321907 1.3093027 1.2296981
## [4,] 1.7195119 0.7611163 1.1586773 1.0909190
## [5,] 1.3555422 0.6433334 0.9098613 0.8651225
## 19100 more rows ...
##
## $design
## ADENOCARCINOMA CMS
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 55 more rows ...
Each black dot is a gene. The blue line is the average log2 residual standard deviation computed with the Bayes algorithm.
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit, main="Final model: Mean-variance trend") #plots log2 residual standard deviations against mean log-CPM values
Quick view at how many genes are down-regulated, up-regulated, and not statistically significant. The adjusted p-value cutoff is 5% by default.
summary(decideTests(efit))
## ADENOCARCINOMAvsCMS
## Down 1474
## NotSig 15810
## Up 1821
This is a stricter definition of significance and could be overcorrecting since now I don’t have any down-regulated or up-regulated genes.
tfit <- treat(vfit, lfc=1) #p-values calculated from empirical Bayes moderated t-statistics with a minimum log-FC requirement.
dt <- decideTests(tfit)
#dt <- decideTests(efit) #for testing purposes
summary(dt)
## ADENOCARCINOMAvsCMS
## Down 0
## NotSig 19105
## Up 0
I don’t have any DE genes if tfit is used. If efit is used, I have 3295 DE genes.
de.common <- which(dt[,1]!=0)
length(de.common)
## [1] 0
If efit is used the genes are: “DPM1”, “CFH”, “LAS1L”, “CFTR”, “TMEM176A”, “DBNDD1”, “TFPI”, “SLC7A2”, “ARF5”, “POLDIP2”, “ARHGAP33”, “UPF1”, “MCUB”, “POLR2J”, “THSD7A”, “LIG3”, “SPPL2B”, “IBTK”, “PDK2”, “REX1BD”
head(tfit$genes$SYMBOL[de.common], n=20)
## character(0)
My diagram only has 1 circle because I only have 1 pairwise comparison.
vennDiagram(dt[,1], circle.col=c("turquoise", "salmon"))
write.fit(tfit, dt, file="results.txt")
ADENOCARCINOMA.vs.CMS <- topTreat(tfit, coef=1, n=Inf)
head(ADENOCARCINOMA.vs.CMS)
If efit is used, will have read, black and blue genes. Since tfit is used, all genes are black.
plotMD(tfit, column=1, status=dt[,1], main=colnames(tfit)[1],
xlim=c(-8,13))
To open HTML page in a browser make launch=TRUE
library(Glimma)
glMDPlot(tfit, coef=1, status=dt, main=colnames(tfit)[1],
side.main="ENSEMBL", counts=lcpm, groups=group, launch=TRUE)
Install heatmap.plus beacuse heatmap.2 did not work for my data.
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:IRanges':
##
## space
## The following object is masked from 'package:S4Vectors':
##
## space
## The following object is masked from 'package:stats':
##
## lowess
library(heatmap.plus)
ADENOCARCINOMA.vs.CMS.topgenes <- ADENOCARCINOMA.vs.CMS$ENSEMBL[1:100]
i <- which(v$genes$ENSEMBL %in% ADENOCARCINOMA.vs.CMS.topgenes)
mycol <- colorpanel(1000,"blue","white","red")
#par("mar") OUTPUT SHOULD BE [1] 5.1 4.1 4.1 2.1
par(cex.main=0.8,mar=c(1,1,1,1)) #mar=c(1,1,1,1) ensures margins are large enough
heatmap.plus(lcpm[i,], col=bluered(20),cexRow=1,cexCol=0.2, margins = c(10,10), main = "HeatMap") #changed the margins to have a more legible heatmap
I spoke to professor Craig on 12/4 and agreed that this step would not work for me becuase I did not have differentlially expressed genes.
I spoke to professor Craig on 12/6 and the interactive plots are not embedded in RPubs.