Final Project

Section 1: Objective

For this project, I will be showing the log2(fold-change) in differential gene expression for RNA-seq data in two subsets of colon cancer: adenocarcinoma and cystic, mucinous and serous neoplasms. The final deliverable will be the the Glimma Vignette. The input files for the Glimma vignette will be HTSeq-count data obtained from The Cancer Genome Atlas Program (TCGA). I will be turning in an HTML containing my R Notebook.

Section 2: Datasets

Loading data

Data is obtained from TCGA. I filtered for RNA-Seq experimental strategy, TXT data format and HTSeq-counts workflow type. HTSeq-counts is a tool that quantifies the aligned reads overlapping a gene’s exons. HTSeq data does not have a header, is tab-delimited, the first column is the Ensembl gene ID and the second column is the number of mapped reads of the gene. The counts will be used in differential gene expression analysis using edgeR as the method. To look at the differential gene expression, the counts will be normalized using the calcNormFactors in edgeR and only reads that unambigously map to one gene are used.

RAW: URL for cystic, mucinous, and serous neoplasms, I will choose 30:

RAW: URL for adenocarcinoma, I will choose 30:

Unit test for data

When data is read in:

ENSG00000000003.13
<fctr>
X5290
<int>
ENSG00000000005.5   47          
ENSG00000000419.11  1212            
ENSG00000000457.12  1176            
ENSG00000000460.15  121         
ENSG00000000938.11  166         
ENSG00000000971.14  1012            
ENSG00000001036.12  4401            
ENSG00000001084.9   1977            
ENSG00000001167.13  976         
ENSG00000001460.16  1638            
1-10 of 60 rows

When dataset is made using the 60 text files:

[1] "DGEList"
attr(,"package")
[1] "edgeR"
[1] 60487    60

Unit test for normalized data

Make a box plots of unnormalized and normalized data

Proposed Analysis

This project will compute and analyze the logarithmic ratio of differential gene expression of two subtypes of colon cancer. EdgeR will be used to import, organize, and normalize the data, Mus.musculus will be used for gene annotions, limma will be used to examine the gene expression anaylsis and make exploratory plots, Glimma will be used to make these plots interactive. RColorBrewer and gplots will be used to make heatmaps.

Flowchart

library(DiagrammeR)
grViz("digraph flowchart {
      # node definitions with substituted label text
      node [fontname = Helvetica, shape = rectangle]        
      tab1 [label = '@@1']
      tab2 [label = '@@2']
      tab3 [label = '@@3']
      tab4 [label = '@@4']
      tab5 [label = '@@5']
      tab6 [label = '@@6']
      tab7 [label = '@@7']
      tab8 [label = '@@8']
      tab9 [label = '@@9']
      tab10 [label = '@@10']

      # edge definitions with the node IDs
      tab1 -> tab2;
      tab2 -> tab3;
      tab3 -> tab4;
      tab4 -> tab8 -> tab5;
      tab4 -> tab5 -> tab6 -> tab7 -> tab9 -> tab10
      }

      [1]: 'Download necessary libraries'
      [2]: 'Load and read datasets'
      [3]: 'Join datasets'
      [4]: 'Unit test: Data was properly loaded?'
      [5]: 'Normalize data'
      [6]: 'Find Mean Varience Trend'
      [7]: 'Analyze DE genes'
      [8]: 'Troubleshoot and fix errors'
      [9]: 'Make Interactive MDS plot'
      [10]: 'Make HeatMap of log-CPM data'
      ")

Loading libraries

Proposed Timeline and Milestones

Week 1: Run the Glimma vignette. I will install the necessary packages in R and understand each step in the vignette.

Week 2: Load in the data (joins, creating datasets) and do a simple, 1 line unit test to look at the data. I will download 60 datasets (30 from each subtype) and join multiple datasets. Emailed Dr. Craig on 11/19 and agreed on turning this milestone in on Sat Nov 23, 2019.

Week 3: Confirm that the data was loaded in correctly and analyze data using the Glimma vignette.Emailed Dr. Craig on 11/26 and agreed on turning this milestone in on 12/1.

Week 4: Troubleshoot for more errors and enhance the user interface.

User Interface

I anticipate having boxplots, heatmaps, and interactive multi-dimensional scaling (MDS) plots done in an R Notebook. I will submit an HTML page of my completed R Notebook.

MDS plot

Make MDS plot interactive

HeatMap of log-CPM data

Ashley E Noriega,

Nov 13, 2019

TRGN 510 Final Project: Milestone 1

A script for setting up the Glimma Vignette

{ if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“limma”) library(limma)

if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”) BiocManager::install(“Glimma”) library(Glimma)

if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)

BiocManager::install(“edgeR”) library(edgeR)

if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)

BiocManager::install(“Mus.musculus”) library(Mus.musculus)

library(R.utils)

if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)

BiocManager::install(“CAMERA”) library(CAMERA) }

Data Packaging

Read in count data

Read the 9 text files into R and combining into a matrix of counts

Annotate the samples

Organize sample information

Organize gene annotations

Resolve duplicate gene IDs

Package in a DGEList-object containing raw count data with associated sample information and gene annotations

Data Pre-processing

Transformations from the raw-scale: convert raw counts to counts per million (CPM) and log2-counts per million (log-CPM)

Remove lowly expressed genes

Filter genes while keeping as many genes as possible with worthwile counts

Plot the density of log-CPM values for raw and filtered data

Normalize gene expression distributions

Improve visualization by duplicating data then adjusting the counts

Boxplot expression distribution of samples for normalised and unnormalised data

Unsupervised clustering of cells: make multi-dimensional scaling plot (MDS) to show simmilarities and dissimilarities between samples in an unsupervised manner

Make interactive using Glimma

HTML page will be generarted and opened in a browser if launch=TRUE

Differential Expression Analysis

Create a design matrix

Contrasts for pairwise comparisons between cell populations

Remove heteroscedascity from count data

Apply voom precision weights to data

Examine the number of DE genes

Set a minimum log-fold change(log-FC) of 1

Extract genes that are DE in multiple comparisons

Extract and write results for all 3 comparisons (basalvsLP, basalvsML, and LPvsML) to a single output file

Examine individual DE genes from top to bottom

Summarize results for genes using mean-difference plots that highlight differentially expressed genes

Make interactive mean-difference plot.

To open HTML page in a browser, make launch=TRUE

Make heatmap

Gene set testing by applying the camera method on c2 gene signatures from the Broad Institute’s MSigDB c2 collection

Ashley E Noriega,

Nov 20, 2019

TRGN 510 Final Project: Milestone 2

Loading in RNA seq colon cancer data from TCGA

RNASeq files for cystic, mucinous, and serous neoplasms and adenocarcinoma

I created a new folder called COAD and stored all 60 files in it. I then changed them to TXT files and opened them.

Data Packaging

The urls for the 30 files I downloaded locally for cystic, mucinous and serous colon cancer (CMS)

  1. https://portal.gdc.cancer.gov/files/536f5a77-0087-457d-ac95-6d1a9abad8cb, UUID 536f5a77-0087-457d-ac95-6d1a9abad8cb, case: TCGA-AA-3516

  2. https://portal.gdc.cancer.gov/files/ed52de66-66fa-44ce-b679-cf641b0d92cd, UUID ed52de66-66fa-44ce-b679-cf641b0d92cd, case: TCGA-AA-3516

  3. https://portal.gdc.cancer.gov/files/b28090c5-c42d-4836-9bb1-ce906d3ead95, UUID: b28090c5-c42d-4836-9bb1-ce906d3ead95, case TCGA-AA-3854

  4. https://portal.gdc.cancer.gov/cases/57cdaa1c-4e94-4a28-ab3b-300c0457555f, UUID: 49e29c69-d9d7-4496-9f24-26f42c8b6d8e, case: TCGA-A6-2674

  5. https://portal.gdc.cancer.gov/files/08ed32e4-fb94-4bc0-8715-83ee2143a13d, UUID: 08ed32e4-fb94-4bc0-8715-83ee2143a13d, case: TCGA-AA-A00J

  6. https://portal.gdc.cancer.gov/files/6e571f71-d5fb-42f3-a35b-554c5ab76587, UUID: 6e571f71-d5fb-42f3-a35b-554c5ab76587, case: TCGA-AA-A01G

  7. https://portal.gdc.cancer.gov/files/8b12a000-f588-4a78-a9eb-f06041a65789, UUID: 8b12a000-f588-4a78-a9eb-f06041a65789, case: TCGA-A6-6780

  8. https://portal.gdc.cancer.gov/files/02734d4d-fc8f-4ef7-ac82-1b4d7184cc5e, UUID: 02734d4d-fc8f-4ef7-ac82-1b4d7184cc5e, case: TCGA-CK-4950

  9. https://portal.gdc.cancer.gov/files/6466a8b1-d1e2-4195-a353-0800576c13c8, UUID: 6466a8b1-d1e2-4195-a353-0800576c13c8, case: TCGA-G4-6322

  10. https://portal.gdc.cancer.gov/files/bc47f01c-1994-4ff8-a356-94d9679b66ee, UUID: bc47f01c-1994-4ff8-a356-94d9679b66ee, case: TCGA-AA-3947

  11. https://portal.gdc.cancer.gov/files/b045ee79-82a6-4636-a875-1a58603d89ff, UUID: b045ee79-82a6-4636-a875-1a58603d89ff, case: TCGA-A6-A566

  12. https://portal.gdc.cancer.gov/files/c383ba2c-b00a-4bd2-82cb-b3f04c2a8172, UUID: c383ba2c-b00a-4bd2-82cb-b3f04c2a8172, case: TCGA-AA-3877

  13. https://portal.gdc.cancer.gov/files/b52775aa-273e-484e-82c7-c625f09415fa, UUID: b52775aa-273e-484e-82c7-c625f09415fa, case: TCGA-A6-3809

  14. https://portal.gdc.cancer.gov/files/7b15a87a-805c-4b8a-84de-549cec9c44e3, UUID: 7b15a87a-805c-4b8a-84de-549cec9c44e3, case: TCGA-AA-3684

  15. https://portal.gdc.cancer.gov/files/b4f3dbbb-2686-4896-9e60-5bef6c9150b4, UUID: b4f3dbbb-2686-4896-9e60-5bef6c9150b4, case: TCGA-AA-3692

  16. https://portal.gdc.cancer.gov/files/0b16e2bd-3ec7-4901-9ff0-a389670e5019, UUID: 0b16e2bd-3ec7-4901-9ff0-a389670e5019, case: TCGA-D5-6534

  17. https://portal.gdc.cancer.gov/files/a6690007-f347-49c3-a0ba-28e01d131971, UUID: a6690007-f347-49c3-a0ba-28e01d131971, case: TCGA-A6-3809

  18. https://portal.gdc.cancer.gov/files/a1742cf6-c3c5-43e7-879c-489494460e78, UUID: a1742cf6-c3c5-43e7-879c-489494460e78, case: TCGA-AA-A00N

  19. https://portal.gdc.cancer.gov/files/d5be795d-beb6-4def-bda8-f485ee45bfc1, UUID: d5be795d-beb6-4def-bda8-f485ee45bfc1, case: TCGA-A6-2674

  20. https://portal.gdc.cancer.gov/files/46306072-c59c-4b4b-963c-9c4e778ff34b, UUID: 46306072-c59c-4b4b-963c-9c4e778ff34b, case: TCGA-A6-6780

  21. https://portal.gdc.cancer.gov/files/a938cb2c-c8e8-4395-915b-37e1e279a4da, UUID: a938cb2c-c8e8-4395-915b-37e1e279a4da, case: TCGA-G4-6302

  22. https://portal.gdc.cancer.gov/files/7fec7c90-fd2e-4ee2-ba1a-77f85920771f, UUID: 7fec7c90-fd2e-4ee2-ba1a-77f85920771f, case: TCGA-DM-A282

  23. https://portal.gdc.cancer.gov/files/2c3fd34c-70d1-4331-9628-260b77329b53, UUID: 2c3fd34c-70d1-4331-9628-260b77329b53, case: TCGA-F4-6704

  24. https://portal.gdc.cancer.gov/files/4168a720-521e-47ff-afb5-4abe3e815490, UUID: 4168a720-521e-47ff-afb5-4abe3e815490, case: TCGA-AA-3950

  25. https://portal.gdc.cancer.gov/files/ecc90bd1-f594-41ea-ba4b-d42f4c64880b, UUID: ecc90bd1-f594-41ea-ba4b-d42f4c64880b, case: TCGA-A6-6781

  26. https://portal.gdc.cancer.gov/files/8736ed27-2141-48d9-b677-b1a0e14d4b50, UUID: 8736ed27-2141-48d9-b677-b1a0e14d4b50, case: TCGA-CA-6717

  27. https://portal.gdc.cancer.gov/files/3b8d04cd-d658-46ba-adca-079fee531e17, UUID: 3b8d04cd-d658-46ba-adca-079fee531e17, case: TCGA-AA-3821

  28. https://portal.gdc.cancer.gov/files/b27da518-d023-4f9c-a9ab-5cd68ee37870, UUID: b27da518-d023-4f9c-a9ab-5cd68ee37870, case: TCGA-CK-4951

  29. https://portal.gdc.cancer.gov/files/e7005df6-f78b-4e47-abe7-61ae6a2ee026, UUID: e7005df6-f78b-4e47-abe7-61ae6a2ee026, case: TCGA-AA-A01R

  30. https://portal.gdc.cancer.gov/files/e3598d14-292c-41cc-9b59-4497fa078272, UUID: e3598d14-292c-41cc-9b59-4497fa078272, case: TCGA-D5-6930

The urls for the 30 files adenocarcinoma I downloaded locally for adenocarcinoma

  1. https://portal.gdc.cancer.gov/files/f1185347-ad15-43ae-9ef3-d5343b31a0fc, UUID: f1185347-ad15-43ae-9ef3-d5343b31a0fc, case: TCGA-A6-6654

  2. https://portal.gdc.cancer.gov/files/0d53cb1c-97c4-4088-9e43-029de88fd66d, UUID: 0d53cb1c-97c4-4088-9e43-029de88fd66d, case: TCGA-DM-A1D4

  3. https://portal.gdc.cancer.gov/files/a74bbce0-7f3d-434e-b294-7fa45e5b3a60, UUID: a74bbce0-7f3d-434e-b294-7fa45e5b3a60, case: TCGA-A6-2684

  4. https://portal.gdc.cancer.gov/files/47554e4e-cd13-4b92-80be-e1940f9a950f, UUID: 47554e4e-cd13-4b92-80be-e1940f9a950f, case: TCGA-A6-5657

  5. https://portal.gdc.cancer.gov/files/de60dbd7-8a93-47a5-b1ea-a3f95beade8a, UUID: de60dbd7-8a93-47a5-b1ea-a3f95beade8a, case: TCGA-F4-6854

  6. https://portal.gdc.cancer.gov/files/70883b31-d130-4efd-a7c6-169c8d4a253d, UUID: 70883b31-d130-4efd-a7c6-169c8d4a253d, case: TCGA-AD-A5EJ

  7. https://portal.gdc.cancer.gov/files/042bda3d-77aa-4522-8a97-c121711a760e, UUID: 042bda3d-77aa-4522-8a97-c121711a760e, case: TCGA-AG-3582

  8. https://portal.gdc.cancer.gov/files/b6388e09-7ed5-4041-97bb-4427ba5571ba, UUID: b6388e09-7ed5-4041-97bb-4427ba5571ba, case: TCGA-AY-6197

  9. https://portal.gdc.cancer.gov/files/54394c0b-6ae3-4b48-8e89-350ad5349611, UUID: 54394c0b-6ae3-4b48-8e89-350ad5349611, case: TCGA-AA-3554

  10. https://portal.gdc.cancer.gov/files/f7e21d61-19b6-4e99-887f-463d4419628c, UUID: f7e21d61-19b6-4e99-887f-463d4419628c, case: TCGA-AG-4015

  11. https://portal.gdc.cancer.gov/files/b4114885-38cd-4e8a-874b-b78da8d95e2c, UUID: b4114885-38cd-4e8a-874b-b78da8d95e2c, case: TCGA-CM-6171

  12. https://portal.gdc.cancer.gov/files/f9fda40d-67e4-4cb9-859c-ddc2ea84b7e4, UUID: f9fda40d-67e4-4cb9-859c-ddc2ea84b7e4, case: TCGA-CM-6170

  13. https://portal.gdc.cancer.gov/files/b4aebb2a-d0b8-43d8-bd1f-78af2065d8f9, UUID: b4aebb2a-d0b8-43d8-bd1f-78af2065d8f9, case: TCGA-AA-3846

  14. https://portal.gdc.cancer.gov/files/6a750710-5ed9-4d24-b2bf-3a4e3211878f, UUID: 6a750710-5ed9-4d24-b2bf-3a4e3211878f, case: TCGA-CM-6677

  15. https://portal.gdc.cancer.gov/files/93d1a78f-423e-4560-b4d3-ee4a89ac922b, UUID: 93d1a78f-423e-4560-b4d3-ee4a89ac922b, case: TCGA-RU-A8FL

  16. https://portal.gdc.cancer.gov/files/2e632fd9-fa17-4290-9601-a5d462cf152c, UUID: 2e632fd9-fa17-4290-9601-a5d462cf152c, case: TCGA-AZ-4323

  17. https://portal.gdc.cancer.gov/files/7239b026-2587-489d-81fe-7bc657b7523c, UUID: 7239b026-2587-489d-81fe-7bc657b7523c, case: TCGA-CM-6164

  18. https://portal.gdc.cancer.gov/files/90e86a26-fffa-4c38-b2e0-bf0704ee3615, UUID: 90e86a26-fffa-4c38-b2e0-bf0704ee3615, case: TCGA-AZ-4315

  19. https://portal.gdc.cancer.gov/files/9ff11fe0-037c-405e-95c3-dc4a15413db8, UUID: 9ff11fe0-037c-405e-95c3-dc4a15413db8, case: TCGA-G4-6311

  20. https://portal.gdc.cancer.gov/files/b8eed826-6051-4358-9b3d-44d1553dd9ad, UUID: b8eed826-6051-4358-9b3d-44d1553dd9ad, case: TCGA-AA-3522

  21. https://portal.gdc.cancer.gov/files/c172bc07-d4f0-41be-a558-49abc81065c2, UUID: c172bc07-d4f0-41be-a558-49abc81065c2, case: TCGA-AA-3667

  22. https://portal.gdc.cancer.gov/files/260edc5e-1ca6-4b07-b96d-59594d03ac54, UUID: 260edc5e-1ca6-4b07-b96d-59594d03ac54, case: TCGA-AA-A00U

  23. https://portal.gdc.cancer.gov/files/0c5c1a38-7e9c-4b43-810d-0761c3af49b1, UUID: 0c5c1a38-7e9c-4b43-810d-0761c3af49b1, case: TCGA-AA-3506

  24. https://portal.gdc.cancer.gov/files/7024ba0c-be56-4907-9254-cdb2579e536e, UUID: 7024ba0c-be56-4907-9254-cdb2579e536e, case: TCGA-NH-A8F7

  25. https://portal.gdc.cancer.gov/files/031cf2a5-74e0-4b5f-98bd-da60628c0854, UUID: 031cf2a5-74e0-4b5f-98bd-da60628c0854, case: TCGA-AA-3680

  26. https://portal.gdc.cancer.gov/files/91991ecf-cc54-4110-8a4e-9236bf8aa072, UUID: 91991ecf-cc54-4110-8a4e-9236bf8aa072, case: TCGA-A6-4105

  27. https://portal.gdc.cancer.gov/files/47aceec1-a01d-419f-9689-c46284c79bcb, UUID: 47aceec1-a01d-419f-9689-c46284c79bcb, case: TCGA-D5-6922

  28. https://portal.gdc.cancer.gov/files/ce84c955-63db-473a-a6d7-0e3daad6efd4, UUID: ce84c955-63db-473a-a6d7-0e3daad6efd4, case: TCGA-AA-3524

  29. https://portal.gdc.cancer.gov/files/a071fc45-61ea-4815-93bf-be34980e59ee, UUID: a071fc45-61ea-4815-93bf-be34980e59ee, case: TCGA-AA-3855

  30. https://portal.gdc.cancer.gov/files/8b275144-b885-4fb0-af39-fea1e48a970a, UUID: 8b275144-b885-4fb0-af39-fea1e48a970a, case: TCGA-AA-A00Q

Load in count data

First rename the “.count” files to “.txt” and unzip each one by opening each file.

setwd('~/Desktop/COAD_Data/')
COAD_files <- c("9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt", "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt", 
   "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt", "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt",
   "15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt", "649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq.txt",
   "86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq.txt", "28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq.txt",                 "911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq.txt", "d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq.txt",                 "f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq.txt", "f590941d-19dc-427a-95b6-942c97ea8333.htseq.txt",                 "55aa6d16-3598-42ca-8844-0fe84739ef66.htseq.txt", "0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq.txt",                 "9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq.txt", "d2587070-cb7d-440d-ae49-52f5077248e6.htseq.txt",                 "7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq.txt", "2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq.txt",                 "424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq.txt", "934f9dc6-1260-4268-b022-870f1e37dd6f.htseq.txt",                 "0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq.txt", "c8544a8a-4352-438d-94d4-3495af2e9a78.htseq.txt",                 "dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq.txt", "e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq.txt",                 "debd6982-7c27-42e8-b778-20afcc78a5f3.htseq.txt", "17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq.txt",                 "7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq.txt", "fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq.txt",                 "abe20df7-6b97-4397-8864-881bac27e92c.htseq.txt", "62f84581-4c7d-4c8e-835c-9304bcec3106.htseq.txt", "3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq.txt", "087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq.txt", 
   "c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq.txt", "13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq.txt",
   "6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq.txt", "8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq.txt",
   "168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq.txt", "0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq.txt",                 "4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq.txt", "7fb73a84-867a-4c28-aa02-93068efffb7b.htseq.txt",                 "b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq.txt", "f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq.txt",                 "f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq.txt", "e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq.txt",                 "a26d49db-2309-46a0-a3ed-275378d484e7.htseq.txt", "a3f88a5d-7169-465b-bb80-e5999590681c.htseq.txt",                 "c264fe3b-482b-44ec-83a4-73df565663ff.htseq.txt", "bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq.txt",                 "7261b656-c79c-4581-a503-15b653e2b5d2.htseq.txt", "ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq.txt",                 "f596eabc-e39a-4e35-9fc6-edade04eb785.htseq.txt", "bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq.txt",                 "564daa81-cfef-45b6-94a0-3249b2724d9b.htseq.txt", "82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq.txt",                 "9c52ed00-325f-4664-8873-327bcaa5ea74.htseq.txt", "fabefb10-5546-4017-8ea1-29982a10fb3c.htseq.txt",                 "32a115cf-570f-4ad9-a123-8e1970062f51.htseq.txt", "05eef9f8-a246-403a-b0be-07d274b6f93a.htseq.txt",                 "5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq.txt", "43b292be-5d63-4523-a43f-666d20039208.htseq.txt")
read.delim(COAD_files[1], nrows = 60)

Create dataset, join the 60 loaded txt files

Use edgeR to create a matrix of 60 text files.

Known issue: Working directory

Spoke to professor Craig on 12/4 and it is ok to not change the root, just setwd to desktop as my desktop since files were downloaded locally.

setwd('~/Desktop/COAD_Data/')
library(edgeR)
## Loading required package: limma
x <- readDGE(COAD_files, columns=c(1,2)) #joins my 60 files and creates a dataset
## Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique
class(x)
## [1] "DGEList"
## attr(,"package")
## [1] "edgeR"
dim(x)
## [1] 60487    60
names(x) #accessor function  
## [1] "samples" "counts"
str(x) #displays the structure of x in compact way, alternative to summary and best for displaying contents of lists
## Formal class 'DGEList' [package "edgeR"] with 1 slot
##   ..@ .Data:List of 2
##   .. ..$ :'data.frame':  60 obs. of  4 variables:
##   .. .. ..$ files       : chr [1:60] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt" "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt" "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt" "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt" ...
##   .. .. ..$ group       : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..$ lib.size    : num [1:60] 9.02e+07 3.68e+07 4.30e+07 1.12e+08 3.63e+07 ...
##   .. .. ..$ norm.factors: num [1:60] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : num [1:60487, 1:60] 47 1212 1176 121 166 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ Tags   : chr [1:60487] "ENSG00000000005.5" "ENSG00000000419.11" "ENSG00000000457.12" "ENSG00000000460.15" ...
##   .. .. .. ..$ Samples: chr [1:60] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq" "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq" "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq" "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq" ...

Annotate the samples

x$samples

Organize sample information

Associate sample-level information with the columns of the counts matrix

samplenames <- substring(colnames(x), 1, nchar(colnames(x)))
samplenames
##  [1] "9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq"
##  [2] "bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq"
##  [3] "5697212f-b3fd-479f-84b0-ec0aae54534a.htseq"
##  [4] "7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq"
##  [5] "15864159-be88-41c8-bdef-c2c5927cb1a1.htseq"
##  [6] "649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq"
##  [7] "86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq"
##  [8] "28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq"
##  [9] "911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq"
## [10] "d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq"
## [11] "f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq"
## [12] "f590941d-19dc-427a-95b6-942c97ea8333.htseq"
## [13] "55aa6d16-3598-42ca-8844-0fe84739ef66.htseq"
## [14] "0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq"
## [15] "9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq"
## [16] "d2587070-cb7d-440d-ae49-52f5077248e6.htseq"
## [17] "7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq"
## [18] "2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq"
## [19] "424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq"
## [20] "934f9dc6-1260-4268-b022-870f1e37dd6f.htseq"
## [21] "0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq"
## [22] "c8544a8a-4352-438d-94d4-3495af2e9a78.htseq"
## [23] "dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq"
## [24] "e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq"
## [25] "debd6982-7c27-42e8-b778-20afcc78a5f3.htseq"
## [26] "17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq"
## [27] "7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq"
## [28] "fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq"
## [29] "abe20df7-6b97-4397-8864-881bac27e92c.htseq"
## [30] "62f84581-4c7d-4c8e-835c-9304bcec3106.htseq"
## [31] "3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq"
## [32] "087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq"
## [33] "c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq"
## [34] "13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq"
## [35] "6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq"
## [36] "8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq"
## [37] "168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq"
## [38] "0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq"
## [39] "4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq"
## [40] "7fb73a84-867a-4c28-aa02-93068efffb7b.htseq"
## [41] "b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq"
## [42] "f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq"
## [43] "f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq"
## [44] "e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq"
## [45] "a26d49db-2309-46a0-a3ed-275378d484e7.htseq"
## [46] "a3f88a5d-7169-465b-bb80-e5999590681c.htseq"
## [47] "c264fe3b-482b-44ec-83a4-73df565663ff.htseq"
## [48] "bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq"
## [49] "7261b656-c79c-4581-a503-15b653e2b5d2.htseq"
## [50] "ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq"
## [51] "f596eabc-e39a-4e35-9fc6-edade04eb785.htseq"
## [52] "bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq"
## [53] "564daa81-cfef-45b6-94a0-3249b2724d9b.htseq"
## [54] "82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq"
## [55] "9c52ed00-325f-4664-8873-327bcaa5ea74.htseq"
## [56] "fabefb10-5546-4017-8ea1-29982a10fb3c.htseq"
## [57] "32a115cf-570f-4ad9-a123-8e1970062f51.htseq"
## [58] "05eef9f8-a246-403a-b0be-07d274b6f93a.htseq"
## [59] "5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq"
## [60] "43b292be-5d63-4523-a43f-666d20039208.htseq"

Specify which files are Cystic, Mucinous, and Serous (CMS) and which files are Adenocarcinoma

colnames(x) <- samplenames
group <- as.factor(c("CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
                     "CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
                     "CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
                     "CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
                     "CMS", "CMS", "CMS", "CMS", "CMS", "CMS",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",                                 "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA", "ADENOCARCINOMA",
                     "ADENOCARCINOMA", "ADENOCARCINOMA"))

x$samples$group <- group
x$samples
DF<-x$samples #for my own visualization purposes

Script to organize gene annotations

{ if (!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)

BiocManager::install(“Homo.sapiens”) library(Homo.sapiens) install.packages(gsubfn) library(gsubfn) }

Script to annotate Genes

First install Homo.sapiens, then use a script remove the decimals and numbers after the decimal points in all 60487 ENSEMBL geneid elements.

library(Homo.sapiens)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following object is masked from 'package:limma':
## 
##     plotMA
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: OrganismDbi
## Loading required package: GenomicFeatures
## Loading required package: GenomeInfoDb
## Loading required package: GenomicRanges
## Loading required package: GO.db
## 
## Loading required package: org.Hs.eg.db
## 
## Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
#library(stringr)
library(gsubfn)
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
##   dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
##   Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
##   Reason: image not found
## Could not load tcltk.  Will use slower R code instead.
geneid <- rownames(x)
#geneid_test <- c("ENSG00000000005", 
#   "ENSG00000000419",
#   "ENSG00000000457",
#   "ENSG00000000938") 
#geneid <- str_remove(geneid, "[.]") removes decimals only
geneid <- gsub("\\.[0-9]*$", "", geneid) #remove decimals and numbers after decimals
genes <- select(Homo.sapiens, keys=geneid, columns=c("SYMBOL", "TXCHROM"), 
                keytype="ENSEMBL")
## 'select()' returned 1:many mapping between keys and columns
head(genes)

Remove duplicate genes

genes <- genes[!duplicated(genes$ENSEMBL),]

Package in a DGEList-object containing raw count data with associated sample information and gene annotations

x$genes <- genes
x
## An object of class "DGEList"
## $samples
##                                                                                     files
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt
##                                            group  lib.size norm.factors
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq   CMS  90179803            1
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq   CMS  36807306            1
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq   CMS  42963355            1
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq   CMS 111649651            1
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq   CMS  36349055            1
## 55 more rows ...
## 
## $counts
##                     Samples
## Tags                 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
##   ENSG00000000005.5                                          47
##   ENSG00000000419.11                                       1212
##   ENSG00000000457.12                                       1176
##   ENSG00000000460.15                                        121
##   ENSG00000000938.11                                        166
##                     Samples
## Tags                 bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
##   ENSG00000000005.5                                           4
##   ENSG00000000419.11                                        710
##   ENSG00000000457.12                                        236
##   ENSG00000000460.15                                        211
##   ENSG00000000938.11                                        140
##                     Samples
## Tags                 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                        702
##   ENSG00000000457.12                                        552
##   ENSG00000000460.15                                        320
##   ENSG00000000938.11                                         93
##                     Samples
## Tags                 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
##   ENSG00000000005.5                                          10
##   ENSG00000000419.11                                        432
##   ENSG00000000457.12                                        803
##   ENSG00000000460.15                                        605
##   ENSG00000000938.11                                        473
##                     Samples
## Tags                 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
##   ENSG00000000005.5                                           3
##   ENSG00000000419.11                                        641
##   ENSG00000000457.12                                        311
##   ENSG00000000460.15                                        182
##   ENSG00000000938.11                                        130
##                     Samples
## Tags                 649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
##   ENSG00000000005.5                                          14
##   ENSG00000000419.11                                       1151
##   ENSG00000000457.12                                        246
##   ENSG00000000460.15                                        202
##   ENSG00000000938.11                                         52
##                     Samples
## Tags                 86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
##   ENSG00000000005.5                                          14
##   ENSG00000000419.11                                       3675
##   ENSG00000000457.12                                       1901
##   ENSG00000000460.15                                       1436
##   ENSG00000000938.11                                        862
##                     Samples
## Tags                 28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
##   ENSG00000000005.5                                          37
##   ENSG00000000419.11                                       2278
##   ENSG00000000457.12                                        835
##   ENSG00000000460.15                                        697
##   ENSG00000000938.11                                        687
##                     Samples
## Tags                 911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                       1934
##   ENSG00000000457.12                                        745
##   ENSG00000000460.15                                        464
##   ENSG00000000938.11                                        138
##                     Samples
## Tags                 d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
##   ENSG00000000005.5                                           3
##   ENSG00000000419.11                                        707
##   ENSG00000000457.12                                        366
##   ENSG00000000460.15                                        206
##   ENSG00000000938.11                                         80
##                     Samples
## Tags                 f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
##   ENSG00000000005.5                                          10
##   ENSG00000000419.11                                       1282
##   ENSG00000000457.12                                        624
##   ENSG00000000460.15                                        267
##   ENSG00000000938.11                                        979
##                     Samples
## Tags                 f590941d-19dc-427a-95b6-942c97ea8333.htseq
##   ENSG00000000005.5                                           0
##   ENSG00000000419.11                                        727
##   ENSG00000000457.12                                        356
##   ENSG00000000460.15                                        240
##   ENSG00000000938.11                                        378
##                     Samples
## Tags                 55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
##   ENSG00000000005.5                                           1
##   ENSG00000000419.11                                       2949
##   ENSG00000000457.12                                        892
##   ENSG00000000460.15                                        823
##   ENSG00000000938.11                                        389
##                     Samples
## Tags                 0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
##   ENSG00000000005.5                                          10
##   ENSG00000000419.11                                        219
##   ENSG00000000457.12                                         95
##   ENSG00000000460.15                                        106
##   ENSG00000000938.11                                        320
##                     Samples
## Tags                 9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
##   ENSG00000000005.5                                           7
##   ENSG00000000419.11                                       1503
##   ENSG00000000457.12                                        566
##   ENSG00000000460.15                                        389
##   ENSG00000000938.11                                        235
##                     Samples
## Tags                 d2587070-cb7d-440d-ae49-52f5077248e6.htseq
##   ENSG00000000005.5                                         124
##   ENSG00000000419.11                                       2070
##   ENSG00000000457.12                                        886
##   ENSG00000000460.15                                        283
##   ENSG00000000938.11                                       2117
##                     Samples
## Tags                 7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                        604
##   ENSG00000000457.12                                        215
##   ENSG00000000460.15                                        255
##   ENSG00000000938.11                                        228
##                     Samples
## Tags                 2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
##   ENSG00000000005.5                                           6
##   ENSG00000000419.11                                        518
##   ENSG00000000457.12                                        215
##   ENSG00000000460.15                                        119
##   ENSG00000000938.11                                        159
##                     Samples
## Tags                 424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
##   ENSG00000000005.5                                           5
##   ENSG00000000419.11                                       2300
##   ENSG00000000457.12                                       1445
##   ENSG00000000460.15                                        831
##   ENSG00000000938.11                                       2183
##                     Samples
## Tags                 934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
##   ENSG00000000005.5                                          11
##   ENSG00000000419.11                                        627
##   ENSG00000000457.12                                        518
##   ENSG00000000460.15                                        401
##   ENSG00000000938.11                                        227
##                     Samples
## Tags                 0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                       1012
##   ENSG00000000457.12                                        468
##   ENSG00000000460.15                                        187
##   ENSG00000000938.11                                        534
##                     Samples
## Tags                 c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
##   ENSG00000000005.5                                         433
##   ENSG00000000419.11                                       3532
##   ENSG00000000457.12                                        771
##   ENSG00000000460.15                                        449
##   ENSG00000000938.11                                         99
##                     Samples
## Tags                 dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
##   ENSG00000000005.5                                           6
##   ENSG00000000419.11                                       3445
##   ENSG00000000457.12                                        840
##   ENSG00000000460.15                                        523
##   ENSG00000000938.11                                        891
##                     Samples
## Tags                 e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
##   ENSG00000000005.5                                           0
##   ENSG00000000419.11                                        757
##   ENSG00000000457.12                                        234
##   ENSG00000000460.15                                        232
##   ENSG00000000938.11                                        333
##                     Samples
## Tags                 debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
##   ENSG00000000005.5                                           0
##   ENSG00000000419.11                                       1519
##   ENSG00000000457.12                                        869
##   ENSG00000000460.15                                        317
##   ENSG00000000938.11                                        526
##                     Samples
## Tags                 17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
##   ENSG00000000005.5                                          11
##   ENSG00000000419.11                                       1875
##   ENSG00000000457.12                                        650
##   ENSG00000000460.15                                        325
##   ENSG00000000938.11                                        742
##                     Samples
## Tags                 7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
##   ENSG00000000005.5                                          11
##   ENSG00000000419.11                                        757
##   ENSG00000000457.12                                        332
##   ENSG00000000460.15                                        237
##   ENSG00000000938.11                                        139
##                     Samples
## Tags                 fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
##   ENSG00000000005.5                                           8
##   ENSG00000000419.11                                        959
##   ENSG00000000457.12                                        307
##   ENSG00000000460.15                                        246
##   ENSG00000000938.11                                        324
##                     Samples
## Tags                 abe20df7-6b97-4397-8864-881bac27e92c.htseq
##   ENSG00000000005.5                                           3
##   ENSG00000000419.11                                        392
##   ENSG00000000457.12                                        341
##   ENSG00000000460.15                                        335
##   ENSG00000000938.11                                        342
##                     Samples
## Tags                 62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
##   ENSG00000000005.5                                           1
##   ENSG00000000419.11                                       1144
##   ENSG00000000457.12                                        358
##   ENSG00000000460.15                                        320
##   ENSG00000000938.11                                        199
##                     Samples
## Tags                 3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
##   ENSG00000000005.5                                           1
##   ENSG00000000419.11                                       2901
##   ENSG00000000457.12                                        731
##   ENSG00000000460.15                                        494
##   ENSG00000000938.11                                       1845
##                     Samples
## Tags                 087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
##   ENSG00000000005.5                                          36
##   ENSG00000000419.11                                       3725
##   ENSG00000000457.12                                       1188
##   ENSG00000000460.15                                        741
##   ENSG00000000938.11                                         89
##                     Samples
## Tags                 c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                        378
##   ENSG00000000457.12                                        171
##   ENSG00000000460.15                                        230
##   ENSG00000000938.11                                        440
##                     Samples
## Tags                 13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
##   ENSG00000000005.5                                         100
##   ENSG00000000419.11                                       2292
##   ENSG00000000457.12                                        831
##   ENSG00000000460.15                                        874
##   ENSG00000000938.11                                        489
##                     Samples
## Tags                 6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
##   ENSG00000000005.5                                          31
##   ENSG00000000419.11                                       4884
##   ENSG00000000457.12                                        765
##   ENSG00000000460.15                                        628
##   ENSG00000000938.11                                        284
##                     Samples
## Tags                 8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
##   ENSG00000000005.5                                           4
##   ENSG00000000419.11                                       1593
##   ENSG00000000457.12                                        575
##   ENSG00000000460.15                                        368
##   ENSG00000000938.11                                        376
##                     Samples
## Tags                 168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
##   ENSG00000000005.5                                          76
##   ENSG00000000419.11                                       1247
##   ENSG00000000457.12                                        274
##   ENSG00000000460.15                                        239
##   ENSG00000000938.11                                        158
##                     Samples
## Tags                 0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
##   ENSG00000000005.5                                           3
##   ENSG00000000419.11                                       1853
##   ENSG00000000457.12                                        673
##   ENSG00000000460.15                                        437
##   ENSG00000000938.11                                        271
##                     Samples
## Tags                 4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
##   ENSG00000000005.5                                           1
##   ENSG00000000419.11                                        506
##   ENSG00000000457.12                                        270
##   ENSG00000000460.15                                        184
##   ENSG00000000938.11                                        918
##                     Samples
## Tags                 7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
##   ENSG00000000005.5                                          19
##   ENSG00000000419.11                                       1464
##   ENSG00000000457.12                                        271
##   ENSG00000000460.15                                        303
##   ENSG00000000938.11                                         93
##                     Samples
## Tags                 b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
##   ENSG00000000005.5                                           1
##   ENSG00000000419.11                                       1331
##   ENSG00000000457.12                                        743
##   ENSG00000000460.15                                        422
##   ENSG00000000938.11                                        437
##                     Samples
## Tags                 f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
##   ENSG00000000005.5                                          72
##   ENSG00000000419.11                                       4749
##   ENSG00000000457.12                                        877
##   ENSG00000000460.15                                        536
##   ENSG00000000938.11                                        446
##                     Samples
## Tags                 f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
##   ENSG00000000005.5                                           7
##   ENSG00000000419.11                                       1954
##   ENSG00000000457.12                                        422
##   ENSG00000000460.15                                        283
##   ENSG00000000938.11                                         97
##                     Samples
## Tags                 e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
##   ENSG00000000005.5                                         252
##   ENSG00000000419.11                                       2538
##   ENSG00000000457.12                                        581
##   ENSG00000000460.15                                        521
##   ENSG00000000938.11                                        208
##                     Samples
## Tags                 a26d49db-2309-46a0-a3ed-275378d484e7.htseq
##   ENSG00000000005.5                                          26
##   ENSG00000000419.11                                       3001
##   ENSG00000000457.12                                        875
##   ENSG00000000460.15                                        462
##   ENSG00000000938.11                                         39
##                     Samples
## Tags                 a3f88a5d-7169-465b-bb80-e5999590681c.htseq
##   ENSG00000000005.5                                           4
##   ENSG00000000419.11                                       2746
##   ENSG00000000457.12                                        732
##   ENSG00000000460.15                                        542
##   ENSG00000000938.11                                        888
##                     Samples
## Tags                 c264fe3b-482b-44ec-83a4-73df565663ff.htseq
##   ENSG00000000005.5                                         104
##   ENSG00000000419.11                                       5777
##   ENSG00000000457.12                                        684
##   ENSG00000000460.15                                        634
##   ENSG00000000938.11                                        304
##                     Samples
## Tags                 bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
##   ENSG00000000005.5                                          10
##   ENSG00000000419.11                                        980
##   ENSG00000000457.12                                        193
##   ENSG00000000460.15                                        309
##   ENSG00000000938.11                                         76
##                     Samples
## Tags                 7261b656-c79c-4581-a503-15b653e2b5d2.htseq
##   ENSG00000000005.5                                           2
##   ENSG00000000419.11                                       2981
##   ENSG00000000457.12                                        658
##   ENSG00000000460.15                                        845
##   ENSG00000000938.11                                        459
##                     Samples
## Tags                 ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
##   ENSG00000000005.5                                          78
##   ENSG00000000419.11                                       1846
##   ENSG00000000457.12                                       1368
##   ENSG00000000460.15                                        415
##   ENSG00000000938.11                                        283
##                     Samples
## Tags                 f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
##   ENSG00000000005.5                                          12
##   ENSG00000000419.11                                        478
##   ENSG00000000457.12                                         98
##   ENSG00000000460.15                                         95
##   ENSG00000000938.11                                        112
##                     Samples
## Tags                 bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
##   ENSG00000000005.5                                           4
##   ENSG00000000419.11                                        850
##   ENSG00000000457.12                                        277
##   ENSG00000000460.15                                        315
##   ENSG00000000938.11                                         67
##                     Samples
## Tags                 564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
##   ENSG00000000005.5                                          19
##   ENSG00000000419.11                                        202
##   ENSG00000000457.12                                        117
##   ENSG00000000460.15                                         61
##   ENSG00000000938.11                                         91
##                     Samples
## Tags                 82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
##   ENSG00000000005.5                                          26
##   ENSG00000000419.11                                       5155
##   ENSG00000000457.12                                        728
##   ENSG00000000460.15                                        626
##   ENSG00000000938.11                                         46
##                     Samples
## Tags                 9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
##   ENSG00000000005.5                                          24
##   ENSG00000000419.11                                        557
##   ENSG00000000457.12                                        454
##   ENSG00000000460.15                                        175
##   ENSG00000000938.11                                         70
##                     Samples
## Tags                 fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
##   ENSG00000000005.5                                          15
##   ENSG00000000419.11                                       4147
##   ENSG00000000457.12                                        679
##   ENSG00000000460.15                                        764
##   ENSG00000000938.11                                        477
##                     Samples
## Tags                 32a115cf-570f-4ad9-a123-8e1970062f51.htseq
##   ENSG00000000005.5                                          37
##   ENSG00000000419.11                                       2843
##   ENSG00000000457.12                                       1259
##   ENSG00000000460.15                                        869
##   ENSG00000000938.11                                        438
##                     Samples
## Tags                 05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
##   ENSG00000000005.5                                          42
##   ENSG00000000419.11                                        844
##   ENSG00000000457.12                                        137
##   ENSG00000000460.15                                        133
##   ENSG00000000938.11                                         24
##                     Samples
## Tags                 5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
##   ENSG00000000005.5                                         179
##   ENSG00000000419.11                                       1307
##   ENSG00000000457.12                                        571
##   ENSG00000000460.15                                        307
##   ENSG00000000938.11                                        136
##                     Samples
## Tags                 43b292be-5d63-4523-a43f-666d20039208.htseq
##   ENSG00000000005.5                                         140
##   ENSG00000000419.11                                       1101
##   ENSG00000000457.12                                        407
##   ENSG00000000460.15                                        191
##   ENSG00000000938.11                                         85
## 60482 more rows ...
## 
## $genes
##           ENSEMBL   SYMBOL TXCHROM
## 1 ENSG00000000005     TNMD    chrX
## 2 ENSG00000000419     DPM1   chr20
## 3 ENSG00000000457    SCYL3    chr1
## 4 ENSG00000000460 C1orf112    chr1
## 5 ENSG00000000938      FGR    chr1
## 60482 more rows ...

Ashley E Noriega

Nov 30, 2019

TRGN 510 Final Project: Milestone 3

Running the Glimma Vignette

Data Pre-processing

Transformations from the raw-scale: convert raw counts to counts per million (CPM) and log2-counts per million (log-CPM)

cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)
L <- mean(x$samples$lib.size) * 1e-6
M <- median(x$samples$lib.size) * 1e-6
c(L, M)
## [1] 64.23804 58.76902
summary(lcpm)
##  9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5302                           
##  3rd Qu.:-0.7721                           
##  Max.   :17.9542                           
##  bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.573                            
##  3rd Qu.:-1.020                            
##  Max.   :18.478                            
##  5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.548                            
##  3rd Qu.:-1.079                            
##  Max.   :18.160                            
##  7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-3.4138                           
##  Mean   :-2.3687                           
##  3rd Qu.:-0.6434                           
##  Max.   :19.0973                           
##  15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.406                            
##  3rd Qu.:-0.591                            
##  Max.   :18.390                            
##  649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4792                           
##  3rd Qu.:-0.8821                           
##  Max.   :18.3537                           
##  86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.6634                           
##  Mean   :-2.4026                           
##  3rd Qu.:-0.7838                           
##  Max.   :18.3240                           
##  28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.6140                           
##  Mean   :-2.4599                           
##  3rd Qu.:-0.7757                           
##  Max.   :18.0427                           
##  911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5570                           
##  Mean   :-2.4067                           
##  3rd Qu.:-0.6097                           
##  Max.   :17.9832                           
##  d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4745                           
##  3rd Qu.:-0.6094                           
##  Max.   :18.1525                           
##  f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
##  Min.   :-5.00536                          
##  1st Qu.:-5.00536                          
##  Median :-4.29934                          
##  Mean   :-2.20443                          
##  3rd Qu.:-0.06656                          
##  Max.   :17.62822                          
##  f590941d-19dc-427a-95b6-942c97ea8333.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.3872                           
##  3rd Qu.:-0.5109                           
##  Max.   :18.2714                           
##  55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-4.694                            
##  Mean   :-2.678                            
##  3rd Qu.:-1.498                            
##  Max.   :19.004                            
##  0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4202                           
##  3rd Qu.:-0.3006                           
##  Max.   :18.0102                           
##  9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.2625                           
##  Mean   :-2.3000                           
##  3rd Qu.:-0.2363                           
##  Max.   :18.3430                           
##  d2587070-cb7d-440d-ae49-52f5077248e6.htseq
##  Min.   :-5.00536                          
##  1st Qu.:-5.00536                          
##  Median :-4.55846                          
##  Mean   :-2.21997                          
##  3rd Qu.:-0.02803                          
##  Max.   :17.69040                          
##  7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-4.644                            
##  Mean   :-2.912                            
##  3rd Qu.:-1.374                            
##  Max.   :18.862                            
##  2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.3752                           
##  3rd Qu.:-0.3785                           
##  Max.   :17.9383                           
##  424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.1417                           
##  Mean   :-2.4586                           
##  3rd Qu.:-0.7395                           
##  Max.   :19.0773                           
##  934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-3.9604                           
##  Mean   :-2.5190                           
##  3rd Qu.:-0.7787                           
##  Max.   :18.7200                           
##  0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.310                            
##  3rd Qu.:-0.129                            
##  Max.   :17.739                            
##  c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5364                           
##  Mean   :-2.4309                           
##  3rd Qu.:-0.6687                           
##  Max.   :17.9586                           
##  dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.6905                           
##  Mean   :-2.3374                           
##  3rd Qu.:-0.2584                           
##  Max.   :18.1117                           
##  e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5190                           
##  3rd Qu.:-0.8769                           
##  Max.   :18.2798                           
##  debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-4.529                            
##  Mean   :-2.669                            
##  3rd Qu.:-1.269                            
##  Max.   :19.271                            
##  17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.4684                           
##  Mean   :-2.2833                           
##  3rd Qu.:-0.1722                           
##  Max.   :18.0424                           
##  7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.315                            
##  3rd Qu.:-0.347                            
##  Max.   :18.253                            
##  fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4626                           
##  3rd Qu.:-0.7878                           
##  Max.   :18.5223                           
##  abe20df7-6b97-4397-8864-881bac27e92c.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4819                           
##  3rd Qu.:-0.6892                           
##  Max.   :18.2755                           
##  62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4094                           
##  3rd Qu.:-0.6332                           
##  Max.   :18.1225                           
##  3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.4363                           
##  Mean   :-2.3126                           
##  3rd Qu.:-0.3237                           
##  Max.   :17.6880                           
##  087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5625                           
##  Mean   :-2.4787                           
##  3rd Qu.:-0.9039                           
##  Max.   :18.0167                           
##  c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5007                           
##  3rd Qu.:-0.6715                           
##  Max.   :18.2731                           
##  13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.6001                           
##  Mean   :-2.2739                           
##  3rd Qu.:-0.3063                           
##  Max.   :17.8027                           
##  6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5434                           
##  Mean   :-2.3342                           
##  3rd Qu.:-0.4192                           
##  Max.   :17.8651                           
##  8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.4560                           
##  Mean   :-2.4084                           
##  3rd Qu.:-0.7874                           
##  Max.   :17.8969                           
##  168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4576                           
##  3rd Qu.:-0.6493                           
##  Max.   :18.3474                           
##  0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5027                           
##  Mean   :-2.3667                           
##  3rd Qu.:-0.5842                           
##  Max.   :17.8840                           
##  4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.3788                           
##  3rd Qu.:-0.5019                           
##  Max.   :18.2679                           
##  7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5250                           
##  3rd Qu.:-0.8667                           
##  Max.   :18.4535                           
##  b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.4286                           
##  Mean   :-2.3446                           
##  3rd Qu.:-0.5695                           
##  Max.   :17.7747                           
##  f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5018                           
##  Mean   :-2.3837                           
##  3rd Qu.:-0.5533                           
##  Max.   :18.0439                           
##  f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.374                            
##  3rd Qu.:-0.463                            
##  Max.   :18.269                            
##  e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.494                            
##  3rd Qu.:-0.882                            
##  Max.   :18.606                            
##  a26d49db-2309-46a0-a3ed-275378d484e7.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.505                            
##  3rd Qu.:-1.076                            
##  Max.   :17.950                            
##  a3f88a5d-7169-465b-bb80-e5999590681c.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.0762                           
##  Mean   :-1.9487                           
##  3rd Qu.: 0.8739                           
##  Max.   :18.1564                           
##  c264fe3b-482b-44ec-83a4-73df565663ff.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5239                           
##  Mean   :-2.2732                           
##  3rd Qu.:-0.3071                           
##  Max.   :17.8054                           
##  bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4576                           
##  3rd Qu.:-0.7238                           
##  Max.   :18.4775                           
##  7261b656-c79c-4581-a503-15b653e2b5d2.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.3392                           
##  3rd Qu.:-0.4668                           
##  Max.   :17.7787                           
##  ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.6205                           
##  Mean   :-2.3583                           
##  3rd Qu.:-0.4248                           
##  Max.   :17.9569                           
##  f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.635                            
##  3rd Qu.:-0.981                            
##  Max.   :18.298                            
##  bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5172                           
##  3rd Qu.:-0.9961                           
##  Max.   :18.2876                           
##  564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.5218                           
##  3rd Qu.:-0.6848                           
##  Max.   :18.2312                           
##  82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-4.5039                           
##  Mean   :-2.4147                           
##  3rd Qu.:-0.7359                           
##  Max.   :17.8176                           
##  9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.3284                           
##  3rd Qu.:-0.2542                           
##  Max.   :18.0456                           
##  fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-4.399                            
##  Mean   :-2.282                            
##  3rd Qu.:-0.272                            
##  Max.   :17.898                            
##  32a115cf-570f-4ad9-a123-8e1970062f51.htseq
##  Min.   :-5.00536                          
##  1st Qu.:-5.00536                          
##  Median :-4.51407                          
##  Mean   :-2.18955                          
##  3rd Qu.:-0.05016                          
##  Max.   :17.82851                          
##  05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
##  Min.   :-5.005                            
##  1st Qu.:-5.005                            
##  Median :-5.005                            
##  Mean   :-2.610                            
##  3rd Qu.:-1.036                            
##  Max.   :18.150                            
##  5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4734                           
##  3rd Qu.:-0.6384                           
##  Max.   :18.7800                           
##  43b292be-5d63-4523-a43f-666d20039208.htseq
##  Min.   :-5.0054                           
##  1st Qu.:-5.0054                           
##  Median :-5.0054                           
##  Mean   :-2.4267                           
##  3rd Qu.:-0.6233                           
##  Max.   :18.3746

Remove lowly expressed genes

True signifies how many genes have counts equal to zero, meaning genes are unexpressed throughout all samples.

table(rowSums(x$counts==0)==9)
## 
## FALSE  TRUE 
## 60074   413

Filter genes while keeping as many genes as possible with worthwile counts

keep.exprs <- filterByExpr(x, group=group)
x <- x[keep.exprs,, keep.lib.sizes=FALSE]
dim(x)
## [1] 19105    60

Plot the density of log-CPM values for raw and filtered data

There is a sample that is a potential outlier (green colored line), could remove the sample for future analysis but spoke to porfessor Craig on 12/4 and agreed to leave the sample in since the vignette has a normalisation step.

Known issue: color palette

Spoke to professor Craig on 12/4 and agreed to stop working on this issue. I understand that the “Paired” palatte only offers 12 colors so every 13th sample repeats color scheme. I tried increasing the number of colors available with colorRampPalatte but was unsuccesful.

lcpm.cutoff <- log2(10/M + 2/L)
library(RColorBrewer)
#library(colorRamps)
nsamples <- ncol(x)
col <- brewer.pal(nsamples, "Paired") #results in the error message: n too large, allowed maximum for palette Paired is 12. Returning the palette you asked for with that many colors
## Warning in brewer.pal(nsamples, "Paired"): n too large, allowed maximum for palette Paired is 12
## Returning the palette you asked for with that many colors
#nb.cols = 60
#col <- colorRampPalette(brewer.pal(nsamples, "Paired"))(nb.cols) #colorRampPalette is a constructor function that builds palettes with arbitrary number of colors by interpolating existing palette 
par(mfrow=c(1,2)) #1 row, 2 columns
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.26), las=2, main="", xlab="")
title(main="A. Raw data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")
lcpm <- cpm(x, log=TRUE)
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.26), las=2, main="", xlab="")
title(main="B. Filtered data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}
legend("topright", samplenames, text.col=col, bty="n")

Normalising gene expression distributions

x <- calcNormFactors(x, method = "TMM")
x$samples$norm.factors
##  [1] 0.8247877 0.8701067 0.9744718 0.3338411 1.0672340 0.9853437 0.8355374
##  [8] 1.0685271 1.1041480 1.1621714 1.3190864 1.1616288 0.5656024 1.0079779
## [15] 1.1311621 1.3455788 0.2548806 1.1728192 0.5164198 0.4158854 1.1552465
## [22] 1.0977806 1.1538364 1.0380711 0.5054850 1.2962443 1.1705252 0.9900169
## [29] 0.9259593 1.1222135 1.2985756 1.0598233 0.9471138 1.3391984 1.3043419
## [36] 1.1144424 1.0697018 1.1921660 1.1054413 0.9399911 1.2384865 1.2243347
## [43] 1.0760378 0.9933192 1.1164186 1.4176459 1.3689476 0.8989757 1.3130426
## [50] 1.0789261 0.7851273 1.0110826 0.9751891 1.1994225 1.2667583 1.3476310
## [57] 1.4869359 1.0917179 0.8579364 1.0468322

Improve visualization by duplicating data, then adjusting the counts

x2 <- x
x2$samples$norm.factors <- 1
x2$counts[,1] <- ceiling(x2$counts[,1]*0.05)
x2$counts[,2] <- x2$counts[,2]*5

Boxplot expression distribution of samples for unnormalised data

par(mfrow=c(1,1)) #makes boxplot look less cramped 
lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="A. Example: Unnormalised data",ylab="Log-cpm")

x2 <- calcNormFactors(x2)  
x2$samples$norm.factors
##  [1] 0.04889808 4.36314201 0.99857375 0.36344409 1.08814631 1.00048470
##  [7] 0.91165862 1.07196030 1.10500025 1.21054683 1.28772624 1.16063702
## [13] 0.60882746 1.04998828 1.13203522 1.31787908 0.28023587 1.16142488
## [19] 0.53329090 0.46494919 1.14274272 1.09741972 1.16406362 1.04925131
## [25] 0.51082602 1.30832124 1.17758030 1.00128465 0.97668711 1.11752935
## [31] 1.28315406 1.05679584 0.99263908 1.36510564 1.33873108 1.14293994
## [37] 1.10193770 1.21035319 1.10148876 0.97629109 1.25905436 1.25325781
## [43] 1.12605092 1.02503014 1.11956858 1.41609997 1.38704078 0.91182127
## [49] 1.31684398 1.09702295 0.85028060 1.03939580 1.01768934 1.20164571
## [55] 1.27684217 1.35509884 1.51897054 1.10244902 0.85428854 1.05218206

Boxplot expression distribution of samples for normalised data

This step forces the samples to even out, may not be a good thing since there is a potential outlier.

lcpm <- cpm(x2, log=TRUE)
boxplot(lcpm, las=2, col=col, main="")
title(main="B. Example: Normalised data",ylab="Log-cpm")

Unsupervised clustering of cells: make multi-dimensional scaling plot (MDS) to show simmilarities and dissimilarities between samples in an unsupervised manner

Known issue: color palette

I spoke to professor Craig on 12/4, ok to ignore error since I am only comparing 2 different subsets of colon cancer. To get rid of this error I would need to add an additional factor: lane.

lcpm <- cpm(x, log=TRUE)
par(mfrow=c(1,1)) #1 row, 1 column 
col.group <- group
levels(col.group) <-  brewer.pal(nlevels(col.group), "Set1") #n= number of different colors in a palette with the min being 3 
## Warning in brewer.pal(nlevels(col.group), "Set1"): minimal value for n is 3, returning requested palette with 3 different levels
col.group <- as.character(col.group)
#col.lane <- lane did not have lanes for my data
#levels(col.lane) <-  brewer.pal(nlevels(col.lane), "Set2")
#col.lane <- as.character(col.lane)
plotMDS(lcpm, labels=group, col=col.group)
title(main="A. Sample groups")

#plotMDS(lcpm, labels=lane, col=col.lane, dim=c(3,4))
#title(main="B. Sequencing lanes")

Make interactive using Glimma

HTML page will be generarted and opened in a browser if launch=TRUE

library(Glimma)
glMDSPlot(lcpm, labels=paste(group, sep="_"), 
          groups=x$samples[,c(1,2)], launch=TRUE)

Differential expression analysis

Creating a design matrix

design <- model.matrix(~0+group) #removes intercept from the factor group
#design <- model.matrix(~group) leaves intercept from factor group, but model contrasts are more straight forward without intercept
colnames(design) <- gsub("group", "", colnames(design))
design
##    ADENOCARCINOMA CMS
## 1               0   1
## 2               0   1
## 3               0   1
## 4               0   1
## 5               0   1
## 6               0   1
## 7               0   1
## 8               0   1
## 9               0   1
## 10              0   1
## 11              0   1
## 12              0   1
## 13              0   1
## 14              0   1
## 15              0   1
## 16              0   1
## 17              0   1
## 18              0   1
## 19              0   1
## 20              0   1
## 21              0   1
## 22              0   1
## 23              0   1
## 24              0   1
## 25              0   1
## 26              0   1
## 27              0   1
## 28              0   1
## 29              0   1
## 30              0   1
## 31              1   0
## 32              1   0
## 33              1   0
## 34              1   0
## 35              1   0
## 36              1   0
## 37              1   0
## 38              1   0
## 39              1   0
## 40              1   0
## 41              1   0
## 42              1   0
## 43              1   0
## 44              1   0
## 45              1   0
## 46              1   0
## 47              1   0
## 48              1   0
## 49              1   0
## 50              1   0
## 51              1   0
## 52              1   0
## 53              1   0
## 54              1   0
## 55              1   0
## 56              1   0
## 57              1   0
## 58              1   0
## 59              1   0
## 60              1   0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

Contrasts for pairwise comparisons between cell populations

Since I am only comparing CMS and Adenocarcinoma, I will only have 1 pairwise comparison.

library(limma)
contr.matrix <- makeContrasts(
   ADENOCARCINOMAvsCMS = ADENOCARCINOMA-CMS, 
   levels = colnames(design))
contr.matrix
##                 Contrasts
## Levels           ADENOCARCINOMAvsCMS
##   ADENOCARCINOMA                   1
##   CMS                             -1

Remove heteroscedascity from count data

Voom plot

Each black dot represents a gene. The red curve is the estimated mean-varience trend used to compute the voom weights.

 par(mfrow=c(1,2))
v <- voom(x, design, plot=TRUE) #voom converts raw counts to log-CPM values by extracting library sizes and normalisation factors from x
v
## An object of class "EList"
## $genes
##           ENSEMBL   SYMBOL TXCHROM
## 1 ENSG00000000005     TNMD    chrX
## 2 ENSG00000000419     DPM1   chr20
## 3 ENSG00000000457    SCYL3    chr1
## 4 ENSG00000000460 C1orf112    chr1
## 5 ENSG00000000938      FGR    chr1
## 19100 more rows ...
## 
## $targets
##                                                                                     files
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq.txt
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq.txt
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq.txt
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq.txt
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq.txt
##                                            group lib.size norm.factors
## 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq   CMS 74292479    0.8247877
## bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq   CMS 31987425    0.8701067
## 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq   CMS 41835745    0.9744718
## 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq   CMS 37079196    0.3338411
## 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq   CMS 38748737    1.0672340
## 55 more rows ...
## 
## $E
##                     Samples
## Tags                 9e8b528b-1172-4c07-a09b-ebb23cf2310c.htseq
##   ENSG00000000005.5                                  -0.6452887
##   ENSG00000000419.11                                  4.0286248
##   ENSG00000000457.12                                  3.9851413
##   ENSG00000000460.15                                  0.7096682
##   ENSG00000000938.11                                  1.1642341
##                     Samples
## Tags                 bda1a9a4-a14f-4463-81d2-a4fcca65d6f1.htseq
##   ENSG00000000005.5                                   -2.829508
##   ENSG00000000419.11                                   4.473258
##   ENSG00000000457.12                                   2.886263
##   ENSG00000000460.15                                   2.725081
##   ENSG00000000938.11                                   2.134993
##                     Samples
## Tags                 5697212f-b3fd-479f-84b0-ec0aae54534a.htseq
##   ENSG00000000005.5                                   -4.064736
##   ENSG00000000419.11                                   4.069690
##   ENSG00000000457.12                                   3.723166
##   ENSG00000000460.15                                   2.937516
##   ENSG00000000938.11                                   1.160230
##                     Samples
## Tags                 7f9a629b-12ed-48cc-8d5c-1c2f5db9cf1f.htseq
##   ENSG00000000005.5                                   -1.820221
##   ENSG00000000419.11                                   3.544018
##   ENSG00000000457.12                                   4.437616
##   ENSG00000000460.15                                   4.029445
##   ENSG00000000938.11                                   3.674683
##                     Samples
## Tags                 15864159-be88-41c8-bdef-c2c5927cb1a1.htseq
##   ENSG00000000005.5                                   -3.468722
##   ENSG00000000419.11                                   4.049228
##   ENSG00000000457.12                                   3.007011
##   ENSG00000000460.15                                   2.235675
##   ENSG00000000938.11                                   1.751829
##                     Samples
## Tags                 649b19e1-96e2-4b55-951d-3b6ee9f4b91f.htseq
##   ENSG00000000005.5                                  -1.1743179
##   ENSG00000000419.11                                  5.1369998
##   ENSG00000000457.12                                  2.9131449
##   ENSG00000000460.15                                  2.6294792
##   ENSG00000000938.11                                  0.6819466
##                     Samples
## Tags                 86679663-dfc5-46ad-8cf9-c7954c4b339b.htseq
##   ENSG00000000005.5                                   -2.786013
##   ENSG00000000419.11                                   5.199731
##   ENSG00000000457.12                                   4.248928
##   ENSG00000000460.15                                   3.844348
##   ENSG00000000938.11                                   3.108387
##                     Samples
## Tags                 28004569-048d-4f8c-99aa-7a8c69a98dcc.htseq
##   ENSG00000000005.5                                   -1.553263
##   ENSG00000000419.11                                   4.371787
##   ENSG00000000457.12                                   2.924414
##   ENSG00000000460.15                                   2.663967
##   ENSG00000000938.11                                   2.643134
##                     Samples
## Tags                 911f6378-8a25-4570-9d3b-80f5b5bfc085.htseq
##   ENSG00000000005.5                                  -5.2805208
##   ENSG00000000419.11                                  4.3152962
##   ENSG00000000457.12                                  2.9396157
##   ENSG00000000460.15                                  2.2570859
##   ENSG00000000938.11                                  0.5112933
##                     Samples
## Tags                 d5dca54e-d7e9-4328-b2ca-1d191a2b8b4c.htseq
##   ENSG00000000005.5                                   -2.408949
##   ENSG00000000419.11                                   5.250282
##   ENSG00000000457.12                                   4.301365
##   ENSG00000000460.15                                   3.473694
##   ENSG00000000938.11                                   2.114612
##                     Samples
## Tags                 f3895ae4-1228-49b3-9342-3c3b86cb5243.htseq
##   ENSG00000000005.5                                   -2.674378
##   ENSG00000000419.11                                   4.258048
##   ENSG00000000457.12                                   3.219863
##   ENSG00000000460.15                                   1.996700
##   ENSG00000000938.11                                   3.869207
##                     Samples
## Tags                 f590941d-19dc-427a-95b6-942c97ea8333.htseq
##   ENSG00000000005.5                                   -6.376198
##   ENSG00000000419.11                                   4.130606
##   ENSG00000000457.12                                   3.101561
##   ENSG00000000460.15                                   2.533696
##   ENSG00000000938.11                                   3.187952
##                     Samples
## Tags                 55aa6d16-3598-42ca-8844-0fe84739ef66.htseq
##   ENSG00000000005.5                                   -5.645779
##   ENSG00000000419.11                                   5.295513
##   ENSG00000000457.12                                   3.570967
##   ENSG00000000460.15                                   3.454883
##   ENSG00000000938.11                                   2.374738
##                     Samples
## Tags                 0e7094cf-4c79-43f4-8b72-9de259e5e18f.htseq
##   ENSG00000000005.5                                   -1.143810
##   ENSG00000000419.11                                   3.241950
##   ENSG00000000457.12                                   2.041301
##   ENSG00000000460.15                                   2.198582
##   ENSG00000000938.11                                   3.788053
##                     Samples
## Tags                 9a62fe1f-36ec-4e8e-b3d9-bdfc62f71905.htseq
##   ENSG00000000005.5                                   -2.844269
##   ENSG00000000419.11                                   4.802950
##   ENSG00000000457.12                                   3.394772
##   ENSG00000000460.15                                   2.854320
##   ENSG00000000938.11                                   2.128424
##                     Samples
## Tags                 d2587070-cb7d-440d-ae49-52f5077248e6.htseq
##   ENSG00000000005.5                                  0.06657207
##   ENSG00000000419.11                                 4.12233363
##   ENSG00000000457.12                                 2.89854696
##   ENSG00000000460.15                                 1.25377506
##   ENSG00000000938.11                                 4.15471639
##                     Samples
## Tags                 7800bdb2-aa8b-43e0-8e45-1b968872b34e.htseq
##   ENSG00000000005.5                                   -3.518050
##   ENSG00000000419.11                                   4.399620
##   ENSG00000000457.12                                   2.911566
##   ENSG00000000460.15                                   3.157201
##   ENSG00000000938.11                                   2.996072
##                     Samples
## Tags                 2bcd2efd-4fd6-40ee-86a4-867ae82711b0.htseq
##   ENSG00000000005.5                                   -2.288229
##   ENSG00000000419.11                                   4.029531
##   ENSG00000000457.12                                   2.762875
##   ENSG00000000460.15                                   1.912198
##   ENSG00000000938.11                                   2.328744
##                     Samples
## Tags                 424d8e5f-9fc6-470b-ad2c-b4447b0eb07e.htseq
##   ENSG00000000005.5                                   -3.874298
##   ENSG00000000419.11                                   4.834002
##   ENSG00000000457.12                                   4.163623
##   ENSG00000000460.15                                   3.365842
##   ENSG00000000938.11                                   4.758697
##                     Samples
## Tags                 934f9dc6-1260-4268-b022-870f1e37dd6f.htseq
##   ENSG00000000005.5                                   -1.706603
##   ENSG00000000419.11                                   4.063307
##   ENSG00000000457.12                                   3.788035
##   ENSG00000000460.15                                   3.419091
##   ENSG00000000938.11                                   2.599558
##                     Samples
## Tags                 0fa55c0e-6f8f-44a6-82fe-9a42495d3484.htseq
##   ENSG00000000005.5                                   -4.763380
##   ENSG00000000419.11                                   3.898399
##   ENSG00000000457.12                                   2.786598
##   ENSG00000000460.15                                   1.465439
##   ENSG00000000938.11                                   2.976738
##                     Samples
## Tags                 c8544a8a-4352-438d-94d4-3495af2e9a78.htseq
##   ENSG00000000005.5                                   2.2408036
##   ENSG00000000419.11                                  5.2673893
##   ENSG00000000457.12                                  3.0724378
##   ENSG00000000460.15                                  2.2930928
##   ENSG00000000938.11                                  0.1175401
##                     Samples
## Tags                 dade0b16-ecc3-43b3-b328-3819a8fc18c6.htseq
##   ENSG00000000005.5                                   -4.544553
##   ENSG00000000419.11                                   4.505505
##   ENSG00000000457.12                                   2.470112
##   ENSG00000000460.15                                   1.787053
##   ENSG00000000938.11                                   2.555099
##                     Samples
## Tags                 e875ae4e-4645-4e84-b0ff-9c9a694717a9.htseq
##   ENSG00000000005.5                                   -5.921106
##   ENSG00000000419.11                                   4.643996
##   ENSG00000000457.12                                   2.952338
##   ENSG00000000460.15                                   2.939981
##   ENSG00000000938.11                                   3.460437
##                     Samples
## Tags                 debd6982-7c27-42e8-b778-20afcc78a5f3.htseq
##   ENSG00000000005.5                                   -7.372309
##   ENSG00000000419.11                                   4.197072
##   ENSG00000000457.12                                   3.391733
##   ENSG00000000460.15                                   1.938303
##   ENSG00000000938.11                                   2.667980
##                     Samples
## Tags                 17c88994-9e8e-4f16-8c41-34e98a0d8c52.htseq
##   ENSG00000000005.5                                   -3.003489
##   ENSG00000000419.11                                   4.346009
##   ENSG00000000457.12                                   2.818355
##   ENSG00000000460.15                                   1.819463
##   ENSG00000000938.11                                   3.009197
##                     Samples
## Tags                 7f5a924a-ddf3-45ff-be1f-5b5909305f46.htseq
##   ENSG00000000005.5                                   -1.751115
##   ENSG00000000419.11                                   4.290426
##   ENSG00000000457.12                                   3.102534
##   ENSG00000000460.15                                   2.617107
##   ENSG00000000938.11                                   1.849445
##                     Samples
## Tags                 fa73bdce-67fb-42aa-883f-635f0e7bcdc6.htseq
##   ENSG00000000005.5                                   -2.408306
##   ENSG00000000419.11                                   4.410370
##   ENSG00000000457.12                                   2.768674
##   ENSG00000000460.15                                   2.449675
##   ENSG00000000938.11                                   2.846306
##                     Samples
## Tags                 abe20df7-6b97-4397-8864-881bac27e92c.htseq
##   ENSG00000000005.5                                   -3.366592
##   ENSG00000000419.11                                   3.442602
##   ENSG00000000457.12                                   3.241795
##   ENSG00000000460.15                                   3.216222
##   ENSG00000000938.11                                   3.246014
##                     Samples
## Tags                 62f84581-4c7d-4c8e-835c-9304bcec3106.htseq
##   ENSG00000000005.5                                   -5.454801
##   ENSG00000000419.11                                   4.120738
##   ENSG00000000457.12                                   2.446066
##   ENSG00000000460.15                                   2.284417
##   ENSG00000000938.11                                   1.600482
##                     Samples
## Tags                 3abbd2b5-04db-4fe0-8dd1-ea2b48caa4c1.htseq
##   ENSG00000000005.5                                   -5.844178
##   ENSG00000000419.11                                   5.073443
##   ENSG00000000457.12                                   3.085574
##   ENSG00000000460.15                                   2.520686
##   ENSG00000000938.11                                   4.420656
##                     Samples
## Tags                 087666cd-47ae-4f56-b947-d6aa1c25e8a7.htseq
##   ENSG00000000005.5                                 -1.37511256
##   ENSG00000000419.11                                 5.29828123
##   ENSG00000000457.12                                 3.64998907
##   ENSG00000000460.15                                 2.96936577
##   ENSG00000000938.11                                -0.08112134
##                     Samples
## Tags                 c14f98e2-8e9b-49f4-a244-3d06c6cb7126.htseq
##   ENSG00000000005.5                                   -3.735749
##   ENSG00000000419.11                                   3.506472
##   ENSG00000000457.12                                   2.364387
##   ENSG00000000460.15                                   2.790945
##   ENSG00000000938.11                                   3.725321
##                     Samples
## Tags                 13abc91e-fbfc-4c55-bf54-fbd134979ccc.htseq
##   ENSG00000000005.5                                  -0.3984018
##   ENSG00000000419.11                                  4.1132525
##   ENSG00000000457.12                                  2.6501190
##   ENSG00000000460.15                                  2.7228611
##   ENSG00000000938.11                                  1.8857116
##                     Samples
## Tags                 6ae2dd6c-2a39-411f-a1fc-11e0e6e82165.htseq
##   ENSG00000000005.5                                   -1.815983
##   ENSG00000000419.11                                   5.460732
##   ENSG00000000457.12                                   2.786995
##   ENSG00000000460.15                                   2.502506
##   ENSG00000000938.11                                   1.359022
##                     Samples
## Tags                 8f77f4f4-b184-40c7-8ab8-2f95b13620b5.htseq
##   ENSG00000000005.5                                   -4.099968
##   ENSG00000000419.11                                   4.368090
##   ENSG00000000457.12                                   2.898779
##   ENSG00000000460.15                                   2.255627
##   ENSG00000000938.11                                   2.286613
##                     Samples
## Tags                 168e5cb2-7390-45ad-ad04-c9aa4416e950.htseq
##   ENSG00000000005.5                                    1.352327
##   ENSG00000000419.11                                   5.379763
##   ENSG00000000457.12                                   3.195602
##   ENSG00000000460.15                                   2.998821
##   ENSG00000000938.11                                   2.403278
##                     Samples
## Tags                 0ed65bdf-cb92-47c1-8aeb-42518ce639b8.htseq
##   ENSG00000000005.5                                   -4.712731
##   ENSG00000000419.11                                   4.335951
##   ENSG00000000457.12                                   2.875449
##   ENSG00000000460.15                                   2.253054
##   ENSG00000000938.11                                   1.564723
##                     Samples
## Tags                 4e7c6811-88e4-4bb7-a88f-7491dfa6d072.htseq
##   ENSG00000000005.5                                   -4.648538
##   ENSG00000000419.11                                   3.750918
##   ENSG00000000457.12                                   2.845984
##   ENSG00000000460.15                                   2.293977
##   ENSG00000000938.11                                   4.609635
##                     Samples
## Tags                 7fb73a84-867a-4c28-aa02-93068efffb7b.htseq
##   ENSG00000000005.5                                  -0.8970334
##   ENSG00000000419.11                                  5.3337569
##   ENSG00000000457.12                                  2.9023728
##   ENSG00000000460.15                                  3.0631171
##   ENSG00000000938.11                                  1.3644589
##                     Samples
## Tags                 b53f9a9d-b24d-410d-b3e9-f2a8bf22ca27.htseq
##   ENSG00000000005.5                                   -5.751942
##   ENSG00000000419.11                                   4.041932
##   ENSG00000000457.12                                   3.201284
##   ENSG00000000460.15                                   2.385903
##   ENSG00000000938.11                                   2.436234
##                     Samples
## Tags                 f7ce175f-763e-4a55-97e3-0381d889b0eb.htseq
##   ENSG00000000005.5                                   -0.375647
##   ENSG00000000419.11                                   5.658004
##   ENSG00000000457.12                                   3.221699
##   ENSG00000000460.15                                   2.511878
##   ENSG00000000938.11                                   2.246960
##                     Samples
## Tags                 f346f2d2-285c-455c-ba34-ea8eec3fa881.htseq
##   ENSG00000000005.5                                   -2.367437
##   ENSG00000000419.11                                   5.658257
##   ENSG00000000457.12                                   3.448480
##   ENSG00000000460.15                                   2.872878
##   ENSG00000000938.11                                   1.333003
##                     Samples
## Tags                 e53e1a83-1979-4e12-bbb7-79b37d0cfe03.htseq
##   ENSG00000000005.5                                    2.116966
##   ENSG00000000419.11                                   5.446587
##   ENSG00000000457.12                                   3.320462
##   ENSG00000000460.15                                   3.163350
##   ENSG00000000938.11                                   1.840730
##                     Samples
## Tags                 a26d49db-2309-46a0-a3ed-275378d484e7.htseq
##   ENSG00000000005.5                                   -1.557761
##   ENSG00000000419.11                                   5.265786
##   ENSG00000000457.12                                   3.488282
##   ENSG00000000460.15                                   2.567628
##   ENSG00000000938.11                                  -0.981901
##                     Samples
## Tags                 a3f88a5d-7169-465b-bb80-e5999590681c.htseq
##   ENSG00000000005.5                                   -4.475837
##   ENSG00000000419.11                                   4.777617
##   ENSG00000000457.12                                   2.870923
##   ENSG00000000460.15                                   2.437718
##   ENSG00000000938.11                                   3.149466
##                     Samples
## Tags                 c264fe3b-482b-44ec-83a4-73df565663ff.htseq
##   ENSG00000000005.5                                 -0.08499537
##   ENSG00000000419.11                                 5.70387514
##   ENSG00000000457.12                                 2.62655223
##   ENSG00000000460.15                                 2.51712185
##   ENSG00000000938.11                                 1.45794392
##                     Samples
## Tags                 bd2dfab3-88a8-4673-ba36-3daf252d0b4d.htseq
##   ENSG00000000005.5                                   -1.499968
##   ENSG00000000419.11                                   5.045088
##   ENSG00000000457.12                                   2.703904
##   ENSG00000000460.15                                   3.381510
##   ENSG00000000938.11                                   1.365102
##                     Samples
## Tags                 7261b656-c79c-4581-a503-15b653e2b5d2.htseq
##   ENSG00000000005.5                                   -4.885539
##   ENSG00000000419.11                                   5.334355
##   ENSG00000000457.12                                   3.155572
##   ENSG00000000460.15                                   3.516193
##   ENSG00000000938.11                                   2.636454
##                     Samples
## Tags                 ee4dcccc-514b-4cc6-ae63-6ed3e7519a40.htseq
##   ENSG00000000005.5                                   -0.528445
##   ENSG00000000419.11                                   4.027512
##   ENSG00000000457.12                                   3.595314
##   ENSG00000000460.15                                   1.875639
##   ENSG00000000938.11                                   1.324139
##                     Samples
## Tags                 f596eabc-e39a-4e35-9fc6-edade04eb785.htseq
##   ENSG00000000005.5                                  -0.4005062
##   ENSG00000000419.11                                  4.8580127
##   ENSG00000000457.12                                  2.5776894
##   ENSG00000000460.15                                  2.5330664
##   ENSG00000000938.11                                  2.7694188
##                     Samples
## Tags                 bf9c448b-bdc9-4f74-b13a-374e6add7939.htseq
##   ENSG00000000005.5                                  -2.9336310
##   ENSG00000000419.11                                  4.6286114
##   ENSG00000000457.12                                  3.0127879
##   ENSG00000000460.15                                  3.1979402
##   ENSG00000000938.11                                  0.9732596
##                     Samples
## Tags                 564daa81-cfef-45b6-94a0-3249b2724d9b.htseq
##   ENSG00000000005.5                                   0.2422946
##   ENSG00000000419.11                                  3.6186704
##   ENSG00000000457.12                                  2.8334093
##   ENSG00000000460.15                                  1.8994068
##   ENSG00000000938.11                                  2.4725922
##                     Samples
## Tags                 82e00e45-734c-471f-ba97-79ec3b7e0baa.htseq
##   ENSG00000000005.5                                   -1.804916
##   ENSG00000000419.11                                   5.799060
##   ENSG00000000457.12                                   2.975948
##   ENSG00000000460.15                                   2.758334
##   ENSG00000000938.11                                  -0.993678
##                     Samples
## Tags                 9c52ed00-325f-4664-8873-327bcaa5ea74.htseq
##   ENSG00000000005.5                                   -0.425799
##   ENSG00000000419.11                                   4.082319
##   ENSG00000000457.12                                   3.787628
##   ENSG00000000460.15                                   2.414818
##   ENSG00000000938.11                                   1.099043
##                     Samples
## Tags                 fabefb10-5546-4017-8ea1-29982a10fb3c.htseq
##   ENSG00000000005.5                                   -2.416932
##   ENSG00000000419.11                                   5.646898
##   ENSG00000000457.12                                   3.037201
##   ENSG00000000460.15                                   3.207244
##   ENSG00000000938.11                                   2.528229
##                     Samples
## Tags                 32a115cf-570f-4ad9-a123-8e1970062f51.htseq
##   ENSG00000000005.5                                   -1.647989
##   ENSG00000000419.11                                   4.596644
##   ENSG00000000457.12                                   3.421828
##   ENSG00000000460.15                                   2.887235
##   ENSG00000000938.11                                   1.899625
##                     Samples
## Tags                 05eef9f8-a246-403a-b0be-07d274b6f93a.htseq
##   ENSG00000000005.5                                   1.5670755
##   ENSG00000000419.11                                  5.8796382
##   ENSG00000000457.12                                  3.2609724
##   ENSG00000000460.15                                  3.2183805
##   ENSG00000000938.11                                  0.7723944
##                     Samples
## Tags                 5c18c6a8-9ad2-43a8-a3a0-83d8fc0cc257.htseq
##   ENSG00000000005.5                                    2.000466
##   ENSG00000000419.11                                   4.865221
##   ENSG00000000457.12                                   3.671236
##   ENSG00000000460.15                                   2.777068
##   ENSG00000000938.11                                   1.605383
##                     Samples
## Tags                 43b292be-5d63-4523-a43f-666d20039208.htseq
##   ENSG00000000005.5                                    1.851908
##   ENSG00000000419.11                                   4.822736
##   ENSG00000000457.12                                   3.388138
##   ENSG00000000460.15                                   2.298683
##   ENSG00000000938.11                                   1.135335
## 19100 more rows ...
## 
## $weights
##           [,1]      [,2]      [,3]     [,4]      [,5]      [,6]      [,7]
## [1,] 0.4272084 0.3605236 0.3757352 0.365797 0.3693936 0.3605236 0.4570683
## [2,] 1.9814015 1.6365197 1.7819632 1.719739 1.7436574 1.6501657 2.0277074
## [3,] 1.6643834 1.1684340 1.3145104 1.246240 1.2706888 1.1800283 1.8183637
## [4,] 1.3834658 0.9685114 1.0782455 1.026371 1.0448994 0.9770827 1.5681360
## [5,] 1.3694188 0.9598946 1.0679714 1.016793 1.0351053 0.9683706 1.5534252
##           [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
## [1,] 0.4672378 0.4540638 0.3605236 0.4174268 0.3751305 0.4282511 0.3605236
## [2,] 2.0368915 2.0243354 1.3060271 1.9565953 1.7784959 1.9834135 1.4385854
## [3,] 1.8583854 1.8042999 0.9377362 1.6032037 1.3103639 1.6709795 1.0217228
## [4,] 1.6255947 1.5498688 0.8002510 1.3223929 1.0749748 1.3900854 0.8613773
## [5,] 1.6119818 1.5353716 0.7943371 1.3091158 1.0648054 1.3759544 0.8544378
##          [,15]     [,16]     [,17]     [,18]     [,19]     [,20]     [,21]
## [1,] 0.3974921 0.4756471 0.3605236 0.3605236 0.4352165 0.3667904 0.4186382
## [2,] 1.8910097 2.0431154 1.5704385 1.6320811 1.9967627 1.7263329 1.9598408
## [3,] 1.4673786 1.8891483 1.1145441 1.1646677 1.7094138 1.2529634 1.6107185
## [4,] 1.2010650 1.6720149 0.9289599 0.9657247 1.4336004 1.0314693 1.3298629
## [5,] 1.1890175 1.6579150 0.9210325 0.9571388 1.4193715 1.0218322 1.3164920
##          [,22]     [,23]     [,24]     [,25]     [,26]     [,27]     [,28]
## [1,] 0.4481102 0.5034813 0.3605236 0.4378621 0.4486748 0.3693138 0.3821114
## [2,] 2.0166399 2.0545965 1.6045624 2.0017922 2.0173736 1.7431256 1.8186144
## [3,] 1.7764873 1.9617136 1.1414146 1.7237786 1.7791217 1.2701433 1.3587262
## [4,] 1.5140091 1.7983482 0.9485683 1.4502539 1.5173906 1.0444864 1.1130911
## [5,] 1.4999295 1.7871833 0.9404314 1.4358184 1.5032717 1.0346972 1.1022734
##          [,29]     [,30]     [,31]     [,32]     [,33]     [,34]     [,35]
## [1,] 0.3636339 0.4156818 0.5566926 0.5692988 0.4457222 0.6160439 0.5910201
## [2,] 1.7054115 1.9519111 2.0455284 2.0508520 1.8436462 2.0553194 2.0556041
## [3,] 1.2316779 1.5924075 1.7025964 1.7535783 1.1491360 1.8988201 1.8304581
## [4,] 1.0153172 1.3116768 1.5372851 1.5961086 1.0233400 1.7812708 1.6886882
## [5,] 1.0059897 1.2983333 1.1829062 1.2325189 0.8208803 1.4281382 1.3216210
##          [,36]     [,37]     [,38]     [,39]     [,40]     [,41]     [,42]
## [1,] 0.5422713 0.4351421 0.5651271 0.4583116 0.4546091 0.5482909 0.5684231
## [2,] 2.0374975 1.7951817 2.0493867 1.8918037 1.8778143 2.0416245 2.0506310
## [3,] 1.6396777 1.0987680 1.7365313 1.2115461 1.1929927 1.6677198 1.7499956
## [4,] 1.4686227 0.9815781 1.5772147 1.0754848 1.0599818 1.4980710 1.5923585
## [5,] 1.1280796 0.7932221 1.2158480 0.8549097 0.8448113 1.1506285 1.2290082
##          [,43]    [,44]     [,45]    [,46]     [,47]     [,48]     [,49]
## [1,] 0.4612994 0.507252 0.5436882 0.576914 0.5909329 0.4342706 0.5367139
## [2,] 1.9008018 2.003378 2.0385513 2.052757 2.0556004 1.7909115 2.0333376
## [3,] 1.2266409 1.466563 1.6462312 1.780728 1.8301464 1.0946825 1.6140829
## [4,] 1.0883208 1.298794 1.4756170 1.628880 1.6883568 0.9782159 1.4413704
## [5,] 0.8634054 1.004397 1.1333250 1.263292 1.3212579 0.7909791 1.1076346
##          [,50]     [,51]     [,52]     [,53]     [,54]     [,55]     [,56]
## [1,] 0.5938889 0.3807075 0.4489648 0.3806337 0.5663092 0.4445163 0.5513897
## [2,] 2.0557268 1.4513570 1.8561201 1.4508110 2.0499284 1.8390051 2.0430719
## [3,] 1.8385894 0.8728752 1.1650257 0.8726089 1.7413566 1.1432791 1.6813825
## [4,] 1.6996040 0.7953596 1.0366278 0.7951382 1.5829164 1.0184223 1.5124736
## [5,] 1.3335985 0.6665224 0.8295810 0.6663713 1.2205580 0.8176537 1.1624749
##          [,57]     [,58]     [,59]     [,60]
## [1,] 0.5991062 0.3690418 0.4772951 0.4619017
## [2,] 2.0559479 1.3656913 1.9465166 1.9026135
## [3,] 1.8531617 0.8321907 1.3093027 1.2296981
## [4,] 1.7195119 0.7611163 1.1586773 1.0909190
## [5,] 1.3555422 0.6433334 0.9098613 0.8651225
## 19100 more rows ...
## 
## $design
##   ADENOCARCINOMA CMS
## 1              0   1
## 2              0   1
## 3              0   1
## 4              0   1
## 5              0   1
## 55 more rows ...

Apply voom precision weights to data

Each black dot is a gene. The blue line is the average log2 residual standard deviation computed with the Bayes algorithm.

vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit, main="Final model: Mean-variance trend") #plots log2 residual standard deviations against mean log-CPM values

Examine the number of DE genes

Quick view at how many genes are down-regulated, up-regulated, and not statistically significant. The adjusted p-value cutoff is 5% by default.

summary(decideTests(efit))
##        ADENOCARCINOMAvsCMS
## Down                  1474
## NotSig               15810
## Up                    1821

Set a minimum log-fold change(log-FC) of 1

This is a stricter definition of significance and could be overcorrecting since now I don’t have any down-regulated or up-regulated genes.

tfit <- treat(vfit, lfc=1) #p-values calculated from empirical Bayes moderated t-statistics with a minimum log-FC requirement.
dt <- decideTests(tfit)
#dt <- decideTests(efit) #for testing purposes
summary(dt)
##        ADENOCARCINOMAvsCMS
## Down                     0
## NotSig               19105
## Up                       0

Extract genes that are DE in multiple comparisons

I don’t have any DE genes if tfit is used. If efit is used, I have 3295 DE genes.

de.common <- which(dt[,1]!=0)
length(de.common) 
## [1] 0

The first 20 DE genes

If efit is used the genes are: “DPM1”, “CFH”, “LAS1L”, “CFTR”, “TMEM176A”, “DBNDD1”, “TFPI”, “SLC7A2”, “ARF5”, “POLDIP2”, “ARHGAP33”, “UPF1”, “MCUB”, “POLR2J”, “THSD7A”, “LIG3”, “SPPL2B”, “IBTK”, “PDK2”, “REX1BD”

head(tfit$genes$SYMBOL[de.common], n=20)
## character(0)

Make Venn Diagram

My diagram only has 1 circle because I only have 1 pairwise comparison.

vennDiagram(dt[,1], circle.col=c("turquoise", "salmon"))

Extract and write results for comparisons of ADENOCARCINOMAvsCMS to a single output file

write.fit(tfit, dt, file="results.txt")

Examining individual DE genes from top to bottom

ADENOCARCINOMA.vs.CMS <- topTreat(tfit, coef=1, n=Inf)
head(ADENOCARCINOMA.vs.CMS)

Summarize results for genes using mean-difference plots that highlight DE genes

If efit is used, will have read, black and blue genes. Since tfit is used, all genes are black.

plotMD(tfit, column=1, status=dt[,1], main=colnames(tfit)[1], 
       xlim=c(-8,13))

Make interactive mean-difference plot

To open HTML page in a browser make launch=TRUE

library(Glimma)
glMDPlot(tfit, coef=1, status=dt, main=colnames(tfit)[1],
         side.main="ENSEMBL", counts=lcpm, groups=group, launch=TRUE)

Make heatmap

Install heatmap.plus beacuse heatmap.2 did not work for my data.

library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:IRanges':
## 
##     space
## The following object is masked from 'package:S4Vectors':
## 
##     space
## The following object is masked from 'package:stats':
## 
##     lowess
library(heatmap.plus)
ADENOCARCINOMA.vs.CMS.topgenes <- ADENOCARCINOMA.vs.CMS$ENSEMBL[1:100]
i <- which(v$genes$ENSEMBL %in% ADENOCARCINOMA.vs.CMS.topgenes)
mycol <- colorpanel(1000,"blue","white","red")
#par("mar") OUTPUT SHOULD BE [1] 5.1 4.1 4.1 2.1
par(cex.main=0.8,mar=c(1,1,1,1)) #mar=c(1,1,1,1) ensures margins are large enough
heatmap.plus(lcpm[i,], col=bluered(20),cexRow=1,cexCol=0.2, margins = c(10,10), main = "HeatMap") #changed the margins to have a more legible heatmap

Gene set testing with Camera

I spoke to professor Craig on 12/4 and agreed that this step would not work for me becuase I did not have differentlially expressed genes.

Ashley E Noriega,

Dec 6, 2019

TRGN 510 Final Project: Milestone 4

HTML containing Knitted R Notebook

Known Issues: Interactive plots

I spoke to professor Craig on 12/6 and the interactive plots are not embedded in RPubs.