First we’ll load the data. Here I’m loading from an .rds file, but this could also be from a .tsv or similar file.
full_gene_matrix <- readRDS("~/Documents/projects/rgunaratna/removing_duplicate_genes/expdf_subset_ramesh.rds")
full_gene_matrix
Next, we’ll use the dplyr package to help us perform the grouping operation. You may need to install it first with install.packages("dplyr") or possibly install.packages("tidyverse").
library(dplyr)
mean_gene_matrix <- full_gene_matrix %>% group_by(gene_symbol) %>% summarise_all(funs(mean))
mean_gene_matrix
We can perform various opertaions, including median.
median_gene_matrix <- full_gene_matrix %>% group_by(gene_symbol) %>% summarise_all(funs(median))
median_gene_matrix
The results are no longer integers, which can cause issues with downstream analysis that is expecting integer counts. One can convert the values to integers by rounding. Note: You could also use the floor or ceiling functions in place of the round function, depending on your desired results.
mean_gene_matrix_rounded <- mean_gene_matrix %>% mutate_if(is.numeric, round)
mean_gene_matrix_rounded
LS0tCnRpdGxlOiAiQ29tYmluaW5nIGR1cGxpY2F0ZSBnZW5lcyAocm93cykiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCkZpcnN0IHdlJ2xsIGxvYWQgdGhlIGRhdGEuIEhlcmUgSSdtIGxvYWRpbmcgZnJvbSBhbiBgLnJkc2AgZmlsZSwgYnV0IHRoaXMgY291bGQgYWxzbyBiZSBmcm9tIGEgYC50c3ZgIG9yIHNpbWlsYXIgZmlsZS4KYGBge3IgbG9hZF9kYXRhfQpmdWxsX2dlbmVfbWF0cml4IDwtIHJlYWRSRFMoIn4vRG9jdW1lbnRzL3Byb2plY3RzL3JndW5hcmF0bmEvcmVtb3ZpbmdfZHVwbGljYXRlX2dlbmVzL2V4cGRmX3N1YnNldF9yYW1lc2gucmRzIikKZnVsbF9nZW5lX21hdHJpeApgYGAKCk5leHQsIHdlJ2xsIHVzZSB0aGUgW2RwbHlyXShodHRwczovL2RwbHlyLnRpZHl2ZXJzZS5vcmcvKSBwYWNrYWdlIHRvIGhlbHAgdXMgcGVyZm9ybSB0aGUgZ3JvdXBpbmcgb3BlcmF0aW9uLiBZb3UgbWF5IG5lZWQgdG8gaW5zdGFsbCBpdCBmaXJzdCB3aXRoIGBpbnN0YWxsLnBhY2thZ2VzKCJkcGx5ciIpYCBvciBwb3NzaWJseSBgaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIilgLgpgYGB7ciBncm91cF9tZWFufQpsaWJyYXJ5KGRwbHlyKQptZWFuX2dlbmVfbWF0cml4IDwtIGZ1bGxfZ2VuZV9tYXRyaXggJT4lIGdyb3VwX2J5KGdlbmVfc3ltYm9sKSAlPiUgc3VtbWFyaXNlX2FsbChmdW5zKG1lYW4pKQptZWFuX2dlbmVfbWF0cml4CmBgYAoKV2UgY2FuIHBlcmZvcm0gdmFyaW91cyBvcGVydGFpb25zLCBpbmNsdWRpbmcgYG1lZGlhbmAuCmBgYHtyIGdyb3VwX21lZGlhbn0KbWVkaWFuX2dlbmVfbWF0cml4IDwtIGZ1bGxfZ2VuZV9tYXRyaXggJT4lIGdyb3VwX2J5KGdlbmVfc3ltYm9sKSAlPiUgc3VtbWFyaXNlX2FsbChmdW5zKG1lZGlhbikpCm1lZGlhbl9nZW5lX21hdHJpeApgYGAKClRoZSByZXN1bHRzIGFyZSBubyBsb25nZXIgaW50ZWdlcnMsIHdoaWNoIGNhbiBjYXVzZSBpc3N1ZXMgd2l0aCBkb3duc3RyZWFtIGFuYWx5c2lzIHRoYXQgaXMgZXhwZWN0aW5nIGludGVnZXIgY291bnRzLiBPbmUgY2FuIGNvbnZlcnQgdGhlIHZhbHVlcyB0byBpbnRlZ2VycyBieSByb3VuZGluZy4gKk5vdGU6KiBZb3UgY291bGQgYWxzbyB1c2UgdGhlIGBmbG9vcmAgb3IgYGNlaWxpbmdgIGZ1bmN0aW9ucyBpbiBwbGFjZSBvZiB0aGUgYHJvdW5kYCBmdW5jdGlvbiwgZGVwZW5kaW5nIG9uIHlvdXIgZGVzaXJlZCByZXN1bHRzLgpgYGB7ciByb3VuZF90b19pbnRlZ2VyfQptZWFuX2dlbmVfbWF0cml4X3JvdW5kZWQgPC0gbWVhbl9nZW5lX21hdHJpeCAlPiUgbXV0YXRlX2lmKGlzLm51bWVyaWMsIHJvdW5kKQptZWFuX2dlbmVfbWF0cml4X3JvdW5kZWQKYGBg