read counts for each gene by sample - That is the data Maia produces prior to using edgeR to statistically determine differential expression. This data goes into calculating differential expression
the differential expression results, which is what is usually hand off to the researcher.
counts <- read.csv("CF_countsSum.csv")
head(counts)
## X HAP77_CTL HAP79_CTL HAP83_CTL CF573_CTL CF580_CTL
## 1 ENSG00000000419 31 29 5 74 178
## 2 ENSG00000000457 2 2 2 10 18
## 3 ENSG00000000460 7 0 4 10 24
## 4 ENSG00000000971 107 159 44 596 635
## 5 ENSG00000001036 38 40 17 76 135
## 6 ENSG00000001084 109 81 19 153 374
## CF582_CTL CF586_CTL HAP77_RV1B HAP79_RV1B HAP83_RV1B CF573_RV1B
## 1 61 328 42 135 59 73
## 2 18 45 8 14 9 8
## 3 8 74 7 19 36 11
## 4 179 732 117 437 424 172
## 5 40 339 46 151 116 56
## 6 127 777 125 492 159 61
## CF580_RV1B CF582_RV1B CF586_RV1B
## 1 80 130 121
## 2 18 52 13
## 3 14 43 12
## 4 323 1391 309
## 5 67 179 116
## 6 66 276 136
str(counts)
## 'data.frame': 14273 obs. of 15 variables:
## $ X : Factor w/ 14273 levels "ENSG00000000419",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ HAP77_CTL : int 31 2 7 107 38 109 50 17 86 38 ...
## $ HAP79_CTL : int 29 2 0 159 40 81 29 20 74 22 ...
## $ HAP83_CTL : int 5 2 4 44 17 19 6 1 7 7 ...
## $ CF573_CTL : int 74 10 10 596 76 153 53 38 223 81 ...
## $ CF580_CTL : int 178 18 24 635 135 374 175 101 453 184 ...
## $ CF582_CTL : int 61 18 8 179 40 127 60 21 83 41 ...
## $ CF586_CTL : int 328 45 74 732 339 777 346 160 1626 212 ...
## $ HAP77_RV1B: int 42 8 7 117 46 125 42 27 78 35 ...
## $ HAP79_RV1B: int 135 14 19 437 151 492 66 39 157 65 ...
## $ HAP83_RV1B: int 59 9 36 424 116 159 53 12 50 71 ...
## $ CF573_RV1B: int 73 8 11 172 56 61 29 22 87 66 ...
## $ CF580_RV1B: int 80 18 14 323 67 66 38 33 104 44 ...
## $ CF582_RV1B: int 130 52 43 1391 179 276 79 89 171 110 ...
## $ CF586_RV1B: int 121 13 12 309 116 136 67 49 161 74 ...
names(counts)
## [1] "X" "HAP77_CTL" "HAP79_CTL" "HAP83_CTL" "CF573_CTL"
## [6] "CF580_CTL" "CF582_CTL" "CF586_CTL" "HAP77_RV1B" "HAP79_RV1B"
## [11] "HAP83_RV1B" "CF573_RV1B" "CF580_RV1B" "CF582_RV1B" "CF586_RV1B"
rownames(counts[1:10, ])
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
There are 14 samples: 7 controls and 7 treated with RV1B (ie. 7 samples before and after treatment).
tDat <- read.csv("CF_edgeR_DE_CFresponse2RV1B.csv")
head(tDat)
## X logFC logCPM LR PValue FDR
## 1 ENSG00000134326 7.140 7.632 166.8 3.789e-38 4.271e-34
## 2 ENSG00000134321 7.859 10.069 165.8 5.985e-38 4.271e-34
## 3 ENSG00000183486 6.222 8.084 157.2 4.524e-36 2.153e-32
## 4 ENSG00000135114 6.446 8.437 154.4 1.894e-35 6.760e-32
## 5 ENSG00000157601 5.335 9.410 152.4 5.196e-35 1.483e-31
## 6 ENSG00000119917 6.306 10.356 146.7 9.183e-34 2.184e-30
str(tDat)
## 'data.frame': 14273 obs. of 6 variables:
## $ X : Factor w/ 14273 levels "ENSG00000000419",..: 5251 5249 10843 5353 7642 3891 3892 5884 11122 5885 ...
## $ logFC : num 7.14 7.86 6.22 6.45 5.33 ...
## $ logCPM: num 7.63 10.07 8.08 8.44 9.41 ...
## $ LR : num 167 166 157 154 152 ...
## $ PValue: num 3.79e-38 5.98e-38 4.52e-36 1.89e-35 5.20e-35 ...
## $ FDR : num 4.27e-34 4.27e-34 2.15e-32 6.76e-32 1.48e-31 ...
logFC - log_2 of fold change logCPM - log_2 of counts per million reads. This value can be used to #calculate RPKM or FPKM by subtracting log_2 of gene length LR - likelihood ratio FDR - Benjamini and Hochberg's algorithm is used to control the false discovery rate
Looks like this file contains info on differentially expressed genes for the CF "group” (“CF573”, “CF580” , “CF582” , “CF586”).
Two groups: HAP and CF. The HAP group has 3 individuals (or replicates) and the CF group has 4 individuals (or replicates). Both groups have the gene expression evaluated before (_CTL) and after (_RV1B) treatment with RV1B.