By Erica Acton
Load photoRec data.
prDat <- read.table("GSE4051_MINI.txt", header = TRUE, row.names = 1)
str(prDat)
## 'data.frame': 39 obs. of 6 variables:
## $ sample : int 20 21 22 23 16 17 6 24 25 26 ...
## $ devStage : Factor w/ 5 levels "4_weeks","E16",..: 2 2 2 2 2 2 2 4 4 4 ...
## $ gType : Factor w/ 2 levels "NrlKO","wt": 2 2 2 2 1 1 1 2 2 2 ...
## $ crabHammer: num 10.22 10.02 9.64 9.65 8.58 ...
## $ eggBomb : num 7.46 6.89 6.72 6.53 6.47 ...
## $ poisonFang: num 7.37 7.18 7.35 7.04 7.49 ...
How many rows?
nrow(prDat)
## [1] 39
How many columns?
ncol(prDat)
## [1] 6
Inspect the first few observations or the last few or for a random sample.
head(prDat)
## sample devStage gType crabHammer eggBomb poisonFang
## Sample_20 20 E16 wt 10.220 7.462 7.370
## Sample_21 21 E16 wt 10.020 6.890 7.177
## Sample_22 22 E16 wt 9.642 6.720 7.350
## Sample_23 23 E16 wt 9.652 6.529 7.040
## Sample_16 16 E16 NrlKO 8.583 6.470 7.494
## Sample_17 17 E16 NrlKO 10.140 7.065 7.005
tail(prDat)
## sample devStage gType crabHammer eggBomb poisonFang
## Sample_38 38 4_weeks wt 9.767 6.608 7.329
## Sample_39 39 4_weeks wt 10.200 7.003 7.320
## Sample_11 11 4_weeks NrlKO 9.677 7.204 6.981
## Sample_12 12 4_weeks NrlKO 9.129 7.165 7.350
## Sample_2 2 4_weeks NrlKO 9.744 7.107 7.075
## Sample_9 9 4_weeks NrlKO 9.822 6.558 7.043
prDat[9, ]
## sample devStage gType crabHammer eggBomb poisonFang
## Sample_25 25 P2 wt 9.038 6.17 7.449
What does row correspond to - different genes or different mice?
rownames(prDat)
## [1] "Sample_20" "Sample_21" "Sample_22" "Sample_23" "Sample_16"
## [6] "Sample_17" "Sample_6" "Sample_24" "Sample_25" "Sample_26"
## [11] "Sample_27" "Sample_14" "Sample_3" "Sample_5" "Sample_8"
## [16] "Sample_28" "Sample_29" "Sample_30" "Sample_31" "Sample_1"
## [21] "Sample_10" "Sample_4" "Sample_7" "Sample_32" "Sample_33"
## [26] "Sample_34" "Sample_35" "Sample_13" "Sample_15" "Sample_18"
## [31] "Sample_19" "Sample_36" "Sample_37" "Sample_38" "Sample_39"
## [36] "Sample_11" "Sample_12" "Sample_2" "Sample_9"
# each row corresponds to different mice
What are the variable names?
names(prDat)
## [1] "sample" "devStage" "gType" "crabHammer" "eggBomb"
## [6] "poisonFang"
What flavor is each variable?
str(prDat)
## 'data.frame': 39 obs. of 6 variables:
## $ sample : int 20 21 22 23 16 17 6 24 25 26 ...
## $ devStage : Factor w/ 5 levels "4_weeks","E16",..: 2 2 2 2 2 2 2 4 4 4 ...
## $ gType : Factor w/ 2 levels "NrlKO","wt": 2 2 2 2 1 1 1 2 2 2 ...
## $ crabHammer: num 10.22 10.02 9.64 9.65 8.58 ...
## $ eggBomb : num 7.46 6.89 6.72 6.53 6.47 ...
## $ poisonFang: num 7.37 7.18 7.35 7.04 7.49 ...
# sample = integer, devStage & gType = factors, genes = numbers)
Do a sanity check that each integer between 1 and the number of rows in the dataset occurs exactly once.
seq(prDat)
## [1] 1 2 3 4 5 6
sort(prDat$sample)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
seq_len(nrow(prDat))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
all(sort(prDat$sample) == seq_len(nrow(prDat)))
## [1] TRUE
identical(sort(prDat$sample), seq_len(nrow(prDat)))
## [1] TRUE
For each factor variable, what are the levels?
# for devStage
levels(prDat$devStage)
## [1] "4_weeks" "E16" "P10" "P2" "P6"
# for gType
levels(prDat$gType)
## [1] "NrlKO" "wt"
How many observations do we have for each level of devStage? gType?
# for devStage
summary(prDat$devStage)
## 4_weeks E16 P10 P2 P6
## 8 7 8 8 8
# for gType
summary(prDat$gType)
## NrlKO wt
## 19 20
Perform cross-tabulation of devStage and gType.
table(prDat$devStage, prDat$gType)
##
## NrlKO wt
## 4_weeks 4 4
## E16 3 4
## P10 4 4
## P2 4 4
## P6 4 4
addmargins(with(prDat, table(devStage, gType)))
## gType
## devStage NrlKO wt Sum
## 4_weeks 4 4 8
## E16 3 4 7
## P10 4 4 8
## P2 4 4 8
## P6 4 4 8
## Sum 19 20 39
If you had to guess, what do you think the intended experimental design was? What happened in real life?
I would assumed the experimental design was to have 4 mice for each genotype and developmental stage, but that one of the NrlKO mice at embryonic stage died.
For each quantitative variable, what are the extremes? How about the average or median?
# samples
range(prDat$sample)
## [1] 1 39
# poisonfang
summary(prDat$poisonFang)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.74 7.19 7.35 7.38 7.48 8.58
# eggbomb
summary(prDat$eggBomb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.14 6.28 6.76 6.79 7.09 8.17
# crabhammer
summary(prDat$crabHammer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.21 8.94 9.61 9.43 9.83 10.30
Print the observations with row names “Sample_16” and “Sample_38” to screen, showing only the 3 gene expression variables.
prDat[c("Sample_16", "Sample_38"), c("crabHammer", "eggBomb", "poisonFang")]
## crabHammer eggBomb poisonFang
## Sample_16 8.583 6.470 7.494
## Sample_38 9.767 6.608 7.329
Which samples have eggBomb less than the 0.10 quartile?
rownames(prDat[prDat$eggBomb < quantile(prDat$eggBomb, 0.1), ])
## [1] "Sample_25" "Sample_14" "Sample_3" "Sample_35"