Omar AlOmeir

Seminar 01

prDat <- read.table("GSE4051_MINI.txt", header = TRUE, row.names = 1)

How many rows are there? Hint: nrow(), dim().

nrow(prDat)

## [1] 39

How many columns or variables are there? Hint: ncol(), length(), dim().

ncol(prDat)

## [1] 6


dim(prDat)  #returns both number of rows and number of columns

## [1] 39  6

Inspect the first few observations or the last few or a random sample. Hint: head(), tail(), x[i, j] combined with sample().

head(prDat)

##           sample devStage gType crabHammer eggBomb poisonFang
## Sample_20     20      E16    wt     10.220   7.462      7.370
## Sample_21     21      E16    wt     10.020   6.890      7.177
## Sample_22     22      E16    wt      9.642   6.720      7.350
## Sample_23     23      E16    wt      9.652   6.529      7.040
## Sample_16     16      E16 NrlKO      8.583   6.470      7.494
## Sample_17     17      E16 NrlKO     10.140   7.065      7.005

tail(prDat)

##           sample devStage gType crabHammer eggBomb poisonFang
## Sample_38     38  4_weeks    wt      9.767   6.608      7.329
## Sample_39     39  4_weeks    wt     10.200   7.003      7.320
## Sample_11     11  4_weeks NrlKO      9.677   7.204      6.981
## Sample_12     12  4_weeks NrlKO      9.129   7.165      7.350
## Sample_2       2  4_weeks NrlKO      9.744   7.107      7.075
## Sample_9       9  4_weeks NrlKO      9.822   6.558      7.043

prDat[sample(nrow(prDat), size = 6), ]  #sample takes a random sample from the number of rows

##           sample devStage gType crabHammer eggBomb poisonFang
## Sample_17     17      E16 NrlKO     10.140   7.065      7.005
## Sample_12     12  4_weeks NrlKO      9.129   7.165      7.350
## Sample_36     36  4_weeks    wt      9.960   7.866      6.993
## Sample_28     28       P6    wt      8.214   6.530      7.428
## Sample_37     37  4_weeks    wt      9.667   6.992      7.324
## Sample_7       7       P6 NrlKO      8.803   6.188      7.754

What does row correspond to – different genes or different mice? Different genes

What are the variable names? Hint: names(), dimnames().

names(prDat)  #sample, devStage, gType, crabHammer, eggBomb, poisonFang

## [1] "sample"     "devStage"   "gType"      "crabHammer" "eggBomb"   
## [6] "poisonFang"

What “flavor” is each variable, i.e. numeric, character, factor? Hint: str().

str(prDat)  # int, Factor, Factor, num, num, num

## 'data.frame':    39 obs. of  6 variables:
##  $ sample    : int  20 21 22 23 16 17 6 24 25 26 ...
##  $ devStage  : Factor w/ 5 levels "4_weeks","E16",..: 2 2 2 2 2 2 2 4 4 4 ...
##  $ gType     : Factor w/ 2 levels "NrlKO","wt": 2 2 2 2 1 1 1 2 2 2 ...
##  $ crabHammer: num  10.22 10.02 9.64 9.65 8.58 ...
##  $ eggBomb   : num  7.46 6.89 6.72 6.53 6.47 ...
##  $ poisonFang: num  7.37 7.18 7.35 7.04 7.49 ...

For sample, do a sanity check that each integer between 1 and the number of rows in the dataset occurs exactly once. Hint: a:b, seq(), seq_len(), sort(), table(), ==, all(), all.equal(), identical().

identical(seq(1, nrow(prDat)), sort(prDat[["sample"]]))

## [1] TRUE

For each factor variable, what are the levels? Hint: levels(), str().

str(prDat)  # devStage: 5. gType: 2

## 'data.frame':    39 obs. of  6 variables:
##  $ sample    : int  20 21 22 23 16 17 6 24 25 26 ...
##  $ devStage  : Factor w/ 5 levels "4_weeks","E16",..: 2 2 2 2 2 2 2 4 4 4 ...
##  $ gType     : Factor w/ 2 levels "NrlKO","wt": 2 2 2 2 1 1 1 2 2 2 ...
##  $ crabHammer: num  10.22 10.02 9.64 9.65 8.58 ...
##  $ eggBomb   : num  7.46 6.89 6.72 6.53 6.47 ...
##  $ poisonFang: num  7.37 7.18 7.35 7.04 7.49 ...

How many observations do we have for each level of devStage? For gType? Hint: summary(), table().

summary(prDat)  # devStage: 8, 7, 8, 8, 8. gType: 19, 20

##      sample        devStage   gType      crabHammer       eggBomb    
##  Min.   : 1.0   4_weeks:8   NrlKO:19   Min.   : 8.21   Min.   :6.14  
##  1st Qu.:10.5   E16    :7   wt   :20   1st Qu.: 8.94   1st Qu.:6.28  
##  Median :20.0   P10    :8              Median : 9.61   Median :6.76  
##  Mean   :20.0   P2     :8              Mean   : 9.43   Mean   :6.79  
##  3rd Qu.:29.5   P6     :8              3rd Qu.: 9.83   3rd Qu.:7.09  
##  Max.   :39.0                          Max.   :10.34   Max.   :8.17  
##    poisonFang  
##  Min.   :6.74  
##  1st Qu.:7.19  
##  Median :7.35  
##  Mean   :7.38  
##  3rd Qu.:7.48  
##  Max.   :8.58

Perform a cross-tabulation of devStage and gType. Hint: table().

table(c(prDat["devStage"], prDat["gType"]))

##          gType
## devStage  NrlKO wt
##   4_weeks     4  4
##   E16         3  4
##   P10         4  4
##   P2          4  4
##   P6          4  4

If you had to take a wild guess, what do you think the intended experimental design was? What actually happened in real life?

Analyzing data of photo receptor cells in mice. Various developmental stages, two genotypes for 3 gene expressions.

For each quantitative variable, what are the extremes? How about average or median? Hint: min(), max(), range(), summary(), fivenum(), mean(), median(), quantile().

summary(prDat$sample)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    10.5    20.0    20.0    29.5    39.0

summary(prDat$crabHammer)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.21    8.94    9.61    9.43    9.83   10.30

summary(prDat$eggBomb)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.14    6.28    6.76    6.79    7.09    8.17

summary(prDat$poisonFang)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.74    7.19    7.35    7.38    7.48    8.58

Create a new data.frame called weeDat only containing observations for which expression of poisonFang is above 7.5.

weeDat <- subset(prDat, subset = poisonFang > 7.5)

For how many observations poisonFang > 7.5? How do they break down by genotype and developmental stage?

nrow(weeDat)  # 9. In terms of gType, 5 are wt and 4 are NrlKO. In terms of devStage, 4 are P2, 1 is P6, 4 are P10

## [1] 9

Print the observations with row names “Sample_16” and “Sample_38” to screen, showing only the 3 gene expression variables.

(subset(prDat, subset = sample == 38, select = c("crabHammer", "eggBomb", "poisonFang")))

##           crabHammer eggBomb poisonFang
## Sample_38      9.767   6.608      7.329

(subset(prDat, subset = sample == 16, select = c("crabHammer", "eggBomb", "poisonFang")))

##           crabHammer eggBomb poisonFang
## Sample_16      8.583    6.47      7.494

Which samples have expression of eggBomb less than the 0.10 quantile?

subset(prDat, subset = eggBomb < quantile(prDat$eggBomb, probs = seq(0, 1, 0.1))[2], 
    select = "sample")  # samples: 25, 14, 3, 35

##           sample
## Sample_25     25
## Sample_14     14
## Sample_3       3
## Sample_35     35