Dependencies

This document depends on the following packages:

library(devtools)
library(Biobase)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, cbind, colnames,
##     do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, lengths, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff,
##     sort, table, tapply, union, unique, unsplit, which, which.max,
##     which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
library(broom)

Download the data

module 2, quiz question #3

con =url("http://bowtie-bio.sourceforge.net/recount/ExpressionSets/bodymap_eset.RData")
load(file=con)
close(con)
bm = bodymap.eset
edata = exprs(bm)
pdata_bm=pData(bm)
ls()
## [1] "bm"           "bodymap.eset" "con"          "edata"       
## [5] "pdata_bm"

Question #3

Fit a linear model relating the first gene’s counts to the number of technical replicates, treating the number of replicates as a factor. Plot the data for this gene versus the covariate. Can you think of why this model might not fit well?

a.The data are right skewed.

b.The difference between 2 and 5 technical replicates is not the same as the difference between 5 and 6 technical replicates.

c.The variable num.tech.reps is a continuous variable.

d.There are very few samples with more than 2 replicates so the estimates for those values will not be very good.

Fit a simple linear regression

edata = as.matrix(edata)
lm1 = lm(edata[1,] ~ pdata_bm$num.tech.reps)
tidy(lm1)
##                     term  estimate std.error statistic      p.value
## 1            (Intercept) -1833.725  427.7917 -4.286491 4.992869e-04
## 2 pdata_bm$num.tech.reps  1324.579  152.2522  8.699900 1.144012e-07

Is fitting the line a good idea?

Visual diagnostics are often some of the most helpful

par(pch = 19)
plot(pdata_bm$num.tech.reps, edata[1,], col = 3)
abline(lm1$coeff[1], lm1$coeff[2], col = 2, lwd = 2)

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] broom_0.4.2         Biobase_2.34.0      BiocGenerics_0.20.0
## [4] devtools_1.13.2    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11     knitr_1.16       magrittr_1.5     mnormt_1.5-5    
##  [5] lattice_0.20-35  R6_2.2.1         rlang_0.1.1      stringr_1.2.0   
##  [9] plyr_1.8.4       dplyr_0.5.0      tools_3.3.1      grid_3.3.1      
## [13] nlme_3.1-131     psych_1.7.5      DBI_0.6-1        withr_1.0.2     
## [17] htmltools_0.3.6  yaml_2.1.14      rprojroot_1.2    digest_0.6.12   
## [21] assertthat_0.2.0 tibble_1.3.3     tidyr_0.6.3      reshape2_1.4.2  
## [25] memoise_1.1.0    evaluate_0.10    rmarkdown_1.5    stringi_1.1.5   
## [29] backports_1.1.0  foreign_0.8-68

EOF