R Markdown

Insert R chunk : Ctrl + Alt + i
Reformat Code: Ctrl + Shift + a

# All these libraries were used in Week 3
library(devtools)
library(Biobase)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, cbind, colnames,
##     do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, lengths, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff,
##     sort, table, tapply, union, unique, unsplit, which, which.max,
##     which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
library(limma)
## 
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA
library(edge)
library(genefilter)
library(snpStats) # Min necessary
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:edge':
## 
##     kidney
## Loading required package: Matrix
library(broom)    # Min necessary

# library(MASS)
# library(DESeq2)
# library(qvalue) # Testing this in final two lectures of week 3

Q. Fit a logistic regression model on a recessive (need 2 copies of minor allele to confer risk) and ADDITIVE scale for the 10th SNP. Make a table of the fitted values versus the case/control status. Does one model fit better than the other?


Here we use SNP data from a case-control genome-wide association study. Load data set then take a smaller subset of samples for shorter run.

data(for.exercise)
use <- seq(1, ncol(snps.10), 10)
sub.10 <- snps.10[,use] # use ony 1/10 of dataset

Calculate the Principle Components

xxmat <- xxt(sub.10, correct.for.missing = FALSE)
evv <- eigen(xxmat, symmetric = TRUE)
pcs <- evv$vectors[, 1:5]

First do an unadjusted logistic regression assuming an additive model.

snpdata = sub.10@.Data
status = subject.support$cc
snp1 = as.numeric(snpdata[, 1])
snp1[snp1 == 0] = NA
glm1 = glm(status ~ snp1, family = "binomial")
tidy(glm1)
##          term   estimate std.error  statistic   p.value
## 1 (Intercept)  0.1122026 0.2322000  0.4832155 0.6289427
## 2        snp1 -0.1010806 0.2012159 -0.5023489 0.6154221

Q. First we do an unadjusted logistic regression assuming an additive model.The coefficient is the change in log-odds for a one unit decrease (because homozygous major allele is coded 1) in the number of copies of the minor allele.

snpdata = sub.10@.Data
status = subject.support$cc
snp1 = as.numeric(snpdata[,1])
snp1[snp1 == 0] = NA
glm1 = glm(status ~ snp1, family = "binomial")
tidy(glm1)
##          term   estimate std.error  statistic   p.value
## 1 (Intercept)  0.1122026 0.2322000  0.4832155 0.6289427
## 2        snp1 -0.1010806 0.2012159 -0.5023489 0.6154221

Possible Answers:

  1. The recessive model fits much better since there are more parameters to fit and the effect size is so large.
    ***
  2. The additive model fits much better since there are fewer parameters to fit and the effect size is so large.XX - MCC1st
    ***
  3. No, in all cases, the fitted values are near 0.5 and there are about an equal number of cases and controls in each group. This is true regardless of whether you fit a recessive or additive model.
    ***
  4. The additive model shows a strong effect, but the recessive model shows no difference so the additive model is better.
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] broom_0.4.2         snpStats_1.24.0     Matrix_1.2-8       
##  [4] survival_2.41-2     genefilter_1.56.0   edge_2.6.0         
##  [7] limma_3.30.13       Biobase_2.34.0      BiocGenerics_0.20.0
## [10] devtools_1.13.2    
## 
## loaded via a namespace (and not attached):
##  [1] qvalue_2.6.0         reshape2_1.4.2       splines_3.3.1       
##  [4] lattice_0.20-35      colorspace_1.3-2     htmltools_0.3.6     
##  [7] stats4_3.3.1         yaml_2.1.14          mgcv_1.8-17         
## [10] XML_3.98-1.7         rlang_0.1.1          nloptr_1.0.4        
## [13] foreign_0.8-68       withr_1.0.2          DBI_0.6-1           
## [16] snm_1.22.0           plyr_1.8.4           sva_3.22.0          
## [19] stringr_1.2.0        zlibbioc_1.20.0      munsell_0.4.3       
## [22] gtable_0.2.0         lfa_1.4.0            psych_1.7.5         
## [25] memoise_1.1.0        evaluate_0.10        knitr_1.16          
## [28] IRanges_2.8.1        AnnotationDbi_1.36.2 jackstraw_1.1       
## [31] Rcpp_0.12.11         xtable_1.8-2         corpcor_1.6.8       
## [34] scales_0.4.1         backports_1.1.0      S4Vectors_0.12.1    
## [37] annotate_1.52.1      lme4_1.1-12          mnormt_1.5-5        
## [40] ggplot2_2.2.1        digest_0.6.12        stringi_1.1.5       
## [43] dplyr_0.5.0          grid_3.3.1           rprojroot_1.2       
## [46] tools_3.3.1          bitops_1.0-6         magrittr_1.5        
## [49] lazyeval_0.2.0       RCurl_1.95-4.8       tibble_1.3.3        
## [52] RSQLite_1.1-2        tidyr_0.6.3          MASS_7.3-45         
## [55] assertthat_0.2.0     minqa_1.2.4          rmarkdown_1.6       
## [58] R6_2.2.1             nlme_3.1-131

EOF