library(qtl)
library(devtools)
install_github("jtlovell/qtlTools")## Downloading GitHub repo jtlovell/qtlTools@master
## from URL https://api.github.com/repos/jtlovell/qtlTools/zipball/master
## Installing qtlTools
## '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
## --no-environ --no-save --no-restore --quiet CMD INSTALL \
## '/private/var/folders/b6/w7rk82fx273ct14rskd_csk40000gp/T/RtmpGtjxfY/devtools2616754f3e49/jtlovell-qtlTools-588f597' \
## --library='/Library/Frameworks/R.framework/Versions/3.3/Resources/library' \
## --install-tests
##
## Reloading installed qtlTools
library(qtlTools)# Set the seed so that all our maps are identical
set.seed(42)
# Simulate a 2-chromosome map with 25 markers on each 100cM linkage group
map<-sim.map(len = c(100,100), n.mar = c(25,25), include.x=F)
# Simulate 1 QTL on each chromosome
# Here, we set the bigger one on Chr 1 to be purely additive
# the second, smaller one is dominant
cross<-sim.cross(map, n.ind=200, type="f2", map.function="kosambi",
model = rbind(c(1,10,10,1),c(2,90,5,5)))cross is an R object containing the genetic map, genotype and phenotype matricesThe genetic map is an array of genetic markers, where markers are ordered by their similarity. This should reflect a) the recombination rate and b) the physical position of the markers in the genome. A centimorgan (cM) is a measure of the distance between genetic markers, referring to the probability of a recombination event happening for 1/100 individuals between a pair of markers
plot.map(cross)Yellow indicates low recombination fractions (fewer cross-overs), which means that those markers should be closer to eachother on the genetic map. See the tutorial on improving genetic maps for more details.
plot.rf(cross)## Warning in plot.rf(cross): Running est.rf.
QTL mapping in experimental populations was traditionally limitted by the number of markers that could be feasibly genotyped. With NGS data, this is less of a problem; however, we still need to calculate conditional genotype probabilities to satisfy various statistical assumptions, and to fix genotyping errors. R/qtl has two functions, sim.geno and calc.genoprob that do this. Here, we only focus on calc.genoprob because it is better for complete genotype matrices with low error rate, and it is much faster.
cross<-calc.genoprob(cross,
step = 2, # This means that we want
# pseudomarkers every 2cM
error.prob = 0.001)In our simulated cross object, we placed two QTL, one on the top of Chr1 and another on the tail of Chr2. This is what the genotype means look like across the genome:
meanScan(cross, pheno.col = "phenotype")Another way to think about the effects of QTL is to order the individuals in a cross by their phenotypic value - where structure exists in this ordered matrix, a QTL is likely present.
The randomly ordered Chr1 genotype matrix
geno.image(cross, chr = 1, reorder = FALSE)The Chr1 genotype matrix, ordered by the phenotype. Note the structure on the top of Chr1.
geno.image(cross, chr = 1, reorder = TRUE)Now that we have the general idea of QTL mapping, lets simulate a more realistic cross with QTL that have smaller effect sizes.
set.seed(42)
map<-sim.map(len = c(100,100), n.mar = c(25,25), include.x=F)
cross<-sim.cross(map, n.ind=200, type="f2", map.function="kosambi",
model = rbind(c(1,10,1,1),c(1,90,.5,0),c(2,90,.5,1)))
cross<-calc.genoprob(cross, step=2, error.prob = 0.001)Scanone makes a statistical test at each point in the (pseudo)marker grid. The LOD statistic (log of the odds of the likelihood ratio) tests the hypothesis that there is no QTL versus the presence of a single qtl.
s1<-scanone(cross, method = "hk")
summary(s1)## chr pos lod
## D1M2 1 11.7 13.69
## D2M21 2 90.7 6.76
plot(s1, main = "one-way QTL scan")By permuting (randomizing) the phenotype matrix, while keeping the phenotype matrix constant, we can define the null distribution of the LOD score. The number of times a permuted LOD excedes that of a QTL peak (out of 100) is the P-value.
perms<-scanone(cross, n.perm=100, verbose=F, method = "hk")
plot(perms)
abline(v = quantile(perms,.95), col = "red", lty=2)summary(s1, perms = perms, pvalues=T)## chr pos lod pval
## D1M2 1 11.7 13.69 0
## D2M21 2 90.7 6.76 0
plot(s1, main = "one-way QTL scan with significance threshold")
add.threshold(out=s1, perms=perms, alpha = 0.05, col = "red", lty=2)Note that we simulated this cross with 3 QTL, 2 on Chr1, 1 on Chr2. Yet summary(s1, perms = perms, pvalues=T) returns only the top peaks per chromosome. In fact, scanone can only define the best QTL on a chromosome.
To get a better picture of what is actually going on, esspecially where multiple QTL exist on a chromosome, we have to build multiple QTL models. This can be done manually… See: addqtl, addtoqtl, refineqtl
To begin, we start with the best QTL peak
maxS1 <- max(s1)
s1chr <- as.numeric(as.character(maxS1$chr))
s1pos <- as.numeric(as.character(maxS1$pos))We use makeqtl to generate a qtl model containing the genotype probabilities at the chosen QTL position
qtl1 <- makeqtl(cross, chr=s1chr, pos=s1pos, what="prob")
plot(qtl1)Given the presence of this first QTL, we look for a second and add it to the model using addqtl and addtoqtl.
form1 <- "y ~ Q1"
s2 <- addqtl(cross, qtl=qtl1, formula=form1, method="hk")
plot(s2)s2chr <- max(s2)$chr
s2pos <- max(s2)$pos
qtl2 <- addtoqtl(cross, qtl=qtl1, chr=s2chr, pos=s2pos)Now, with two qtl, it is important to make sure out positions are correct, conditional on the previously defined QTL.
form2 <- "y ~ Q1 + Q2"
qtl2 <- refineqtl(cross, formula=form2, qtl=qtl2, method="hk", verbose=F)
plotLodProfile(qtl2)Lets finally repeat this process looking for that minor QTL on chr1
plot(s3 <- addqtl(cross, qtl=qtl2, formula=form2, method="hk"))plot(qtl3 <- addtoqtl(cross, qtl=qtl2, chr=max(s3)$chr, pos=max(s3)$pos))form3 <- "y ~ Q1 + Q2 + Q3"
plotLodProfile(qtl.model <- refineqtl(cross, formula=form3, qtl=qtl3, method="hk", verbose=F))Here, we can do all of the steps described above, in one line. Follow the output to see how it works.
step.model<-stepwiseqtl(cross, max.qtl = 3, method = "hk", additive.only = T)## -Initial scan
## initial lod: 13.6942
## ** new best ** (pLOD increased by 10.1742)
## no.qtl = 1 pLOD = 10.1742 formula: y ~ Q1
## -Step 1
## ---Scanning for additive qtl
## plod = 18.37641
## ---Refining positions
## no.qtl = 2 pLOD = 18.37641 formula: y ~ Q1 + Q2
## ** new best ** (pLOD increased by 8.2022)
## -Step 2
## ---Scanning for additive qtl
## plod = 20.21612
## ---Refining positions
## no.qtl = 3 pLOD = 20.21612 formula: y ~ Q1 + Q2 + Q3
## ** new best ** (pLOD increased by 1.8397)
## -Starting backward deletion
## ---Dropping Q3
## no.qtl = 2 pLOD = 18.37641 formula: y ~ Q1 + Q2
## ---Refining positions
## ---Dropping Q2
## no.qtl = 1 pLOD = 10.1742 formula: y ~ Q1
## ---Refining positions
## ---One last pass through refineqtl
plotLodProfile(step.model)plot(step.model)The most important function is fitqtl which fits an analysis of variance to the conditional genotype probabilities in the qtl model.
print(fit<-summary(fitqtl(cross, qtl = step.model, formula = formula(step.model),
dropone = T, get.ests = T, method = "hk")))##
## fitqtl summary
##
## Method: Haley-Knott regression
## Model: normal phenotype
## Number of observations : 200
##
## Full model result
## ----------------------------------
## Model formula: y ~ Q1 + Q2 + Q3
##
## df SS MS LOD %var Pvalue(Chi2) Pvalue(F)
## Model 6 223.8609 37.310147 30.77612 50.76899 0 0
## Error 193 217.0793 1.124763
## Total 199 440.9402
##
##
## Drop one QTL at a time ANOVA table:
## ----------------------------------
## df Type III SS LOD %var F value Pvalue(Chi2) Pvalue(F)
## 1@11.7 2 118.46 18.91 26.865 52.66 0 < 2e-16 ***
## 1@91.5 2 28.51 5.36 6.467 12.68 0 6.73e-06 ***
## 2@90.6 2 79.02 13.48 17.921 35.13 0 9.77e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Estimated effects:
## -----------------
## est SE t
## Intercept 0.9337 0.0754 12.385
## 1@11.7a 0.8365 0.1070 7.820
## 1@11.7d 1.0747 0.1551 6.930
## 1@91.5a 0.5553 0.1124 4.941
## 1@91.5d 0.1683 0.1538 1.094
## 2@90.6a 0.6754 0.1078 6.265
## 2@90.6d 0.8322 0.1534 5.427
It is also useful to understand the size of the interval that contains the QTL. There are two methods to get confidence intervals: lodint assesses the interval by the relative drop in LOD score from the QTL peak; bayesint uses the area under the peak to determine confidence intervals
lodint(step.model, qtl.index = 1, drop = 1.5) # LOD drop of 1.5## chr pos lod
## c1.loc8 1 8.00000 15.24452
## D1M2 1 11.74874 18.91250
## D1M3 1 13.46666 15.33715
lodint(step.model, qtl.index = 1, drop = 4) # LOD drop of 4## chr pos lod
## c1.loc6 1 6.00000 12.21613
## D1M2 1 11.74874 18.91250
## c1.loc16 1 16.00000 14.77962
bayesint(step.model, qtl.index = 1, prob = .80)## chr pos lod
## D1M2 1 11.74874 18.91250
## D1M2 1 11.74874 18.91250
## c1.loc12 1 12.00000 18.67627
bayesint(step.model, qtl.index = 1, prob = .99)## chr pos lod
## c1.loc10 1 10.00000 17.65661
## D1M2 1 11.74874 18.91250
## c1.loc12 1 12.00000 18.67627
Since we have multiple QTL, we would have to do this calculation for each QTL peak. However, we can use qtlTools::calcCis, which loops through the QTL and puts the output into a dataframe.
print(cis<-calcCis(cross, mod = step.model, lodint = TRUE, drop = 1.5))## pheno qtlname chr pos maxLod lowmarker highmarker lowposition
## 1 NA 1@11.7 1 11.74874 18.912503 c1.loc8 D1M3 8
## 2 NA 1@91.5 1 91.48060 5.359709 c1.loc86 D1M24 86
## 3 NA 2@90.6 2 90.57381 13.482043 c2.loc88 c2.loc94 88
## highposition
## 1 13.46666
## 2 98.88917
## 3 94.00000
For publication, reviewers often want to see the position of QTL intervals on a genetic map. qtlTools::segmentsOnMap provides one method to do this.
segmentsOnMap(cross=cross,
phe=cis$pheno,
chr=cis$chr,
l = cis$lowposition,
h = cis$highposition,
lwd = 5, legendPosition = "right", leg.inset=.1,
palette = rainbow)## phe chr lod cm l h
## 1 1@8 1 NA NA 8 13.46666
## 2 1@86 1 NA NA 86 98.88917
## 3 2@88 2 NA NA 88 94.00000
Or more simply:
segmentsOnMap(cross=cross,
calcCisResults=cis,
legendPosition = "right", leg.inset=.1,
palette = rainbow)## phe chr lod cm l h
## 1 1@12 1 18.912503 11.74874 8 13.46666
## 2 1@91 1 5.359709 91.48060 86 98.88917
## 3 2@91 2 13.482043 90.57381 88 94.00000
While R/qtl provides many functions to conduct qtl mapping, it is often difficult to extract and manipulate statical outputs. qtlTools::qtlStats provides a simple method to get statistics from a qtl model.
qtlStats(cross, pheno.col = "phenotype",
form = formula(step.model),
mod = step.model)## terms altnames qtlnames phenotype formula df Type.III.SS
## 1 1@11.7 Q1 1@11.7 phenotype y ~ Q1 + Q2 + Q3 2 118.46006
## 2 1@91.5 Q2 1@91.5 phenotype y ~ Q1 + Q2 + Q3 2 28.51343
## 3 2@90.6 Q3 2@90.6 phenotype y ~ Q1 + Q2 + Q3 2 79.02101
## 4 <NA> <NA> <NA> <NA> <NA> NA NA
## 5 <NA> <NA> <NA> <NA> <NA> NA NA
## 6 <NA> <NA> <NA> <NA> <NA> NA NA
## LOD X.var F.value Pvalue.Chi2. Pvalue.F. modelLod
## 1 18.912503 26.865334 52.65999 0.000000e+00 0.000000e+00 30.77612
## 2 5.359709 6.466508 12.67530 4.368084e-06 6.727920e-06 30.77612
## 3 13.482043 17.921025 35.12784 3.297362e-14 9.769963e-14 30.77612
## 4 NA NA NA NA NA NA
## 5 NA NA NA NA NA NA
## 6 NA NA NA NA NA NA
## modelPercVar qtlname chr pos maxLod lowmarker highmarker
## 1 50.76899 <NA> <NA> NA NA <NA> <NA>
## 2 50.76899 <NA> <NA> NA NA <NA> <NA>
## 3 50.76899 <NA> <NA> NA NA <NA> <NA>
## 4 NA 1@11.7 1 11.74874 18.912503 c1.loc8 D1M3
## 5 NA 1@91.5 1 91.48060 5.359709 c1.loc86 D1M24
## 6 NA 2@90.6 2 90.57381 13.482043 c2.loc88 c2.loc94
## lowposition highposition
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 8 13.46666
## 5 86 98.88917
## 6 88 94.00000
`lsmeans4qtl produces a dataframe with SAS-style LSMeans, and also includes standard means, and se.
library(lsmeans)## Loading required package: estimability
##
## Attaching package: 'lsmeans'
## The following object is masked from 'package:devtools':
##
## test
alllsms<-lsmeans4qtl(cross,
pheno.col = "phenotype",
form = "y ~ Q1 + Q2 + Q3",
mod = step.model,
covar=NULL)
print(alllsms)## Q1 Q2 Q3 lsmean SE df lower.CL upper.CL
## 1 AA <NA> <NA> -0.60919625 0.1601700 193 -0.9251046 -0.29328792
## 2 AB <NA> <NA> 1.30326041 0.1154055 193 1.0756424 1.53087842
## 3 BB <NA> <NA> 1.06590193 0.1545786 193 0.7610217 1.37078214
## 4 <NA> AA <NA> -0.02770275 0.1636935 193 -0.3505607 0.29515519
## 5 <NA> AB <NA> 0.70019254 0.1124443 193 0.4784150 0.92197004
## 6 <NA> BB <NA> 1.08747630 0.1600063 193 0.7718908 1.40306181
## 7 <NA> <NA> AA -0.36570969 0.1594569 193 -0.6802116 -0.05120775
## 8 <NA> <NA> AB 1.14012265 0.1105252 193 0.9221303 1.35811497
## 9 <NA> <NA> BB 0.98555312 0.1590346 193 0.6718842 1.29922206
## mean sem
## 1 -0.39956595 0.1765342
## 2 1.43599907 0.1231533
## 3 1.31643680 0.1958242
## 4 0.21348170 0.2566733
## 5 1.06129123 0.1240206
## 6 1.41357005 0.2162085
## 7 -0.01403133 0.1979478
## 8 1.23201204 0.1338871
## 9 1.37607761 0.2067131
Often we may be interested in some interaction among QTL (epistasis) or with a covariate. Lets add in one arbitrarily, then cull to the LSmeans of Q1 and Q3, averaging across values at Q2.
alllsms<-lsmeans4qtl(cross,
pheno.col = "phenotype",
form = "y ~ Q1 + Q2 + Q3 + Q1*Q3",
mod = step.model,
covar=NULL)## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions
lsms<-alllsms[!is.na(alllsms$Q1) &
!is.na(alllsms$Q3),]
lsms<-lsms[,c("Q1","Q3","lsmean","SE","mean","sem")]
library(ggplot2)
pos<-position_dodge(.1)
ggplot(lsms, aes(x = Q1, y = lsmean, shape = Q3,
color = Q3, group = Q3))+
geom_point(position = pos)+
geom_line(position = pos)+
theme_jtl()+
geom_errorbar(aes(ymin = lsmean - SE, ymax = lsmean+SE), width = .1,position = pos)+
ggtitle("sas-style LSMeans")Here, we went over QTL mapping basics, including: - the effects of QTL - one-way QTL scans - building multiple QTL models - Fitting statistics to multiple QTL models
We have ignored many complexities in QTL mapping, including: - Epistasis - QTL*covariate interactions - Multiple phenotypes - Generating genetic maps
For more information on complex parts of QTL mapping, see R/qtl’s website.