Introduction to QTL mapping

`cross` is an R object containing the genetic map, genotype and phenotype matrices

The genetic map is an array of genetic markers, where markers are ordered by their similarity. This should reflect a) the recombination rate and b) the physical position of the markers in the genome. A centimorgan (cM) is a measure of the distance between genetic markers, referring to the probability of a recombination event happening for 1/100 individuals between a pair of markers

This is a genetic map, where each horizontal bar represents a marker

plot.map(cross)

Here, we have plotted the relationship among markers in the map:

Yellow indicates low recombination fractions (fewer cross-overs), which means that those markers should be closer to eachother on the genetic map. See the tutorial on improving genetic maps for more details.

plot.rf(cross)

## Warning in plot.rf(cross): Running est.rf.

Calculating conditional genotype probabilities

QTL mapping in experimental populations was traditionally limitted by the number of markers that could be feasibly genotyped. With NGS data, this is less of a problem; however, we still need to calculate conditional genotype probabilities to satisfy various statistical assumptions, and to fix genotyping errors. R/qtl has two functions, sim.geno and calc.genoprob that do this. Here, we only focus on calc.genoprob because it is better for complete genotype matrices with low error rate, and it is much faster.

cross<-calc.genoprob(cross, 
                     step = 2, # This means that we want 
                               # pseudomarkers every 2cM
                     error.prob = 0.001)

QTL mapping is simply the correlation of the genotype and phenotype

In our simulated cross object, we placed two QTL, one on the top of Chr1 and another on the tail of Chr2. This is what the genotype means look like across the genome:

meanScan(cross, pheno.col = "phenotype")

Another way to think about the effects of QTL is to order the individuals in a cross by their phenotypic value - where structure exists in this ordered matrix, a QTL is likely present.

The randomly ordered Chr1 genotype matrix

geno.image(cross, chr = 1, reorder = FALSE)

The Chr1 genotype matrix, ordered by the phenotype. Note the structure on the top of Chr1.

geno.image(cross, chr = 1, reorder = TRUE)

Basic QTL Mapping

Now that we have the general idea of QTL mapping, lets simulate a more realistic cross with QTL that have smaller effect sizes.

set.seed(42) 
map<-sim.map(len = c(100,100), n.mar = c(25,25), include.x=F)
cross<-sim.cross(map, n.ind=200, type="f2", map.function="kosambi",
                 model = rbind(c(1,10,1,1),c(1,90,.5,0),c(2,90,.5,1)))
cross<-calc.genoprob(cross, step=2, error.prob = 0.001)

Scanone - the QTL package’s simplest test for QTL

Scanone makes a statistical test at each point in the (pseudo)marker grid. The LOD statistic (log of the odds of the likelihood ratio) tests the hypothesis that there is no QTL versus the presence of a single qtl.

s1<-scanone(cross, method = "hk")
summary(s1)

##       chr  pos   lod
## D1M2    1 11.7 13.69
## D2M21   2 90.7  6.76

plot(s1, main = "one-way QTL scan")

Permutations allow inference of significance

By permuting (randomizing) the phenotype matrix, while keeping the phenotype matrix constant, we can define the null distribution of the LOD score. The number of times a permuted LOD excedes that of a QTL peak (out of 100) is the P-value.

perms<-scanone(cross, n.perm=100, verbose=F, method = "hk")
plot(perms)
abline(v = quantile(perms,.95), col = "red", lty=2)

summary(s1, perms = perms, pvalues=T)

##       chr  pos   lod pval
## D1M2    1 11.7 13.69    0
## D2M21   2 90.7  6.76    0

plot(s1, main = "one-way QTL scan with significance threshold")
add.threshold(out=s1, perms=perms, alpha = 0.05, col = "red", lty=2)

Scanone has its limitations

Note that we simulated this cross with 3 QTL, 2 on Chr1, 1 on Chr2. Yet summary(s1, perms = perms, pvalues=T) returns only the top peaks per chromosome. In fact, scanone can only define the best QTL on a chromosome.

Multiple QTL modeling - basics

To get a better picture of what is actually going on, esspecially where multiple QTL exist on a chromosome, we have to build multiple QTL models. This can be done manually… See: addqtl, addtoqtl, refineqtl

To begin, we start with the best QTL peak

maxS1 <- max(s1)
s1chr <- as.numeric(as.character(maxS1$chr))
s1pos <- as.numeric(as.character(maxS1$pos))

We use makeqtl to generate a qtl model containing the genotype probabilities at the chosen QTL position

qtl1 <- makeqtl(cross, chr=s1chr, pos=s1pos, what="prob")
plot(qtl1)

Given the presence of this first QTL, we look for a second and add it to the model using addqtl and addtoqtl.

form1 <- "y ~ Q1"
s2 <- addqtl(cross, qtl=qtl1, formula=form1, method="hk")
plot(s2)

s2chr <- max(s2)$chr
s2pos <- max(s2)$pos
qtl2 <- addtoqtl(cross, qtl=qtl1, chr=s2chr, pos=s2pos)

Now, with two qtl, it is important to make sure out positions are correct, conditional on the previously defined QTL.

form2 <- "y ~ Q1 + Q2"
qtl2 <- refineqtl(cross, formula=form2, qtl=qtl2, method="hk", verbose=F)
plotLodProfile(qtl2)

Lets finally repeat this process looking for that minor QTL on chr1

plot(s3 <- addqtl(cross, qtl=qtl2, formula=form2, method="hk"))

plot(qtl3 <- addtoqtl(cross, qtl=qtl2, chr=max(s3)$chr, pos=max(s3)$pos))

form3 <- "y ~ Q1 + Q2 + Q3"
plotLodProfile(qtl.model <- refineqtl(cross, formula=form3, qtl=qtl3, method="hk", verbose=F))

Automated stepwise QTL model selection

Here, we can do all of the steps described above, in one line. Follow the output to see how it works.

step.model<-stepwiseqtl(cross, max.qtl = 3, method = "hk", additive.only = T)

##  -Initial scan
## initial lod:  13.6942 
## ** new best ** (pLOD increased by 10.1742)
##     no.qtl =  1   pLOD = 10.1742   formula: y ~ Q1 
##  -Step 1 
##  ---Scanning for additive qtl
##         plod = 18.37641 
##  ---Refining positions
##     no.qtl =  2   pLOD = 18.37641   formula: y ~ Q1 + Q2 
## ** new best ** (pLOD increased by 8.2022)
##  -Step 2 
##  ---Scanning for additive qtl
##         plod = 20.21612 
##  ---Refining positions
##     no.qtl =  3   pLOD = 20.21612   formula: y ~ Q1 + Q2 + Q3 
## ** new best ** (pLOD increased by 1.8397)
##  -Starting backward deletion
##  ---Dropping Q3 
##     no.qtl =  2   pLOD = 18.37641   formula: y ~ Q1 + Q2 
##  ---Refining positions
##  ---Dropping Q2 
##     no.qtl =  1   pLOD = 10.1742   formula: y ~ Q1 
##  ---Refining positions
##  ---One last pass through refineqtl

plotLodProfile(step.model)

plot(step.model)

Fitting a multiple QTL model.

The most important function is fitqtl which fits an analysis of variance to the conditional genotype probabilities in the qtl model.

print(fit<-summary(fitqtl(cross, qtl = step.model, formula = formula(step.model), 
                    dropone = T, get.ests = T, method = "hk")))

## 
##      fitqtl summary
## 
## Method: Haley-Knott regression 
## Model:  normal phenotype
## Number of observations : 200 
## 
## Full model result
## ----------------------------------  
## Model formula: y ~ Q1 + Q2 + Q3 
## 
##        df       SS        MS      LOD     %var Pvalue(Chi2) Pvalue(F)
## Model   6 223.8609 37.310147 30.77612 50.76899            0         0
## Error 193 217.0793  1.124763                                         
## Total 199 440.9402                                                   
## 
## 
## Drop one QTL at a time ANOVA table: 
## ----------------------------------  
##        df Type III SS   LOD   %var F value Pvalue(Chi2) Pvalue(F)    
## 1@11.7  2      118.46 18.91 26.865   52.66            0   < 2e-16 ***
## 1@91.5  2       28.51  5.36  6.467   12.68            0  6.73e-06 ***
## 2@90.6  2       79.02 13.48 17.921   35.13            0  9.77e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Estimated effects:
## -----------------
##              est     SE      t
## Intercept 0.9337 0.0754 12.385
## 1@11.7a   0.8365 0.1070  7.820
## 1@11.7d   1.0747 0.1551  6.930
## 1@91.5a   0.5553 0.1124  4.941
## 1@91.5d   0.1683 0.1538  1.094
## 2@90.6a   0.6754 0.1078  6.265
## 2@90.6d   0.8322 0.1534  5.427

Getting confidence intervals.

It is also useful to understand the size of the interval that contains the QTL. There are two methods to get confidence intervals: lodint assesses the interval by the relative drop in LOD score from the QTL peak; bayesint uses the area under the peak to determine confidence intervals

lodint(step.model, qtl.index = 1, drop = 1.5) # LOD drop of 1.5

##         chr      pos      lod
## c1.loc8   1  8.00000 15.24452
## D1M2      1 11.74874 18.91250
## D1M3      1 13.46666 15.33715

lodint(step.model, qtl.index = 1, drop = 4) # LOD drop of 4

##          chr      pos      lod
## c1.loc6    1  6.00000 12.21613
## D1M2       1 11.74874 18.91250
## c1.loc16   1 16.00000 14.77962

bayesint(step.model, qtl.index = 1, prob = .80)

##          chr      pos      lod
## D1M2       1 11.74874 18.91250
## D1M2       1 11.74874 18.91250
## c1.loc12   1 12.00000 18.67627

bayesint(step.model, qtl.index = 1, prob = .99)

##          chr      pos      lod
## c1.loc10   1 10.00000 17.65661
## D1M2       1 11.74874 18.91250
## c1.loc12   1 12.00000 18.67627

Since we have multiple QTL, we would have to do this calculation for each QTL peak. However, we can use qtlTools::calcCis, which loops through the QTL and puts the output into a dataframe.

print(cis<-calcCis(cross, mod = step.model, lodint = TRUE, drop = 1.5))

##   pheno qtlname chr      pos    maxLod lowmarker highmarker lowposition
## 1    NA  1@11.7   1 11.74874 18.912503   c1.loc8       D1M3           8
## 2    NA  1@91.5   1 91.48060  5.359709  c1.loc86      D1M24          86
## 3    NA  2@90.6   2 90.57381 13.482043  c2.loc88   c2.loc94          88
##   highposition
## 1     13.46666
## 2     98.88917
## 3     94.00000

Plotting confidence intervals

For publication, reviewers often want to see the position of QTL intervals on a genetic map. qtlTools::segmentsOnMap provides one method to do this.

segmentsOnMap(cross=cross, 
              phe=cis$pheno, 
              chr=cis$chr, 
              l = cis$lowposition, 
              h = cis$highposition, 
              lwd = 5, legendPosition = "right", leg.inset=.1,
              palette = rainbow)

##    phe chr lod cm  l        h
## 1  1@8   1  NA NA  8 13.46666
## 2 1@86   1  NA NA 86 98.88917
## 3 2@88   2  NA NA 88 94.00000

Or more simply:

segmentsOnMap(cross=cross,
              calcCisResults=cis, 
              legendPosition = "right", leg.inset=.1, 
              palette = rainbow)

##    phe chr       lod       cm  l        h
## 1 1@12   1 18.912503 11.74874  8 13.46666
## 2 1@91   1  5.359709 91.48060 86 98.88917
## 3 2@91   2 13.482043 90.57381 88 94.00000

Compiling statistics

While R/qtl provides many functions to conduct qtl mapping, it is often difficult to extract and manipulate statical outputs. qtlTools::qtlStats provides a simple method to get statistics from a qtl model.

qtlStats(cross,  pheno.col = "phenotype",
         form = formula(step.model), 
         mod = step.model)

##    terms altnames qtlnames phenotype          formula df Type.III.SS
## 1 1@11.7       Q1   1@11.7 phenotype y ~ Q1 + Q2 + Q3  2   118.46006
## 2 1@91.5       Q2   1@91.5 phenotype y ~ Q1 + Q2 + Q3  2    28.51343
## 3 2@90.6       Q3   2@90.6 phenotype y ~ Q1 + Q2 + Q3  2    79.02101
## 4   <NA>     <NA>     <NA>      <NA>             <NA> NA          NA
## 5   <NA>     <NA>     <NA>      <NA>             <NA> NA          NA
## 6   <NA>     <NA>     <NA>      <NA>             <NA> NA          NA
##         LOD     X.var  F.value Pvalue.Chi2.    Pvalue.F. modelLod
## 1 18.912503 26.865334 52.65999 0.000000e+00 0.000000e+00 30.77612
## 2  5.359709  6.466508 12.67530 4.368084e-06 6.727920e-06 30.77612
## 3 13.482043 17.921025 35.12784 3.297362e-14 9.769963e-14 30.77612
## 4        NA        NA       NA           NA           NA       NA
## 5        NA        NA       NA           NA           NA       NA
## 6        NA        NA       NA           NA           NA       NA
##   modelPercVar qtlname  chr      pos    maxLod lowmarker highmarker
## 1     50.76899    <NA> <NA>       NA        NA      <NA>       <NA>
## 2     50.76899    <NA> <NA>       NA        NA      <NA>       <NA>
## 3     50.76899    <NA> <NA>       NA        NA      <NA>       <NA>
## 4           NA  1@11.7    1 11.74874 18.912503   c1.loc8       D1M3
## 5           NA  1@91.5    1 91.48060  5.359709  c1.loc86      D1M24
## 6           NA  2@90.6    2 90.57381 13.482043  c2.loc88   c2.loc94
##   lowposition highposition
## 1          NA           NA
## 2          NA           NA
## 3          NA           NA
## 4           8     13.46666
## 5          86     98.88917
## 6          88     94.00000

Extracting and Plotting qtl effects

`lsmeans4qtl produces a dataframe with SAS-style LSMeans, and also includes standard means, and se.

library(lsmeans)

## Loading required package: estimability

## 
## Attaching package: 'lsmeans'

## The following object is masked from 'package:devtools':
## 
##     test

alllsms<-lsmeans4qtl(cross, 
                     pheno.col = "phenotype",
                     form = "y ~ Q1 + Q2 + Q3", 
                     mod = step.model, 
                     covar=NULL)
print(alllsms)

##     Q1   Q2   Q3      lsmean        SE  df   lower.CL    upper.CL
## 1   AA <NA> <NA> -0.60919625 0.1601700 193 -0.9251046 -0.29328792
## 2   AB <NA> <NA>  1.30326041 0.1154055 193  1.0756424  1.53087842
## 3   BB <NA> <NA>  1.06590193 0.1545786 193  0.7610217  1.37078214
## 4 <NA>   AA <NA> -0.02770275 0.1636935 193 -0.3505607  0.29515519
## 5 <NA>   AB <NA>  0.70019254 0.1124443 193  0.4784150  0.92197004
## 6 <NA>   BB <NA>  1.08747630 0.1600063 193  0.7718908  1.40306181
## 7 <NA> <NA>   AA -0.36570969 0.1594569 193 -0.6802116 -0.05120775
## 8 <NA> <NA>   AB  1.14012265 0.1105252 193  0.9221303  1.35811497
## 9 <NA> <NA>   BB  0.98555312 0.1590346 193  0.6718842  1.29922206
##          mean       sem
## 1 -0.39956595 0.1765342
## 2  1.43599907 0.1231533
## 3  1.31643680 0.1958242
## 4  0.21348170 0.2566733
## 5  1.06129123 0.1240206
## 6  1.41357005 0.2162085
## 7 -0.01403133 0.1979478
## 8  1.23201204 0.1338871
## 9  1.37607761 0.2067131

Often we may be interested in some interaction among QTL (epistasis) or with a covariate. Lets add in one arbitrarily, then cull to the LSmeans of Q1 and Q3, averaging across values at Q2.

alllsms<-lsmeans4qtl(cross, 
                     pheno.col = "phenotype",
                     form = "y ~ Q1 + Q2 + Q3 + Q1*Q3", 
                     mod = step.model, 
                     covar=NULL)

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

lsms<-alllsms[!is.na(alllsms$Q1) & 
                !is.na(alllsms$Q3),]
lsms<-lsms[,c("Q1","Q3","lsmean","SE","mean","sem")]

library(ggplot2)
pos<-position_dodge(.1)
ggplot(lsms, aes(x = Q1, y = lsmean, shape = Q3,
   color = Q3, group = Q3))+
   geom_point(position = pos)+
   geom_line(position = pos)+
   theme_jtl()+
   geom_errorbar(aes(ymin = lsmean - SE, ymax = lsmean+SE), width = .1,position = pos)+
   ggtitle("sas-style LSMeans")

Introduction to QTL mapping

JT Lovell

2016-11-01

First, lets simulate some data

`cross` is an R object containing the genetic map, genotype and phenotype matrices

This is a genetic map, where each horizontal bar represents a marker

Here, we have plotted the relationship among markers in the map:

Calculating conditional genotype probabilities

QTL mapping is simply the correlation of the genotype and phenotype

Basic QTL Mapping

Scanone - the QTL package’s simplest test for QTL

Permutations allow inference of significance

Scanone has its limitations

Multiple QTL modeling - basics

Automated stepwise QTL model selection

Fitting a multiple QTL model.

Getting confidence intervals.

Plotting confidence intervals

Compiling statistics

Extracting and Plotting qtl effects

Summary

Introduction to QTL mapping

JT Lovell

2016-11-01

First, lets simulate some data

cross is an R object containing the genetic map, genotype and phenotype matrices

This is a genetic map, where each horizontal bar represents a marker

Here, we have plotted the relationship among markers in the map:

Calculating conditional genotype probabilities

QTL mapping is simply the correlation of the genotype and phenotype

Basic QTL Mapping

Scanone - the QTL package’s simplest test for QTL

Permutations allow inference of significance

Scanone has its limitations

Multiple QTL modeling - basics

Automated stepwise QTL model selection

Fitting a multiple QTL model.

Getting confidence intervals.

Plotting confidence intervals

Compiling statistics

Extracting and Plotting qtl effects

Summary

`cross` is an R object containing the genetic map, genotype and phenotype matrices