Calculating ORFScore

We begin with a set of reads mapped into transcript-relative coordinates, stored in an IRanges object. Here, I generate a sample dataset for a theoretical 300nt long transcript with 400 RPFs of size 28 and 29nt.

library(IRanges)

exampleTx <- IRanges(start=sample(seq(1,300), 400, replace=T),
                      width=sample(c(28,29), 400, replace=T))
exampleTx

## IRanges of length 400
##       start end width
## [1]     267 294    28
## [2]     270 297    28
## [3]      67  94    28
## [4]      31  58    28
## [5]     219 246    28
## ...     ... ...   ...
## [396]   162 189    28
## [397]   290 318    29
## [398]    87 115    29
## [399]   131 159    29
## [400]   262 289    28

Here, we define a function that takes that read profile and collapses it. Note that in our dataset, position +12 of the read represents the ribosomal P-site. This offset (and the in-frame read sizes) may change between datasets.

collapseReads <- function(orfProfile, pSiteOffset, orfStart, orfEnd) {
  # Adjust the read location based on p-site offset
  readLocs <- start(orfProfile) + pSiteOffset
  # Take only reads within the internal part of the ORF
  readLocs <- readLocs[(readLocs >= orfStart+3) & (readLocs <= orfEnd-8)]
  # Make read locations relative to the ORF Start for proper frame calculation
  readLocs <- readLocs - orfStart + 1
  # Calculate and return the frame pileup
  frameVec <- readLocs %% 3 + 1
  readVec <- c( sum(frameVec==1),
                sum(frameVec==2),
                sum(frameVec==3) )
  return(readVec)
}

Now, let’s get the read profile for a theoretical 87aa ORF in the transcript from position 9 to 272

exampleReads <- collapseReads(exampleTx, 12, 9, 272)
exampleReads

## [1] 120 116  95

Note that here, the reads are close to uniformly distributed, but in a true ribosome profiling dataset they should have a profile much more biased towards the first frame if the region is actively translated

Finally, we create a function to calculate the ORFScore, given a length-3 vector of number of reads per frame.

calcORFScore <- function(frameVec) {
  orfscore <- as.vector(chisq.test(c(frameVec[1], frameVec[2], frameVec[3]),
                                   p=c(1/3,1/3,1/3))$statistic)
  # Apply sign based on dominant frame
  if (frameVec[1] < frameVec[2] || frameVec[1] < frameVec[3]) {
    orfscore <- orfscore * -1
  }

  return(orfscore)
}

calcORFScore(exampleReads)

## [1] 3.268882

This will return the raw (non-log-scaled) ORFScore for the ORF

NOTE: In the publication, we used log-adjusted ORFScores The 6.044 log-scaled ORFScore threshold presented in the paper is approximately a raw ORFScore of 50 (everything higher than 50 we would thus define as translated). However, as mentioned in the paper, we recommend establishing an empirical threshold based on known coding ORFs in your dataset as ribosome profiling quality, depth, and phasing strength varies between experiments.

Calculating ORFScore - example

Timothy Johnstone