We begin with a set of reads mapped into transcript-relative coordinates, stored in an IRanges object. Here, I generate a sample dataset for a theoretical 300nt long transcript with 400 RPFs of size 28 and 29nt.
library(IRanges)
exampleTx <- IRanges(start=sample(seq(1,300), 400, replace=T),
width=sample(c(28,29), 400, replace=T))
exampleTx
## IRanges of length 400
## start end width
## [1] 267 294 28
## [2] 270 297 28
## [3] 67 94 28
## [4] 31 58 28
## [5] 219 246 28
## ... ... ... ...
## [396] 162 189 28
## [397] 290 318 29
## [398] 87 115 29
## [399] 131 159 29
## [400] 262 289 28
Here, we define a function that takes that read profile and collapses it. Note that in our dataset, position +12 of the read represents the ribosomal P-site. This offset (and the in-frame read sizes) may change between datasets.
collapseReads <- function(orfProfile, pSiteOffset, orfStart, orfEnd) {
# Adjust the read location based on p-site offset
readLocs <- start(orfProfile) + pSiteOffset
# Take only reads within the internal part of the ORF
readLocs <- readLocs[(readLocs >= orfStart+3) & (readLocs <= orfEnd-8)]
# Make read locations relative to the ORF Start for proper frame calculation
readLocs <- readLocs - orfStart + 1
# Calculate and return the frame pileup
frameVec <- readLocs %% 3 + 1
readVec <- c( sum(frameVec==1),
sum(frameVec==2),
sum(frameVec==3) )
return(readVec)
}
Now, let’s get the read profile for a theoretical 87aa ORF in the transcript from position 9 to 272
exampleReads <- collapseReads(exampleTx, 12, 9, 272)
exampleReads
## [1] 120 116 95
Note that here, the reads are close to uniformly distributed, but in a true ribosome profiling dataset they should have a profile much more biased towards the first frame if the region is actively translated
Finally, we create a function to calculate the ORFScore, given a length-3 vector of number of reads per frame.
calcORFScore <- function(frameVec) {
orfscore <- as.vector(chisq.test(c(frameVec[1], frameVec[2], frameVec[3]),
p=c(1/3,1/3,1/3))$statistic)
# Apply sign based on dominant frame
if (frameVec[1] < frameVec[2] || frameVec[1] < frameVec[3]) {
orfscore <- orfscore * -1
}
return(orfscore)
}
calcORFScore(exampleReads)
## [1] 3.268882
This will return the raw (non-log-scaled) ORFScore for the ORF
NOTE: In the publication, we used log-adjusted ORFScores The 6.044 log-scaled ORFScore threshold presented in the paper is approximately a raw ORFScore of 50 (everything higher than 50 we would thus define as translated). However, as mentioned in the paper, we recommend establishing an empirical threshold based on known coding ORFs in your dataset as ribosome profiling quality, depth, and phasing strength varies between experiments.