I'm making this example in response to a friend's request to try to emulate the plots here by showing the genome axis on the \( X \) axis and the number of ORFs on the \( Y \) axis.
#### Basic data of #ORFs by window
## Using sliding windows by 1/4 of the size (which is 100)
set.seed(101)
df <- data.frame(
start=seq(1, 9901, by=25),
end=seq(100, 10000, by=25),
nORFs=round(runif(397, max = 200))
)
head(df)
## start end nORFs
## 1 1 100 74
## 2 26 125 9
## 3 51 150 142
## 4 76 175 132
## 5 101 200 50
## 6 126 225 60
#### Summarize information for making the plot
## Start by saving the positions of interest
data <- data.frame(
## Could use something simpler since we know the actual window size
pos=df$start + round((df$end - df$start)/2)
)
## Get subsets
subsets <- lapply(data$pos, function(x) {
subset(df, start <= x & end >= x)$nORFs
})
## Complete data.frame of interest
data <- cbind(data, data.frame(
mean=unlist(lapply(subsets, mean)),
min=unlist(lapply(subsets, min)),
max=unlist(lapply(subsets, max))
))
head(data)
## pos mean min max
## 1 51 75.00 9 142
## 2 76 89.25 9 142
## 3 101 83.25 9 142
## 4 126 96.00 50 142
## 5 151 89.75 50 132
## 6 176 73.50 50 117
#### Make the plot
library(ggplot2)
ggplot(data, aes(x=pos, y=mean)) +
geom_ribbon(aes(ymin=min, ymax=max), alpha=0.2) +
geom_line()
The plot doesn't look so great because I generated random data. But the idea is that you look at the mean along with some information of other windows that overlap the mid position of each window. Depending on how many windows actually overlap, you might be interested on using the mean +- the standard deviation.