Moving Average Plot

Moving Average Plot R Code

Created by: Awni Mousa

July 29th, 2015

The following R function will apply a sliding window of moving average method given a window size and steps.

As input, a two column data frame, df, of a response and terms, is used, where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. Example, response = log2FC, predictor = geneLength. Optional arguments can be changed to control the window size and the step value.

The winSize, an integer > 1, controls the desired window size; default 200.

The step controls the distance between windows and is an integer <= winSize; default 40.

The output is a LIST of 2 elements:

These values can be assigned to a variable to be used in the environment by calling the first element in the list[[1]] to get the data frame with the values, and by calling the second element in the list[[2]] to initiate the plot (see example below).

R code

movingAverage.AM <- function( df, winSize=200, step=40){
  colnames(df) <- c("response","predictor")
  meanValue <- NULL
  l <- seq(1,length(df\( response),step)
  for(i in l){
    if(i+winSize-1 > length(df \)response)){
      i1 = length(df$response) }
    else i1 = i+winSize-1
    # mean
    respAvg <- mean(df$response[i:i1])
    # 95% CI
    ucl <- respAvg + 1.96 * sd(df\( response[i:i1]/sqrt(length(df \)response[i:i1])))
    lcl <- respAvg - 1.96 * sd(df\( response[i:i1]/sqrt(length(df \)response[i:i1])))
    # average gene length in bin
    predMean <- mean(df$predictor[i:i1])
    meanValue.i <- cbind(predMean,respAvg,ucl,lcl)
    meanValue <- rbind(meanValue, meanValue.i)
  }
  meanValue <- as.data.frame(meanValue)
  # make plot
  library(ggplot2)
  p = ggplot(df, aes(y=response, x=predictor))
  P <- p + scale_x_log10(breaks=10^(seq(-1,10)),labels=10^(seq(-1,10))) +
    geom_point(color="grey64", alpha=1/2) +
    geom_abline(intercept = 0, slope = 0, lwd = 1, col = "#009E73", alpha=1/2) +
    annotation_logticks(sides="b") +
    geom_smooth(aes(x=predMean, y=respAvg, ymin=lcl, ymax=ucl), data=meanValue,
                stat="identity", color="red", fill="red") +
    geom_rug(position="jitter", size=0.05, color="magenta", alpha=1/8,
             sides="tr") +
    xlab("predictor") + ylab("response") +
    ggtitle(bquote( atop("Moving Average", paste("Window Size ",.(winSize),
                                                 " Step ", .(step) ))))
  return(list(MovingAverage=meanValue, plot=P))
}

Usage and example plot

df <- read.table("df.txt",header = T, sep = "\t", quote = "")

df is a data frame with two columns:

head(df)
##         log2FC gene_length_kb
## 4   0.13645050          0.019
## 10 -0.04667352          0.019
## 13 -0.07517732          0.019
## 15 -0.14915979          0.019
## 20  0.29137589          0.020
## 23 -0.09182034          0.020

it has 21334 rows:

dim(df)
## [1] 21334     2

To run the function and assign the results to a variable:

result <- movingAverage.AM(df = df , winSize = 200 , step = 40)

To extract the output table:

# result\( MovingAverage
head(result \)MovingAverage)
##   predMean      respAvg           ucl          lcl
## 1 0.028355  0.025392100  0.0602557922 -0.009471592
## 2 0.031020  0.024814286  0.0603577014 -0.010729130
## 3 0.033785  0.002299482  0.0374499663 -0.032851003
## 4 0.037580 -0.017463724  0.0159893048 -0.050916754
## 5 0.042830 -0.029813453 -0.0008006119 -0.058826294
## 6 0.049185 -0.024630348  0.0034561658 -0.052716862

To plot the output type in:

result$plot

plot of chunk unnamed-chunk-7

GitHub: awnimo