Created by: Awni Mousa
July 29th, 2015
The following R function will apply a sliding window of moving average method given a window size and steps.
As input, a two column data frame, df, of a response and terms, is used, where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. Example, response = log2FC, predictor = geneLength. Optional arguments can be changed to control the window size and the step value.
The winSize, an integer > 1, controls the desired window size; default 200.
The step controls the distance between windows and is an integer <= winSize; default 40.
The output is a LIST of 2 elements:
These values can be assigned to a variable to be used in the environment by calling the first element in the list[[1]] to get the data frame with the values, and by calling the second element in the list[[2]] to initiate the plot (see example below).
movingAverage.AM <- function( df, winSize=200, step=40){ colnames(df) <- c("response","predictor") meanValue <- NULL l <- seq(1,length(df\( response),step) for(i in l){ if(i+winSize-1 > length(df \)response)){ i1 = length(df$response) } else i1 = i+winSize-1 # mean respAvg <- mean(df$response[i:i1]) # 95% CI ucl <- respAvg + 1.96 * sd(df\( response[i:i1]/sqrt(length(df \)response[i:i1]))) lcl <- respAvg - 1.96 * sd(df\( response[i:i1]/sqrt(length(df \)response[i:i1]))) # average gene length in bin predMean <- mean(df$predictor[i:i1]) meanValue.i <- cbind(predMean,respAvg,ucl,lcl) meanValue <- rbind(meanValue, meanValue.i) } meanValue <- as.data.frame(meanValue) # make plot library(ggplot2) p = ggplot(df, aes(y=response, x=predictor)) P <- p + scale_x_log10(breaks=10^(seq(-1,10)),labels=10^(seq(-1,10))) + geom_point(color="grey64", alpha=1/2) + geom_abline(intercept = 0, slope = 0, lwd = 1, col = "#009E73", alpha=1/2) + annotation_logticks(sides="b") + geom_smooth(aes(x=predMean, y=respAvg, ymin=lcl, ymax=ucl), data=meanValue, stat="identity", color="red", fill="red") + geom_rug(position="jitter", size=0.05, color="magenta", alpha=1/8, sides="tr") + xlab("predictor") + ylab("response") + ggtitle(bquote( atop("Moving Average", paste("Window Size ",.(winSize), " Step ", .(step) )))) return(list(MovingAverage=meanValue, plot=P)) }
df <- read.table("df.txt",header = T, sep = "\t", quote = "")
df is a data frame with two columns:
head(df)
## log2FC gene_length_kb ## 4 0.13645050 0.019 ## 10 -0.04667352 0.019 ## 13 -0.07517732 0.019 ## 15 -0.14915979 0.019 ## 20 0.29137589 0.020 ## 23 -0.09182034 0.020
it has 21334 rows:
dim(df)
## [1] 21334 2
To run the function and assign the results to a variable:
result <- movingAverage.AM(df = df , winSize = 200 , step = 40)
To extract the output table:
# result\( MovingAverage head(result \)MovingAverage)
## predMean respAvg ucl lcl ## 1 0.028355 0.025392100 0.0602557922 -0.009471592 ## 2 0.031020 0.024814286 0.0603577014 -0.010729130 ## 3 0.033785 0.002299482 0.0374499663 -0.032851003 ## 4 0.037580 -0.017463724 0.0159893048 -0.050916754 ## 5 0.042830 -0.029813453 -0.0008006119 -0.058826294 ## 6 0.049185 -0.024630348 0.0034561658 -0.052716862
To plot the output type in:
result$plot
GitHub: awnimo