rco - The R Code Optimizer

A brief search on the web suffices to notice that R is slow compared to other popular programming languages. “The R interpreter is not fast and execution of large amounts of R code can be unacceptably slow” (Ihaka 2010). The main reason for this is because “R was purposely designed to make data analysis and statistics easier for you to do. It was not designed to make life easier for your computer” (Wickham 2014). Currently, the most widely used R interpreter is GNU-R, although there are several implementations of R interpreters that attempt to improve execution speed - MRAN, pqR, renjin, FastR, Riposte, rho - “switching interpreters is something to consider carefully” (Gillespie and Lovelace 2016). “Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it’s poorly written. Few R users have any formal training in programming or software development … This means that it’s relatively easy to make most R code much faster” (Wickham 2014). “It is important to pursue efficiency issues, and in particular, speed” (Ihaka 1998). “A good deal of work is going into making R more efficient. Much of this work consists of reimplementing interpreted R code” (Ihaka 2010).

As available for other programming languages -gcc for C-, a tool that automatically analyzes and optimizes R code results crucial. Automatic code optimization strategies were firstly implemented for compiled languages, the best-known example being the GNU Compiler Collection (gcc; formerly called GNU C Compiler), which implements more than 100 different code optimization techniques. While R is interpreted, many of these optimization techniques can be applied. To the best of our knowledge, the only existing tool to automatically optimize R code is the {compiler} package. The high impact of such package was demonstrated as it was added to GNU-R since version 2.13.0. Although the {compiler} package manages, in certain cases, to improve the execution time of the R code, its objective is to compile expressions into byte code. Since the {compiler} package’s main goal is not optimization, it leaves aside several well-known optimization strategies (Cooper and Torczon 2011). In addition to this, as the result of applying the functions of the {compiler} package is byte code, it does not allow the user to easily understand which modifications make their code more efficient.

To fill this gap for the R users’ community, I present The R Code Optimizer ({rco}) package. {rco} is a GNU-R package, which its main goal is to provide functions that allow users to automatically apply different strategies to optimize their R code: common subexpression elimination, constant folding and propagation, dead code and store elimination, dead expression elimination, and loop-invariant code motion. Having currently implemented seven optimization techniques, {rco} results in a useful and easy to apply tool. The developed functions have as input and output R code, thus, after applying {rco}, the end user obtains an efficient code, and which allows understanding the modifications that cause such efficiency gain.

Acknowledgements

This project was funded by the Google Summer of Code 2019 program. The author would like to thank Dr. Nicolás Wolovick and Dr. Yihui Xie for volunteering as mentors for this project.

References

Cooper, Keith, and Linda Torczon. 2011. Engineering a Compiler. Elsevier.

Gillespie, Colin, and Robin Lovelace. 2016. Efficient R Programming. O’Reilly Media, Incorporated.

Ihaka, Ross. 1998. “R: Past and Future History.” Computing Science and Statistics.

———. 2010. R: Lessons Learned, Directions for the Future. Joint Statistical Meetings.

Wickham, Hadley. 2014. Advanced R. Chapman; Hall/CRC.