There is an ongoing debate in the data science world about the merits of R vs. Python. From my perspective, both languages have their own strengths, and the user should choose based on their specific use case. However, it can be difficult to utilize both languages in a data science workflow because of the different tools and data structures of each language. The recently released R reticulate package helps bridge the gap between the two languages and allows users to integrate them into the same workflow.

Reticulate offers the ability to run Python code directly from R. It is a powerful package which can translate data between R and Python to allow for almost seamlessly integration between the two languages. While this is not the first package of this type, the ease of use and available features make it very useful.

In this post, we will help you get started with reticulate and demonstrate a few useful features of the package.

Getting Started

To get started, install the reticulate package and set the path to the correct version of Python using the use_python() function. While not strictly required, explictly choosing a Python instance is a best practice. Once an instance is chosen for that session it cannot be changed. If you run into issues setting the path to the Python instance, restart R and try again. The py_config() command shows the version of Python that has connected with R.

# Install and load reticulate package
#install.packages("reticulate")
library("reticulate")

# Set the path to the Python executable file
use_python("/Users/matthewlbrown/anaconda3/bin/python", required = T)

# Check the version of Python.
py_config()
## python:         /Users/matthewlbrown/anaconda3/bin/python
## libpython:      /Users/matthewlbrown/anaconda3/lib/libpython3.6m.dylib
## pythonhome:     /Users/matthewlbrown/anaconda3:/Users/matthewlbrown/anaconda3
## version:        3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/matthewlbrown/anaconda3/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.0
## 
## NOTE: Python version was forced by use_python function

Calling Python functions from R

The first feature of reticulate we will discuss is calling Python functions directly from R using the source_python() function. The source_python() evaluates the specified script and makes all public (non-module) objects within the main Python module available within the R environment. Any function or object in the script can be called from R. In addition, you can pass parameters from R into the Python functions.

To run this code, you must:

  1. Create a separate script called “test_func.py” with the code below
  2. Save that script into the same directory as this script
  3. Set working directory “To Source File Location”
  4. Run the code below

“test_func.py”

def my_function(x):
    return x + 5

Here we will demonstrate this functionality.

# Evaluate the chosen script
source_python("test_func.py")

# Create parameter to pass to Python function
x = 10

# Call my_function()
y = my_function(x)
print(y)
## [1] 15

We were able to run a Python function directly from R. This is my favorite feature of the package because we can essentially access any Python functionality using this method.

Calling Python script from R

Another nice feature of the reticulate package is the ability to run Python scripts directly from R. To run this example, follow the steps above with the “test_script.py” script.

“test_script.py”"

import pandas
my_var = "Hello from the other side"
Dict = {'x1': [1,2,3], 'x2': [4,5,6], 'x3': [7,8,9] }
my_df = pandas.DataFrame(Dict)

Import the necessary Python packages into the environment using the import() function.

import("pandas")
## Module(pandas)

Run the python script

py_run_file("test_script.py")

Print the objects defined in the test_script.py

# Pring objects from the Python script
print(py$my_var)
## [1] "Hello from the other side"
print(py$my_df)
##   x1 x2 x3
## 1  1  4  7
## 2  2  5  8
## 3  3  6  9
print(py$my_df$x1)
## [1] 1 2 3

The data from the Python script is accessed using the syntax in the print statements above. It should be clear how to use the values from the example. While this is a trivial example, it demonstrates another powerful feature of this package.

Running Python code directly from R Markdown

Next, we will demonstrate running Python code directly from R Markdown and passing the values from a Python chunk to an R chunk.

Here we define Python objects in the Python chunk.

{Python}

import pandas as pd
Dict = {'x1': [1,2,3], 'x2': [4,5,6], 'x3': [7,8,9] }
my_df2 = pd.DataFrame(Dict)
print(my_df2)
##    x1  x2  x3
## 0   1   4   7
## 1   2   5   8
## 2   3   6   9

Print the previously defined Python objects in the R chunk.

{R}

print(py$my_df2)
##   x1 x2 x3
## 1  1  4  7
## 2  2  5  8
## 3  3  6  9

Notice the syntax matches how we accessed values in the section above. At the moment, you cannot interactively pass data between chunks in R Markdown. You must Knit the document for it to function properly. There are workarounds to this problem but they are out of the scope of this post.

Calling R data from Python

We can also do the opposite and pass R objects to Python chunks.

Define R objects in the R chunk.

{R}

df = data.frame(x1 = 1:10, x2 = 11:20)

Print R object in the Python chunk.

{Python}

print(r.df)
##    x1  x2
## 0   1  11
## 1   2  12
## 2   3  13
## 3   4  14
## 4   5  15
## 5   6  16
## 6   7  17
## 7   8  18
## 8   9  19
## 9  10  20

The Python objects are accessed from the R chunks using the syntax r.[Python object].

If you have any comments / questions, please leave them below and we will get back to you.