With only 2 steps, we are able to use Python in R!
Fire up an R Markdown document and load tidyverse and reticulate:
tidyverse – Loads the core data wrangling and visualization packages needed to work in R.
reticulate – The key link between R and Python.
Next, we need to make sure we have the Python Environment setup that we want to use. For Python Environments, we will use Anaconda (Conda), a python environment management tool specifically developed for data scientists.
library(tidyverse)
library(reticulate)
use_condaenv("py3.8", required = TRUE)
py_run_string("import os as os")
py_run_string("os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/Users/2015524/Anaconda3/envs/py3.8/Library/plugins/platforms'")
conda create -n py3.8 python=3.8 scikit-learn pandas numpy matplotlib
This code does the following:
Back in R Markdown, we can do the same thing using retculate::conda_list().
conda_list()
## name python
## 1 Anaconda3 C:\\Users\\2015524\\Anaconda3\\python.exe
## 2 py3.8 C:\\Users\\2015524\\Anaconda3\\envs\\py3.8\\python.exe
## 3 venv C:\\Users\\2015524\\Anaconda3\\envs\\venv\\python.exe
Make sure your R Markdown document activates the “py3.8” environment using use_condaenv().
use_condaenv("py3.8", required = TRUE)
Double check that reticulate is actually using your new conda env.
py_config()
## python: C:/Users/2015524/Anaconda3/envs/py3.8/python.exe
## libpython: C:/Users/2015524/Anaconda3/envs/py3.8/python38.dll
## pythonhome: C:/Users/2015524/Anaconda3/envs/py3.8
## version: 3.8.2 (default, Apr 14 2020, 19:01:40) [MSC v.1916 64 bit (AMD64)]
## Architecture: 64bit
## numpy: C:/Users/2015524/Anaconda3/envs/py3.8/Lib/site-packages/numpy
## numpy_version: 1.18.1
##
## NOTE: Python version was forced by use_python function
All of the code in this section uses python code chunks. This means you need to use {python} instead of {r} code chunks.
Errors in this section: Are likely because you have a code chunk with {r} (it’s super easy to make this mistake)
Solution: Replace {r} with {python}.
Is python working?
1+1
## 2
import numpy as np
import pandas as pd
np.arange(1, 10)
## array([1, 2, 3, 4, 5, 6, 7, 8, 9])
df = pd.DataFrame(data = {"sequence":np.arange(1,20,.01)})
df = df.assign(value=np.sin(df["sequence"]))
Run the following pandas plotting code. If the visualization appears, matplotlib is installed.
import matplotlib as plt
df.plot(x="sequence", y = "value", title = "Matplotlib")
Run a test Random Forest using RandomForestClassifier from the sklearn.ensemble module of Scikit Learn.
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
X = [[ 1, 2, 3], # 2 samples, 3 features
[11, 12, 13]]
y = [0, 1] # classes of each sample
clf.fit(X, y)
## RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
## criterion='gini', max_depth=None, max_features='auto',
## max_leaf_nodes=None, max_samples=None,
## min_impurity_decrease=0.0, min_impurity_split=None,
## min_samples_leaf=1, min_samples_split=2,
## min_weight_fraction_leaf=0.0, n_estimators=100,
## n_jobs=None, oob_score=False, random_state=0, verbose=0,
## warm_start=False)
A simple test is to run the AffinityPropagation test from Scikit Learn’s website.
from sklearn.cluster import AffinityPropagation
from sklearn.datasets import make_blobs
# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
random_state=0)
# Compute Affinity Propagation
af = AffinityPropagation(preference=-50).fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_
n_clusters_ = len(cluster_centers_indices)
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle
plt.close('all')
plt.figure(1)
plt.clf()
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
class_members = labels == k
cluster_center = X[cluster_centers_indices[k]]
plt.plot(X[class_members, 0], X[class_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
for x in X[class_members]:
plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()