This script will show some elements of working in Python from RStudio.
version 2, updated. version 1: Jan 19, 2021.
Here is your main project in R:
data <- mtcars
str(data)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
In order to start using Python syntax in RStudio, we must dance around a bit.
Install and load the reticulate package:
#install.packages("reticulate")
library(reticulate)
{r} chunk:reticulate::source_python("script.py")
py_ from an {r} chunk:#library(reticulate)
py_run_string("import numpy as np")
py_run_string("my_python_array = np.array([1,4,6,8])")
py_run_string("print(my_python_array)")
{r} chunk:py_install("pandas")
py_install("seaborn")
For using whole Python chunks as part of a script from a {python} chunk
import numpy as np
my_python_array = np.array([3,4,6,8])
my_python_array
## array([3, 4, 6, 8])
import pandas as pd
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None)
df.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'salary']
df.head()
## age workclass fnlwgt ... hours-per-week native-country salary
## 0 39 State-gov 77516 ... 40 United-States <=50K
## 1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K
## 2 38 Private 215646 ... 40 United-States <=50K
## 3 53 Private 234721 ... 40 United-States <=50K
## 4 28 Private 338409 ... 40 Cuba <=50K
##
## [5 rows x 15 columns]
NB: don’t be scared if neither Python objects nor code lines appear in the Environment or console.
import seaborn as sns
## Matplotlib created a temporary config/cache directory at C:\Users\lssi7\AppData\Local\Temp\matplotlib-95okh5lb because the default path (C:\Users\lssi7\OneDrive\?????????\.matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
sns.countplot(x = "salary",data = df)
my_r_array <- py$my_python_array
class(my_r_array) # this is still a Python data class!
## [1] "array"
my_r_vector <- as.vector(py$my_python_array)
class(my_r_vector) # it is an R data type now.
## [1] "numeric"
my_r_vector <- my_r_vector * 2
my_r_vector
## [1] 6 8 12 16
{python} chunks):my_python_array2 = r.my_r_vector
print(my_python_array2)
## [6.0, 8.0, 12.0, 16.0]
r. from a {python} chunk. For example,cars = r.mtcars #our mtcars
cars.head()
## mpg cyl disp hp drat ... qsec vs am gear carb
## Mazda RX4 21.0 6.0 160.0 110.0 3.90 ... 16.46 0.0 1.0 4.0 4.0
## Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 ... 17.02 0.0 1.0 4.0 4.0
## Datsun 710 22.8 4.0 108.0 93.0 3.85 ... 18.61 1.0 1.0 4.0 1.0
## Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 ... 19.44 1.0 0.0 3.0 1.0
## Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 ... 17.02 0.0 0.0 3.0 2.0
##
## [5 rows x 11 columns]
Another example (start in an {r} chunk):
x = c(1:10) # create a vector in R
Then switch to a {python} chunk:
r.x # access the created vector with Python
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Click here to download a cheatsheet!
Sources: https://www.infoworld.com/article/3340120/how-to-run-python-in-r.html