Set Up

library(reticulate) # to use python in RStudio
library(tidyverse) # data wrangling and plotting with R

R code chunks are in light pink, while python in light blue.

Data Types

Python

float: Python uses 8 bytes (or 64 bits) to represent floating point numbers. Unlike the integer type, the float type uses a fixed number of bytes (more here)
int: Python uses a variable number of bits (e.g, 8 bits, 16 bits, 32bits, etc) to store integers (more here)
bool: True & False values (only capitalize the 1st letter, see below for how R differs)
str: strings

R

numeric: 64-bit double conforming to the IEEE 754 standard
integer: 32-bit numbers (hold between -2147483648 and +2147483647). R does not have native support for 64-bit integers. However, the bit64 package provides support for them.(more here)
logical: TRUE & FALSE values (capitalize all letters, see above for how Python differs)
character: strings

Data Types in Practice

Interesting differences as demonstrated in the following code chunks

Python automatically distinguishes between integer and float
- 5 is recognized as integer, while 5.0 as float
- boolP = TRUE results in error (should use True)
In R one needs specifically require a numeric to be an integer by adding “L”
- both 5 and 5.0 are recognized as numeric
- to assign an integer data type, we add “L” to specify it.
- logicR <- True results in error (should use TRUE)

Python

a = 5
type(a)

## <class 'int'>

b = 5.0
type(b)

## <class 'float'>

boolP = TRUE # in Python, boolean type values should only have the first letter capitalized

## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'TRUE' is not defined
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>

boolP = True

R

a <- 5
class(a)

## [1] "numeric"

b <- 5.0
class(b)

## [1] "numeric"

# specifically require the integer class
c <- 5L
class(c)

## [1] "integer"

logicR <- True # in R, logic type values should have all letters capitalized

## Error in eval(expr, envir, enclos): object 'True' not found

logicR <- TRUE

Assignment: “<-” vs “=”

In the above code chunks, “=” is used to assign values in Python, while “<-” is used in R.

Actually, you can also use “=” in R, but “=” and “<-” have different results:

The main difference between “<-” and “=” assignments is the scope.
- “<-” result in an object in the user’s workspace
- “=” result in an object within the scope of the function (please see demo in the following R code chunk)
“<-” is recommended in both Google R Style Guide and Hadley Wickham’s style guide

# calculate the sum of a vector ranging from 1 to 10
sum(x = 1:10)

## [1] 55

# this results in an error because x only exists within the scope of the function
x

## Error in eval(expr, envir, enclos): object 'x' not found

# same function, this time use "<-" instead of "=" to assign the vector
sum(x <- 1:10)

## [1] 55

##  [1]  1  2  3  4  5  6  7  8  9 10

Operation on Data Types

Interesting differences as demonstrated in the following code chunks

Adding two strings?
- R: error
- Python: strings concatenated
Multiply a string by n?
- R: error
- Python: repeat the string n times

Python code

x="cute"
y="bunny"
z="hop"
x + y

## 'cutebunny'

z * 5

## 'hophophophophop'

R code

With reticulate package, It is easy to apply R code on an object created in python code chunks, simply use py$python_object_created.

# call object x created in the above Python code chunk
py$x

## [1] "cute"

# adding or multiply strings in R creates error
py$x + py$y

## Error in py$x + py$y: non-numeric argument to binary operator

py$z * 5

## Error in py$z * 5: non-numeric argument to binary operator

# to paste two strings, you use paste() or paste0()
paste(py$x,py$y)

## [1] "cute bunny"

paste0(py$x,py$y)

## [1] "cutebunny"

R & Python Basics 1: Data Type

An R user’s Python learning note

Mena WANG

26/09/2021

Set Up

Data Types

Python

R

Data Types in Practice

Python

R

Assignment: “<-” vs “=”

Operation on Data Types

Python code

R code