Set Up

library(reticulate) # to use python in RStudio
library(tidyverse) # data wrangling and plotting with R

R code chunks are in light pink, while python in light blue.

Data Types

Python

  • float: Python uses 8 bytes (or 64 bits) to represent floating point numbers. Unlike the integer type, the float type uses a fixed number of bytes (more here)
  • int: Python uses a variable number of bits (e.g, 8 bits, 16 bits, 32bits, etc) to store integers (more here)
  • bool: True & False values (only capitalize the 1st letter, see below for how R differs)
  • str: strings

R

  • numeric: 64-bit double conforming to the IEEE 754 standard
  • integer: 32-bit numbers (hold between -2147483648 and +2147483647). R does not have native support for 64-bit integers. However, the bit64 package provides support for them.(more here)
  • logical: TRUE & FALSE values (capitalize all letters, see above for how Python differs)
  • character: strings

Data Types in Practice

Interesting differences as demonstrated in the following code chunks

  • Python automatically distinguishes between integer and float
    • 5 is recognized as integer, while 5.0 as float
    • boolP = TRUE results in error (should use True)
  • In R one needs specifically require a numeric to be an integer by adding “L”
    • both 5 and 5.0 are recognized as numeric
    • to assign an integer data type, we add “L” to specify it.
    • logicR <- True results in error (should use TRUE)

Python

a = 5
type(a)
## <class 'int'>
b = 5.0
type(b)
## <class 'float'>
boolP = TRUE # in Python, boolean type values should only have the first letter capitalized
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'TRUE' is not defined
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
boolP = True

R

a <- 5
class(a)
## [1] "numeric"
b <- 5.0
class(b)
## [1] "numeric"
# specifically require the integer class
c <- 5L
class(c)
## [1] "integer"
logicR <- True # in R, logic type values should have all letters capitalized
## Error in eval(expr, envir, enclos): object 'True' not found
logicR <- TRUE

Assignment: “<-” vs “=”

In the above code chunks, “=” is used to assign values in Python, while “<-” is used in R.

Actually, you can also use “=” in R, but “=” and “<-” have different results:

  • The main difference between “<-” and “=” assignments is the scope.
    • “<-” result in an object in the user’s workspace
    • “=” result in an object within the scope of the function (please see demo in the following R code chunk)
  • “<-” is recommended in both Google R Style Guide and Hadley Wickham’s style guide
# calculate the sum of a vector ranging from 1 to 10
sum(x = 1:10)
## [1] 55
# this results in an error because x only exists within the scope of the function
x
## Error in eval(expr, envir, enclos): object 'x' not found
# same function, this time use "<-" instead of "=" to assign the vector
sum(x <- 1:10)
## [1] 55
x
##  [1]  1  2  3  4  5  6  7  8  9 10

Operation on Data Types

Interesting differences as demonstrated in the following code chunks

  • Adding two strings?
    • R: error
    • Python: strings concatenated
  • Multiply a string by n?
    • R: error
    • Python: repeat the string n times

Python code

x="cute"
y="bunny"
z="hop"
x + y
## 'cutebunny'
z * 5
## 'hophophophophop'

R code

With reticulate package, It is easy to apply R code on an object created in python code chunks, simply use py$python_object_created.

# call object x created in the above Python code chunk
py$x
## [1] "cute"
# adding or multiply strings in R creates error
py$x + py$y
## Error in py$x + py$y: non-numeric argument to binary operator
py$z * 5
## Error in py$z * 5: non-numeric argument to binary operator
# to paste two strings, you use paste() or paste0()
paste(py$x,py$y)
## [1] "cute bunny"
paste0(py$x,py$y)
## [1] "cutebunny"