## Warning: package 'reticulate' was built under R version 4.3.3
Category
R
Python
use curly braces {} and parentheses (), r code can be spead on multiple rows
for (i in 1:5){
print("code block 1")
if (i > 3) {
print("code block 2")
}
print("code block 1 again")
}
## [1] "code block 1"
## [1] "code block 1 again"
## [1] "code block 1"
## [1] "code block 1 again"
## [1] "code block 1"
## [1] "code block 1 again"
## [1] "code block 1"
## [1] "code block 2"
## [1] "code block 1 again"
## [1] "code block 1"
## [1] "code block 2"
## [1] "code block 1 again"
use indentation and columns, python operations on multiple rows must be declared either using or ()
for i in range(5):
print("code block 1")
if i > 3:
print("code block 2")
print("code block 1 again")
## code block 1
## code block 1 again
## code block 1
## code block 1 again
## code block 1
## code block 1 again
## code block 1
## code block 1 again
## code block 1
## code block 2
## code block 1 again
use colon :
btw 2 numbers to create a series
## [1] 1 2 3 4 5 6 7 8 9 10
use range
function, must be transformed to a list to
access the elements list(range())
or
np.arange()
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
indexing starts at 1 and ends with the last element in the sequence included
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
assigning a variable with the function
command
## [1] 6
specify default arguments regardless of their order
## [1] 13
paste
or paste0 or sprintf
to concatenate
strings and add variables to it
## [1] "The value of variable 1 is: 5 , and of variable 2 is: 7"
## [1] "The value of variable 1 is: 5, and of variable 2 is: 7"
no buit-in dictionary, but most objects are named
(e.g.list
can be used). If the elements are all the same
type, a regular vector can be used
## [1] 25
## [1] 25
## [1] 30
library
orrequire
: automatically loads all
functions into namespace and if some methods were overwritten, we can
still access the other elements ::
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dplyr) # mask the base filter function
# filter(df, Age>25)
x = 1:100
stats::filter(x, c(1,1,1)) # use the base filter function
## Time Series:
## Start = 1
## End = 100
## Frequency = 1
## [1] NA 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54
## [19] 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108
## [37] 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
## [55] 165 168 171 174 177 180 183 186 189 192 195 198 201 204 207 210 213 216
## [73] 219 222 225 228 231 234 237 240 243 246 249 252 255 258 261 264 267 270
## [91] 273 276 279 282 285 288 291 294 297 NA
import
: allow to access the commands using the model
name & “.” numpy.arange
, allies the library
import numpy as np
, import specific functions
from numpy import arange
, or import all functions
from numpy import *
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
## [1] 5 7 9
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 13 16 19 22
## [2,] 14 17 20 23
## [3,] 15 18 21 24
## array([ 2, 4, 6, 8, 10])
## array([[1, 2, 3],
## [4, 5, 6]])
## array([[[ 0, 1, 2, 3],
## [ 4, 5, 6, 7],
## [ 8, 9, 10, 11]],
##
## [[12, 13, 14, 15],
## [16, 17, 18, 19],
## [20, 21, 22, 23]]])
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
refers to using arithmetic operations with arrays of different dimensions. For ex., adding a vector to a matrix.
mat = matrix(1:6, nrow=2)
vec = c(10,20,30)
mat + vec # column-wise, not what you'd expect coming from python
## [,1] [,2] [,3]
## [1,] 11 33 25
## [2,] 22 14 36
## [,1] [,2] [,3]
## [1,] 11 23 35
## [2,] 12 24 36
## [,1] [,2] [,3]
## [1,] 11 23 35
## [2,] 12 24 36
has built-in data frame capabilities, preferably use dplyr for handling the data
library(dplyr)
df2 = df %>%
mutate(Score_Doubled = Score * 2) %>% # create a new column/variable
filter(Age > 23) # filtering rows
df2
doesn’t have built-in data frame capabilities, the only data frame
library is pandas
import pandas as pd
data = {
'Name': ["Alice","Bob","Charlie"],
'Age': [25,30,22],
'Score': [95,80,75]
}
df = pd.DataFrame(data)
print(df)
## Name Age Score
## 0 Alice 25 95
## 1 Bob 30 80
## 2 Charlie 22 75
df['Score_Doubled'] = df['Score']*2 # creating a new column/variable
df2 = df['Age'] > 23 # filtering rows
df2
## 0 True
## 1 True
## 2 False
## Name: Age, dtype: bool
using dplyr
pipe command %>% (ctr+shift+m)
using vectorized-boolean expressions `or
dplyrfilter`
command
## [,1] [,2] [,3]
## [1,] 5 9 13
## [2,] 6 10 14
## [3,] 7 11 15
## [4,] 8 12 16
# dataframe: subsetting same as array, R also has the ability to name the rows in a dataframe (change the df index to be some labels and then slice it as done in Python panda)
df[,1:2]
## [1] 25 22
## array([[ 2, 3],
## [ 6, 7],
## [10, 11],
## [14, 15]])
## Series([], Name: Age, dtype: int64)
# Note that unlike the normal python behavior, .loc will return the last element in the slice)
df.iloc[0:2,0] # or `.iloc` regular indice (will have regular python behavior, will only return the 0th and 1st row)
## 0 Alice
## 1 Bob
## Name: Name, dtype: object
built-in random number generator
## [1] 0.9717502
## [1] 0.08375751
set the seed on the component that is making the generation with
import random
random.seed(42)
random.random(); with
import numpy as
npnp.random.seed(42)np.random.random()`
## 0.6484972199788831
## 0.7013695409686748
import numpy as np
np.random.seed(27) # set the seed on the right component that generates the random No. Otherwise it won't work
np.random.random()
## 0.4257214105188958
## 0.8145837404945526
# Caution!!
random.seed(27) # setting the seed on the random library
np.random.random() # calling the numpy library random No. generator; result will be different each time
## 0.7353972901996796
length
for vector, list; dim()
for matrix,
dataframe
## [1] 3
## [1] 4 4
## [1] 50 3
Four rules are used to help interpret run charts by detecting non-random patterns (i.e. signals) in the data.
Five rules for identifying special cause in control charts to understand whether improvement is occurring: