本週作業與進度：

研讀Advanced R: Functions
研讀電子書：The Quick Python Book, 3rd Edition, Chapter 9 ，書上程式可參閱 Ch09_code.txt
預習Advanced R: Functionals
研讀The Quick Python Book 3rd Edition-2018-英文版 – Ch6: Strings

R functions

return()
formals()
body()
environment()
do.call()

Python functions

def
return
print()
string operation

R

R的函數(function)與其他R的資料型態，如向量(vector)，都是物件(object)。
R的函數亦被稱為 『first-class object』：
- 可以被存入變數或其他結構
- 可以被作為參數傳遞給其他函數
- 可以被作為函數的回傳值
- 可以在執行期創造，而無需完全在設計期全部寫出
- 即使沒有被繫結至某一名稱，也可以存在

Function Components

the body()： function body，函數主體，定義函數功能
the formals()：函數的 『形式參數(formal arguments)』
the environment()：函數被創建出來的位置，又稱 『enclosing environment』，用來決定函數執行時，變數搜尋的規則。

f <- function(x = 0) {   
  return(x^2)
}
body(f)     # function body

## {
##     return(x^2)
## }

formals(f)  # pair list

## $x
## [1] 0

environment(f)  # enclosing environment

## <environment: R_GlobalEnv>

Primitive function(底層為C)不具備這三個組成成分。

sin

## function (x)  .Primitive("sin")

typeof(sin)

## [1] "builtin"

typeof(`[`)

## [1] "special"

formals(sin)

## NULL

body(sin)

## NULL

environment(sin)

## NULL

Lazy Evaluation

函數參數具有惰性求值(Lazy Evaluation)特性：僅在被存取時才會進行求值。
可以讓R執行的更有效率

f <- function(x) {
  10
}

f(x = stop("This is an error")) # 因為函數執行並不需要變數x，故不會引發錯誤訊息

## [1] 10

y <- 10
g <- function(x) {
  y <- 100
  x + 1
}

g(x = y)

## [1] 11

g(x <- 1000)

## [1] 1001

## [1] 1000

## [1] 10

Function Default Values

myfun <- function(x = 0, y = 0) {
  sqrt(x^2 + y^2)
}
myfun(x = 5, y = 12)

## [1] 13

normalize <- function(x, m, s) {
  (x - m)/s
}

set.seed(seed = 188)
d <- rnorm(n = 20, mean = 10, sd = 5)
M <- mean(x = d)
M

## [1] 9.517236

S <- sd(x = d)
S

## [1] 5.292199

# fix(normalize)
normalize <- function(x = x, m = mean(x = x), s = sd(x = x)) {
  (x - m)/s
}

d[length(d)] <- NA
normalize(x = d)

##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

normalize <- function(x = x, m = mean(x = x, na.rm = na.rm), s = sd(x = x, na.rm = na.rm), na.rm = FALSE) {
  (x - m)/s
}
normalize(x = d, na.rm = TRUE)

##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA

normalize <- function(x = x, m = mean(x = x, ...), s = sd(x = x, ...), ...) {
  (x - m)/s
}
normalize(x = d, na.rm = TRUE)

##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA

normalize <- function(x = x, m = mean(x = x, trim = trim, ...), 
                      s = sd(x = x, ...), trim = 0, ...) {
  (x - m)/s
}
normalize(x = d, trim = 0, na.rm = TRUE)

##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA

normalize(x = d, trim = 0.2, na.rm = TRUE)

##  [1] -1.41823144  0.04341101 -1.88254190 -0.14118529  0.13223646 -1.06078869
##  [7]  0.41479512 -0.09889509  0.60478714 -0.86240289  0.43411887  0.87677543
## [13] -0.02571524 -0.62384375  1.35051329 -0.55673202 -0.19734975  2.25153908
## [19]  1.18705926          NA

h <- function(x = ls()) {
  a <- 1
  x
}

h()

## [1] "a" "x"

h(x = ls())

##  [1] "d"         "f"         "g"         "h"         "M"         "myfun"    
##  [7] "normalize" "S"         "x"         "y"

Function Arguments

函數參數的傳入：有名字看名字(支援『縮寫』)，沒名字看位置
大部分情況下，參數傳值為passed by value

addTheLog <- function(first, second) {
  first + log(second)
}

addTheLog(first = 1, second = exp(4))

## [1] 5

addTheLog(second = exp(4), first = 1)

## [1] 5

addTheLog(f = 1, s = exp(4))

## [1] 5

addTheLog(s = exp(4), f = 1)

## [1] 5

addTheLog(1, exp(4))

## [1] 5

addTheLog(exp(4), 1)

## [1] 54.59815

mean(1:10, n = T)

## [1] 5.5

mean(1:10, , FALSE)

## [1] 5.5

mean(1:10, 0.05)

## [1] 5.5

mean(, TRUE, x = c(1:10, NA))

## [1] 5.5

以『參數list』執行函數呼叫

常搭配lapply函數結果使用。

args <- list(1:10, na.rm = TRUE)  # 參數list
do.call(what = mean, args = args)

## [1] 5.5

# Equivalent to
mean(1:10, na.rm = TRUE)

## [1] 5.5

函數回傳值(Return Value)

Implicit versus Explicit

Implicit: 回傳最後一次的執行結果

j01 <- function(x) {
  if (x < 10) {
    0
  } else {
    10
  }
}
j01(5)

## [1] 0

j01(20)

## [1] 10

Explicit：使用return()

j02 <- function(x) {
  if (x < 10) {
    return(0)
  } else {
    return(10)
  }
}

Visible versus Invisible

k01 <- function() 1
k01()

## [1] 1

a <- k01()
a

## [1] 1

k02 <- function() invisible(1)
k02()
b <- k02()
b

## [1] 1

withVisible(k02())

## $value
## [1] 1
## 
## $visible
## [1] FALSE

Python

Defining Function

如果函數沒有return敘述，則預設會回傳 None。
範例：

def f(x, y):
    x + y

f(1, 2)
ans = f(1, 2)
ans
type(ans)

## <class 'NoneType'>

def fact(n):
    """Return the factorial of the given number."""    # docstring: 『文件字串』，用來說明這個函數的用途
    r = 1
    while n > 0:
        r = r *n
        n = n - 1
    return r    # 回傳值為r                                        

type(fact)

## <class 'function'>

x = fact(n = 4)
x

## 24

help(fact)      # 查詢函數的help說明

## Help on function fact in module __main__:
## 
## fact(n)
##     Return the factorial of the given number.

fact.__doc__    # 回傳docstring的內容，亦為函數的屬性

## 'Return the factorial of the given number.'

Function Attributes

def func(a):
    b = 'spam'
    return b * a

func(3)

## 'spamspamspam'

func.__name__

## 'func'

dir(func)

## ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

func.count = 0
func.count += 1
func.count

## 1

dir(func)

## ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count']

Function Arguments

依 位置(position) 傳值：比對『參數位置』傳值，稱為 『位置引數(positional argument)』

def power(x, y):   # positional parameter
    r = 1
    while y > 0:
        r = r * x
        y = y - 1
    return r

power(3, 4)     # positional argument

## 81

函數呼叫時的『引數數量』必須與函數定義時的『形式參數數量』一致，否則會導致TypeError

# power(3)
# TypeError: power() missing 1 required positional argument: 'y'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

Function Default Values

定義函數時，你可以對參數設定預設值(default value)。

def power(x, y = 2):   # y設定預設值
    r = 1
    while y > 0:
        r = r * x
        y = y - 1
    return r
    
power(3, 4)

## 81

power(3)

## 9

在定義函數時， 沒有預設值的參數必須放在前面，否則會發生錯誤。
因為python是以 『位置參數』 為基礎

# def func(x = 2, y):
#     pass
# 
# 錯誤: non-default argument follows default argument (<string>, line 1)

在呼叫函數時，必須提供足夠的引數給所有的位置參數
也可直接指定參數名稱來傳遞引數，此方法稱為 『指名引數(keyword argument)』。

power(2, 3)

## 8

power(3, 2)

## 9

power(y = 2, x = 3)

## 9

請看下面例子，並注意錯誤訊息，與R語言有很大的不同 (請思考，如果是R，下面的函數呼叫會發生錯誤嗎？)

# power(2, x = 3)
# TypeError: power() got multiple values for argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

『指名引數』的呼叫順序

呼叫函數時，如果混合使用『位置引數』與『指名引數』，請留意順序必須為：
『位置引數』在前，『指名引數』在後

power(3, y = 2)   # 滿足『位置引數』在前，『指名引數』在後

# 注意下面例子結果與錯誤訊息
# power(x = 3, 2)  # 違反『位置引數』在前，『指名引數』在後
# 錯誤: positional argument follows keyword argument (<string>, line 2)

## 9

帶有*與**的參數

用一個參數(帶有 * )捕捉多個 『位置參數(positional argument)』，並以 『tuple』 方式處理
用一個參數(帶有 ** )捕捉多個 『指名參數(keyword argument)』，並以 『dict』 方式處理

def f(*x):
    print(x)
    
f(1, 2, 3, 4)

# f(x =1, y = 2, p = 3, q = 4)
# TypeError: f() got an unexpected keyword argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

## (1, 2, 3, 4)

def g(**x):
    print(x)

g(x =1, y = 2, p = 3, q = 4)

# g(1, 2, 3, 4)
# TypeError: g() takes 0 positional arguments but 4 were given
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

## {'x': 1, 'y': 2, 'p': 3, 'q': 4}

def maximum(*numbers):   # 帶有*的參數
    if len(numbers) == 0:
        return None
    else:
         maxnum = numbers[0]
         for n in numbers[1:]:
             if n > maxnum:
                 maxnum = n
         return maxnum
         
maximum(1, 5, 9, -2, 2)

## 9

def example_fun(x, y, **other):
        print("x: {0}, y: {1}, keys in 'other': {2}".format(x, 
              y, list(other.keys())))
        other_total = 0
        for k in other.keys():
            other_total = other_total + other[k]
        print("The total of values in 'other' is {0}".format(other_total))


example_fun(2, y="1", foo=3, bar=4)

## x: 2, y: 1, keys in 'other': ['foo', 'bar']
## The total of values in 'other' is 7

def f1(x, y, *args):
    print(x, y, args, sep = '\n')
    
f1(1, 2, 3)

## 1
## 2
## (3,)

f1(1, 2, 3, 4, 5)

## 1
## 2
## (3, 4, 5)

def f2(x, y, **kwargs):
    print(x, y, kwargs, sep = '\n')
    
f2(1, 2, p = 3)

## 1
## 2
## {'p': 3}

f2(1, 2, p = 3, q = 4, r = 5)


# f2(1, 2, p = 3, q = 4, r = 5, x = 100)
# TypeError: f2() got multiple values for argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

## 1
## 2
## {'p': 3, 'q': 4, 'r': 5}

def func(a, b, c, d): print(a, b, c, d)

func(*(1, 2, 3,4))

## 1 2 3 4

func(**{'a': 1, 'd': 2, 'b': 3, 'c': 4})

## 1 3 4 2

func(1, 2, **{'c':3, 'd': 4})

## 1 2 3 4

func(1, c = 3, *(2,), **{'d': 4})

## 1 2 3 4

func(1, *(2,), c = 3, **{'d': 4})

## 1 2 3 4

# func(1, b = 3, *(2,), **{'d': 4})
# TypeError: func() got multiple values for argument 'b'

混合使用不同類型參數

須符合以下順序：
位置參數 –> 預設值參數 –> *args –> **kwargs

def myfun(x, y = 0, *args, **kwargs):
    print(x, y, args, kwargs, sep = '\n')
    

# myfun(1, y = 2, 3, 4, 5, p = 100, q = 200)
# 錯誤: positional argument follows keyword argument (<string>, line 5)

# myfun(1, y = 2, *(3, 4, 5), p = 100, q = 200)
# TypeError: myfun() got multiple values for argument 'y'

myfun(1, 2, 3, 4, 5, p = 100, q = 200)

## 1
## 2
## (3, 4, 5)
## {'p': 100, 'q': 200}

myfun(1, *(3, 4, 5), p = 100, q = 200)

## 1
## 3
## (4, 5)
## {'p': 100, 'q': 200}

以『可變(mutable)物件』作為引數傳入

passed by reference：做函數呼叫時，引數傳入的是物件的 參照(reference)，非 passed by value。
因此，當所傳入的物件為 『可變物件』(如list, dict) 時，要特別小心。因為在函數內部修改物件內容，則會影響到原物件內容。

def f(n, list1, list2):
    list1.append(3)
    list2 = [4, 5, 6]
    n = n + 1

x = 5
y = [1, 2]
z = [4, 5]
f(x, y, z)
x, y, z

## (5, [1, 2, 3], [4, 5])

def f(lst):
    lst = lst[:]
    lst.append(3)

x = [1, 2]
f(x)
x

## [1, 2]

Keyword-only參數

必須按照 關鍵字(keyword) 傳入 (指名引數)，且永遠不會被 位置參數 值來傳值的參數定義。
keyword-only參數為編寫在『*args』之後的 指名參數。

def kwonly(a, *args, c):   # c為keyword-only參數
    print(a, args, c)
    
kwonly(1, 2, c = 3)

## 1 (2,) 3

kwonly(1, c = 3)

## 1 () 3

# kwonly(1, 2, 3)
# TypeError: kwonly() missing 1 required keyword-only argument: 'c'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

也可以在函數參數定義時，使用一個＊來表達一個函數不會接受可變長度的參數列表，但是仍想要定義在『＊』後面都作為keyword-only參數傳入

def kwonly(a, *, b, c):
    print(a, b, c)
    
kwonly(1, c = 3, b = 2)

## 1 2 3

kwonly(c= 3, b = 2, a = 1)

## 1 2 3

# kwonly(1, 2, 3)
# TypeError: kwonly() takes 1 positional argument but 3 were given
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>


# kwonly(1)
# TypeError: kwonly() missing 2 required keyword-only arguments: 'b' and 'c'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

當然keyword-only參數也可以設定預設值

def kwonly(a, *, b = 0, c = 0):
    print(a, b, c)
    
kwonly(1)

## 1 0 0

kwonly(1, c = 100)

## 1 0 100

keyword-only參數必須出現在單個＊後面，而不能是兩個星號後面

# def kwonly(a, **args, b, c):
#     pass
#     
# 錯誤: invalid syntax (<string>, line 1)

# def kwonly(a, **, b, c):
#     pass
# 
# 錯誤: invalid syntax (<string>, line 6)

所以，keyword-only參數必須出現在兩個星號之前

# def f(a, *b, **d, c = 6): print(a, b, c, d)
# 錯誤: invalid syntax (<string>, line 1)

def f(a, *b, c = 10, **d): print(a, b, c, d, sep= '\n')

f(1, 2, 3, x = 4, y = 5)

## 1
## (2, 3)
## 10
## {'x': 4, 'y': 5}

f(1, 2, 3, x = 4, y = 5, c = 6)

## 1
## (2, 3)
## 6
## {'x': 4, 'y': 5}

def f(a, c = 6, *b, **d):
    print(a, b, c, d)
    
f(1, 2, 3, 4)

## 1 (3, 4) 2 {}

以『可變(mutable)物件』作為函數參數預設值

與其他程式語言不同，如R，當參數設定預設值時，Python在定義函數時就會為該參數賦值
當預設值設定為『不可變物件』時不會有任何影響
當預設值設定為『可變物件』時，則可能會出現問題

def data_append(v, lst = []):     # lst參數預設值為可變的空list
    lst.append(v)
    return lst
    
data_append(1)

## [1]

data_append(2)  # 注意執行的結果，是否為[2]?

## [1, 2]

上述例子，lst參數在函數定義時參照可變之list物件，此物件並不會在函數結束呼叫後而刪除進行垃圾回收，且一直保留。故第一次的執行結果會影響第二次的執行結果。

解決方法如下：

def data_append(v, lst = None):
    if lst is None:
        lst = []
    lst.append(v)
    return lst
    
data_append(1)

## [1]

data_append(2)  # 不會受上一次執行結果的影響了

## [2]

def data_append(v, lst = None):
    if lst is None:
        lst = []
    lst.append(v)
    return lst
    
lt = [100, 200, 300]
lt

## [100, 200, 300]

data_append(1, lst = lt)

## [100, 200, 300, 1]

lt  # 外部的lt也被改變了嗎？？

# data_append()函數要怎麼修正？？

## [100, 200, 300, 1]

Functions for R and Python

Zheng-Hui Chen

4/14/2020

本週作業與進度：

R functions

Python functions

R

Function Components

Lazy Evaluation

Function Default Values

Function Arguments

以『參數list』執行函數呼叫

函數回傳值(Return Value)

Implicit versus Explicit

Visible versus Invisible

Python

Defining Function

Function Attributes

Function Arguments

Function Default Values

『指名引數』的呼叫順序

帶有*與**的參數

混合使用不同類型參數

以『可變(mutable)物件』作為引數傳入

Keyword-only參數

以『可變(mutable)物件』作為函數參數預設值