本週作業與進度:
  1. 研讀Advanced R: Functions
  2. 研讀電子書:The Quick Python Book, 3rd Edition, Chapter 9 ,書上程式可參閱 Ch09_code.txt
  3. 預習Advanced R: Functionals
  4. 研讀The Quick Python Book 3rd Edition-2018-英文版 – Ch6: Strings


R functions
Python functions

R

Function Components

  • the body()function body,函數主體,定義函數功能
  • the formals():函數的 『形式參數(formal arguments)』
  • the environment():函數被創建出來的位置,又稱 『enclosing environment』,用來決定函數執行時,變數搜尋的規則。
f <- function(x = 0) {   
  return(x^2)
}
body(f)     # function body
## {
##     return(x^2)
## }
formals(f)  # pair list
## $x
## [1] 0
environment(f)  # enclosing environment
## <environment: R_GlobalEnv>
  • Primitive function(底層為C)不具備這三個組成成分。
sin
## function (x)  .Primitive("sin")
typeof(sin)
## [1] "builtin"
typeof(`[`)
## [1] "special"
formals(sin)
## NULL
body(sin)
## NULL
environment(sin)
## NULL

Lazy Evaluation

  • 函數參數具有惰性求值(Lazy Evaluation)特性:僅在被存取時才會進行求值。
  • 可以讓R執行的更有效率
f <- function(x) {
  10
}

f(x = stop("This is an error")) # 因為函數執行並不需要變數x,故不會引發錯誤訊息
## [1] 10
y <- 10
g <- function(x) {
  y <- 100
  x + 1
}

g(x = y)
## [1] 11
g(x <- 1000)
## [1] 1001
x
## [1] 1000
y
## [1] 10

Function Default Values

myfun <- function(x = 0, y = 0) {
  sqrt(x^2 + y^2)
}
myfun(x = 5, y = 12)
## [1] 13
normalize <- function(x, m, s) {
  (x - m)/s
}

set.seed(seed = 188)
d <- rnorm(n = 20, mean = 10, sd = 5)
M <- mean(x = d)
M
## [1] 9.517236
S <- sd(x = d)
S
## [1] 5.292199
# fix(normalize)
normalize <- function(x = x, m = mean(x = x), s = sd(x = x)) {
  (x - m)/s
}
d[length(d)] <- NA
normalize(x = d)
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
normalize <- function(x = x, m = mean(x = x, na.rm = na.rm), s = sd(x = x, na.rm = na.rm), na.rm = FALSE) {
  (x - m)/s
}
normalize(x = d, na.rm = TRUE)
##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA
normalize <- function(x = x, m = mean(x = x, ...), s = sd(x = x, ...), ...) {
  (x - m)/s
}
normalize(x = d, na.rm = TRUE)
##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA
normalize <- function(x = x, m = mean(x = x, trim = trim, ...), 
                      s = sd(x = x, ...), trim = 0, ...) {
  (x - m)/s
}
normalize(x = d, trim = 0, na.rm = TRUE)
##  [1] -1.44073405  0.02090840 -1.90504451 -0.16368790  0.10973385 -1.08329130
##  [7]  0.39229251 -0.12139770  0.58228453 -0.88490550  0.41161626  0.85427282
## [13] -0.04821785 -0.64634636  1.32801068 -0.57923463 -0.21985236  2.22903647
## [19]  1.16455665          NA
normalize(x = d, trim = 0.2, na.rm = TRUE)
##  [1] -1.41823144  0.04341101 -1.88254190 -0.14118529  0.13223646 -1.06078869
##  [7]  0.41479512 -0.09889509  0.60478714 -0.86240289  0.43411887  0.87677543
## [13] -0.02571524 -0.62384375  1.35051329 -0.55673202 -0.19734975  2.25153908
## [19]  1.18705926          NA
h <- function(x = ls()) {
  a <- 1
  x
}

h()
## [1] "a" "x"
h(x = ls())
##  [1] "d"         "f"         "g"         "h"         "M"         "myfun"    
##  [7] "normalize" "S"         "x"         "y"

Function Arguments

  • 函數參數的傳入:有名字看名字(支援『縮寫』),沒名字看位置
  • 大部分情況下,參數傳值為passed by value
addTheLog <- function(first, second) {
  first + log(second)
}

addTheLog(first = 1, second = exp(4))
## [1] 5
addTheLog(second = exp(4), first = 1)
## [1] 5
addTheLog(f = 1, s = exp(4))
## [1] 5
addTheLog(s = exp(4), f = 1)
## [1] 5
addTheLog(1, exp(4))
## [1] 5
addTheLog(exp(4), 1)
## [1] 54.59815
mean(1:10, n = T)
## [1] 5.5
mean(1:10, , FALSE)
## [1] 5.5
mean(1:10, 0.05)
## [1] 5.5
mean(, TRUE, x = c(1:10, NA))
## [1] 5.5

以『參數list』執行函數呼叫

常搭配lapply函數結果使用。

args <- list(1:10, na.rm = TRUE)  # 參數list
do.call(what = mean, args = args)
## [1] 5.5
# Equivalent to
mean(1:10, na.rm = TRUE)
## [1] 5.5

函數回傳值(Return Value)

Implicit versus Explicit

  • Implicit: 回傳最後一次的執行結果
j01 <- function(x) {
  if (x < 10) {
    0
  } else {
    10
  }
}
j01(5)
## [1] 0
j01(20)
## [1] 10
  • Explicit:使用return()
j02 <- function(x) {
  if (x < 10) {
    return(0)
  } else {
    return(10)
  }
}

Visible versus Invisible

k01 <- function() 1
k01()
## [1] 1
a <- k01()
a
## [1] 1
k02 <- function() invisible(1)
k02()
b <- k02()
b
## [1] 1
withVisible(k02())
## $value
## [1] 1
## 
## $visible
## [1] FALSE

Python

Defining Function

  • 如果函數沒有return敘述,則預設會回傳 None
  • 範例:
def f(x, y):
    x + y

f(1, 2)
ans = f(1, 2)
ans
type(ans)
## <class 'NoneType'>
def fact(n):
    """Return the factorial of the given number."""    # docstring: 『文件字串』,用來說明這個函數的用途
    r = 1
    while n > 0:
        r = r *n
        n = n - 1
    return r    # 回傳值為r                                        

type(fact)
## <class 'function'>
x = fact(n = 4)
x 
## 24
help(fact)      # 查詢函數的help說明
## Help on function fact in module __main__:
## 
## fact(n)
##     Return the factorial of the given number.
fact.__doc__    # 回傳docstring的內容,亦為函數的屬性
## 'Return the factorial of the given number.'

Function Attributes

def func(a):
    b = 'spam'
    return b * a

func(3)
## 'spamspamspam'
func.__name__
## 'func'
dir(func)
## ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
func.count = 0
func.count += 1
func.count
## 1
dir(func)
## ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count']

Function Arguments

  • 位置(position) 傳值:比對『參數位置』傳值,稱為 『位置引數(positional argument)』
def power(x, y):   # positional parameter
    r = 1
    while y > 0:
        r = r * x
        y = y - 1
    return r

power(3, 4)     # positional argument
## 81
  • 函數呼叫時的『引數數量』必須與函數定義時的『形式參數數量』 一致,否則會導致TypeError
# power(3)
# TypeError: power() missing 1 required positional argument: 'y'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

Function Default Values

定義函數時,你可以對參數設定預設值(default value)。

def power(x, y = 2):   # y設定預設值
    r = 1
    while y > 0:
        r = r * x
        y = y - 1
    return r
    
power(3, 4)
## 81
power(3)
## 9
  • 在定義函數時, 沒有預設值的參數必須放在前面,否則會發生錯誤。
  • 因為python是以 『位置參數』 為基礎
# def func(x = 2, y):
#     pass
# 
# 錯誤: non-default argument follows default argument (<string>, line 1)
  • 在呼叫函數時,必須提供足夠的引數給所有的位置參數
  • 也可直接指定參數名稱來傳遞引數,此方法稱為 『指名引數(keyword argument)』
power(2, 3)
## 8
power(3, 2)
## 9
power(y = 2, x = 3)
## 9

請看下面例子,並注意錯誤訊息,與R語言有很大的不同 (請思考,如果是R,下面的函數呼叫會發生錯誤嗎?)

# power(2, x = 3)
# TypeError: power() got multiple values for argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>

『指名引數』的呼叫順序

  • 呼叫函數時,如果混合使用『位置引數』與『指名引數』,請留意順序必須為:
    『位置引數』在前,『指名引數』在後
power(3, y = 2)   # 滿足『位置引數』在前,『指名引數』在後

# 注意下面例子結果與錯誤訊息
# power(x = 3, 2)  # 違反『位置引數』在前,『指名引數』在後
# 錯誤: positional argument follows keyword argument (<string>, line 2)
## 9

帶有*與**的參數

  • 用一個參數(帶有 * )捕捉多個 『位置參數(positional argument)』,並以 『tuple』 方式處理
  • 用一個參數(帶有 ** )捕捉多個 『指名參數(keyword argument)』,並以 『dict』 方式處理
def f(*x):
    print(x)
    
f(1, 2, 3, 4)

# f(x =1, y = 2, p = 3, q = 4)
# TypeError: f() got an unexpected keyword argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>
## (1, 2, 3, 4)
def g(**x):
    print(x)

g(x =1, y = 2, p = 3, q = 4)

# g(1, 2, 3, 4)
# TypeError: g() takes 0 positional arguments but 4 were given
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>
## {'x': 1, 'y': 2, 'p': 3, 'q': 4}
def maximum(*numbers):   # 帶有*的參數
    if len(numbers) == 0:
        return None
    else:
         maxnum = numbers[0]
         for n in numbers[1:]:
             if n > maxnum:
                 maxnum = n
         return maxnum
         
maximum(1, 5, 9, -2, 2)
## 9
def example_fun(x, y, **other):
        print("x: {0}, y: {1}, keys in 'other': {2}".format(x, 
              y, list(other.keys())))
        other_total = 0
        for k in other.keys():
            other_total = other_total + other[k]
        print("The total of values in 'other' is {0}".format(other_total))


example_fun(2, y="1", foo=3, bar=4)
## x: 2, y: 1, keys in 'other': ['foo', 'bar']
## The total of values in 'other' is 7
def f1(x, y, *args):
    print(x, y, args, sep = '\n')
    
f1(1, 2, 3)
## 1
## 2
## (3,)
f1(1, 2, 3, 4, 5)
## 1
## 2
## (3, 4, 5)
def f2(x, y, **kwargs):
    print(x, y, kwargs, sep = '\n')
    
f2(1, 2, p = 3)
## 1
## 2
## {'p': 3}
f2(1, 2, p = 3, q = 4, r = 5)


# f2(1, 2, p = 3, q = 4, r = 5, x = 100)
# TypeError: f2() got multiple values for argument 'x'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>
## 1
## 2
## {'p': 3, 'q': 4, 'r': 5}
def func(a, b, c, d): print(a, b, c, d)

func(*(1, 2, 3,4))
## 1 2 3 4
func(**{'a': 1, 'd': 2, 'b': 3, 'c': 4})
## 1 3 4 2
func(1, 2, **{'c':3, 'd': 4})
## 1 2 3 4
func(1, c = 3, *(2,), **{'d': 4})
## 1 2 3 4
func(1, *(2,), c = 3, **{'d': 4})
## 1 2 3 4
# func(1, b = 3, *(2,), **{'d': 4})
# TypeError: func() got multiple values for argument 'b'

混合使用不同類型參數

  • 須符合以下順序:
    位置參數 –> 預設值參數 –> *args –> **kwargs
def myfun(x, y = 0, *args, **kwargs):
    print(x, y, args, kwargs, sep = '\n')
    

# myfun(1, y = 2, 3, 4, 5, p = 100, q = 200)
# 錯誤: positional argument follows keyword argument (<string>, line 5)

# myfun(1, y = 2, *(3, 4, 5), p = 100, q = 200)
# TypeError: myfun() got multiple values for argument 'y'

myfun(1, 2, 3, 4, 5, p = 100, q = 200)
## 1
## 2
## (3, 4, 5)
## {'p': 100, 'q': 200}
myfun(1, *(3, 4, 5), p = 100, q = 200)
## 1
## 3
## (4, 5)
## {'p': 100, 'q': 200}

以『可變(mutable)物件』作為引數傳入

  • passed by reference:做函數呼叫時,引數傳入的是物件的 參照(reference),非 passed by value
  • 因此,當所傳入的物件為 『可變物件』(如list, dict) 時,要特別小心。因為在函數內部修改物件內容,則會影響到原物件內容。
def f(n, list1, list2):
    list1.append(3)
    list2 = [4, 5, 6]
    n = n + 1

x = 5
y = [1, 2]
z = [4, 5]
f(x, y, z)
x, y, z
## (5, [1, 2, 3], [4, 5])
def f(lst):
    lst = lst[:]
    lst.append(3)

x = [1, 2]
f(x)
x
## [1, 2]

Keyword-only參數

  • 必須 按照 關鍵字(keyword) 傳入 (指名引數),且永遠不會被 位置參數 值來傳值的參數定義。
  • keyword-only參數為編寫在『*args』之後的 指名參數
def kwonly(a, *args, c):   # c為keyword-only參數
    print(a, args, c)
    
kwonly(1, 2, c = 3)
## 1 (2,) 3
kwonly(1, c = 3)
## 1 () 3
# kwonly(1, 2, 3)
# TypeError: kwonly() missing 1 required keyword-only argument: 'c'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>
  • 也可以在函數參數定義時,使用一個*來表達一個函數不會接受可變長度的參數列表,但是仍想要定義在『*』後面都作為keyword-only參數傳入
def kwonly(a, *, b, c):
    print(a, b, c)
    
kwonly(1, c = 3, b = 2)
## 1 2 3
kwonly(c= 3, b = 2, a = 1)
## 1 2 3
# kwonly(1, 2, 3)
# TypeError: kwonly() takes 1 positional argument but 3 were given
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>


# kwonly(1)
# TypeError: kwonly() missing 2 required keyword-only arguments: 'b' and 'c'
# 
# Detailed traceback: 
#   File "<string>", line 1, in <module>
  • 當然keyword-only參數也可以設定預設值
def kwonly(a, *, b = 0, c = 0):
    print(a, b, c)
    
kwonly(1)
## 1 0 0
kwonly(1, c = 100)
## 1 0 100
  • keyword-only參數必須出現在單個*後面,而不能是兩個星號後面
# def kwonly(a, **args, b, c):
#     pass
#     
# 錯誤: invalid syntax (<string>, line 1)

# def kwonly(a, **, b, c):
#     pass
# 
# 錯誤: invalid syntax (<string>, line 6)
  • 所以,keyword-only參數必須出現在兩個星號之前
# def f(a, *b, **d, c = 6): print(a, b, c, d)
# 錯誤: invalid syntax (<string>, line 1)

def f(a, *b, c = 10, **d): print(a, b, c, d, sep= '\n')

f(1, 2, 3, x = 4, y = 5)
## 1
## (2, 3)
## 10
## {'x': 4, 'y': 5}
f(1, 2, 3, x = 4, y = 5, c = 6)
## 1
## (2, 3)
## 6
## {'x': 4, 'y': 5}
def f(a, c = 6, *b, **d):
    print(a, b, c, d)
    
f(1, 2, 3, 4)
## 1 (3, 4) 2 {}

以『可變(mutable)物件』作為函數參數預設值

  • 與其他程式語言不同,如R,當參數設定預設值時,Python在定義函數時就會為該參數賦值
  • 當預設值設定為『不可變物件』時不會有任何影響
  • 當預設值設定為『可變物件』時,則可能會出現問題
def data_append(v, lst = []):     # lst參數預設值為可變的空list
    lst.append(v)
    return lst
    
data_append(1)
## [1]
data_append(2)  # 注意執行的結果,是否為[2]?
## [1, 2]

上述例子,lst參數在函數定義時參照可變之list物件,此物件並不會在函數結束呼叫後而刪除進行垃圾回收,且一直保留。故第一次的執行結果會影響第二次的執行結果。

解決方法如下:

def data_append(v, lst = None):
    if lst is None:
        lst = []
    lst.append(v)
    return lst
    
data_append(1)
## [1]
data_append(2)  # 不會受上一次執行結果的影響了
## [2]
def data_append(v, lst = None):
    if lst is None:
        lst = []
    lst.append(v)
    return lst
    
lt = [100, 200, 300]
lt
## [100, 200, 300]
data_append(1, lst = lt)
## [100, 200, 300, 1]
lt  # 外部的lt也被改變了嗎??

# data_append()函數要怎麼修正??
## [100, 200, 300, 1]