第二單元：R語言簡介

基本的程式筆記設定

rm(list=ls(all=T))
knitr::opts_chunk$set(comment = NA)
knitr::opts_knit$set(global.par = TRUE)
par(cex=0.8)
options(scipen=20, digits=4, width=90)

安裝、載入一些基本的套件

if(!require(devtools)) install.packages("devtools")
if(!require(devtools)) devtools::install_github("hadley/emo")
if(!require(pacman)) install.packages("pacman")
pacman::p_load(magrittr)

以上這些程式碼請大家不要去改動

【A】一個最簡單的R程式

Given a strategy which may lead to $n$ different outcomes …

$p_i$ : the probability of the $i$-th outcome, $i \in [1, n]$
$v_i$ : the payoff of the $i$-th outcome
$\pi$ : the expected payoff of the strategy

\[\pi = \sum_{i=1}^{n} p_i \times v_i \qquad(1)\]

1. 定義資料物件： 分別把報酬($v_i$)和機率($p_i$)放在v和p這兩個數值向量之中

v = c(100, 50, -50, 0)
p = c(0.1, 0.2, 0.3, 0.4)

2. 執行運算式 用sum(p * v)這一個運算式算出$\pi$的值，放在payoff這個物件裡面，

payoff = sum(p * v)

3. 印出結果 並且把它列印出來

payoff

[1] 5

【B】基本『值』的種類 Types of Automic Values

noTimes = 3L                   # 整數 integer
myWeight = 75.2                # 數值 numeric
isAsian = TRUE                 # 邏輯 Boolin
myName = "Tony Chuo"           # 字串 character 
date1 = as.Date("2019-01-01")  # 日期 Date

【C】基本資料結構 - 向量 Vector

: 向量是R的基本資料結構
※ noTime、myWeight、isAsian、myName、date1 其實都是長度為1的向量物件

(基本值的)向量物件

noBuy = c(3L, 5L, 1L, 1L, 3L)               # 整數向量
height = c(175, 168, 180, 181, 169)         # 數值向量
isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE)  # 邏輯向量

字串向量

name = c("Amy", "Bob", "Cindy", "Danny", "Edward")

類別向量

gender = factor( c("F", "M", "F", "M", "M") )     
skin_color = factor( c("black", "black", "white", "yellow", "white") )

【D】向量索引 Vector Index

向量的索引本身也是(整數或邏輯)向量

整數向量索引

noBuy[c(1, 5)]

[1] 3 3

height[2:4]

[1] 168 180 181

i = c(1:3, 5)
isMale[i]

[1] FALSE  TRUE FALSE  TRUE

邏輯向量索引

noBuy[c(T,T,F,F,T)]

[1] 3 5 3

height[c(T,T,F,F,T)]

[1] 175 168 169

i = c(T,T,F,F,T)
isMale[i]

[1] FALSE  TRUE  TRUE

什麼時候我們會用邏輯向量做索引呢？

使用邏輯運算式的結果當索引：列出所有女生的名字

name[gender == "F"]

[1] "Amy"   "Cindy"

練習：
請列出 …
■ 所有男生的名字
■ 身高大於180cm的人的名字
■ 身高大於180cm的黃種人的名字
請算出 …
■ 男生的平均身高
■ 女生總共買了多少次
■ 白種女生總共有多少人

name[gender == "M"]

[1] "Bob"    "Danny"  "Edward"

name[height > 180]

[1] "Danny"

name[height >180 & skin_color == "yellow"]

[1] "Danny"

mean(height[isMale])

[1] 172.7

sum(noBuy[!isMale])

[1] 4

sum(gender == "M" & skin_color == "white")

[1] 1

【E】運算符號 (Operator)

: 早期的R，其最主要的目的就是要簡化向量運算
※ 四則運算和內建功能大多都可以直接作用在向量上面

c(1, 2, 3, 4) * c(1, 10, 100, 1000)

[1]    1   20  300 4000

連續的整數

1:6

[1] 1 2 3 4 5 6

次方運算和科學記號

10^(-2:3)

[1]    0.01    0.10    1.00   10.00  100.00 1000.00

當向量不一樣長時 …(重複算)

c(100, 200, 300, 400) / c(10, 20)

[1] 10 10 30 20

單值：長度為1的向量

c(100, 200, 300, 400) / 10

[1] 10 20 30 40

c(10,20,30,40,50,60,70,80) + c(1, 2, 3)

Warning in c(10, 20, 30, 40, 50, 60, 70, 80) + c(1, 2, 3): longer object length is not a
multiple of shorter object length

[1] 11 22 33 41 52 63 71 82

指定物件的名稱： = 和 <- Assignment Operator

Prob = c(0.1, 0.2, 0.3, 0.4)
Value = c(120, 100, -50, -60)
Prob * Value

[1]  12  20 -15 -24

name[skin_color == "black"]

[1] "Amy" "Bob"

【F】功能與參數 Function & Argument

The Expected Payoff is： $\sum p \times v$

expPayoff = sum(Prob * Value)
expPayoff

[1] -7

內建功能通常第一個參數都是向量物件

sqrt(1:9)

[1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000

: R的功能通常都有很多個參數，我們需要注意參數的：
※ 名稱
※ 位置
※ 預設值

log(1000)

[1] 6.908

help(log)

log(x=1000, base=10)

[1] 3

log(1000,10)

[1] 3

連續呼叫功能 (round = 取小數點以下兩位)

round(sqrt(1:9), 2)

[1] 1.00 1.41 1.73 2.00 2.24 2.45 2.65 2.83 3.00

管線運算符號： %>%sqrt(1:9)為round(3)的第一個函數

sqrt(1:9) %>% round(3)

[1] 1.000 1.414 1.732 2.000 2.236 2.449 2.646 2.828 3.000

【G】資料框 DataFrame

通常資料框中的每一筆記錄代表一個分析單位，每個欄位值代表分析單位之中的某一種屬性

df = data.frame(
  noBuy = c(3L, 5L, 1L, 1L, 3L),
  height = c(175, 168, 180, 181, 169),
  isMale = c(FALSE, TRUE, FALSE, TRUE, TRUE),
  name = c("Amy", "Bob", "Cindy", "Danny", "Edward"),
  gender = factor( c("F", "M", "F", "M", "M") ),
  skin_color = factor( c("black", "black", "white", "yellow", "white")),
  stringsAsFactors=FALSE
  )
df

  noBuy height isMale   name gender skin_color
1     3    175  FALSE    Amy      F      black
2     5    168   TRUE    Bob      M      black
3     1    180  FALSE  Cindy      F      white
4     1    181   TRUE  Danny      M     yellow
5     3    169   TRUE Edward      M      white

方便篩選

subset(df, isMale & skin_color == "black")

  noBuy height isMale name gender skin_color
2     5    168   TRUE  Bob      M      black

方便統計

mean(df$height)

[1] 174.6

分類運算

tapply(df$height, df$gender, mean)

    F     M 
177.5 172.7

【H】資料框索引 DataFrame Indexing

資料框有非常多種索引方式，光是靠這一些索引方式，就可以做很精緻的探索性資料分析：

整數向量索引：df[c(1,2), c(2,3)] （第1,2筆資料的第2,3欄位）
空索引：df[c(1,2), ]（全部欄位）
邏輯向量索引：df[df$height < 175 & df$isMale, ]（全部欄位）
字串向量索引：df[df$height < 175 & df$isMale, "name"]（name的欄位）
單欄位符號($)：df$name[df$height < 175 & df$isMale]
subset()功能：
- subset(df, height<175 & isMale)
- subset(df, height<175 & isMale, name)（==df[df$height < 175 & df$isMale, "name"]）
- subset(df, height<175 & isMale)$name
- subset(df, height<175 & isMale, c(name, noBuy))

df[c(1,2), c(2,3)]

  height isMale
1    175  FALSE
2    168   TRUE

df[c(1,2), ]

  noBuy height isMale name gender skin_color
1     3    175  FALSE  Amy      F      black
2     5    168   TRUE  Bob      M      black

df[df$height < 175 & df$isMale, ]

  noBuy height isMale   name gender skin_color
2     5    168   TRUE    Bob      M      black
5     3    169   TRUE Edward      M      white

df[df$height < 175 & df$isMale, "name"]

[1] "Bob"    "Edward"

df$name[df$height < 175 & df$isMale]

[1] "Bob"    "Edward"

subset(df, height<175 & isMale)

  noBuy height isMale   name gender skin_color
2     5    168   TRUE    Bob      M      black
5     3    169   TRUE Edward      M      white

subset(df, height<175 & isMale, name)

    name
2    Bob
5 Edward

subset(df, height<175 & isMale)$name

[1] "Bob"    "Edward"

subset(df, height<175 & isMale, c(name, noBuy))

    name noBuy
2    Bob     5
5 Edward     3

練習：
以下的運算式分別代表什麼意思 …
■ df$name[df$isMale]
■ df[df$height > 180 , "name"]
■ subset(df, height > 170 & !isMale)$name
■ mean(df$height[df$isMale])
■ df$height[!df$isMale] %>% mean
■ sum( subset(df, !isMale)$noBuy )
■ subset(df, skin_color == "white" & !isMale ) %>% nrow
■ sum(df$skin_color == "white" & !df$isMale )

所有男生的名字

df$name[df$isMale]

[1] "Bob"    "Danny"  "Edward"

身高大於180的人

df[df$height > 180 , "name"]

[1] "Danny"

身高大於170的女生名字

subset(df, height > 170 & !isMale)$name

[1] "Amy"   "Cindy"

男生的平均身高

mean(df$height[df$isMale])

[1] 172.7

女生的平均身高

df$height[!df$isMale] %>% mean

[1] 177.5

所有女生的購買次數總合

sum( subset(df, !isMale)$noBuy )

[1] 4

白種女生的人數總和

subset(df, skin_color == "white" & !isMale ) %>% nrow

[1] 1

白種女生的人數總和

sum(df$skin_color == "white" & !df$isMale )

[1] 1

第二單元：R語言簡介

中山大學管理學院王子菁

2019-03-03 15:20:20

【A】一個最簡單的R程式

【B】基本『值』的種類 Types of Automic Values

【C】基本資料結構 - 向量 Vector

【D】向量索引 Vector Index

【E】運算符號 (Operator)

【F】功能與參數 Function & Argument

【G】資料框 DataFrame

【H】資料框索引 DataFrame Indexing

第二單元：R語言簡介

中山大學管理學院 王子菁

2019-03-03 15:20:20

【A】一個最簡單的R程式

【B】基本『值』的種類 Types of Automic Values

【C】基本資料結構 - 向量 Vector

【D】向量索引 Vector Index

【E】運算符號 (Operator)

【F】功能與參數 Function & Argument

【G】資料框 DataFrame

【H】資料框索引 DataFrame Indexing

中山大學管理學院王子菁