安裝需要的packages
packages = c("dplyr","ggplot2")
existing = as.character(installed.packages()[,1])
for(pkg in packages[!(packages %in% existing)]) install.packages(pkg)
載入Package及Data
require(ggplot2)
require(dplyr)
data("diamonds")
2.1 dplyr 常用指令
1. select(col1, col2) : 選取資料集中的col1, col2 兩欄。
str(diamonds)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 53940 obs. of 10 variables:
$ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num 55 61 65 58 58 57 57 55 61 61 ...
$ price : int 326 326 327 334 335 336 336 337 337 338 ...
$ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
共有10個 variables
# sql語法: SELECT carat, cut FROM diamonds;
str(diamonds %>% select(carat, cut))
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 53940 obs. of 2 variables:
$ carat: num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
篩選後僅剩2個variables
2. distinct() : 排除資料集中重複的資料。
summary(diamonds %>% select(cut))
cut
Fair : 1610
Good : 4906
Very Good:12082
Premium :13791
Ideal :21551
五個Level各有許多筆資料。
# sql語法: SELECT DISTINCT cut FROM diamonds;
summary(diamonds %>% select(cut) %>% distinct())
cut
Fair :1
Good :1
Very Good:1
Premium :1
Ideal :1
將完全相同的資料排除,所以每個Level都只剩下一筆。
3. filter(condition) : 篩選符合條件的觀測值。
str(diamonds)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 53940 obs. of 10 variables:
$ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num 55 61 65 58 58 57 57 55 61 61 ...
$ price : int 326 326 327 334 335 336 336 337 337 338 ...
$ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
diamonds資料集共有 53940 rows, 10 columns
# sql語法: SELECT * FROM diamonds WHERE cut='Fair';
dimonds_Fair <- diamonds %>% filter(cut == "Fair")
str(dimonds_Fair)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1610 obs. of 10 variables:
$ carat : num 0.22 0.86 0.96 0.7 0.7 0.91 0.91 0.98 0.84 1.01 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 3 3 3 5 5 5 4 2 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 4 2 2 4 4 2 2 2 3 1 ...
$ depth : num 65.1 55.1 66.3 64.5 65.3 64.4 65.7 67.9 55.1 64.5 ...
$ table : num 61 69 62 57 55 57 60 60 67 58 ...
$ price : int 337 2757 2759 2762 2762 2763 2763 2777 2782 2788 ...
$ x : num 3.87 6.45 6.27 5.57 5.63 6.11 6.03 6.05 6.39 6.29 ...
$ y : num 3.78 6.33 5.95 5.53 5.58 6.09 5.99 5.97 6.2 6.21 ...
$ z : num 2.49 3.52 4.07 3.58 3.66 3.93 3.95 4.08 3.47 4.03 ...
diamonds 資料集中,cut 等級屬於“Fair”的共有1610筆資料。
4. group_by() + summarise() 分群彙總函數
- group_by() : 按照特定欄位進行分群。
- summarise(col_name = fun(col)) :分群後針對特定欄位進行彙總運算。
- col_name: 新欄位的名稱。
- fun: 彙總函數,例如總和-> sum()、平均-> average()、最大->max()
- col: 需要進行彙總運算的欄位。
假設diamonds資料集中是每一顆鑽石的資料,其中鑽石的cut可分為五種等級,我們希望得到資料集中每一種等級的鑽石克拉數總和為多少。
# sql語法: SELECT cut, SUM(carat) FROM diamonds GROUP BY cut;
diamonds %>%
group_by(cut) %>%
summarise(sum_carat = sum(carat))
5. mutate(col_name = ____ ) : 在資料集中增加新欄位
- col_name: 新欄位的名稱。
- ____: 運算動作。
head(diamonds)
假設我們想要新增一個新欄位 “average_price”,計算此鑽石每一克拉的平均價格。
diamonds %>% mutate(average_price = price / carat)
7. arrange(col): 按照資料欄位排序觀測值。
head(diamonds)
原先資料是沒有特定排序的。
# sql語法: SELECT * FROM diamonds ORDER BY carat;
head(diamonds) %>% arrange(desc(carat))
按照克拉數進行降冪排序。
8. top_n(n = num, wt = column) 取出column欄位最大的num筆資料
head(diamonds)
# sql語法: SELECT * FROM diamonds ORDER BY carat DESC LIMIT 3;
head(diamonds) %>% top_n(3, wt = carat)
從diamonds資料集的前6筆資料中,選擇克拉數最大的三筆資料。
# sql語法: SELECT * FROM diamonds ORDER BY carat ASC LIMIT 3;
head(diamonds) %>% top_n(-3, wt = carat)
從diamonds資料集的前6筆資料中,選擇克拉數最小的三筆資料。
更詳細的介紹可參考: 此份教材
以及完整的函數用法:  dplyr Cheat Sheet
2.2 ggplot2
ggplot 是一個R常見的繪圖系統,將圖形視為從資料開始的一連串轉換,轉換的過程有許多程序以及繁瑣的語法,這邊為各位介紹最基本的語法及概念。

更詳細的ggplot使用範例可以參考: 此份教材
以及完整的函數用法: ggplot2 Cheat Sheet
ggplot將繪圖看作資料開始的一連串變化,這份教材將會提到最底層的三個級別。 資料來源: GTW Blog
diamonds # 使用ggplot內建的鑽石資料集作為示範資料
1. Data: 將資料送入ggplot() function中。
plot <- diamonds %>% ggplot()
plot

2. Aesthetics: 決定資料的維度。
diamonds %>% ggplot(aes(x = clarity, y = price))

使用aes() 來決定x軸及y軸的欄位分別為何。
3. Geometries: 決定圖形的種類。
ggplot內建有數種繪製的圖形種類:
- geom_bar() : 離散型變數長條圖
- geom_histogram() : 連續型變數長條圖
- geom_line() : 折線圖
- geom_boxplot() : 盒狀圖
不同的圖形種類需要提供不同的資料維度,以下為幾個參考範例:
diamonds %>% ggplot(aes(x = clarity, y = price)) + geom_boxplot()

diamonds %>% ggplot(aes(x = clarity)) + geom_bar()

diamonds %>% ggplot(aes(x = price)) + geom_histogram()

LS0tCnRpdGxlOiAiZHBseXLoiIdnZ3Bsb3Qy6KSH57+SIgphdXRob3I6ICJNaW5nTHVuIFd1IgpkYXRlOiAiMjAxOS8wMy8wNCIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKIyMg5a6J6KOd6ZyA6KaB55qEcGFja2FnZXMKYGBge3J9CnBhY2thZ2VzID0gYygiZHBseXIiLCJnZ3Bsb3QyIikKZXhpc3RpbmcgPSBhcy5jaGFyYWN0ZXIoaW5zdGFsbGVkLnBhY2thZ2VzKClbLDFdKQpmb3IocGtnIGluIHBhY2thZ2VzWyEocGFja2FnZXMgJWluJSBleGlzdGluZyldKSBpbnN0YWxsLnBhY2thZ2VzKHBrZykKYGBgCgojIyDovInlhaVQYWNrYWdl5Y+KRGF0YQpgYGB7cn0KcmVxdWlyZShnZ3Bsb3QyKQpyZXF1aXJlKGRwbHlyKQpkYXRhKCJkaWFtb25kcyIpCmBgYAoKIyMgMi4xIGRwbHlyIOW4uOeUqOaMh+S7pAoKIyMjIDEuIHNlbGVjdChjb2wxLCBjb2wyKSA6IOmBuOWPluizh+aWmembhuS4reeahGNvbDEsIGNvbDIg5YWp5qyE44CCCmBgYHtyfQpzdHIoZGlhbW9uZHMpCmBgYAo+IOWFseaciTEw5YCLIHZhcmlhYmxlcwoKYGBge3J9CiMgc3Fs6Kqe5rOVOiBTRUxFQ1QgY2FyYXQsIGN1dCBGUk9NIGRpYW1vbmRzOwpzdHIoZGlhbW9uZHMgJT4lIHNlbGVjdChjYXJhdCwgY3V0KSkKYGBgCj4g56+p6YG45b6M5YOF5YmpMuWAi3ZhcmlhYmxlcwoKIyMjIDIuIGRpc3RpbmN0KCkgOiDmjpLpmaTos4fmlpnpm4bkuK3ph43opIfnmoTos4fmlpnjgIIKYGBge3J9CnN1bW1hcnkoZGlhbW9uZHMgJT4lIHNlbGVjdChjdXQpKQpgYGAKPiDkupTlgItMZXZlbOWQhOacieioseWkmuethuizh+aWmeOAggoKYGBge3J9CiMgc3Fs6Kqe5rOVOiBTRUxFQ1QgRElTVElOQ1QgY3V0IEZST00gZGlhbW9uZHM7CnN1bW1hcnkoZGlhbW9uZHMgJT4lIHNlbGVjdChjdXQpICU+JSBkaXN0aW5jdCgpKQpgYGAKPiDlsIflrozlhajnm7jlkIznmoTos4fmlpnmjpLpmaTvvIzmiYDku6Xmr4/lgItMZXZlbOmDveWPquWJqeS4i+S4gOethuOAggoKIyMjIDMuIGZpbHRlcihjb25kaXRpb24pIDog56+p6YG456ym5ZCI5qKd5Lu255qE6KeA5ris5YC844CCCmBgYHtyfQpzdHIoZGlhbW9uZHMpCmBgYAo+IGRpYW1vbmRz6LOH5paZ6ZuG5YWx5pyJIDUzOTQwIHJvd3MsIDEwIGNvbHVtbnMKCmBgYHtyfQojIHNxbOiqnuazlTogU0VMRUNUICogRlJPTSBkaWFtb25kcyBXSEVSRSBjdXQ9J0ZhaXInOwpkaW1vbmRzX0ZhaXIgPC0gZGlhbW9uZHMgJT4lIGZpbHRlcihjdXQgPT0gIkZhaXIiKQpzdHIoZGltb25kc19GYWlyKQpgYGAKPiBkaWFtb25kcyDos4fmlpnpm4bkuK3vvIxjdXQg562J57Sa5bGs5pa8IkZhaXIi55qE5YWx5pyJMTYxMOethuizh+aWmeOAggoKIyMjIyA0LiBncm91cF9ieSgpICsgc3VtbWFyaXNlKCkg5YiG576k5b2Z57i95Ye95pW4CisgZ3JvdXBfYnkoKSA6IOaMieeFp+eJueWumuashOS9jemAsuihjOWIhue+pOOAggorIHN1bW1hcmlzZShjb2xfbmFtZSA9IGZ1bihjb2wpKSA65YiG576k5b6M6Yed5bCN54m55a6a5qyE5L2N6YCy6KGM5b2Z57i96YGL566X44CCCiAgLSBjb2xfbmFtZTog5paw5qyE5L2N55qE5ZCN56ix44CCCiAgLSBmdW46IOW9mee4veWHveaVuO+8jOS+i+Wmgue4veWSjC0+IHN1bSgp44CB5bmz5Z2HLT4gYXZlcmFnZSgp44CB5pyA5aSnLT5tYXgoKQogIC0gY29sOiDpnIDopoHpgLLooYzlvZnnuL3pgYvnrpfnmoTmrITkvY3jgIIKCj4g5YGH6KitZGlhbW9uZHPos4fmlpnpm4bkuK3mmK/mr4/kuIDpoYbpkb3nn7PnmoTos4fmlpnvvIzlhbbkuK3pkb3nn7PnmoRjdXTlj6/liIbngrrkupTnqK7nrYnntJrvvIzmiJHlgJHluIzmnJvlvpfliLDos4fmlpnpm4bkuK3mr4/kuIDnqK7nrYnntJrnmoTpkb3nn7PlhYvmi4nmlbjnuL3lkozngrrlpJrlsJHjgIIKCmBgYHtyfQojIHNxbOiqnuazlTogU0VMRUNUIGN1dCwgU1VNKGNhcmF0KSBGUk9NIGRpYW1vbmRzIEdST1VQIEJZIGN1dDsKZGlhbW9uZHMgJT4lIAogIGdyb3VwX2J5KGN1dCkgJT4lIAogIHN1bW1hcmlzZShzdW1fY2FyYXQgPSBzdW0oY2FyYXQpKQpgYGAKCiMjIyMgNS4gbXV0YXRlKGNvbF9uYW1lID0gX19fXyApIDog5Zyo6LOH5paZ6ZuG5Lit5aKe5Yqg5paw5qyE5L2NCiAgKyBjb2xfbmFtZTog5paw5qyE5L2N55qE5ZCN56ix44CCCiAgKyBfX19fOiDpgYvnrpfli5XkvZzjgIIKICAKCmBgYHtyfQpoZWFkKGRpYW1vbmRzKQpgYGAKPiDlgYfoqK3miJHlgJHmg7PopoHmlrDlop7kuIDlgIvmlrDmrITkvY0gImF2ZXJhZ2VfcHJpY2Ui77yM6KiI566X5q2k6ZG955+z5q+P5LiA5YWL5ouJ55qE5bmz5Z2H5YO55qC844CCCgpgYGB7cn0KZGlhbW9uZHMgJT4lIG11dGF0ZShhdmVyYWdlX3ByaWNlID0gcHJpY2UgLyBjYXJhdCkKYGBgCgogIAojIyMjICA3LiBhcnJhbmdlKGNvbCk6IOaMieeFp+izh+aWmeashOS9jeaOkuW6j+ingOa4rOWAvOOAggpgYGB7cn0KaGVhZChkaWFtb25kcykKYGBgCj4g5Y6f5YWI6LOH5paZ5piv5rKS5pyJ54m55a6a5o6S5bqP55qE44CCCgpgYGB7cn0KIyBzcWzoqp7ms5U6IFNFTEVDVCAqIEZST00gZGlhbW9uZHMgT1JERVIgQlkgY2FyYXQ7CmhlYWQoZGlhbW9uZHMpICU+JSBhcnJhbmdlKGRlc2MoY2FyYXQpKQpgYGAKPiDmjInnhaflhYvmi4nmlbjpgLLooYzpmY3lhqrmjpLluo/jgIIKCiMjIyMgOC4gdG9wX24obiA9IG51bSwgd3QgPSBjb2x1bW4pIOWPluWHumNvbHVtbuashOS9jeacgOWkp+eahG51beethuizh+aWmQpgYGB7cn0KaGVhZChkaWFtb25kcykKYGBgCgpgYGB7cn0KIyBzcWzoqp7ms5U6IFNFTEVDVCAqIEZST00gZGlhbW9uZHMgT1JERVIgQlkgY2FyYXQgREVTQyBMSU1JVCAzOwpoZWFkKGRpYW1vbmRzKSAlPiUgdG9wX24oMywgd3QgPSBjYXJhdCkKYGBgCj4g5b6eZGlhbW9uZHPos4fmlpnpm4bnmoTliY02562G6LOH5paZ5Lit77yM6YG45pOH5YWL5ouJ5pW45pyA5aSn55qE5LiJ562G6LOH5paZ44CCCgoKYGBge3J9CiMgc3Fs6Kqe5rOVOiBTRUxFQ1QgKiBGUk9NIGRpYW1vbmRzIE9SREVSIEJZIGNhcmF0IEFTQyBMSU1JVCAzOwpoZWFkKGRpYW1vbmRzKSAlPiUgdG9wX24oLTMsIHd0ID0gY2FyYXQpCmBgYAo+IOW+nmRpYW1vbmRz6LOH5paZ6ZuG55qE5YmNNuethuizh+aWmeS4re+8jOmBuOaTh+WFi+aLieaVuOacgOWwj+eahOS4ieethuizh+aWmeOAggoK5pu06Kmz57Sw55qE5LuL57S55Y+v5Y+D6ICDOiA8YSBocmVmPSJodHRwOi8vd3d3LmxlYXJuLXItdGhlLWVhc3ktd2F5LnR3L2NoYXB0ZXJzLzE2Ij7mraTku73mlZnmnZA8L2E+PGJyPgrku6Xlj4rlrozmlbTnmoTlh73mlbjnlKjms5U6ICZuYnNwPGEgaHJlZj0iaHR0cHM6Ly93d3cucnN0dWRpby5jb20vd3AtY29udGVudC91cGxvYWRzLzIwMTUvMDIvZGF0YS13cmFuZ2xpbmctY2hlYXRzaGVldC5wZGYiPmRwbHlyIENoZWF0IFNoZWV0PC9hPgoKIyMjIDIuMiBnZ3Bsb3QyCj4gZ2dwbG90IOaYr+S4gOWAi1LluLjopovnmoTnuarlnJbns7vntbHvvIzlsIflnJblvaLoppbngroqKuW+nuizh+aWmemWi+Wni+eahOS4gOmAo+S4sui9ieaPmyoq77yM6L2J5o+b55qE6YGO56iL5pyJ6Kix5aSa56iL5bqP5Lul5Y+K57mB55Gj55qE6Kqe5rOV77yM6YCZ6YKK54K65ZCE5L2N5LuL57S55pyA5Z+65pys55qE6Kqe5rOV5Y+K5qaC5b+144CCCgo8aW1nIHNyYz0iaHR0cHM6Ly9ibG9nLmd0d2FuZy5vcmcvd3AtY29udGVudC91cGxvYWRzLzIwMTYvMDcvZ2dwbG90LWdyYW1tYXItb2YtZ3JhcGhpY3Mtc3RhY2stMS5wbmciPgoK5pu06Kmz57Sw55qEZ2dwbG905L2/55So56+E5L6L5Y+v5Lul5Y+D6ICDOiA8YSBocmVmPSJodHRwczovL3JzdHVkaW8tcHVicy1zdGF0aWMuczMuYW1hem9uYXdzLmNvbS8yMjc2MjVfYTJiNDkxN2QzMzUwNGJjNDk5OGMzNDQ0MjRiYTJhNjIuaHRtbCMoMSkiPuatpOS7veaVmeadkDwvYT48YnI+CuS7peWPiuWujOaVtOeahOWHveaVuOeUqOazlTogPGEgaHJlZj0iaHR0cHM6Ly93d3cucnN0dWRpby5jb20vd3AtY29udGVudC91cGxvYWRzLzIwMTUvMDMvZ2dwbG90Mi1jaGVhdHNoZWV0LnBkZgoiPmdncGxvdDIgQ2hlYXQgU2hlZXQ8L2E+CgoKPiBnZ3Bsb3TlsIfnuarlnJbnnIvkvZzos4fmlpnplovlp4vnmoTkuIDpgKPkuLLororljJbvvIzpgJnku73mlZnmnZDlsIfmnIPmj5DliLDmnIDlupXlsaTnmoTkuInlgIvntJrliKXjgIIg6LOH5paZ5L6G5rqQ77yaPGEgaHJlZj0iaHR0cHM6Ly9ibG9nLmd0d2FuZy5vcmcvci9nZ3Bsb3QyLXR1dG9yaWFsLWJhc2ljLWNvbmNlcHQtYW5kLXFwbG90LyI+IEdUVyBCbG9nPC9hPgoKYGBge3J9CmRpYW1vbmRzICMg5L2/55SoZ2dwbG905YWn5bu655qE6ZG955+z6LOH5paZ6ZuG5L2c54K656S656+E6LOH5paZCmBgYAoKIyMjIyAxLiBEYXRhOiDlsIfos4fmlpnpgIHlhaVnZ3Bsb3QoKSBmdW5jdGlvbuS4reOAggpgYGB7cn0KcGxvdCA8LSBkaWFtb25kcyAlPiUgZ2dwbG90KCkKcGxvdApgYGAKCgojIyMjIDIuIEFlc3RoZXRpY3M6IOaxuuWumuizh+aWmeeahOe2reW6puOAgiAKYGBge3J9CmRpYW1vbmRzICU+JSBnZ3Bsb3QoYWVzKHggPSBjbGFyaXR5LCB5ID0gcHJpY2UpKQpgYGAKCj4g5L2/55SoYWVzKCkg5L6G5rG65a6aeOi7uOWPinnou7jnmoTmrITkvY3liIbliKXngrrkvZXjgIIKCiMjIyMgMy4gR2VvbWV0cmllczog5rG65a6a5ZyW5b2i55qE56iu6aGe44CCCmdncGxvdOWFp+W7uuacieaVuOeorue5quijveeahOWcluW9oueorumhnjoKCisgZ2VvbV9iYXIoKSA6IOmbouaVo+Wei+iuiuaVuOmVt+aineWclgorIGdlb21faGlzdG9ncmFtKCkgOiDpgKPnuozlnovorormlbjplbfmop3lnJYKKyBnZW9tX2xpbmUoKSA6IOaKmOe3muWclgorIGdlb21fYm94cGxvdCgpIDog55uS54uA5ZyWCgo+IOS4jeWQjOeahOWcluW9oueorumhnumcgOimgeaPkOS+m+S4jeWQjOeahOizh+aWmee2reW6pu+8jOS7peS4i+eCuuW5vuWAi+WPg+iAg+evhOS+izoKCmBgYHtyfQpkaWFtb25kcyAlPiUgZ2dwbG90KGFlcyh4ID0gY2xhcml0eSwgeSA9IHByaWNlKSkgKyBnZW9tX2JveHBsb3QoKQpgYGAKCmBgYHtyfQpkaWFtb25kcyAlPiUgZ2dwbG90KGFlcyh4ID0gY2xhcml0eSkpICsgZ2VvbV9iYXIoKQpgYGAKCmBgYHtyfQpkaWFtb25kcyAlPiUgZ2dwbG90KGFlcyh4ID0gcHJpY2UpKSArIGdlb21faGlzdG9ncmFtKCkKYGBg