Data table examples using public data

소개.

시작 전 내용 설정하기.

제일 먼저 워킹디렉토리를 다운받은 파일들이 모여있는 폴더로 지정합니다. data.table과 ggplot2과 설치가 안되어 있다면 먼저 두 패키를 설치 합니다.

setwd("C:/Users/kwon/Documents/dt_ex")

rm(list = ls())
library(data.table)
library(ggplot2)
source("functions/ExtractIsoTime.R")
source("functions/wtf.R")

CSV파일 읽기.

setwd('C:/Users/kwon/Documents/dt_ex')
rawdat = read.table (file = "Database 2013-01-21 (8zQ4cW7T).csv" , sep = "," ,
                     quote = "\"", flush = F, header = T, nrows = -1, fill = F,
                     stringsAsFactors = F, na.strings = c("None",""))
str(rawdat)
## 'data.frame':    19207 obs. of  11 variables:
##  $ charges_citation       : chr  "720 ILCS 5 12-3.4(a)(2) [16145" "625 ILCS 5 6-101 [12935]" "720 ILCS 5 12-3(a)(1) [10529]" "720 ILCS 550 5(c) [5020200]" ...
##  $ race                   : chr  "WH" "LW" "BK" "BK" ...
##  $ age_at_booking         : int  26 37 18 32 49 26 41 56 40 20 ...
##  $ gender                 : chr  "M" "M" "M" "F" ...
##  $ booking_date           : chr  "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" ...
##  $ jail_id                : chr  "2013-0120171" "2013-0120170" "2013-0120169" "2013-0120167" ...
##  $ bail_status            : chr  NA NA NA NA ...
##  $ housing_location       : chr  "05-" "05-" "05-L-2-2-1" "17-WR-N-A-2" ...
##  $ charges                : chr  NA NA NA NA ...
##  $ bail_amount            : int  5000 10000 5000 50000 5000 5000 25000 5000 25000 10000 ...
##  $ discharge_date_earliest: chr  NA NA NA NA ...

rawdat를 data.table로 변환 시키는 작업을 합니다.

dat = as.data.table(rawdat)
str(dat)
## Classes 'data.table' and 'data.frame':   19207 obs. of  11 variables:
##  $ charges_citation       : chr  "720 ILCS 5 12-3.4(a)(2) [16145" "625 ILCS 5 6-101 [12935]" "720 ILCS 5 12-3(a)(1) [10529]" "720 ILCS 550 5(c) [5020200]" ...
##  $ race                   : chr  "WH" "LW" "BK" "BK" ...
##  $ age_at_booking         : int  26 37 18 32 49 26 41 56 40 20 ...
##  $ gender                 : chr  "M" "M" "M" "F" ...
##  $ booking_date           : chr  "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" ...
##  $ jail_id                : chr  "2013-0120171" "2013-0120170" "2013-0120169" "2013-0120167" ...
##  $ bail_status            : chr  NA NA NA NA ...
##  $ housing_location       : chr  "05-" "05-" "05-L-2-2-1" "17-WR-N-A-2" ...
##  $ charges                : chr  NA NA NA NA ...
##  $ bail_amount            : int  5000 10000 5000 50000 5000 5000 25000 5000 25000 10000 ...
##  $ discharge_date_earliest: chr  NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>

booking_date 와 discharge dates를 시간표현으로 바꾸는 작업입니다.

ex) 2013-01-20T00:00:00 —->2013-01-20 00:00:00

dat[, `:=`(booking_date, ExtractIsoTime(dat$booking_date))]
##                      charges_citation race age_at_booking gender
##     1: 720 ILCS 5 12-3.4(a)(2) [16145   WH             26      M
##     2:       625 ILCS 5 6-101 [12935]   LW             37      M
##     3:  720 ILCS 5 12-3(a)(1) [10529]   BK             18      M
##     4:    720 ILCS 550 5(c) [5020200]   BK             32      F
##     5: 720 ILCS 5 12-3.2(a)(2) [10418   LW             49      M
##    ---                                                          
## 19203:                    95.5-11-501   WH             31      M
## 19204:                      56.5-1402   LW             28      M
## 19205:                      56.5-1402   BK             36      M
## 19206:                        38-10-5   LW             23      M
## 19207:                      56.5-1401   LW             27      M
##        booking_date      jail_id bail_status housing_location charges
##     1:   2013-01-20 2013-0120171          NA              05-      NA
##     2:   2013-01-20 2013-0120170          NA              05-      NA
##     3:   2013-01-20 2013-0120169          NA       05-L-2-2-1      NA
##     4:   2013-01-20 2013-0120167          NA      17-WR-N-A-2      NA
##     5:   2013-01-20 2013-0120165          NA              05-      NA
##    ---                                                               
## 19203:   1995-05-09 1995-9532061     NO BOND          15-DRAW      NA
## 19204:   1994-09-22 1994-9459745          NA          15-EMAW      NA
## 19205:   1993-09-24 1993-9357382          NA          15-DRAW      NA
## 19206:   1993-05-07 1993-9326844          NA          15-EMAW      NA
## 19207:   1993-01-16 1993-9303175          NA          15-EMAW      NA
##        bail_amount discharge_date_earliest
##     1:        5000                      NA
##     2:       10000                      NA
##     3:        5000                      NA
##     4:       50000                      NA
##     5:        5000                      NA
##    ---                                    
## 19203:          NA                      NA
## 19204:      250000                      NA
## 19205:       10000                      NA
## 19206:       60000                      NA
## 19207:      100000                      NA

booking_date와 마찬가지로

dat[, `:=`(discharge_date_earliest, ExtractIsoTime(dat$discharge_date_earliest))]
##                      charges_citation race age_at_booking gender
##     1: 720 ILCS 5 12-3.4(a)(2) [16145   WH             26      M
##     2:       625 ILCS 5 6-101 [12935]   LW             37      M
##     3:  720 ILCS 5 12-3(a)(1) [10529]   BK             18      M
##     4:    720 ILCS 550 5(c) [5020200]   BK             32      F
##     5: 720 ILCS 5 12-3.2(a)(2) [10418   LW             49      M
##    ---                                                          
## 19203:                    95.5-11-501   WH             31      M
## 19204:                      56.5-1402   LW             28      M
## 19205:                      56.5-1402   BK             36      M
## 19206:                        38-10-5   LW             23      M
## 19207:                      56.5-1401   LW             27      M
##        booking_date      jail_id bail_status housing_location charges
##     1:   2013-01-20 2013-0120171          NA              05-      NA
##     2:   2013-01-20 2013-0120170          NA              05-      NA
##     3:   2013-01-20 2013-0120169          NA       05-L-2-2-1      NA
##     4:   2013-01-20 2013-0120167          NA      17-WR-N-A-2      NA
##     5:   2013-01-20 2013-0120165          NA              05-      NA
##    ---                                                               
## 19203:   1995-05-09 1995-9532061     NO BOND          15-DRAW      NA
## 19204:   1994-09-22 1994-9459745          NA          15-EMAW      NA
## 19205:   1993-09-24 1993-9357382          NA          15-DRAW      NA
## 19206:   1993-05-07 1993-9326844          NA          15-EMAW      NA
## 19207:   1993-01-16 1993-9303175          NA          15-EMAW      NA
##        bail_amount discharge_date_earliest
##     1:        5000                    <NA>
##     2:       10000                    <NA>
##     3:        5000                    <NA>
##     4:       50000                    <NA>
##     5:        5000                    <NA>
##    ---                                    
## 19203:          NA                    <NA>
## 19204:      250000                    <NA>
## 19205:       10000                    <NA>
## 19206:       60000                    <NA>
## 19207:      100000                    <NA>

data.table을 이용한 부분작업하는 예제.

2번째 열에 있는 race 변수의 열들만 뽑아내려고 할때,

## 잘못된 방법입니다(1)
dat[, 2]
## [1] 2
## 잘못된 방법입니다(2)
dat[, "race"]
## [1] "race"
## 이렇게 해야 옳은 방법이 됩니다
dat[1:10, race]
##  [1] "WH" "LW" "BK" "BK" "LW" "BK" "BK" "BK" "LW" "BK"
## 또는 나이 변수를 뽑을때 성별이 여자인 사람만 뽑으려고 할때
dat[1:10, 3, with = F]
##     age_at_booking
##  1:             26
##  2:             37
##  3:             18
##  4:             32
##  5:             49
##  6:             26
##  7:             41
##  8:             56
##  9:             40
## 10:             20

이렇게 subsetting.data.table을 하기 위해선 기본적으로 행과 열을 지정을 해주고 나서, subset자리에 logical ecpression이 들어가야 원하는 정보를 얻을수 있습니다. 확실히 알아둘것은 열의 위치는 위 처럼 숫자로 표기하려면 부수적인 logical 표현이 분명히 필요합니다. 즉 열(변수명)을 지정할땐 분명히 logical 표현이 있어야 합니다. numerical표현이 아닙니다.

예외.

dat[1]
##                  charges_citation race age_at_booking gender booking_date
## 1: 720 ILCS 5 12-3.4(a)(2) [16145   WH             26      M   2013-01-20
##         jail_id bail_status housing_location charges bail_amount
## 1: 2013-0120171          NA              05-      NA        5000
##    discharge_date_earliest
## 1:                    <NA>

위의 코드에서 알 수 있는것은 보고싶은 객체를(행) 지정하고 열(변수명)을 지정하지 않으면 원하는 객체에의 모든 변수들을 뽑아 낼 수 있다.

data.table을 이용한 groupping 과 aggregating 예제.

## Grouping is simple (1)
dat[, mean(age_at_booking), by = race]
##    race    V1
## 1:   WH 35.04
## 2:   LW 31.45
## 3:   BK 31.64
## 4:   LT 29.41
## 5:   AS 33.20
## 6:    W 29.89
## 7:    B 29.86
## 8:   LB 31.97
## 9:   IN 27.00

위의 코드는 9개의 race별로 각 평균을 계산해 새로운 변수를 만들어 줍니다.

## (2)
dat[, age_at_booking, by = race]
##        race age_at_booking
##     1:   WH             26
##     2:   WH             52
##     3:   WH             58
##     4:   WH             39
##     5:   WH             26
##    ---                    
## 19203:   IN             37
## 19204:   IN             35
## 19205:   IN             19
## 19206:   IN             27
## 19207:   IN             18

위 코드는 race 별로 모든 객체에 대해서 나이를 보기위한 groupping입니다.

## (3)
dat[i = TRUE, j = list(mean = mean(age_at_booking), sd = sd(age_at_booking)), 
    by = race]
##    race  mean    sd
## 1:   WH 35.04 11.90
## 2:   LW 31.45 10.05
## 3:   BK 31.64 12.02
## 4:   LT 29.41 10.35
## 5:   AS 33.20 12.65
## 6:    W 29.89 10.54
## 7:    B 29.86 10.63
## 8:   LB 31.97 12.08
## 9:   IN 27.00 10.25

위 코드는 race 별로 평균과 표준편차에 대한 새로운 변수를 생성해 줍니다.

## (4)
dat[i = TRUE, j = list(mean = mean(age_at_booking), sd = sd(age_at_booking), 
    age_at_booking), by = race]
##        race  mean    sd age_at_booking
##     1:   WH 35.04 11.90             26
##     2:   WH 35.04 11.90             52
##     3:   WH 35.04 11.90             58
##     4:   WH 35.04 11.90             39
##     5:   WH 35.04 11.90             26
##    ---                                
## 19203:   IN 27.00 10.25             37
## 19204:   IN 27.00 10.25             35
## 19205:   IN 27.00 10.25             19
## 19206:   IN 27.00 10.25             27
## 19207:   IN 27.00 10.25             18

위의 코드에 의한 결과물을 보면 따로 aggregating 함수 없이도 age_at_booking 변수가 포함된 것을 볼 수 있다. 이것은 자동적으로 data.table의 결과를 확장시켜줍니다.

J(join)과 CJ(cross join)

datSmall = dat[, list(race, gender, charges_citation, housing_location)]
## Count of observations by race (1)
datSmall[, .N, by = race]
##    race     N
## 1:   WH  2015
## 2:   LW  1239
## 3:   BK 13879
## 4:   LT  1803
## 5:   AS   117
## 6:    W    44
## 7:    B    29
## 8:   LB    68
## 9:   IN    13

위의 코드를 살펴보면 모든 객체에 대해서 race 별로 (.N)<–객체가 몇개가 있는지를 보여줍니다.

## race별로 재정렬을 하기위한 코드 'setkey' (2)
setkey(datSmall, "race")

## 간단하게 괄호안에 Join(즉 내가 찾고 싶은 것)을하게 되면.
datSmall["W"]
##     race gender               charges_citation housing_location
##  1:    W      M 720 ILCS 5 12-3.4(a)(1) [16128         08-2N-DR
##  2:    W      M    625 ILCS 5 6-303(a) [13526]      11-AH-3-411
##  3:    W      M   625 ILCS 5 11-501(a) [14041]              02-
##  4:    W      M    625 ILCS 5 6-303(a) [13526]      02-D4-MU-1-
##  5:    W      M 720 ILCS 5 12-13(a)(3) [995700       05-D-1-2-1
##  6:    W      M                            000      02-D1-H-3-H
##  7:    W      M     720 ILCS 5 12-3.2 [930200]       03-A-2-4-2
##  8:    W      M 720 ILCS 570 401(a)(2)(D) [509      11-AF-3-311
##  9:    W      M 720 ILCS 5 12-3.05(d)(4) [1610       01-B-2-3-1
## 10:    W      M 720 ILCS 5 12-3.2(a)(1) [10416      02-D3-HH-3-
## 11:    W      M 720 ILCS 570 401(c)(1) [13009]      03-AX-D3-1-
## 12:    W      M    720 ILCS 5 16A-3(a) [15599]      02-D4-QU-1-
## 13:    W      M  720 ILCS 570 402(c) [5101110]      03-AX-B1-1-
## 14:    W      M                            000       03-A-3-4-1
## 15:    W      M    720 ILCS 5 12-1(a) [920000]      02-D4-RL-1-
## 16:    W      M  720 ILCS 5 12-3(a)(2) [10530]       10-A-3-8-2
## 17:    W      M  625 ILCS 5 6-303(d) [5883000]      02-D1-A-2-A
## 18:    W      M  720 ILCS 5 16A-3(a) [1060000]      02-D2-W-2-W
## 19:    W      M       720 ILCS 5 12-3 [930000]      02-D4-ML-1-
## 20:    W      M  720 ILCS 570 402(c) [5101110]      14-B4-4-42-
## 21:    W      F   720 ILCS 5 16-3(a) [1025000]      04-J-1-11-1
## 22:    W      M   720 ILCS 5 19-1(a) [1110000]      02-D1-A-2-A
## 23:    W      M 720 ILCS 5 12-3.2(a)(1) [10416      03-A-2-19-1
## 24:    W      M 720 ILCS 5 12-3.2(a)(2) [10418      02-D2-U-3-U
## 25:    W      F   625 ILCS 5 11-501(a) [14039]          17-SFFP
## 26:    W      M  720 ILCS 570 402(c) [5101110]      03-A-1-26-1
## 27:    W      M   625 ILCS 5 11-501(a) [12809]      06-H-2-19-2
## 28:    W      M  720 ILCS 570 402(c) [5101110]      11-AC-1-208
## 29:    W      M 625 ILCS 5 11-501(a)(2) [11309            15-EM
## 30:    W      M             720 ILCS 5/19-1(a)       01-E-1-1-1
## 31:    W      M                        UNKNOWN      01-G-3-16-1
## 32:    W      M            625 ILCS 5/6-303(a)            15-EM
## 33:    W      M           625 ILCS 5/11-501(a)       06-D-1-3-2
## 34:    W      M             720 ILCS 5/19-1(a)            15-EM
## 35:    W      F           625 ILCS 5/11-501(a)      17-WR-N-C-2
## 36:    W      M           720 ILCS 5/9-1(a)(1)       01-A-3-3-2
## 37:    W      M                38-24-3.1(a)(6)      06-H-1-12-2
## 38:    W      M           625 ILCS 5/11-501(a)      02-D1-H-3-H
## 39:    W      M               720 ILCS 570/402            15-DR
## 40:    W      M              720 ILCS 550/5(g)       10-C-1-7-1
## 41:    W      M       720 ILCS 5/12-14.1(a)(1)          C DISCH
## 42:    W      M           720 ILCS 5/12-4.3(a)      11-DH-3-411
## 43:    W      M              720 ILCS 5/24-1.1      06-C-1-13-1
## 44:    W      M                  38-10-2(a)(3)      01-H-1-13-2
##     race gender               charges_citation housing_location
## 같은 방법으로는 이 코드도 똑같지만 위의 코드가 좀 더 쉬운 것을 알 수
## 있다.
datSmall[J("W")]
##     race gender               charges_citation housing_location
##  1:    W      M 720 ILCS 5 12-3.4(a)(1) [16128         08-2N-DR
##  2:    W      M    625 ILCS 5 6-303(a) [13526]      11-AH-3-411
##  3:    W      M   625 ILCS 5 11-501(a) [14041]              02-
##  4:    W      M    625 ILCS 5 6-303(a) [13526]      02-D4-MU-1-
##  5:    W      M 720 ILCS 5 12-13(a)(3) [995700       05-D-1-2-1
##  6:    W      M                            000      02-D1-H-3-H
##  7:    W      M     720 ILCS 5 12-3.2 [930200]       03-A-2-4-2
##  8:    W      M 720 ILCS 570 401(a)(2)(D) [509      11-AF-3-311
##  9:    W      M 720 ILCS 5 12-3.05(d)(4) [1610       01-B-2-3-1
## 10:    W      M 720 ILCS 5 12-3.2(a)(1) [10416      02-D3-HH-3-
## 11:    W      M 720 ILCS 570 401(c)(1) [13009]      03-AX-D3-1-
## 12:    W      M    720 ILCS 5 16A-3(a) [15599]      02-D4-QU-1-
## 13:    W      M  720 ILCS 570 402(c) [5101110]      03-AX-B1-1-
## 14:    W      M                            000       03-A-3-4-1
## 15:    W      M    720 ILCS 5 12-1(a) [920000]      02-D4-RL-1-
## 16:    W      M  720 ILCS 5 12-3(a)(2) [10530]       10-A-3-8-2
## 17:    W      M  625 ILCS 5 6-303(d) [5883000]      02-D1-A-2-A
## 18:    W      M  720 ILCS 5 16A-3(a) [1060000]      02-D2-W-2-W
## 19:    W      M       720 ILCS 5 12-3 [930000]      02-D4-ML-1-
## 20:    W      M  720 ILCS 570 402(c) [5101110]      14-B4-4-42-
## 21:    W      F   720 ILCS 5 16-3(a) [1025000]      04-J-1-11-1
## 22:    W      M   720 ILCS 5 19-1(a) [1110000]      02-D1-A-2-A
## 23:    W      M 720 ILCS 5 12-3.2(a)(1) [10416      03-A-2-19-1
## 24:    W      M 720 ILCS 5 12-3.2(a)(2) [10418      02-D2-U-3-U
## 25:    W      F   625 ILCS 5 11-501(a) [14039]          17-SFFP
## 26:    W      M  720 ILCS 570 402(c) [5101110]      03-A-1-26-1
## 27:    W      M   625 ILCS 5 11-501(a) [12809]      06-H-2-19-2
## 28:    W      M  720 ILCS 570 402(c) [5101110]      11-AC-1-208
## 29:    W      M 625 ILCS 5 11-501(a)(2) [11309            15-EM
## 30:    W      M             720 ILCS 5/19-1(a)       01-E-1-1-1
## 31:    W      M                        UNKNOWN      01-G-3-16-1
## 32:    W      M            625 ILCS 5/6-303(a)            15-EM
## 33:    W      M           625 ILCS 5/11-501(a)       06-D-1-3-2
## 34:    W      M             720 ILCS 5/19-1(a)            15-EM
## 35:    W      F           625 ILCS 5/11-501(a)      17-WR-N-C-2
## 36:    W      M           720 ILCS 5/9-1(a)(1)       01-A-3-3-2
## 37:    W      M                38-24-3.1(a)(6)      06-H-1-12-2
## 38:    W      M           625 ILCS 5/11-501(a)      02-D1-H-3-H
## 39:    W      M               720 ILCS 570/402            15-DR
## 40:    W      M              720 ILCS 550/5(g)       10-C-1-7-1
## 41:    W      M       720 ILCS 5/12-14.1(a)(1)          C DISCH
## 42:    W      M           720 ILCS 5/12-4.3(a)      11-DH-3-411
## 43:    W      M              720 ILCS 5/24-1.1      06-C-1-13-1
## 44:    W      M                  38-10-2(a)(3)      01-H-1-13-2
##     race gender               charges_citation housing_location

datsmall 데이터 안에서 "setkey"를 통해 race 변수(실무자가 원하는 변수)에 의해 재정렬이 들어가고 datsmall 에서 race 가 "W” 인 객체들만 출력이 된다. 간단한 코드만으로도 가능해진다. 이때 “setkey” coding을 해야됩니다.

## 두개 이상의 찾고 싶은 key를 사용해도 된다.  (3)
datSmall[c("W", "WH")]
##       race gender               charges_citation housing_location
##    1:    W      M 720 ILCS 5 12-3.4(a)(1) [16128         08-2N-DR
##    2:    W      M    625 ILCS 5 6-303(a) [13526]      11-AH-3-411
##    3:    W      M   625 ILCS 5 11-501(a) [14041]              02-
##    4:    W      M    625 ILCS 5 6-303(a) [13526]      02-D4-MU-1-
##    5:    W      M 720 ILCS 5 12-13(a)(3) [995700       05-D-1-2-1
##   ---                                                            
## 2055:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1
## 2056:   WH      M                         38-9-1          15-EMAW
## 2057:   WH      M                        38-19-3          15-EMAW
## 2058:   WH      M                       56.5-704          15-EMAW
## 2059:   WH      M                    95.5-11-501          15-DRAW

## 역시 똑같은 방법으로는.
datSmall[J(c("W", "WH"))]
##       race gender               charges_citation housing_location
##    1:    W      M 720 ILCS 5 12-3.4(a)(1) [16128         08-2N-DR
##    2:    W      M    625 ILCS 5 6-303(a) [13526]      11-AH-3-411
##    3:    W      M   625 ILCS 5 11-501(a) [14041]              02-
##    4:    W      M    625 ILCS 5 6-303(a) [13526]      02-D4-MU-1-
##    5:    W      M 720 ILCS 5 12-13(a)(3) [995700       05-D-1-2-1
##   ---                                                            
## 2055:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1
## 2056:   WH      M                         38-9-1          15-EMAW
## 2057:   WH      M                        38-19-3          15-EMAW
## 2058:   WH      M                       56.5-704          15-EMAW
## 2059:   WH      M                    95.5-11-501          15-DRAW

역시 마찬가지로 race에 대한 것이기때문에 “setkey"를 race 변수로 coding을 해야됩니다.

###### (4)
datSmall[, .N, keyby = list(race, gender)]
##     race gender     N
##  1:   AS      F     6
##  2:   AS      M   111
##  3:    B      F     3
##  4:    B      M    26
##  5:   BK      F  1209
##  6:   BK      M 12670
##  7:   IN      F     6
##  8:   IN      M     7
##  9:   LB      F     9
## 10:   LB      M    59
## 11:   LT      F    73
## 12:   LT      M  1730
## 13:   LW      F   100
## 14:   LW      M  1139
## 15:    W      F     3
## 16:    W      M    41
## 17:   WH      F   333
## 18:   WH      M  1682
## (5)
datSmall[c("WH", "M")]
##       race gender               charges_citation housing_location
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05-
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2
##   ---                                                            
## 2012:   WH      M                         38-9-1          15-EMAW
## 2013:   WH      M                        38-19-3          15-EMAW
## 2014:   WH      M                       56.5-704          15-EMAW
## 2015:   WH      M                    95.5-11-501          15-DRAW
## 2016:    M     NA                             NA               NA
datSmall[c("WH", "M")][, .N, list(race, gender)]
##    race gender    N
## 1:   WH      M 1682
## 2:   WH      F  333
## 3:    M     NA    1

일단 두 코드 모두 "setkey"coding이 되어있습니다. 첫번째 코드는 datSmall 데이터에서 race가 "WH” 와 “W"인 모든 객체를 출력해줍니다. 두번째 코드는 "WH"와 "W” 인 race들에 대해서 각 객체가 몇개가 있는지 개수를 알려줍니다. 여기서 “.N"을 사용하게 되면 Female의 값과 NA의 값이 추가되어서 출력되어집니다.

## (6)
datSmall[data.table("WH", "M")]
##       race gender               charges_citation housing_location M
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05- M
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2 M
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1 M
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR M
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2 M
##   ---                                                              
## 2011:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1 M
## 2012:   WH      M                         38-9-1          15-EMAW M
## 2013:   WH      M                        38-19-3          15-EMAW M
## 2014:   WH      M                       56.5-704          15-EMAW M
## 2015:   WH      M                    95.5-11-501          15-DRAW M
datSmall[data.table("WH", "M")][, .N, list(race, gender)]
##    race gender    N
## 1:   WH      M 1682
## 2:   WH      F  333

data.table을 이용합니다.

## (7)
datSmall[J("WH", "M")]
##       race gender               charges_citation housing_location V2
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05-  M
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2  M
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1  M
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR  M
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2  M
##   ---                                                               
## 2011:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1  M
## 2012:   WH      M                         38-9-1          15-EMAW  M
## 2013:   WH      M                        38-19-3          15-EMAW  M
## 2014:   WH      M                       56.5-704          15-EMAW  M
## 2015:   WH      M                    95.5-11-501          15-DRAW  M
datSmall[CJ("WH", "M")]
##       race gender               charges_citation housing_location V2
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05-  M
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2  M
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1  M
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR  M
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2  M
##   ---                                                               
## 2011:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1  M
## 2012:   WH      M                         38-9-1          15-EMAW  M
## 2013:   WH      M                        38-19-3          15-EMAW  M
## 2014:   WH      M                       56.5-704          15-EMAW  M
## 2015:   WH      M                    95.5-11-501          15-DRAW  M
## (8)
datSmall[J("WH", "W")]
##       race gender               charges_citation housing_location V2
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05-  W
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2  W
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1  W
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR  W
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2  W
##   ---                                                               
## 2011:   WH      M            720 ILCS 5/32-10(a)       01-H-1-6-1  W
## 2012:   WH      M                         38-9-1          15-EMAW  W
## 2013:   WH      M                        38-19-3          15-EMAW  W
## 2014:   WH      M                       56.5-704          15-EMAW  W
## 2015:   WH      M                    95.5-11-501          15-DRAW  W

하지만 이 코드는 같은 변수에서 두개의 key를 사용 할 수 있습니다.

datSmall[J(c("WH", "W"))]
##       race gender               charges_citation housing_location
##    1:   WH      M 720 ILCS 5 12-3.4(a)(2) [16145              05-
##    2:   WH      M     720 ILCS 5 12-3.2 [930200]       05-L-2-1-2
##    3:   WH      F 720 ILCS 5 12-3.2(a)(2) [10418      04-Q-1-11-1
##    4:   WH      M 720 ILCS 5 12-3.2(a)(1) [10416         08-2N-DR
##    5:   WH      F    720 ILCS 5 16A-3(a) [15601]      17-WR-N-A-2
##   ---                                                            
## 2055:    W      M              720 ILCS 550/5(g)       10-C-1-7-1
## 2056:    W      M       720 ILCS 5/12-14.1(a)(1)          C DISCH
## 2057:    W      M           720 ILCS 5/12-4.3(a)      11-DH-3-411
## 2058:    W      M              720 ILCS 5/24-1.1      06-C-1-13-1
## 2059:    W      M                  38-10-2(a)(3)      01-H-1-13-2

Hankuk University of Foreign Studies.

Dept of Statistics. Daewoo Choi Lab. Jaemyung Kwon

한국외국어대학교 통계학과 최대우 연구실 권재명 e-mail : jaemyung.kw@gmail.com