제일 먼저 워킹디렉토리를 다운받은 파일들이 모여있는 폴더로 지정합니다. data.table과 ggplot2과 설치가 안되어 있다면 먼저 두 패키를 설치 합니다.
setwd("C:/Users/kwon/Documents/dt_ex")
rm(list = ls())
library(data.table)
library(ggplot2)
source("functions/ExtractIsoTime.R")
source("functions/wtf.R")
setwd('C:/Users/kwon/Documents/dt_ex')
rawdat = read.table (file = "Database 2013-01-21 (8zQ4cW7T).csv" , sep = "," ,
quote = "\"", flush = F, header = T, nrows = -1, fill = F,
stringsAsFactors = F, na.strings = c("None",""))
str(rawdat)
## 'data.frame': 19207 obs. of 11 variables:
## $ charges_citation : chr "720 ILCS 5 12-3.4(a)(2) [16145" "625 ILCS 5 6-101 [12935]" "720 ILCS 5 12-3(a)(1) [10529]" "720 ILCS 550 5(c) [5020200]" ...
## $ race : chr "WH" "LW" "BK" "BK" ...
## $ age_at_booking : int 26 37 18 32 49 26 41 56 40 20 ...
## $ gender : chr "M" "M" "M" "F" ...
## $ booking_date : chr "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" ...
## $ jail_id : chr "2013-0120171" "2013-0120170" "2013-0120169" "2013-0120167" ...
## $ bail_status : chr NA NA NA NA ...
## $ housing_location : chr "05-" "05-" "05-L-2-2-1" "17-WR-N-A-2" ...
## $ charges : chr NA NA NA NA ...
## $ bail_amount : int 5000 10000 5000 50000 5000 5000 25000 5000 25000 10000 ...
## $ discharge_date_earliest: chr NA NA NA NA ...
rawdat를 data.table로 변환 시키는 작업을 합니다.
dat = as.data.table(rawdat)
str(dat)
## Classes 'data.table' and 'data.frame': 19207 obs. of 11 variables:
## $ charges_citation : chr "720 ILCS 5 12-3.4(a)(2) [16145" "625 ILCS 5 6-101 [12935]" "720 ILCS 5 12-3(a)(1) [10529]" "720 ILCS 550 5(c) [5020200]" ...
## $ race : chr "WH" "LW" "BK" "BK" ...
## $ age_at_booking : int 26 37 18 32 49 26 41 56 40 20 ...
## $ gender : chr "M" "M" "M" "F" ...
## $ booking_date : chr "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" "2013-01-20T00:00:00" ...
## $ jail_id : chr "2013-0120171" "2013-0120170" "2013-0120169" "2013-0120167" ...
## $ bail_status : chr NA NA NA NA ...
## $ housing_location : chr "05-" "05-" "05-L-2-2-1" "17-WR-N-A-2" ...
## $ charges : chr NA NA NA NA ...
## $ bail_amount : int 5000 10000 5000 50000 5000 5000 25000 5000 25000 10000 ...
## $ discharge_date_earliest: chr NA NA NA NA ...
## - attr(*, ".internal.selfref")=<externalptr>
ex) 2013-01-20T00:00:00 —->2013-01-20 00:00:00
dat[, `:=`(booking_date, ExtractIsoTime(dat$booking_date))]
## charges_citation race age_at_booking gender
## 1: 720 ILCS 5 12-3.4(a)(2) [16145 WH 26 M
## 2: 625 ILCS 5 6-101 [12935] LW 37 M
## 3: 720 ILCS 5 12-3(a)(1) [10529] BK 18 M
## 4: 720 ILCS 550 5(c) [5020200] BK 32 F
## 5: 720 ILCS 5 12-3.2(a)(2) [10418 LW 49 M
## ---
## 19203: 95.5-11-501 WH 31 M
## 19204: 56.5-1402 LW 28 M
## 19205: 56.5-1402 BK 36 M
## 19206: 38-10-5 LW 23 M
## 19207: 56.5-1401 LW 27 M
## booking_date jail_id bail_status housing_location charges
## 1: 2013-01-20 2013-0120171 NA 05- NA
## 2: 2013-01-20 2013-0120170 NA 05- NA
## 3: 2013-01-20 2013-0120169 NA 05-L-2-2-1 NA
## 4: 2013-01-20 2013-0120167 NA 17-WR-N-A-2 NA
## 5: 2013-01-20 2013-0120165 NA 05- NA
## ---
## 19203: 1995-05-09 1995-9532061 NO BOND 15-DRAW NA
## 19204: 1994-09-22 1994-9459745 NA 15-EMAW NA
## 19205: 1993-09-24 1993-9357382 NA 15-DRAW NA
## 19206: 1993-05-07 1993-9326844 NA 15-EMAW NA
## 19207: 1993-01-16 1993-9303175 NA 15-EMAW NA
## bail_amount discharge_date_earliest
## 1: 5000 NA
## 2: 10000 NA
## 3: 5000 NA
## 4: 50000 NA
## 5: 5000 NA
## ---
## 19203: NA NA
## 19204: 250000 NA
## 19205: 10000 NA
## 19206: 60000 NA
## 19207: 100000 NA
booking_date와 마찬가지로
dat[, `:=`(discharge_date_earliest, ExtractIsoTime(dat$discharge_date_earliest))]
## charges_citation race age_at_booking gender
## 1: 720 ILCS 5 12-3.4(a)(2) [16145 WH 26 M
## 2: 625 ILCS 5 6-101 [12935] LW 37 M
## 3: 720 ILCS 5 12-3(a)(1) [10529] BK 18 M
## 4: 720 ILCS 550 5(c) [5020200] BK 32 F
## 5: 720 ILCS 5 12-3.2(a)(2) [10418 LW 49 M
## ---
## 19203: 95.5-11-501 WH 31 M
## 19204: 56.5-1402 LW 28 M
## 19205: 56.5-1402 BK 36 M
## 19206: 38-10-5 LW 23 M
## 19207: 56.5-1401 LW 27 M
## booking_date jail_id bail_status housing_location charges
## 1: 2013-01-20 2013-0120171 NA 05- NA
## 2: 2013-01-20 2013-0120170 NA 05- NA
## 3: 2013-01-20 2013-0120169 NA 05-L-2-2-1 NA
## 4: 2013-01-20 2013-0120167 NA 17-WR-N-A-2 NA
## 5: 2013-01-20 2013-0120165 NA 05- NA
## ---
## 19203: 1995-05-09 1995-9532061 NO BOND 15-DRAW NA
## 19204: 1994-09-22 1994-9459745 NA 15-EMAW NA
## 19205: 1993-09-24 1993-9357382 NA 15-DRAW NA
## 19206: 1993-05-07 1993-9326844 NA 15-EMAW NA
## 19207: 1993-01-16 1993-9303175 NA 15-EMAW NA
## bail_amount discharge_date_earliest
## 1: 5000 <NA>
## 2: 10000 <NA>
## 3: 5000 <NA>
## 4: 50000 <NA>
## 5: 5000 <NA>
## ---
## 19203: NA <NA>
## 19204: 250000 <NA>
## 19205: 10000 <NA>
## 19206: 60000 <NA>
## 19207: 100000 <NA>
2번째 열에 있는 race 변수의 열들만 뽑아내려고 할때,
## 잘못된 방법입니다(1)
dat[, 2]
## [1] 2
## 잘못된 방법입니다(2)
dat[, "race"]
## [1] "race"
## 이렇게 해야 옳은 방법이 됩니다
dat[1:10, race]
## [1] "WH" "LW" "BK" "BK" "LW" "BK" "BK" "BK" "LW" "BK"
## 또는 나이 변수를 뽑을때 성별이 여자인 사람만 뽑으려고 할때
dat[1:10, 3, with = F]
## age_at_booking
## 1: 26
## 2: 37
## 3: 18
## 4: 32
## 5: 49
## 6: 26
## 7: 41
## 8: 56
## 9: 40
## 10: 20
이렇게 subsetting.data.table을 하기 위해선 기본적으로 행과 열을 지정을 해주고 나서, subset자리에 logical ecpression이 들어가야 원하는 정보를 얻을수 있습니다. 확실히 알아둘것은 열의 위치는 위 처럼 숫자로 표기하려면 부수적인 logical 표현이 분명히 필요합니다. 즉 열(변수명)을 지정할땐 분명히 logical 표현이 있어야 합니다. numerical표현이 아닙니다.
dat[1]
## charges_citation race age_at_booking gender booking_date
## 1: 720 ILCS 5 12-3.4(a)(2) [16145 WH 26 M 2013-01-20
## jail_id bail_status housing_location charges bail_amount
## 1: 2013-0120171 NA 05- NA 5000
## discharge_date_earliest
## 1: <NA>
위의 코드에서 알 수 있는것은 보고싶은 객체를(행) 지정하고 열(변수명)을 지정하지 않으면 원하는 객체에의 모든 변수들을 뽑아 낼 수 있다.
## Grouping is simple (1)
dat[, mean(age_at_booking), by = race]
## race V1
## 1: WH 35.04
## 2: LW 31.45
## 3: BK 31.64
## 4: LT 29.41
## 5: AS 33.20
## 6: W 29.89
## 7: B 29.86
## 8: LB 31.97
## 9: IN 27.00
위의 코드는 9개의 race별로 각 평균을 계산해 새로운 변수를 만들어 줍니다.
## (2)
dat[, age_at_booking, by = race]
## race age_at_booking
## 1: WH 26
## 2: WH 52
## 3: WH 58
## 4: WH 39
## 5: WH 26
## ---
## 19203: IN 37
## 19204: IN 35
## 19205: IN 19
## 19206: IN 27
## 19207: IN 18
위 코드는 race 별로 모든 객체에 대해서 나이를 보기위한 groupping입니다.
## (3)
dat[i = TRUE, j = list(mean = mean(age_at_booking), sd = sd(age_at_booking)),
by = race]
## race mean sd
## 1: WH 35.04 11.90
## 2: LW 31.45 10.05
## 3: BK 31.64 12.02
## 4: LT 29.41 10.35
## 5: AS 33.20 12.65
## 6: W 29.89 10.54
## 7: B 29.86 10.63
## 8: LB 31.97 12.08
## 9: IN 27.00 10.25
위 코드는 race 별로 평균과 표준편차에 대한 새로운 변수를 생성해 줍니다.
## (4)
dat[i = TRUE, j = list(mean = mean(age_at_booking), sd = sd(age_at_booking),
age_at_booking), by = race]
## race mean sd age_at_booking
## 1: WH 35.04 11.90 26
## 2: WH 35.04 11.90 52
## 3: WH 35.04 11.90 58
## 4: WH 35.04 11.90 39
## 5: WH 35.04 11.90 26
## ---
## 19203: IN 27.00 10.25 37
## 19204: IN 27.00 10.25 35
## 19205: IN 27.00 10.25 19
## 19206: IN 27.00 10.25 27
## 19207: IN 27.00 10.25 18
위의 코드에 의한 결과물을 보면 따로 aggregating 함수 없이도 age_at_booking 변수가 포함된 것을 볼 수 있다. 이것은 자동적으로 data.table의 결과를 확장시켜줍니다.
datSmall = dat[, list(race, gender, charges_citation, housing_location)]
## Count of observations by race (1)
datSmall[, .N, by = race]
## race N
## 1: WH 2015
## 2: LW 1239
## 3: BK 13879
## 4: LT 1803
## 5: AS 117
## 6: W 44
## 7: B 29
## 8: LB 68
## 9: IN 13
위의 코드를 살펴보면 모든 객체에 대해서 race 별로 (.N)<–객체가 몇개가 있는지를 보여줍니다.
## race별로 재정렬을 하기위한 코드 'setkey' (2)
setkey(datSmall, "race")
## 간단하게 괄호안에 Join(즉 내가 찾고 싶은 것)을하게 되면.
datSmall["W"]
## race gender charges_citation housing_location
## 1: W M 720 ILCS 5 12-3.4(a)(1) [16128 08-2N-DR
## 2: W M 625 ILCS 5 6-303(a) [13526] 11-AH-3-411
## 3: W M 625 ILCS 5 11-501(a) [14041] 02-
## 4: W M 625 ILCS 5 6-303(a) [13526] 02-D4-MU-1-
## 5: W M 720 ILCS 5 12-13(a)(3) [995700 05-D-1-2-1
## 6: W M 000 02-D1-H-3-H
## 7: W M 720 ILCS 5 12-3.2 [930200] 03-A-2-4-2
## 8: W M 720 ILCS 570 401(a)(2)(D) [509 11-AF-3-311
## 9: W M 720 ILCS 5 12-3.05(d)(4) [1610 01-B-2-3-1
## 10: W M 720 ILCS 5 12-3.2(a)(1) [10416 02-D3-HH-3-
## 11: W M 720 ILCS 570 401(c)(1) [13009] 03-AX-D3-1-
## 12: W M 720 ILCS 5 16A-3(a) [15599] 02-D4-QU-1-
## 13: W M 720 ILCS 570 402(c) [5101110] 03-AX-B1-1-
## 14: W M 000 03-A-3-4-1
## 15: W M 720 ILCS 5 12-1(a) [920000] 02-D4-RL-1-
## 16: W M 720 ILCS 5 12-3(a)(2) [10530] 10-A-3-8-2
## 17: W M 625 ILCS 5 6-303(d) [5883000] 02-D1-A-2-A
## 18: W M 720 ILCS 5 16A-3(a) [1060000] 02-D2-W-2-W
## 19: W M 720 ILCS 5 12-3 [930000] 02-D4-ML-1-
## 20: W M 720 ILCS 570 402(c) [5101110] 14-B4-4-42-
## 21: W F 720 ILCS 5 16-3(a) [1025000] 04-J-1-11-1
## 22: W M 720 ILCS 5 19-1(a) [1110000] 02-D1-A-2-A
## 23: W M 720 ILCS 5 12-3.2(a)(1) [10416 03-A-2-19-1
## 24: W M 720 ILCS 5 12-3.2(a)(2) [10418 02-D2-U-3-U
## 25: W F 625 ILCS 5 11-501(a) [14039] 17-SFFP
## 26: W M 720 ILCS 570 402(c) [5101110] 03-A-1-26-1
## 27: W M 625 ILCS 5 11-501(a) [12809] 06-H-2-19-2
## 28: W M 720 ILCS 570 402(c) [5101110] 11-AC-1-208
## 29: W M 625 ILCS 5 11-501(a)(2) [11309 15-EM
## 30: W M 720 ILCS 5/19-1(a) 01-E-1-1-1
## 31: W M UNKNOWN 01-G-3-16-1
## 32: W M 625 ILCS 5/6-303(a) 15-EM
## 33: W M 625 ILCS 5/11-501(a) 06-D-1-3-2
## 34: W M 720 ILCS 5/19-1(a) 15-EM
## 35: W F 625 ILCS 5/11-501(a) 17-WR-N-C-2
## 36: W M 720 ILCS 5/9-1(a)(1) 01-A-3-3-2
## 37: W M 38-24-3.1(a)(6) 06-H-1-12-2
## 38: W M 625 ILCS 5/11-501(a) 02-D1-H-3-H
## 39: W M 720 ILCS 570/402 15-DR
## 40: W M 720 ILCS 550/5(g) 10-C-1-7-1
## 41: W M 720 ILCS 5/12-14.1(a)(1) C DISCH
## 42: W M 720 ILCS 5/12-4.3(a) 11-DH-3-411
## 43: W M 720 ILCS 5/24-1.1 06-C-1-13-1
## 44: W M 38-10-2(a)(3) 01-H-1-13-2
## race gender charges_citation housing_location
## 같은 방법으로는 이 코드도 똑같지만 위의 코드가 좀 더 쉬운 것을 알 수
## 있다.
datSmall[J("W")]
## race gender charges_citation housing_location
## 1: W M 720 ILCS 5 12-3.4(a)(1) [16128 08-2N-DR
## 2: W M 625 ILCS 5 6-303(a) [13526] 11-AH-3-411
## 3: W M 625 ILCS 5 11-501(a) [14041] 02-
## 4: W M 625 ILCS 5 6-303(a) [13526] 02-D4-MU-1-
## 5: W M 720 ILCS 5 12-13(a)(3) [995700 05-D-1-2-1
## 6: W M 000 02-D1-H-3-H
## 7: W M 720 ILCS 5 12-3.2 [930200] 03-A-2-4-2
## 8: W M 720 ILCS 570 401(a)(2)(D) [509 11-AF-3-311
## 9: W M 720 ILCS 5 12-3.05(d)(4) [1610 01-B-2-3-1
## 10: W M 720 ILCS 5 12-3.2(a)(1) [10416 02-D3-HH-3-
## 11: W M 720 ILCS 570 401(c)(1) [13009] 03-AX-D3-1-
## 12: W M 720 ILCS 5 16A-3(a) [15599] 02-D4-QU-1-
## 13: W M 720 ILCS 570 402(c) [5101110] 03-AX-B1-1-
## 14: W M 000 03-A-3-4-1
## 15: W M 720 ILCS 5 12-1(a) [920000] 02-D4-RL-1-
## 16: W M 720 ILCS 5 12-3(a)(2) [10530] 10-A-3-8-2
## 17: W M 625 ILCS 5 6-303(d) [5883000] 02-D1-A-2-A
## 18: W M 720 ILCS 5 16A-3(a) [1060000] 02-D2-W-2-W
## 19: W M 720 ILCS 5 12-3 [930000] 02-D4-ML-1-
## 20: W M 720 ILCS 570 402(c) [5101110] 14-B4-4-42-
## 21: W F 720 ILCS 5 16-3(a) [1025000] 04-J-1-11-1
## 22: W M 720 ILCS 5 19-1(a) [1110000] 02-D1-A-2-A
## 23: W M 720 ILCS 5 12-3.2(a)(1) [10416 03-A-2-19-1
## 24: W M 720 ILCS 5 12-3.2(a)(2) [10418 02-D2-U-3-U
## 25: W F 625 ILCS 5 11-501(a) [14039] 17-SFFP
## 26: W M 720 ILCS 570 402(c) [5101110] 03-A-1-26-1
## 27: W M 625 ILCS 5 11-501(a) [12809] 06-H-2-19-2
## 28: W M 720 ILCS 570 402(c) [5101110] 11-AC-1-208
## 29: W M 625 ILCS 5 11-501(a)(2) [11309 15-EM
## 30: W M 720 ILCS 5/19-1(a) 01-E-1-1-1
## 31: W M UNKNOWN 01-G-3-16-1
## 32: W M 625 ILCS 5/6-303(a) 15-EM
## 33: W M 625 ILCS 5/11-501(a) 06-D-1-3-2
## 34: W M 720 ILCS 5/19-1(a) 15-EM
## 35: W F 625 ILCS 5/11-501(a) 17-WR-N-C-2
## 36: W M 720 ILCS 5/9-1(a)(1) 01-A-3-3-2
## 37: W M 38-24-3.1(a)(6) 06-H-1-12-2
## 38: W M 625 ILCS 5/11-501(a) 02-D1-H-3-H
## 39: W M 720 ILCS 570/402 15-DR
## 40: W M 720 ILCS 550/5(g) 10-C-1-7-1
## 41: W M 720 ILCS 5/12-14.1(a)(1) C DISCH
## 42: W M 720 ILCS 5/12-4.3(a) 11-DH-3-411
## 43: W M 720 ILCS 5/24-1.1 06-C-1-13-1
## 44: W M 38-10-2(a)(3) 01-H-1-13-2
## race gender charges_citation housing_location
datsmall 데이터 안에서 "setkey"를 통해 race 변수(실무자가 원하는 변수)에 의해 재정렬이 들어가고 datsmall 에서 race 가 "W” 인 객체들만 출력이 된다. 간단한 코드만으로도 가능해진다. 이때 “setkey” coding을 해야됩니다.
## 두개 이상의 찾고 싶은 key를 사용해도 된다. (3)
datSmall[c("W", "WH")]
## race gender charges_citation housing_location
## 1: W M 720 ILCS 5 12-3.4(a)(1) [16128 08-2N-DR
## 2: W M 625 ILCS 5 6-303(a) [13526] 11-AH-3-411
## 3: W M 625 ILCS 5 11-501(a) [14041] 02-
## 4: W M 625 ILCS 5 6-303(a) [13526] 02-D4-MU-1-
## 5: W M 720 ILCS 5 12-13(a)(3) [995700 05-D-1-2-1
## ---
## 2055: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1
## 2056: WH M 38-9-1 15-EMAW
## 2057: WH M 38-19-3 15-EMAW
## 2058: WH M 56.5-704 15-EMAW
## 2059: WH M 95.5-11-501 15-DRAW
## 역시 똑같은 방법으로는.
datSmall[J(c("W", "WH"))]
## race gender charges_citation housing_location
## 1: W M 720 ILCS 5 12-3.4(a)(1) [16128 08-2N-DR
## 2: W M 625 ILCS 5 6-303(a) [13526] 11-AH-3-411
## 3: W M 625 ILCS 5 11-501(a) [14041] 02-
## 4: W M 625 ILCS 5 6-303(a) [13526] 02-D4-MU-1-
## 5: W M 720 ILCS 5 12-13(a)(3) [995700 05-D-1-2-1
## ---
## 2055: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1
## 2056: WH M 38-9-1 15-EMAW
## 2057: WH M 38-19-3 15-EMAW
## 2058: WH M 56.5-704 15-EMAW
## 2059: WH M 95.5-11-501 15-DRAW
역시 마찬가지로 race에 대한 것이기때문에 “setkey"를 race 변수로 coding을 해야됩니다.
###### (4)
datSmall[, .N, keyby = list(race, gender)]
## race gender N
## 1: AS F 6
## 2: AS M 111
## 3: B F 3
## 4: B M 26
## 5: BK F 1209
## 6: BK M 12670
## 7: IN F 6
## 8: IN M 7
## 9: LB F 9
## 10: LB M 59
## 11: LT F 73
## 12: LT M 1730
## 13: LW F 100
## 14: LW M 1139
## 15: W F 3
## 16: W M 41
## 17: WH F 333
## 18: WH M 1682
## (5)
datSmall[c("WH", "M")]
## race gender charges_citation housing_location
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05-
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2
## ---
## 2012: WH M 38-9-1 15-EMAW
## 2013: WH M 38-19-3 15-EMAW
## 2014: WH M 56.5-704 15-EMAW
## 2015: WH M 95.5-11-501 15-DRAW
## 2016: M NA NA NA
datSmall[c("WH", "M")][, .N, list(race, gender)]
## race gender N
## 1: WH M 1682
## 2: WH F 333
## 3: M NA 1
일단 두 코드 모두 "setkey"coding이 되어있습니다. 첫번째 코드는 datSmall 데이터에서 race가 "WH” 와 “W"인 모든 객체를 출력해줍니다. 두번째 코드는 "WH"와 "W” 인 race들에 대해서 각 객체가 몇개가 있는지 개수를 알려줍니다. 여기서 “.N"을 사용하게 되면 Female의 값과 NA의 값이 추가되어서 출력되어집니다.
## (6)
datSmall[data.table("WH", "M")]
## race gender charges_citation housing_location M
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05- M
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2 M
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1 M
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR M
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2 M
## ---
## 2011: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1 M
## 2012: WH M 38-9-1 15-EMAW M
## 2013: WH M 38-19-3 15-EMAW M
## 2014: WH M 56.5-704 15-EMAW M
## 2015: WH M 95.5-11-501 15-DRAW M
datSmall[data.table("WH", "M")][, .N, list(race, gender)]
## race gender N
## 1: WH M 1682
## 2: WH F 333
data.table을 이용합니다.
## (7)
datSmall[J("WH", "M")]
## race gender charges_citation housing_location V2
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05- M
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2 M
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1 M
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR M
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2 M
## ---
## 2011: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1 M
## 2012: WH M 38-9-1 15-EMAW M
## 2013: WH M 38-19-3 15-EMAW M
## 2014: WH M 56.5-704 15-EMAW M
## 2015: WH M 95.5-11-501 15-DRAW M
datSmall[CJ("WH", "M")]
## race gender charges_citation housing_location V2
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05- M
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2 M
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1 M
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR M
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2 M
## ---
## 2011: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1 M
## 2012: WH M 38-9-1 15-EMAW M
## 2013: WH M 38-19-3 15-EMAW M
## 2014: WH M 56.5-704 15-EMAW M
## 2015: WH M 95.5-11-501 15-DRAW M
## (8)
datSmall[J("WH", "W")]
## race gender charges_citation housing_location V2
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05- W
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2 W
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1 W
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR W
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2 W
## ---
## 2011: WH M 720 ILCS 5/32-10(a) 01-H-1-6-1 W
## 2012: WH M 38-9-1 15-EMAW W
## 2013: WH M 38-19-3 15-EMAW W
## 2014: WH M 56.5-704 15-EMAW W
## 2015: WH M 95.5-11-501 15-DRAW W
하지만 이 코드는 같은 변수에서 두개의 key를 사용 할 수 있습니다.
datSmall[J(c("WH", "W"))]
## race gender charges_citation housing_location
## 1: WH M 720 ILCS 5 12-3.4(a)(2) [16145 05-
## 2: WH M 720 ILCS 5 12-3.2 [930200] 05-L-2-1-2
## 3: WH F 720 ILCS 5 12-3.2(a)(2) [10418 04-Q-1-11-1
## 4: WH M 720 ILCS 5 12-3.2(a)(1) [10416 08-2N-DR
## 5: WH F 720 ILCS 5 16A-3(a) [15601] 17-WR-N-A-2
## ---
## 2055: W M 720 ILCS 550/5(g) 10-C-1-7-1
## 2056: W M 720 ILCS 5/12-14.1(a)(1) C DISCH
## 2057: W M 720 ILCS 5/12-4.3(a) 11-DH-3-411
## 2058: W M 720 ILCS 5/24-1.1 06-C-1-13-1
## 2059: W M 38-10-2(a)(3) 01-H-1-13-2
Hankuk University of Foreign Studies.
Dept of Statistics. Daewoo Choi Lab. Jaemyung Kwon
한국외국어대학교 통계학과 최대우 연구실 권재명 e-mail : jaemyung.kw@gmail.com