Titanic_Example

인터넷에서 구한 타이타닉 승객 데이터를 이용해 몇 가지 분석을 해봤다.

사용데이터는 http://dl.dropbox.com/u/8686172/titanic.csv 에서 받을 수 있다.

알고싶은 부분은 승객등급, 나이, 성별에 따른 생존률의 차이가 어떻게 나는지 인데…

타이타닉 남녀 주인공이 모두 승객 등급이 1,3 등급으로 등급의 격차가 컸다는 것을 염두에 두자.

분석에 필요한 패키지들을 다운로드하고 불러오기

library(data.table)
library(ggplot2)

CSV파일 읽어오기


```r
# 파일을 불러온다.
titanic <- read.csv("http://dl.dropboxusercontent.com/u/8686172/titanic.csv")

# 분석 편의상 data.table로 변환한다.
titanic.dt <- as.data.table(titanic)

# 데이터의 유형을 미리보기 해보자.
names(titanic.dt)
##  [1] "X"         "pclass"    "survived"  "name"      "sex"      
##  [6] "age"       "sibsp"     "parch"     "ticket"    "fare"     
## [11] "cabin"     "embarked"  "boat"      "body"      "home.dest"
head(titanic.dt)
##    X pclass survived                            name    sex     age sibsp
## 1: 1    1st        1   Allen, Miss. Elisabeth Walton female 29.0000     0
## 2: 2    1st        1  Allison, Master. Hudson Trevor   male  0.9167     1
## 3: 3    1st        0    Allison, Miss. Helen Loraine female  2.0000     1
## 4: 4    1st        0 Allison, Mr. Hudson Joshua Crei   male 30.0000     1
## 5: 5    1st        0 Allison, Mrs. Hudson J C (Bessi female 25.0000     1
## 6: 6    1st        1             Anderson, Mr. Harry   male 48.0000     0
##    parch ticket   fare   cabin    embarked boat body
## 1:     0  24160 211.34      B5 Southampton    2   NA
## 2:     2 113781 151.55 C22 C26 Southampton   11   NA
## 3:     2 113781 151.55 C22 C26 Southampton        NA
## 4:     2 113781 151.55 C22 C26 Southampton       135
## 5:     2 113781 151.55 C22 C26 Southampton        NA
## 6:     0  19952  26.55     E12 Southampton    3   NA
##                          home.dest
## 1:                    St Louis, MO
## 2: Montreal, PQ / Chesterville, ON
## 3: Montreal, PQ / Chesterville, ON
## 4: Montreal, PQ / Chesterville, ON
## 5: Montreal, PQ / Chesterville, ON
## 6:                    New York, NY

# 변수 'survived'를 요인으로 지정한다.
titanic.dt$survived <- as.factor(titanic.dt$survived)

ggplot을 이용한 등급/성별 조합에 따른 생존률 분석

# 15세 미만을 아이로 본다.

titanic.dt[, `:=`(isminor, "adult")]
##          X pclass survived                            name    sex     age
##    1:    1    1st        1   Allen, Miss. Elisabeth Walton female 29.0000
##    2:    2    1st        1  Allison, Master. Hudson Trevor   male  0.9167
##    3:    3    1st        0    Allison, Miss. Helen Loraine female  2.0000
##    4:    4    1st        0 Allison, Mr. Hudson Joshua Crei   male 30.0000
##    5:    5    1st        0 Allison, Mrs. Hudson J C (Bessi female 25.0000
##   ---                                                                    
## 1305: 1305    3rd        0            Zabour, Miss. Hileni female 14.5000
## 1306: 1306    3rd        0           Zabour, Miss. Thamine female      NA
## 1307: 1307    3rd        0       Zakarian, Mr. Mapriededer   male 26.5000
## 1308: 1308    3rd        0             Zakarian, Mr. Ortin   male 27.0000
## 1309: 1309    3rd        0              Zimmerman, Mr. Leo   male 29.0000
##       sibsp parch ticket    fare   cabin    embarked boat body
##    1:     0     0  24160 211.337      B5 Southampton    2   NA
##    2:     1     2 113781 151.550 C22 C26 Southampton   11   NA
##    3:     1     2 113781 151.550 C22 C26 Southampton        NA
##    4:     1     2 113781 151.550 C22 C26 Southampton       135
##    5:     1     2 113781 151.550 C22 C26 Southampton        NA
##   ---                                                         
## 1305:     1     0   2665  14.454           Cherbourg       328
## 1306:     1     0   2665  14.454           Cherbourg        NA
## 1307:     0     0   2656   7.225           Cherbourg       304
## 1308:     0     0   2670   7.225           Cherbourg        NA
## 1309:     0     0 315082   7.875         Southampton        NA
##                             home.dest isminor
##    1:                    St Louis, MO   adult
##    2: Montreal, PQ / Chesterville, ON   adult
##    3: Montreal, PQ / Chesterville, ON   adult
##    4: Montreal, PQ / Chesterville, ON   adult
##    5: Montreal, PQ / Chesterville, ON   adult
##   ---                                        
## 1305:                                   adult
## 1306:                                   adult
## 1307:                                   adult
## 1308:                                   adult
## 1309:                                   adult
titanic.dt[age < 15, `:=`(isminor, "child")]
##          X pclass survived                            name    sex     age
##    1:    1    1st        1   Allen, Miss. Elisabeth Walton female 29.0000
##    2:    2    1st        1  Allison, Master. Hudson Trevor   male  0.9167
##    3:    3    1st        0    Allison, Miss. Helen Loraine female  2.0000
##    4:    4    1st        0 Allison, Mr. Hudson Joshua Crei   male 30.0000
##    5:    5    1st        0 Allison, Mrs. Hudson J C (Bessi female 25.0000
##   ---                                                                    
## 1305: 1305    3rd        0            Zabour, Miss. Hileni female 14.5000
## 1306: 1306    3rd        0           Zabour, Miss. Thamine female      NA
## 1307: 1307    3rd        0       Zakarian, Mr. Mapriededer   male 26.5000
## 1308: 1308    3rd        0             Zakarian, Mr. Ortin   male 27.0000
## 1309: 1309    3rd        0              Zimmerman, Mr. Leo   male 29.0000
##       sibsp parch ticket    fare   cabin    embarked boat body
##    1:     0     0  24160 211.337      B5 Southampton    2   NA
##    2:     1     2 113781 151.550 C22 C26 Southampton   11   NA
##    3:     1     2 113781 151.550 C22 C26 Southampton        NA
##    4:     1     2 113781 151.550 C22 C26 Southampton       135
##    5:     1     2 113781 151.550 C22 C26 Southampton        NA
##   ---                                                         
## 1305:     1     0   2665  14.454           Cherbourg       328
## 1306:     1     0   2665  14.454           Cherbourg        NA
## 1307:     0     0   2656   7.225           Cherbourg       304
## 1308:     0     0   2670   7.225           Cherbourg        NA
## 1309:     0     0 315082   7.875         Southampton        NA
##                             home.dest isminor
##    1:                    St Louis, MO   adult
##    2: Montreal, PQ / Chesterville, ON   child
##    3: Montreal, PQ / Chesterville, ON   child
##    4: Montreal, PQ / Chesterville, ON   adult
##    5: Montreal, PQ / Chesterville, ON   adult
##   ---                                        
## 1305:                                   child
## 1306:                                   adult
## 1307:                                   adult
## 1308:                                   adult
## 1309:                                   adult
titanic.dt$isminor <- as.factor(titanic.dt$isminor)

# 등급별 생존률
titanic.dt[, length(which(survived == 1))/nrow(.SD), by = pclass]
##    pclass     V1
## 1:    1st 0.6192
## 2:    2nd 0.4296
## 3:    3rd 0.2553

# 성별 생존률
titanic.dt[, length(which(survived == 1))/nrow(.SD), by = sex]
##       sex     V1
## 1: female 0.7275
## 2:   male 0.1910

survived가 1은 생존한 사람이기 때문에 1에 해당하는 사람들의 수를 구하고, 그 수를 전체갯수로 나눠주면, 생존률을 구할수 있다.

# 등급/성별 생존률
survived_pclass_sex <- titanic.dt[, list(cntsurv = length(which(survived == 
    1)), cntdie = length(which(survived == 0))), by = list(pclass, sex)][, list(psurvived = cntsurv/(cntsurv + 
    cntdie)), by = list(pclass, sex)]

ggplot을 이용한 등급/성별/나이의 조합에 따른 생존률 분석

# 성별, 등급, 나이별 생존률
survived_pclass_sex_isminor <- titanic.dt[, list(cntsurv = length(which(survived == 
    1)), cntdie = length(which(survived == 0))), by = list(pclass, sex, isminor)][, 
    list(psurvived = cntsurv/(cntsurv + cntdie)), by = list(pclass, sex, isminor)]

# 2차원으로 나타내기위해 성별과 나이를 묶는다.
survived_pclass_sex_isminor$sex_age <- apply(survived_pclass_sex_isminor[, list(sex, 
    isminor)], 1, paste, collapse = "_")

library(scales)
ggplot(survived_pclass_sex_isminor, aes(pclass, sex_age)) + geom_tile(aes(fill = psurvived)) + 
    scale_fill_gradient2("생존률", low = muted("white"), high = muted("blue")) + 
    xlab("승객등급") + ylab("성별과 나이")

plot of chunk unnamed-chunk-5