Data 작업
- OECD 국가들의 Gini계수 읽어들이기. 세전과 세후로 구분. 자료구조로 인하여
sep="\t"
을 사용한 것에 유의
Gini.b.tax<-read.table(file="../data/Gini_before_tax.txt", header=F, sep="\t")
Gini.a.tax<-read.table(file="../data/Gini_after_tax.txt", header=F, sep="\t")
str(Gini.b.tax)
## 'data.frame': 34 obs. of 8 variables:
## $ V1: chr "Australia" "Austria" "Belgium" "Canada" ...
## $ V2: num NA NA NA 0.385 NA NA NA NA 0.343 NA ...
## $ V3: num NA NA 0.449 0.395 NA NA 0.373 NA 0.387 0.38 ...
## $ V4: num NA NA NA 0.403 NA NA 0.396 NA NA 0.37 ...
## $ V5: num 0.467 NA 0.472 0.43 0.441 0.442 0.417 NA 0.479 0.473 ...
## $ V6: num 0.476 NA 0.464 0.44 NA 0.472 0.415 NA 0.478 0.49 ...
## $ V7: num 0.465 0.433 0.494 0.436 0.414 0.474 0.417 0.504 0.483 0.485 ...
## $ V8: num 0.468 0.472 0.469 0.441 0.426 0.444 0.416 0.458 0.465 0.483 ...
str(Gini.a.tax)
## 'data.frame': 34 obs. of 8 variables:
## $ V1: chr "Australia" "Austria" "Belgium" "Canada" ...
## $ V2: num NA NA NA 0.304 NA NA NA NA 0.235 NA ...
## $ V3: num NA 0.236 0.274 0.293 NA NA 0.221 NA 0.209 0.3 ...
## $ V4: num NA NA NA 0.287 NA 0.232 0.226 NA NA 0.29 ...
## $ V5: num 0.309 0.238 0.287 0.289 0.427 0.257 0.215 NA 0.218 0.277 ...
## $ V6: num 0.317 0.252 0.289 0.318 NA 0.26 0.226 NA 0.247 0.287 ...
## $ V7: num 0.315 0.265 0.271 0.317 0.403 0.268 0.232 0.349 0.254 0.288 ...
## $ V8: num 0.336 0.261 0.259 0.324 0.394 0.256 0.248 0.315 0.259 0.293 ...
- 2000년 후반 자료만 모아서 새로운 data frame 구성
(Gini.b.a<-data.frame(Country=Gini.b.tax$V1, Before=Gini.b.tax$V8, After=Gini.a.tax$V8))
## Country Before After
## 1 Australia 0.468 0.336
## 2 Austria 0.472 0.261
## 3 Belgium 0.469 0.259
## 4 Canada 0.441 0.324
## 5 Chile 0.426 0.394
## 6 Czech_Republic 0.444 0.256
## 7 Denmark 0.416 0.248
## 8 Estonia 0.458 0.315
## 9 Finland 0.465 0.259
## 10 France 0.483 0.293
## 11 Germany 0.504 0.295
## 12 Greece 0.436 0.307
## 13 Hungary 0.466 0.272
## 14 Iceland 0.382 0.301
## 15 Ireland NA 0.293
## 16 Israel 0.498 0.371
## 17 Italy 0.534 0.337
## 18 Japan 0.462 0.329
## 19 Luxembourg 0.482 0.288
## 20 Mexico 0.494 0.476
## 21 Netherlands 0.426 0.294
## 22 New_Zealand 0.455 0.330
## 23 Norway 0.410 0.250
## 24 Poland 0.470 0.305
## 25 Portugal 0.521 0.353
## 26 Slovak_Republic 0.416 0.257
## 27 Slovenia 0.423 0.236
## 28 South_Korea 0.344 0.315
## 29 Spain 0.461 0.317
## 30 Sweden 0.426 0.259
## 31 Switzerland 0.409 0.303
## 32 Turkey 0.470 0.409
## 33 United_Kingdom 0.456 0.345
## 34 United_States 0.486 0.378
- 세전과 세후의 Gini 계수 차이를 개선도(Improvement)라고 명명.
Gini.b.a$Improvement<-Gini.b.a[,2]-Gini.b.a[,3]
Gini.b.a
## Country Before After Improvement
## 1 Australia 0.468 0.336 0.132
## 2 Austria 0.472 0.261 0.211
## 3 Belgium 0.469 0.259 0.210
## 4 Canada 0.441 0.324 0.117
## 5 Chile 0.426 0.394 0.032
## 6 Czech_Republic 0.444 0.256 0.188
## 7 Denmark 0.416 0.248 0.168
## 8 Estonia 0.458 0.315 0.143
## 9 Finland 0.465 0.259 0.206
## 10 France 0.483 0.293 0.190
## 11 Germany 0.504 0.295 0.209
## 12 Greece 0.436 0.307 0.129
## 13 Hungary 0.466 0.272 0.194
## 14 Iceland 0.382 0.301 0.081
## 15 Ireland NA 0.293 NA
## 16 Israel 0.498 0.371 0.127
## 17 Italy 0.534 0.337 0.197
## 18 Japan 0.462 0.329 0.133
## 19 Luxembourg 0.482 0.288 0.194
## 20 Mexico 0.494 0.476 0.018
## 21 Netherlands 0.426 0.294 0.132
## 22 New_Zealand 0.455 0.330 0.125
## 23 Norway 0.410 0.250 0.160
## 24 Poland 0.470 0.305 0.165
## 25 Portugal 0.521 0.353 0.168
## 26 Slovak_Republic 0.416 0.257 0.159
## 27 Slovenia 0.423 0.236 0.187
## 28 South_Korea 0.344 0.315 0.029
## 29 Spain 0.461 0.317 0.144
## 30 Sweden 0.426 0.259 0.167
## 31 Switzerland 0.409 0.303 0.106
## 32 Turkey 0.470 0.409 0.061
## 33 United_Kingdom 0.456 0.345 0.111
## 34 United_States 0.486 0.378 0.108
- 개선도가 낮은 순서로 나열. 아일랜드는 세전 자료가 없기 때문에 맨 뒤로 위치.
Gini.b.a[order(Gini.b.a$Improvement), ]
## Country Before After Improvement
## 20 Mexico 0.494 0.476 0.018
## 28 South_Korea 0.344 0.315 0.029
## 5 Chile 0.426 0.394 0.032
## 32 Turkey 0.470 0.409 0.061
## 14 Iceland 0.382 0.301 0.081
## 31 Switzerland 0.409 0.303 0.106
## 34 United_States 0.486 0.378 0.108
## 33 United_Kingdom 0.456 0.345 0.111
## 4 Canada 0.441 0.324 0.117
## 22 New_Zealand 0.455 0.330 0.125
## 16 Israel 0.498 0.371 0.127
## 12 Greece 0.436 0.307 0.129
## 1 Australia 0.468 0.336 0.132
## 21 Netherlands 0.426 0.294 0.132
## 18 Japan 0.462 0.329 0.133
## 8 Estonia 0.458 0.315 0.143
## 29 Spain 0.461 0.317 0.144
## 26 Slovak_Republic 0.416 0.257 0.159
## 23 Norway 0.410 0.250 0.160
## 24 Poland 0.470 0.305 0.165
## 30 Sweden 0.426 0.259 0.167
## 7 Denmark 0.416 0.248 0.168
## 25 Portugal 0.521 0.353 0.168
## 27 Slovenia 0.423 0.236 0.187
## 6 Czech_Republic 0.444 0.256 0.188
## 10 France 0.483 0.293 0.190
## 13 Hungary 0.466 0.272 0.194
## 19 Luxembourg 0.482 0.288 0.194
## 17 Italy 0.534 0.337 0.197
## 9 Finland 0.465 0.259 0.206
## 11 Germany 0.504 0.295 0.209
## 3 Belgium 0.469 0.259 0.210
## 2 Austria 0.472 0.261 0.211
## 15 Ireland NA 0.293 NA
- 개선도가 높은 순서로 나라명을 나열하려면,
decreasing = TRUE
추가.
Gini.b.a[order(Gini.b.a$Improvement, decreasing=TRUE), ]
## Country Before After Improvement
## 2 Austria 0.472 0.261 0.211
## 3 Belgium 0.469 0.259 0.210
## 11 Germany 0.504 0.295 0.209
## 9 Finland 0.465 0.259 0.206
## 17 Italy 0.534 0.337 0.197
## 13 Hungary 0.466 0.272 0.194
## 19 Luxembourg 0.482 0.288 0.194
## 10 France 0.483 0.293 0.190
## 6 Czech_Republic 0.444 0.256 0.188
## 27 Slovenia 0.423 0.236 0.187
## 25 Portugal 0.521 0.353 0.168
## 7 Denmark 0.416 0.248 0.168
## 30 Sweden 0.426 0.259 0.167
## 24 Poland 0.470 0.305 0.165
## 23 Norway 0.410 0.250 0.160
## 26 Slovak_Republic 0.416 0.257 0.159
## 29 Spain 0.461 0.317 0.144
## 8 Estonia 0.458 0.315 0.143
## 18 Japan 0.462 0.329 0.133
## 1 Australia 0.468 0.336 0.132
## 21 Netherlands 0.426 0.294 0.132
## 12 Greece 0.436 0.307 0.129
## 16 Israel 0.498 0.371 0.127
## 22 New_Zealand 0.455 0.330 0.125
## 4 Canada 0.441 0.324 0.117
## 33 United_Kingdom 0.456 0.345 0.111
## 34 United_States 0.486 0.378 0.108
## 31 Switzerland 0.409 0.303 0.106
## 14 Iceland 0.382 0.301 0.081
## 32 Turkey 0.470 0.409 0.061
## 5 Chile 0.426 0.394 0.032
## 28 South_Korea 0.344 0.315 0.029
## 20 Mexico 0.494 0.476 0.018
## 15 Ireland NA 0.293 NA
Graphic representation
- 세전 세후 Gini 계수를 시각적으로 비교하려면
barplot()
이 적합함. barplot(height, ...)
에서 height
가 매트릭스일 때는 막대는 열의 각 요소를 크기대로 쌓아놓은 형태가 되므로, t()
를 이용하여 transpose시킨 후 barplot()
을 적용. 또한 transpose를 시켜도 여전히 data frame 이기 때문에 매트릭스로 강제 변환함. 세전, 세후 비교를 위해 쌓아 놓기 보다는 옆에 늘어세우는 게 나으므로 beside=TRUE
를 적용하고 각 막대의 이름으로 나라이름을 사용.
barplot(as.matrix(t(Gini.b.a[, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country)

- 개선도 순서(내림차순)를
o.improvement
로 저장하여 지속적으로 활용.
o.improvement<-order(Gini.b.a$Improvement, decreasing=TRUE)
Gini.b.a$Country[o.improvement]
## [1] "Austria" "Belgium" "Germany"
## [4] "Finland" "Italy" "Hungary"
## [7] "Luxembourg" "France" "Czech_Republic"
## [10] "Slovenia" "Portugal" "Denmark"
## [13] "Sweden" "Poland" "Norway"
## [16] "Slovak_Republic" "Spain" "Estonia"
## [19] "Japan" "Australia" "Netherlands"
## [22] "Greece" "Israel" "New_Zealand"
## [25] "Canada" "United_Kingdom" "United_States"
## [28] "Switzerland" "Iceland" "Turkey"
## [31] "Chile" "South_Korea" "Mexico"
## [34] "Ireland"
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country[o.improvement])

barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country[o.improvement], las=2)

- 나라 이름이 가리지 않도록
par("mai")
를 조정
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.5, 0.8, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country[o.improvement], las=2)

par(old.par)
- 불평등이 심하다고 판단하는 Gini 계수 0.4를 경계로 나눠 보면,
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.5, 0.8, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country[o.improvement], las=2)
abline(h=0.4, lty=2, col="red")

par(old.par)
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.5, 0.8, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, names.arg=Gini.b.a$Country[o.improvement], legend.text=c("Before Tax", "After Tax"), args.legend=list(x=105, y=0.62), las=2)
abline(h=0.4, lty=2, col="red")
title(main="Gini Coefficients of OECD Countries")

par(old.par)
- 이번에는 막대를 눕히는 방법을 생각해 보자. 옆으로 눕히면서
las = 1
로 설정하면,
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, horiz=TRUE, names.arg=Gini.b.a$Country[o.improvement], las=1)

- 역시 나라 이름이 가리지 않도록
par("mai")
를 조정.
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.0, 1.5, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[o.improvement, 2:3])), beside=TRUE, horiz=TRUE, names.arg=Gini.b.a$Country[o.improvement], las=1)

par(old.par)
- 개선도가 낮은 순서대로 밑에서 올라가도록 다시 그리면,
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.0, 1.5, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[order(Gini.b.a$Improvement, na.last=FALSE), 2:3])), beside=TRUE, horiz=TRUE, names.arg=Gini.b.a$Country[order(Gini.b.a$Improvement, na.last=FALSE)], las=1)

par(old.par)
- 이 때, Ireland가 맨 위에 올라오는 게 보기 좋지 않으므로,
na.last=FALSE
를 추가한 것임.
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.0, 1.5, 0.8, 0.4))
barplot(as.matrix(t(Gini.b.a[order(Gini.b.a$Improvement, na.last=FALSE), 2:3])), beside=TRUE, horiz=TRUE, names.arg=Gini.b.a$Country[order(Gini.b.a$Improvement, na.last=FALSE)], las=1)
abline(v=0.4, lty=2, col="red")

par(old.par)
- 범례 및 메인 타이틀 추가. 시행착오를 거쳐 구한 좌표에 유의할 것.
old.par<-par(no.readonly=TRUE)
par("mai")
## [1] 1.02 0.82 0.82 0.42
par("mai"= c(1.0, 1.5, 0.8, 0.8))
barplot(as.matrix(t(Gini.b.a[order(Gini.b.a$Improvement, na.last=FALSE), 2:3])), beside=TRUE, horiz=TRUE, names.arg=Gini.b.a$Country[order(Gini.b.a$Improvement, na.last=FALSE)], legend.text=c("Before Tax", "After Tax"), args.legend=list(x=0.67, y=110), las=1)
abline(v=0.4, lty=2, col="red")
title(main="Gini Coefficients of OECD Countries")

par(old.par)