US Top Income Share 2014

Data Preparation

준비한 자료는 E. Saez 교수의 홈페이지에 있는 TabFig2014prel.xls 인데 이를 약간의 작업을 거쳐 R에서 불러들이기 편하게 고쳐놓았다.

작업을 마친 자료파일은 US.top.income.shares.14이며, 이 자료의 구조와 앞의 몇 열의 값은 다음과 같다.

Year	P90_100	P95_100	P99_100	P99.5_100	P99.9_100	P99.99_100	P90_95	P95_99	P99_99.5	P99.5_99.9	P99.9_99.99
1913	NA	NA	18.0	14.7	8.6	2.76	NA	NA	3.2	6.1	5.9
1914	NA	NA	18.2	15.1	8.6	2.73	NA	NA	3.1	6.5	5.9
1915	NA	NA	17.6	14.6	9.2	4.36	NA	NA	3.0	5.4	4.9
1916	NA	NA	19.3	16.4	10.5	4.78	NA	NA	2.9	5.9	5.7
1917	41	31	17.7	14.3	8.4	3.37	9.9	13	3.4	5.9	5.0
1918	40	29	16.0	12.4	6.7	2.45	10.6	14	3.5	5.7	4.3
1919	40	30	16.4	12.6	6.6	2.29	10.2	14	3.8	6.0	4.3
1920	39	28	14.8	11.1	5.4	1.66	10.7	13	3.7	5.8	3.7
1921	43	31	15.6	11.7	5.6	1.69	12.4	15	3.9	6.1	3.9
1922	44	32	17.1	13.1	6.6	2.27	11.8	15	4.0	6.4	4.4
1923	41	30	15.6	11.9	5.9	2.00	11.7	14	3.7	6.0	3.9
1924	44	32	17.4	13.4	6.8	2.32	12.3	15	4.0	6.6	4.5
1925	46	35	20.2	15.9	8.5	3.31	11.3	15	4.4	7.3	5.2
1926	46	35	19.9	15.6	8.5	3.36	11.1	15	4.4	7.1	5.1
1927	47	36	21.0	16.6	9.2	3.75	11.0	15	4.4	7.3	5.5
1928	49	39	23.9	19.4	11.5	5.02	10.7	15	4.5	7.9	6.5
1929	47	36	22.4	18.1	10.9	4.99	10.2	14	4.3	7.2	5.9
1930	44	32	17.2	13.2	7.1	2.84	11.8	15	4.0	6.1	4.2
1931	45	31	15.5	11.6	5.9	2.25	13.3	16	3.9	5.7	3.6
1932	46	33	15.6	11.6	6.0	1.99	13.7	17	3.9	5.7	4.0
1933	46	33	16.5	12.5	6.6	2.34	12.4	17	4.0	5.9	4.3
1934	46	34	16.4	12.3	6.1	2.07	12.1	17	4.1	6.2	4.1
1935	44	32	16.7	12.6	6.4	2.19	12.2	16	4.0	6.2	4.2
1936	47	35	19.3	14.9	7.6	2.54	12.0	15	4.4	7.3	5.0
1937	44	32	17.1	13.0	6.5	2.17	12.0	15	4.1	6.5	4.3
1938	44	31	15.8	11.8	5.9	2.19	12.7	16	4.0	5.9	3.7
1939	46	32	16.2	12.1	5.9	1.96	13.2	16	4.1	6.2	3.9
1940	45	32	16.5	12.3	6.0	2.04	13.1	16	4.1	6.3	4.0
1941	42	30	15.8	11.9	5.8	1.98	11.9	14	3.9	6.1	3.8
1942	36	26	13.4	10.1	4.8	1.55	10.3	12	3.4	5.3	3.3
1943	34	24	12.3	9.2	4.3	1.24	9.6	12	3.2	4.9	3.0
1944	33	23	11.3	8.3	3.8	1.16	9.8	11	3.0	4.5	2.6
1945	34	25	12.5	9.1	4.2	1.26	9.6	12	3.4	5.0	2.9
1946	37	27	13.3	9.6	4.4	1.47	9.9	13	3.7	5.2	2.9
1947	34	25	12.0	8.6	3.9	1.30	9.7	13	3.3	4.7	2.6
1948	35	25	12.2	8.9	4.1	1.31	10.0	13	3.3	4.8	2.8
1949	35	25	11.7	8.5	3.8	1.24	10.2	13	3.2	4.6	2.6
1950	36	26	12.8	9.4	4.4	1.22	10.0	13	3.5	5.0	3.2
1951	34	24	11.8	8.5	3.9	1.28	10.0	12	3.3	4.6	2.6
1952	33	23	10.8	7.7	3.4	1.09	10.1	12	3.0	4.3	2.3
1953	32	22	9.9	7.0	3.1	0.97	10.3	12	2.9	4.0	2.1
1954	34	23	10.8	7.7	3.5	1.17	10.3	13	3.1	4.2	2.3
1955	34	24	11.1	8.0	3.7	1.32	10.3	13	3.1	4.2	2.4
1956	33	23	10.7	7.7	3.5	1.20	10.3	12	3.0	4.2	2.3
1957	33	23	10.2	7.2	3.2	1.05	10.4	12	2.9	4.0	2.1
1958	34	23	10.2	7.3	3.2	1.08	10.6	13	2.9	4.0	2.1
1959	34	23	10.7	7.7	3.5	1.19	10.6	13	2.9	4.3	2.3
1960	33	23	10.0	7.1	3.2	1.17	10.9	13	2.9	3.9	2.1
1961	34	24	10.6	7.7	3.6	1.38	10.8	13	3.0	4.0	2.3
1962	34	23	9.9	7.1	3.2	1.16	10.9	13	2.9	3.9	2.0
1963	34	23	9.9	7.0	3.1	1.15	10.9	13	2.9	3.9	2.0
1964	34	24	10.5	7.4	3.4	1.30	10.9	13	3.1	4.0	2.1
1965	35	24	10.9	7.7	3.7	1.49	10.9	13	3.2	4.1	2.2
1966	34	23	10.2	7.2	3.4	1.29	10.8	13	3.0	3.8	2.1
1967	34	24	10.7	7.7	3.7	1.42	10.7	13	3.1	4.0	2.3
1968	35	24	11.2	8.1	4.0	1.61	10.7	13	3.1	4.1	2.4
1969	34	23	10.3	7.5	3.7	1.56	10.8	13	2.9	3.8	2.1
1970	33	22	9.0	6.2	2.8	1.00	11.0	13	2.8	3.5	1.8
1971	33	22	9.4	6.6	3.0	1.11	11.1	13	2.8	3.6	1.9
1972	34	23	9.6	6.8	3.1	1.18	11.1	13	2.9	3.6	1.9
1973	33	22	9.2	6.3	2.8	0.94	11.1	13	2.9	3.5	1.8
1974	33	22	9.1	6.3	2.7	0.88	11.2	13	2.8	3.6	1.9
1975	33	22	8.9	6.1	2.6	0.85	11.4	13	2.8	3.5	1.7
1976	33	22	8.9	6.1	2.6	0.86	11.4	13	2.8	3.5	1.7
1977	34	22	9.0	6.2	2.7	0.92	11.5	13	2.8	3.5	1.8
1978	33	22	8.9	6.2	2.6	0.86	11.4	13	2.8	3.5	1.8
1979	34	23	10.0	7.1	3.4	1.37	11.3	13	2.9	3.7	2.1
1980	35	23	10.0	7.2	3.4	1.28	11.5	13	2.9	3.7	2.1
1981	35	23	10.0	7.2	3.6	1.37	11.5	13	2.8	3.7	2.2
1982	35	24	10.8	8.0	4.2	1.73	11.5	13	2.8	3.8	2.4
1983	36	25	11.6	8.6	4.6	1.88	11.5	13	2.9	4.0	2.7
1984	37	25	12.0	9.0	5.0	2.15	11.4	13	3.0	4.1	2.8
1985	38	26	12.7	9.6	5.3	2.24	11.4	13	3.0	4.3	3.1
1986	41	29	15.9	12.6	7.4	3.34	11.1	14	3.3	5.2	4.0
1987	38	27	12.7	9.4	4.9	1.91	11.7	14	3.2	4.5	3.0
1988	41	29	15.5	12.1	6.8	2.86	11.3	14	3.4	5.3	3.9
1989	40	29	14.5	11.1	6.0	2.45	11.5	14	3.4	5.1	3.5
1990	40	28	14.3	10.9	5.8	2.33	11.6	14	3.4	5.1	3.5
1991	40	28	13.4	10.0	5.1	1.96	11.8	14	3.4	4.9	3.2
1992	41	29	14.7	11.2	6.0	2.46	11.8	14	3.5	5.2	3.6
1993	41	29	14.2	10.8	5.7	2.32	11.8	15	3.5	5.0	3.4
1994	41	29	14.2	10.7	5.7	2.29	11.9	15	3.5	5.0	3.4
1995	42	30	15.2	11.6	6.2	2.46	11.9	15	3.7	5.4	3.7
1996	43	32	16.7	12.9	7.2	3.06	11.7	15	3.8	5.7	4.2
1997	45	33	18.0	14.2	8.2	3.53	11.5	15	3.9	6.0	4.7
1998	45	34	19.1	15.2	9.0	3.92	11.3	15	3.9	6.2	5.1
1999	46	35	20.0	16.0	9.6	4.21	11.2	15	4.0	6.4	5.4
2000	48	37	21.5	17.4	10.9	5.07	11.0	15	4.1	6.6	5.8
2001	45	33	18.2	14.3	8.4	3.70	11.5	15	3.9	6.0	4.7
2002	44	32	16.9	13.0	7.3	3.14	11.8	15	3.8	5.7	4.2
2003	45	33	17.5	13.7	7.9	3.49	11.8	15	3.9	5.8	4.4
2004	46	35	19.8	15.7	9.5	4.34	11.4	15	4.0	6.3	5.1
2005	48	37	21.9	17.8	11.0	5.13	11.2	15	4.1	6.8	5.9
2006	49	38	22.8	18.6	11.6	5.46	11.2	15	4.2	7.0	6.1
2007	50	39	23.5	19.3	12.3	6.04	11.1	15	4.2	7.0	6.2
2008	48	37	20.9	16.9	10.4	5.03	11.7	16	4.1	6.5	5.4
2009	46	34	18.1	14.2	8.3	3.89	12.4	16	4.0	5.9	4.4
2010	48	36	19.9	15.8	9.7	4.78	12.2	16	4.0	6.2	4.9
2011	48	36	19.6	15.6	9.3	4.32	12.2	16	4.1	6.3	4.9
2012	51	39	22.8	18.6	11.7	5.81	11.8	16	4.2	6.9	5.9
2013	49	37	20.1	15.9	9.5	4.48	12.3	17	4.2	6.4	5.0
2014	50	38	21.2	17.0	10.3	4.89	12.1	17	4.3	6.7	5.4

이 중에서 소득 상위 10%(P90_100)를 상위 1%(P99_100), 차상위 4%(P95_99), 차차상위 5%(P90_95)로 나누어 그 몫의 변화를 살펴보자. 우선 밋밋하게 상위 1%의 소득 점유율 변화만 그려보면,

plot(P99_100 ~ Year, data = US.top.income.shares.14)

최소한의 정보를 주기 위하여 각 자료의 최대값, 최소값을 고려하여 y-축의 범위를 ylim=c(5,25)로 설정하고, x-축의 눈금은 비워둔다. 점들을 선으로 이어주고, 점은 다시 삼각형 모양으로 바꾼다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5, 25), xaxt = "n", type = "b", pch = 17)

이제 x-축에는 연도를 10년 단위로 표시하고, lines()를 이용하여 차상위4%와 차차상위5%의 소득 점유율 변화를 함께 그린다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5, 25), xaxt = "n", type = "b", pch = 17)
axis(side = 1, at = seq(1910, 2010, by = 10), labels = seq(1910, 2010, by = 10))
lines(P95_99 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "red")
lines(P90_95 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "blue")

도표의 가독성을 높이기 위하여 x-축과 y-축을 가로지르는 격자를 설치한다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5, 25), xaxt = "n", type = "b", pch = 17)
axis(side = 1, at = seq(1910, 2010, by = 10), labels = seq(1910, 2010, by = 10))
lines(P95_99 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "red")
lines(P90_95 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "blue")
abline(h = seq(5, 25, by = 5), lty = 2)
abline(v = seq(1910, 2010, by = 10), lty = 2)

TabFig2014prel.xls의 Table 0 시트에 나와 있는 정보를 활용하여 범례를 만든다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5,25), xaxt = "n", type = "b", pch = 17)
axis(side = 1, at = seq(1910, 2010, by = 10), labels = seq(1910, 2010, by = 10))
lines(P95_99 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "red")
lines(P90_95 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "blue")
abline(h = seq(5, 25, by = 5), lty = 2)
abline(v = seq(1910, 2010, by = 10), lty = 2)
legend.text <- c("99-100%:$423,090 이상(2014년 기준)", "95-99%:$174,240-$423,090", "90-95%:$121,360-$174,240")
legend(x = 1945, y = 25, legend = legend.text, pch = 17, col = c("black", "red", "blue"))

메인 타이틀을 입력하고, 상위1%의 소득점유율이 최고에 달했던 연도를 표시한다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5,25), xaxt = "n", type = "b", pch = 17)
axis(side = 1, at = seq(1910, 2010, by = 10), labels = seq(1910, 2010, by = 10))
lines(P95_99 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "red")
lines(P90_95 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "blue")
abline(h = seq(5, 25, by = 5), lty = 2)
abline(v = seq(1910, 2010, by = 10), lty = 2)
legend(x = 1945, y = 25, legend = legend.text, pch = 17, col = c("black", "red", "blue"))
main.title <- "미국 소득 상위 10%의 점유율 분할"
title(main = main.title)
text(x = c(1928, 2007), y = c(24, 23.5), labels = c("1928", "2007"), pos = 3)

역사적으로 각 시기를 어떻게 부르고 있는지 텍스트를 추가한다.

plot(P99_100 ~ Year, data = US.top.income.shares.14, xlab = "연도", ylab = "소득점유(%)", ylim = c(5,25), xaxt = "n", type = "b", pch = 17)
axis(side = 1, at = seq(1910, 2010, by = 10), labels = seq(1910, 2010, by = 10))
lines(P95_99 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "red")
lines(P90_95 ~ Year, data = US.top.income.shares.14, type = "b", pch = 17, col = "blue")
abline(h = seq(5, 25, by = 5), lty = 2)
abline(v = seq(1910, 2010, by = 10), lty = 2)
legend(x = 1945, y = 25, legend = legend.text, pch = 17, col = c("black", "red", "blue"))
title(main = main.title)
text(x = c(1928, 2007), y = c(24, 23.5), labels = c("1928", "2007"), pos = 3)
times.label <- c("대공황", "대번영", "대침체")
text(x = c(1935, 1960, 2012), y = c(22, 8, 17.5), label = times.label, cex = 2.0, col = "red")

ggplot

Data Reshaping

reshape2 패키지를 이용하여 wide format 을 long format 으로

library(reshape2)
data.1_10 <- US.top.income.shares.14[c("Year", "P99_100", "P95_99", "P90_95")]
data.1_10.melt <- melt(data.1_10, id.vars = "Year", measure.vars = c("P99_100", "P95_99", "P90_95"), variable.name = "Percentiles", value.name = "Share")
str(data.1_10.melt)

## 'data.frame':    306 obs. of  3 variables:
##  $ Year       : num  1913 1914 1915 1916 1917 ...
##  $ Percentiles: Factor w/ 3 levels "P99_100","P95_99",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Share      : num  18 18.2 17.6 19.3 17.7 ...

골격 그리기

library(ggplot2)
(g0 <- ggplot(data.1_10.melt, aes(x = Year, y = Share, colour = Percentiles)) + 
  geom_line(na.rm = TRUE) + 
  geom_point(shape = 24, aes(fill = Percentiles), size = 2, na.rm = TRUE) + 
  ylim(5, 25))

theme_bw() 적용

(g1 <- g0 + 
  theme_bw())

#(g1 <- g0 + theme_classic())
#(g1 <- g0 + theme_minimal())
#(g1 <- g0 + theme_grey())

격자 설정

(g2 <- g1 + 
   theme(panel.grid.major = element_line(linetype = "dashed", colour = "black")))

x-축 눈금 위치 설정

(g3 <- g2 + 
  scale_x_continuous(breaks = seq(1910, 2010, by = 10)))

한글 정보 입력

한글 테마 sourcing

source("./theme_kr.R")
ls()

##  [1] "data.1_10"               "data.1_10.melt"         
##  [3] "g0"                      "g1"                     
##  [5] "g2"                      "g3"                     
##  [7] "legend.text"             "main.title"             
##  [9] "theme.kr"                "times.label"            
## [11] "US.top.income.shares.14" "v.names"

한글 테마 적용, x-축과 y-축의 라벨 수정

(g4 <- g3 + 
   theme.kr + 
   xlab("연도") + 
   ylab("소득점유(%)"))

전체 제목 추가

(g5 <- g4 + 
   ggtitle(main.title) + 
   theme(plot.title = element_text(size = 20)))

범례 제목 수정

(g6 <- g5 + 
   labs(colour = "소득 분위", fill = "소득 분위") )

범례와 색깔 수정, 범례 제목 없애기

(g7 <- g6 + 
   scale_colour_manual(name = "", values = c("black", "red", "blue"), labels = legend.text) +
   scale_fill_manual(name = "", values = c("black", "red", "blue"), labels = legend.text))

범례를 안쪽으로

(g8 <- g7 + 
   theme(legend.position =  c(0.5, 0.85)))

colour 범례 없애기(colour 설정으로 나타나는 범례의 선 없애는 효과)

#(g9 <- g8 + guides(colour = guide_legend(title=NULL), fill = guide_legend(title=NULL)))
(g9 <- g8 + 
   guides(colour ="none"))

범례에 박스 두르고, 빈 제목 자리 없애기

(g10 <- g9 + 
   theme(legend.title = element_blank(), legend.background = element_rect(fill = "white", colour = "black")))

범례 항목 박스 없애기

(g11 <- g10 + 
   theme(legend.key = element_blank()))

역사상 고점 시기

(g12 <- g11 + 
   annotate("text", x = c(1928, 2007), y = c(24.5, 24), label = c(1928, 2007)))

시대적 특징 텍스트 입력

(g13 <- g12 + 
  annotate("text", x = c(1935, 1960, 2014), y = c(22, 8, 18), label = times.label, colour = "red", family = "HCR Dotum LVT", size = 8))

뒷 정리

save.image(file="US_top_income_shares_2014_add.rda")