[14-02-12]

4.2.2 GEOM

4.2.3 STAT

다음 코드를 통해 stat_bar에서 생성된 데이터를 어떻게 쓰는지 확인해보자.

library(ggplot2)
ggplot(data = diamonds, aes(x = price)) + geom_bar()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-1

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(y = ..count..))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-1

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(y = ..density..))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-1

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(y = ..ncount..))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-1

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(y = ..ndensity..))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-1

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(y = ..density..)) + ylab("밀도")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <b0>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <80>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <8f>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <84>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <b0>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <80>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <8f>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <84>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <b0>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <80>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <eb>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <8f>
## Warning: conversion failure on '諛€攼㹢룄' in 'mbcsToSbcs': dot substituted for <84>

plot of chunk unnamed-chunk-1

stat에서 생성된 data.frame의 필드를 사용하려면 '…'기호를 쓰면 된다. 주어진 원본 데이터(data.frame)의 필드 이름과 혼용되는 것을 피하기 위함이다.

geom과 stat객체들은 http://docs.ggplot2.org/current/를 참고하자.

4.2.4 위치 조정

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(fill = cut), binwidth = 3000)

plot of chunk unnamed-chunk-2

기본적인 막대그래프이다. 여기에 위치 조정 함수를 넣어보자.

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(fill = cut), binwidth = 3000, 
    position = "identity")

plot of chunk unnamed-chunk-3

identity는 cut별 막대의 위치가 동일(identity)해서 겹쳐졌다.

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(fill = cut), binwidth = 3000, 
    position = "dodge")

plot of chunk unnamed-chunk-4

dodge는 막대의 위치가 가장자리(dodge)에 다닥다닥 붙어서 그려진다.

ggplot(data = diamonds, aes(x = price)) + geom_bar(aes(fill = cut), binwidth = 3000, 
    position = "fill")

plot of chunk unnamed-chunk-5

fill은 가격별 도수를 1로 보고 그 비율을 볼 수 있다. 그래프 전체를 채운다(fill)는 의미인 것 같다…(아님 말고)

facet이라는 기법으로 표현할 수 도 있다.
facet은 데이터를 특정 기준에 따라 서브세트로 나눈 뒤 각 서브세트를 각기 다른 그래프 패널에 출력하는 것을 의미한다.

ggplot(data = diamonds, aes(x = price)) + geom_bar(binwidth = 3000) + facet_grid(. ~ 
    cut)

plot of chunk unnamed-chunk-6

dodge인자를 사용해 위치 조정을 한 그래프에서, cut을 기준으로 하여 서브세트를 따로 떼어놓은 듯한 그래프를 그릴 수 있다.

ggplot(data = diamonds, aes(x = price)) + geom_bar(binwidth = 3000) + facet_wrap(~cut, 
    nrow = 3)

plot of chunk unnamed-chunk-7

비슷한 역할을 하는 facet_warp이라는 함수도 있다.

4.2.5 GEOM과 STAT의 결합

d <- ggplot(diamonds, aes(price))
d + stat_bin(geom = "bar")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-8

d + stat_bin(geom = "area")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-8

d + stat_bin(aes(size = ..ndensity..), geom = "point")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## ymax not defined: adjusting position using y instead

plot of chunk unnamed-chunk-8

d + stat_bin(aes(y = 1, , fill = ..density..), geom = "tile")
## Mapping a variable to y and also using stat="bin".
##   With stat="bin", it will attempt to set the y value to the count of cases in each group.
##   This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
##   If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
##   If you want y to represent values in the data, use stat="identity".
##   See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-8

(여기서 왜 y=1로 지정하고 fill에 ..density..를 한걸까? 궁금하다.)

위와 같은 그래프가 어떻게 그려질까?하고 곰곰히 생각해보자. 어떤 데이터와 가공이 필요한가 생각해보면 더욱 다양한 그래프를 그릴 수 있다.