Day 4 Bar plot

data: pg_mean from gcookbook

head(pg_mean)
##   group weight
## 1  ctrl  5.032
## 2  trt1  4.661
## 3  trt2  5.526
str(pg_mean)
## 'data.frame':    3 obs. of  2 variables:
##  $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 2 3
##  $ weight: num  5.03 4.66 5.53
summary(pg_mean)
##   group       weight     
##  ctrl:1   Min.   :4.661  
##  trt1:1   1st Qu.:4.846  
##  trt2:1   Median :5.032  
##           Mean   :5.073  
##           3rd Qu.:5.279  
##           Max.   :5.526

bar plot with ggplot2

ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity")

group is factor type, weight is num type. In this case we want to display weight value of certain group, stat = “identity”; by default, stat = “bin” which displays the count of certain group.

color bar and bar outline

ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity", fill = "lightblue", colour = "black")

By default, there is no outline color and bars are dark grey

data: build-in BOD

head(BOD)
##   Time demand
## 1    1    8.3
## 2    2   10.3
## 3    3   19.0
## 4    4   16.0
## 5    5   15.6
## 6    7   19.8
str(BOD)
## 'data.frame':    6 obs. of  2 variables:
##  $ Time  : num  1 2 3 4 5 7
##  $ demand: num  8.3 10.3 19 16 15.6 19.8
##  - attr(*, "reference")= chr "A1.4, p. 270"
summary(BOD)
##       Time           demand     
##  Min.   :1.000   Min.   : 8.30  
##  1st Qu.:2.250   1st Qu.:11.62  
##  Median :3.500   Median :15.80  
##  Mean   :3.667   Mean   :14.83  
##  3rd Qu.:4.750   3rd Qu.:18.25  
##  Max.   :7.000   Max.   :19.80

display BOD in bar plot with ggplot2

ggplot(BOD, aes(x=Time, y=demand)) + geom_bar(stat = "identity")

Time and demand are both num type in BOD, geom_bar() will make x-axis from min to max of x value when x is num type. So we could see a gap in x=6

ggplot(BOD, aes(x = as.factor(Time), y=demand)) + geom_bar(stat="identity")

This time we turn Time into factor type, we don’t see a gap in x=6, because x axis is factor value

plot grouping bars

data: cabbage_exp from gcookbook

head(cabbage_exp)
##   Cultivar Date Weight        sd  n         se
## 1      c39  d16   3.18 0.9566144 10 0.30250803
## 2      c39  d20   2.80 0.2788867 10 0.08819171
## 3      c39  d21   2.74 0.9834181 10 0.31098410
## 4      c52  d16   2.26 0.4452215 10 0.14079141
## 5      c52  d20   3.11 0.7908505 10 0.25008887
## 6      c52  d21   1.47 0.2110819 10 0.06674995
str(cabbage_exp)
## 'data.frame':    6 obs. of  6 variables:
##  $ Cultivar: Factor w/ 2 levels "c39","c52": 1 1 1 2 2 2
##  $ Date    : Factor w/ 3 levels "d16","d20","d21": 1 2 3 1 2 3
##  $ Weight  : num  3.18 2.8 2.74 2.26 3.11 1.47
##  $ sd      : num  0.957 0.279 0.983 0.445 0.791 ...
##  $ n       : int  10 10 10 10 10 10
##  $ se      : num  0.3025 0.0882 0.311 0.1408 0.2501 ...

display barplot

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity", position = "dodge")

we displayed barplot with x = Date, y = weight, but seperate Date by cultivar to have to group, one is c39, the othter is c52. Filling different color to different group, we could see two bar plot

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity", position = "dodge", colour = "black")

we could add black outlines by adding colour = “black” in geom_bar

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + 
        geom_bar(stat = "identity", position = "dodge", colour = "black") +
        scale_fill_brewer(palette = "Pastel1")

we could also change the color filled in bar by adding scale_fill_brewer(palette = ), here we change the color to palette “Pastel1”

when some data missing

data: changed cabbage_exp

ce <- cabbage_exp[1:5, ]
ce
##   Cultivar Date Weight        sd  n         se
## 1      c39  d16   3.18 0.9566144 10 0.30250803
## 2      c39  d20   2.80 0.2788867 10 0.08819171
## 3      c39  d21   2.74 0.9834181 10 0.31098410
## 4      c52  d16   2.26 0.4452215 10 0.14079141
## 5      c52  d20   3.11 0.7908505 10 0.25008887

we could see there is no c39 corresponding with date d21

barplot

ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) +
        geom_bar(stat = "identity", position = "dodge", colour = "black") +
        scale_fill_brewer(palette = "Pastel1")

At x = d21, group c39 take the whole place

barplot display counts of the cases

data: ggplot2 build-in diamonds

head(diamonds)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

plot barplot to display the count of cut

ggplot(diamonds, aes(x=cut)) + geom_bar()

because cut is type factor, if x is continuous variable, we will get histogram

ggplot(diamonds, aes(x=carat)) + geom_bar()