This will be short article about mean bar plot whit error bars and bar labels.
I want to make simple bar plot and add bar labels.
My problem was - bar label was on wrong places.
library(ggplot2) # for plot
library(plyr) # for summarySE function
library(reshape2) # for converting from long to wide formata
library(knitr) # for printing plot names
## Summarizes data.
## http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/
## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
## data: a data frame.
## measurevar: the name of a column that contains the variable to be summariezed
## groupvars: a vector containing names of columns that contain grouping variables
## na.rm: a boolean that indicates whether to ignore NA's
## conf.interval: the percent range of the confidence interval (default is 95%)
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
conf.interval=.95, .drop=TRUE) {
require(plyr)
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
This is for all small options.
# 1. decimal separator
options(OutDec= ",") # I'm in Europe and our decimal separator is comma. This will change decimal separator.
options(width = 240) # html plotis
# 2. plot captios after plot
# answer was found on:
# https://support.rstudio.com/hc/communities/public/questions/200635448-knitr-fig-cap-with-markdown
# code was copied from:
# https://github.com/yihui/knitr-examples/blob/master/063-html5-figure.Rmd
knit_hooks$set(plot = function(x, options) {
paste('<figure><img src="',
opts_knit$get('base.url'), paste(x, collapse = '.'),
'"><figcaption><center><b>', options$fig.cap, '</b></center></figcaption></figure>',
sep = '')
})
I will use Diamonds data set from GGPlot2 package.
I’m interested in diamonds price for all cuts and colors.
First we need summary data. The simplest way is to use summarySE function from Cookbook-R
(dt <- summarySE(data = diamonds, measurevar = "price", groupvars = c("cut", "color")))
## cut color N price sd se ci
## 1 Fair D 163 4291,061 3286,114 257,38833 508,26880
## 2 Fair E 224 3682,312 2976,652 198,88590 391,93629
## 3 Fair F 312 3827,003 3223,303 182,48358 359,05855
## 4 Fair G 314 4239,255 3609,644 203,70402 400,80232
## 5 Fair H 303 5135,683 3886,482 223,27255 439,36694
## 6 Fair I 175 4685,446 3730,271 281,98199 556,54544
## 7 Fair J 119 4975,655 4050,459 371,30496 735,28491
## 8 Good D 662 3405,382 3175,149 123,40566 242,31434
## 9 Good E 933 3423,644 3330,702 109,04229 213,99687
## 10 Good F 909 3495,750 3202,411 106,21727 208,45990
## 11 Good G 871 4123,482 3702,505 125,45459 246,22902
## 12 Good H 702 4276,255 4020,660 151,75005 297,93905
## 13 Good I 522 5078,533 4631,702 202,72410 398,25710
## 14 Good J 307 4574,173 3707,791 211,61480 416,40433
## 15 Very Good D 1513 3470,467 3523,753 90,59120 177,69774
## 16 Very Good E 2400 3214,652 3408,024 69,56599 136,41566
## 17 Very Good F 2164 3778,820 3786,124 81,38909 159,60900
## 18 Very Good G 2299 3872,754 3861,375 80,53275 157,92447
## 19 Very Good H 1824 4535,390 4185,798 98,00898 192,22169
## 20 Very Good I 1204 5255,880 4687,105 135,08011 265,01879
## 21 Very Good J 678 5103,513 4135,653 158,82879 311,85623
## 22 Premium D 1603 3631,293 3711,634 92,70398 181,83384
## 23 Premium E 2337 3538,914 3794,987 78,50204 153,94094
## 24 Premium F 2331 4324,890 4012,023 83,09832 162,95437
## 25 Premium G 2924 4500,742 4356,571 80,56680 157,97344
## 26 Premium H 2360 5216,707 4466,190 91,93506 180,28191
## 27 Premium I 1428 5946,181 5053,746 133,73630 262,34085
## 28 Premium J 808 6294,592 4788,937 168,47420 330,69935
## 29 Ideal D 2834 2629,095 3001,070 56,37365 110,53756
## 30 Ideal E 3903 2597,550 2956,007 47,31580 92,76604
## 31 Ideal F 3826 3374,939 3766,635 60,89492 119,38964
## 32 Ideal G 4884 3720,706 4006,262 57,32599 112,38473
## 33 Ideal H 3115 3889,335 4013,375 71,90858 140,99304
## 34 Ideal I 2093 4451,970 4505,150 98,47470 193,11860
## 35 Ideal J 896 4918,186 4476,207 149,53957 293,48908
# adding labels for the plots
dt$label <- format(round(dt$price, digits = 2),
nsmall = 2,
big.mark = ".",
decimal.mark = "," )
Here I can see all data, but it is hard to understand where what is.
Therefore, just for estetic, I will make cross table with cut, color and price.
temp <- data.frame(cut = dt$cut,
color = dt$color,
price = format(round(dt$price, digits = 2),
nsmall = 2,
big.mark = ".",
decimal.mark = "," )) # making copy to separate variable
dcast(data = temp, formula = cut ~ color, value.var = "price")
## cut D E F G H I J
## 1 Fair 4.291,06 3.682,31 3.827,00 4.239,25 5.135,68 4.685,45 4.975,66
## 2 Good 3.405,38 3.423,64 3.495,75 4.123,48 4.276,25 5.078,53 4.574,17
## 3 Very Good 3.470,47 3.214,65 3.778,82 3.872,75 4.535,39 5.255,88 5.103,51
## 4 Premium 3.631,29 3.538,91 4.324,89 4.500,74 5.216,71 5.946,18 6.294,59
## 5 Ideal 2.629,09 2.597,55 3.374,94 3.720,71 3.889,33 4.451,97 4.918,19
At least for me it is hard to see tendency from sucht table. So it is much better to make some bar plot.
First I will make bar plot in logical (at least for me), but wrong way.
ggplot(data = dt, aes(x = cut, y = price, fill = color)) +
geom_bar(stat="identity", position = position_dodge()) + # adding bar plot
geom_errorbar(aes(ymin=price-se, ymax=price+se),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
geom_text(aes(y = rep(2000, times = 35), # adding text to the plot
label = label,
angle = 90,
vjust = rep(c(-6, -4, -2, 0, 2, 4, 6), times = 5))
) +
coord_cartesian(ylim=c(1500, 6400)) + # changing of y axis
theme(axis.text = element_text(colour = "black", size = 15), # some theme options
axis.title.y = element_text(size = 15)) +
xlab("") + ylab("Diamond cut")
As we can see, the problem in this plot are bar labels. I spend a lot of time trying to place labels to the right position, but did not find right technique. After some research on Google and Stack Overflow I found another way.
ggplot(data = dt, aes(x = cut, y = price, fill = color)) +
geom_bar(stat="identity", position = position_dodge()) + # adding bar plo
geom_errorbar(aes(ymin=price-se, ymax=price+se),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
geom_text(aes(y=rep(2000, times = 35), # Right way to add labels on the bar plot
ymax=rep(2000, times = 35),
label=label,
angle = 90
),
size = 5,
position = position_dodge(width=1)
) +
coord_cartesian(ylim=c(1500, 6400)) +
theme(axis.text = element_text(colour = "black", size = 15),
axis.title.y = element_text(size = 15)) +
xlab("") + ylab("Diamond cut")