Introduction to graphs
R is a great platform for building graphs. Literally, in a typical interactive session, you build a graph one statement at a time, adding features, until you have what you want.
The base graphic system chapter of the book, provides information on how modify and customize graphs. Two other systems, that are widely used, and provide extensive options are lattice and ggplot2. We will be mostly using the base graphic system and ggplot2.
The primary graph for a variable: the histogram
Histograms display the distribution of a continuous variable by dividing the range of scores into a specified number of bins on the x-axis and displaying the frequency of scores in each bin on the y-axis.
Basic histograms
The following chunk produces four histograms, using the R base graphic system.
#reading data on Melocactus intortus
melodata <- read.csv("melocactus.csv", header = TRUE)
#this is to produce a 2x2 graphs arrangement
par(mfrow=c(2,2))
# the most basic form
hist(melodata$alturatotal)
# controlling some aspects of the histogram
hist(melodata$alturatotal,
breaks=12,
col="green",
xlab="Altura de la planta, cm",
main="Colored histogram with 12 bins")
# with a density curve and rug plot
hist(melodata$alturatotal,
freq=FALSE,
breaks=12,
col="green",
xlab="Altura de la planta, cm",
main="Histogram, rug plot, density curve")
rug(jitter(melodata$alturatotal))
lines(density(melodata$alturatotal), col="red", lwd=2)
# including a normal curve based on the data
x <- melodata$alturatotal
h<-hist(x,
breaks=12,
col="green",
xlab="Altura de la planta, cm",
main="Histogram with normal curve and box")
xfit<-seq(min(x), max(x), length=80)
yfit<-dnorm(xfit, mean=mean(x), sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="red", lwd=2)
box()
Introducing ggplot2 building a histogram
Now we are going to build a histogram using ggplot2. ggplot2 provides a system for creating graphs based on the grammar of graphics. The intention of the ggplot2 package is to provide a comprehensive, grammar-based system for generating graphs in a unified and coherent manner, allowing users to create new and innovative data visualizations. The power of this approach has led ggplot2 to become an important tool for visualizing data using R.
First, let see a basic ggplot2 histogram:
melodata <- read.csv("melocactus.csv", header = TRUE)
ggplot(melodata, aes(alturatotal))+
geom_histogram(color="white", bins = 14)
Now a more detailed histogram, including several layers:
hist.melodata <- ggplot(melodata, aes(alturatotal)) +
geom_histogram(aes(y=..density..), bins = 14, colour="white", fill="green") +
geom_rug(sides = "b", color = "black") +
labs(x="Altura total de la planta,cm", y = "Density") +
stat_function(fun = dnorm,
args = list(mean = mean(melodata$alturatotal, na.rm = TRUE),
sd = sd(melodata$alturatotal, na.rm = TRUE)),
colour = "red", size = 1)
hist.melodata
The anatomy of a box-and-whiskers plot
A box-and-whiskers plot describes the distribution of a continuous variable by plotting its five-number summary: the minimum, lower quartile (25th percentile), median (50th percentile), upper quartile (75th percentile), and maximum. It can also display observations that may be outliers (values outside the range of ± 1.5*IQR, where IQR is the interquartile range defined as the upper quartile minus the lower quartile). By default, each whisker extends to the most extreme data point, which is no more than 1.5 times the interquartile range for the box. Values outside this range are depicted as dots.

Now we are going to analyze the Melocactus data using a box-plot.
ggplot(melodata, aes(x = "0", y = alturatotal)) +
geom_boxplot(fill="cornflowerblue", color="black") +
geom_point(position="jitter", size = 0.5, color="blue", alpha=.5) +
labs(x = "Melocactus intortus", y = "Altura total de la planta, cm")
Related to box-plots are the violin plots; the violin plots provide more visual cues as to the distribution of scores over the range of heights for each voice part.
ggplot(melodata, aes(x="0", y=alturatotal)) +
geom_violin(fill="lightblue") +
geom_point(color = "blue", alpha = 0.3) +
labs(x = "Melocactus intortus", y = "Altura total de la planta, cm")
Exercises
Build a graph, with the Melocactus data, that combines the box-plot and violin plot.
Bar graphs for categorical variables
Counts of cases for categorical variable are usually presented using bar graphs. Here we use data from a clinical trial of a treatment for arthritis, comparing the outcomes for treated individuals versus individuals receiving a placebo.
load("Arthr.Rdata")
head(Arthritis)
library(cowplot)
A <- ggplot(Arthritis, aes(x=Treatment, fill=Improved)) +
geom_bar(position="stack")
B <- ggplot(Arthritis, aes(x=Treatment, fill=Improved)) +
geom_bar(position="dodge")
C <- ggplot(Arthritis, aes(x=Treatment, fill=Improved)) +
geom_bar(position="fill")
#using the package cowplot to group graphs built separatelly
plot_grid(A, B, C, ncol = 2, labels = "AUTO")
Exercise
Try to improve the graphs, with smaller fonts.
