library(ggplot2)
data(algae, package="DMwR2")
# Plot on the left (Standard), P.99
freqOcc <- table(algae$season)
barplot(freqOcc, main='Frequency of the Seasons')

# Plot on the right (ggplot2), P.99
ggplot(algae, aes(x=season)) + geom_bar() + ggtitle("Frequency of the Seasons")

# To flip the coordinates, use the following code:

ggplot(algae, aes(x=season)) + geom_bar() + ggtitle("Frequency of the Seasons") + coord_flip()

Let’s look at the distributions of the values of a continuous variable using histograms and boxplots. Pp 99 - 100

library(ggplot2)
data(iris)
# Plot on the left (standard). P.100
hist(iris$Petal.Length, xlab='Petal Length')

# Plot on the right (ggplot2). p.100
ggplot(iris, aes(x=Petal.Length)) + geom_histogram() + xlab("Petal Length")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

A different way of showing information on the distribution

of the values of a continuous variable is through the boxplot. See below:

#  Using the boxplot fuction on a continuous variable.  
#  Code: P. 100, Figure: P. 101

library(ggplot2)
data(iris)

## Plot on the left (standard). P. 101
boxplot(iris$Sepal.Width, ylab= 'Sepal Width')

## Plot on the right (ggplot2). P.101
ggplot(iris, aes(x=factor(0), y=Sepal.Width)) + geom_boxplot() + xlab("") + ylab("Sepal Width") + theme(axis.text.x=element_blank())

With plots for continuous variables under our belts, let’s turn to plots that look at subgroups, or subgroups of datasets.

The Conditioned plots are the plots that handled the task of plotting subgroups of datasets.

Only the boxplot in standard graphics can handle this task of comparing the behaviors across subgroups. No other function in standard graphics can handle this task of comparing the behaviors across subgroups.

Even though Conditioned plots pose problems when comparing the behaviors across subgroups, we will work with them.

Within the ggplot ecosystem, the task of comparing the behaviors across subgroups is usually handled by “facets.”

Facets are variations of the same plot that are obtained with different subsets of a dataset.

Figure 3.10 shows the distribution of the variable, Sepal.Length for the plants of the different species.

# Code used to obtain the figure in Figure 3.10 (P.101)

library(ggplot2)
data(iris)

# Plot on the left (standard).  P.102
boxplot(Sepal.Length ~ Species, iris, ylab="Sepal.Length")

# Plot on the right (ggplot2).  P. 102
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot()

The ggplot graphics system provides better conditioning through facets. Below, we check the distribution of algal “a1” for the different types of rivers (in terms of water speed and river size), through a histogram. We need as many histogramsas there are combinations of river size and speed.

Below, we show these graphs:

library(ggplot2)
data(algae, package= "DMwR2")
ggplot(algae, aes(x=a1)) + geom_histogram() + facet_grid(size ~ speed)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.