library(ggplot2)Box Plot
Program: To generate basic box plot using ggplot2, enhanced with notches and outliers and grouped by a categorical variable using an in-built dataset in R
Steps
- Step 1: Load required packages and library -
ggplot2 - Step 2: Use and explore built-in dataset -
iris - Step 3: Visualize box plot with notches and outliers - step by step
Step 1: Load required packages and library - ggplot2
We use ggplot2 package for data visualization. If it is not already installed you can install it using
install.packages(‘ggplot2’)
Step 2: Use and explore built-in dataset - iris
We will use the built-in iris dataset. This dataset contatins measurements of sepal and petal dimnesions for three species of ‘iris’ folowers.
- Setosa
- Versicolor
- Virginica
data=irishead(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
table(data$Species)
setosa versicolor virginica
50 50 50
tail(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
str(iris)'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(data) Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
Step 3: Visualize box plot with notches and outliers - step by step
We now create box plot for Sepal.Length, grouped by Species. We’ll enhance the plot usin: Notches to show the confidence interval around the median- Outlier highlighting using color shape.
boxplot(iris$Sepal.Length)ggplot(iris, aes(x=Species, y= Sepal.Length))ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot()ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red"
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red",
outlier.shape = 21,
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red",
outlier.shape = 21,
fill='skyblue'
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red",
outlier.shape = 21,
fill='skyblue',
alpha=0.1
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red",
outlier.shape = 21,
fill='skyblue',
alpha=0.1
)+labs(
title='Sepal length Distribution by IRIS Species',
x='IRIS Species',
y='Sepal Length of IRIS different species'
)ggplot(iris, aes(x=Species, y= Sepal.Length))+geom_boxplot(
notch=TRUE,
notchwidth = 0.5,
outlier.color = "red",
outlier.shape = 21,
fill='skyblue',
alpha=0.1
)+labs(
title='Sepal length Distribution by IRIS Species',
x='IRIS Species',
y='Sepal Length of IRIS different species'
)+theme_minimal()