#install.packages("ggplot2") # Uncomment if needed
library(ggplot2)PROGRAM 11
Objective
To generate a basic box plot using ggplot2,enhanced with notches and outliers, and grouped by a categorical variable using an in built dataset in R.
Step 1: Load required package
We use the ggplot2 package for data visualization
Step 2:Use an inbuilt dataset We will use the built-in iris dataset.This dataset
#load and preview the dataset
data(iris)
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
str(iris)'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Step 3:Create a notched box plot Grouped by species We now create a box plot for Sepal.Length,grouped by Species.We’ll enhance the plot using:- Notches to show the confidence interval around the median.
ggplot(iris, aes(x = Species,y = Sepal.Length))ggplot(iris, aes(x = Species,y = Sepal.Length)) +
geom_boxplot(
notch = TRUE,
notchwidth = 0.6,
outlier.colour = "red",
outlier.shape = 16,
fill = "skyblue",
alpha = 0.7
)ggplot(iris, aes(x = Species,y = Sepal.Length)) +
geom_boxplot(
notch = TRUE,
notchwidth = 0.6,
outlier.colour = "red",
outlier.shape = 16,
fill = "skyblue",
alpha = 0.7
) +
labs(
title = "Sepal length distribution by iris Species",
subtitle = "Box plot with notches and outlier highlighting",
x = "Species",
y = "Sepal.Length (cm)"
) +
theme_minimal()Box plot: each box summarizes the distribution of
Sepal.Lengthfor a speciesshowing the interquartile range(IQR),median, and potential outliers.Notches: The notches give a rough 95% confidence interval around the median.If the notches of two boxes overlap, the medians are significantly different.
outliers :points that fall outside 1.5 x IQR from the quartiles are considered outliers and shown in red.
grouping: the plot groups values based on the categorical variable species,helping compare between groups.
Aesthetics: theme_minimal() provides a clean background, while colors and transperancy make the plot.
#Box plots: six valuevalues are usually displayed: the lowest value, the lowest quartile (Q1), the median (Q2), the upper quartile (Q3), the highest, and the mean
##Percentile the sample 100 pth percentile is a value such that at least 100p% of the observations are of
##The following rule simplifies the calculation of sample percentiles. Calculate the sample 100 pth percentile: 1.Order the n observations from smallest to largest. 2.Determine the product np. If np is not an integer,round it up to the next integer and find the corresponding ordered value. If np is an integer,say k,calculate the mean of the kth and (k+1)st ordered observations.
136 143 147 151 158 160 161 163 165 167 173 174 181 181 185 188 190 205
ggplot(iris, aes(x = Species,y = Sepal.Length)) +
geom_boxplot(
notch = TRUE,
notchwidth = 0.6,
outlier.colour = "red",
outlier.shape = 16,
fill = "skyblue",
alpha = 0.7
) +
labs(
title = "Sepal length distribution by iris Species",
subtitle = "Box plot with notches and outlier highlighting",
x = "Species",
y = "Petal.Length (cm)"
) +
theme_minimal()ggplot(iris, aes(x = Species,y = Sepal.Length)) +
geom_boxplot(
notch = TRUE,
notchwidth = 0.6,
outlier.colour = "red",
outlier.shape = 16,
fill = "skyblue",
alpha = 0.7
) +
labs(
title = "Sepal length distribution by iris Species",
subtitle = "Box plot with notches and outlier highlighting",
x = "Species",
y = "Sepal.Width (cm)"
) +
theme_minimal()ggplot(iris, aes(x = Species,y = Sepal.Length)) +
geom_boxplot(
notch = TRUE,
notchwidth = 0.6,
outlier.colour = "red",
outlier.shape = 16,
fill = "skyblue",
alpha = 0.7
) +
labs(
title = "Sepal length distribution by iris Species",
subtitle = "Box plot with notches and outlier highlighting",
x = "Species",
y = "Petal.Width (cm)"
) +
theme_minimal()