# install.packages('ggplot2')
library(ggplot2)Multiple Histograms
Creating multiple histograms using ggplot2, to visualize how a variable is distributed across different gropus (Using the inbuilt dataset)
Steps
- Step 1: Load the necessary packages
- Step 2: Load and explore the dataset
- Step 3: Create grouped histograms using ggplot
##Step 1: Load the necessary packages
Step 2: Load and explore the dataset
We will use the built in dataset, this dataset named iris, it contains
- 150 Rows
- 4 numerical columns
- 1 categorical column (Species)
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
tail(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
str(iris)'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
table(iris$Species)
setosa versicolor virginica
50 50 50
summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
dim(iris)[1] 150 5
Step 3: Create grouped histograms using ggplot
Let us now create histograms of Speal.Length for each species using ggplot2 and facet_wrap()
# initializes plot using iris dataset and maps sepal.length to the x axis
p=ggplot(iris, aes(x=Sepal.Length))
p# adds the histogram layers with a bin width of 0.3, line color, and fill of histogram
p=p+geom_histogram(binwidth = 0.3, fill='skyblue', color='black')
p# ~ Species - it creates seperae histogram for each species in a grid layout
p=p+facet_wrap(~ Species)
p# labs - sets the title and x, y axis labels
p=p+ labs(title='Distribution of speal length by species',
x='Sepal Length(cm)',
y='Frequency')
p# theme_minimal - Applied a minimal theme for better visualization
p=p+theme_minimal()
p