Multiple Histograms

Author

Manoj

Creating multiple histograms using ggplot2, to visualize how a variable is distributed across different gropus (Using the inbuilt dataset)

Steps

  • Step 1: Load the necessary packages
  • Step 2: Load and explore the dataset
  • Step 3: Create grouped histograms using ggplot

##Step 1: Load the necessary packages

# install.packages('ggplot2')
library(ggplot2)

Step 2: Load and explore the dataset

We will use the built in dataset, this dataset named iris, it contains

  • 150 Rows
  • 4 numerical columns
  • 1 categorical column (Species)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
tail(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica
str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
table(iris$Species)

    setosa versicolor  virginica 
        50         50         50 
summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
dim(iris)
[1] 150   5

Step 3: Create grouped histograms using ggplot

Let us now create histograms of Speal.Length for each species using ggplot2 and facet_wrap()

# initializes plot using iris dataset and maps sepal.length to the x axis
p=ggplot(iris, aes(x=Sepal.Length))
p

# adds the histogram layers with a bin width of 0.3, line color, and fill of histogram

p=p+geom_histogram(binwidth = 0.3, fill='skyblue', color='black')
p

# ~ Species - it creates seperae histogram for each species in a grid layout
p=p+facet_wrap(~ Species)
p

# labs - sets the title and x, y axis labels
p=p+ labs(title='Distribution of speal length by species',
          x='Sepal Length(cm)',
          y='Frequency')

p

# theme_minimal - Applied a minimal theme for better visualization
p=p+theme_minimal()
p

Explantaiton