Introduction

The Iris dataset is one of the most well-known inbuilt datasets in R.
It contains measurements of four features of iris flowers from three different species — setosa, versicolor, and virginica.

In this markdown, we will perform five descriptive analyses and five visualizations to understand the dataset better.


Load Required Libraries

# Load essential packages
library(dplyr)
library(ggplot2)

Dataset Overview

# Load the inbuilt dataset
data("iris")

# Display first few rows
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

1. Descriptive Statistics

1.1 Structure and Summary

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

1.2 Dimensions of the Dataset

cat("Number of Rows: ", nrow(iris), "\n")
## Number of Rows:  150
cat("Number of Columns: ", ncol(iris))
## Number of Columns:  5

1.3 Basic Statistical Measures (Mean, Median, SD)

iris %>%
  summarise(
    Mean_Sepal_Length = mean(Sepal.Length),
    Median_Sepal_Length = median(Sepal.Length),
    SD_Sepal_Length = sd(Sepal.Length)
  )
##   Mean_Sepal_Length Median_Sepal_Length SD_Sepal_Length
## 1          5.843333                 5.8       0.8280661

1.4 Descriptive Statistics by Species

iris %>%
  group_by(Species) %>%
  summarise(
    Mean_Sepal_Length = mean(Sepal.Length),
    Mean_Sepal_Width = mean(Sepal.Width),
    Mean_Petal_Length = mean(Petal.Length),
    Mean_Petal_Width = mean(Petal.Width)
  )
## # A tibble: 3 × 5
##   Species  Mean_Sepal_Length Mean_Sepal_Width Mean_Petal_Length Mean_Petal_Width
##   <fct>                <dbl>            <dbl>             <dbl>            <dbl>
## 1 setosa                5.01             3.43              1.46            0.246
## 2 versico…              5.94             2.77              4.26            1.33 
## 3 virgini…              6.59             2.97              5.55            2.03

1.5 Correlation Between Numeric Variables

cor(iris[,1:4])
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

2. Data Visualization

Visual exploration helps identify patterns and relationships among variables.


2.1 Histogram of Sepal Length

ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
  geom_histogram(bins = 20, color = "black", alpha = 0.7) +
  labs(title = "Distribution of Sepal Length",
       x = "Sepal Length", y = "Frequency") +
  theme_minimal()


2.2 Boxplot of Sepal Length by Species

ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.8) +
  labs(title = "Sepal Length Comparison Across Species",
       x = "Species", y = "Sepal Length") +
  theme_minimal()


2.3 Scatter Plot of Sepal vs Petal Length

ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
  geom_point(size = 3, alpha = 0.8) +
  labs(title = "Relationship between Sepal Length and Petal Length",
       x = "Sepal Length", y = "Petal Length") +
  theme_minimal()


2.4 Pair Plot of All Numeric Variables

pairs(iris[1:4], main = "Pair Plot of Iris Numeric Features",
      pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])


2.5 Density Plot of Petal Width

ggplot(iris, aes(x = Petal.Width, fill = Species)) +
  geom_density(alpha = 0.6) +
  labs(title = "Density Plot of Petal Width by Species",
       x = "Petal Width", y = "Density") +
  theme_minimal()


3. Insights and Interpretation


Conclusion

This markdown demonstrates how to perform descriptive and visual analysis using the Iris dataset in R.
These methods can be applied to other datasets for quick exploratory data analysis (EDA) and visualization.