Introduction

This analysis explores the built-in Iris dataset, containing measurements for three species of Iris flowers: setosa, versicolor, and virginica. We’ll examine relationships between variables and identify patterns in the data.

Data Overview

Structure and Summary

# View dataset structure
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Summary statistics
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
# Display first few rows
datatable(head(iris, 10), 
          options = list(pageLength = 5),
          caption = "First 10 rows of Iris Dataset")

Visualizations

Pair Plot

# Create pair plot
pairs(iris[, 1:4], 
      main = "Pair Plot of Iris Features",
      pch = 21,
      bg = c("red", "green3", "blue")[iris$Species])

Distribution Analysis

Boxplot of Sepal Length by Species

ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Boxplot of Sepal Length by Species",
       y = "Sepal Length",
       x = "Species")

Scatter Plot: Petal Length vs Sepal Length

ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
  geom_point(size = 3) +
  theme_minimal() +
  labs(title = "Petal Length vs Sepal Length",
       x = "Sepal Length",
       y = "Petal Length")

Correlation Analysis

Correlation Matrix

# Calculate correlation matrix
cor_matrix <- cor(iris[, 1:4])
print(cor_matrix)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Correlation Plot

corrplot(cor_matrix,
         method = "circle",
         type = "upper",
         tl.cex = 0.8,
         tl.col = "black")

Key Findings

  1. Strong positive correlation between Petal Length and Petal Width
  2. Setosa species shows clear separation from others
  3. Sepal Width has negative correlation with other features

Conclusion

The analysis reveals distinct patterns in Iris species measurements, particularly in petal dimensions. These patterns could be useful for species classification.


Analysis by Aryan B V