exercise06 - Iris Data

Author

Hangu Lee

1. Introduction

In this exercise, I will analyze the classic iris dataset. The objective of this report is to load the dataset, inspect its structure, calculate summary statistics, and visualize the data using a simple scatter plot.


2. Data Inspection

First, I will load the built-in iris dataset and display its structure to understand the variables I am working with.

# Display the structure of the iris dataset
str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

The output shows that the dataset contains 150 observations and 5 variables, including measurements for sepal length, sepal width, petal length, and petal width, as well as the specific iris species.

3. Summary Statistics

Next, I will compute the summary statistics for all variables in the dataset to examine the distribution, mean, and range of the flower measurements.

# Calculate summary statistics
summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

By looking at the summary, I can notice the differences in scales between sepal and petal measurements across the 150 samples.

4. Data Visualization

To better understand the relationship between the flower characteristics, I will create a scatter plot. I will plot Petal Length against Petal Width, and color the points based on the different Species to see if they naturally form distinct groups.

# Create a scatter plot colored by species
plot(iris$Petal.Width, iris$Petal.Length, 
     col = as.factor(iris$Species), 
     pch = 16, 
     xlab = "Petal Width (cm)", 
     ylab = "Petal Length (cm)", 
     main = "Iris Petal by Species")

# Add a simple legend to identify species
legend("topleft", legend = levels(iris$Species), col = 1:3, pch = 16)

The plot clearly demonstrates that the species form distinct clusters, with ‘setosa’ having significantly smaller petals compared to ‘versicolor’ and ‘virginica’.