Introduction

This report presents the results of exploratory data analysis (EDA) and visualization for the Iris and Mtcars datasets using R. The goal is to identify patterns, relationships, and insights through various analyses.

Data Loading and Preparation

We start by loading the necessary libraries and datasets.

# Load required libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Load datasets
data(iris)
data(mtcars)
# View the first few rows of each dataset
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Analysis of Iris Dataset

Structure and Summary

Examine the structure and summary statistics of the Iris dataset.

# Display the structure of the Iris dataset
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Generate a summary of the dataset
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Visualizations

Pair Plot of Numeric Features

A pair plot visualizes the relationships between numeric variables.

pairs(iris[, 1:4], main = "Pair Plot of Iris Dataset")

Boxplot of Sepal Length by Species

The boxplot shows the distribution of Sepal Length across the three species.

ggplot(iris, aes(x = Species, y = Sepal.Length)) +  
  geom_boxplot(fill = "lightblue") +  
  ggtitle("Boxplot of Sepal Length by Species") +  
  xlab("Species") +  
  ylab("Sepal Length (cm)")

Analysis of Mtcars Dataset

Structure and Summary

We analyze the structure and summary statistics of the Mtcars dataset.

# Structure and summary of Mtcars dataset
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Visualizations

Scatter Plot of MPG vs Horsepower

The scatter plot visualizes the relationship between Miles per Gallon (MPG) and Horsepower (HP).

ggplot(mtcars, aes(x = hp, y = mpg)) +  
  geom_point(color = "blue") +  
  ggtitle("Scatter Plot: Miles per Gallon vs Horsepower") +  
  xlab("Horsepower (HP)") +  
  ylab("Miles per Gallon (MPG)")

Bar Graph: Average MPG by Number of Cylinders

The bar graph shows the average MPG for cars grouped by the number of cylinders.

mtcars %>%  
  group_by(cyl) %>%  
  summarise(average_mpg = mean(mpg)) %>%  
  ggplot(aes(x = as.factor(cyl), y = average_mpg)) +  
  geom_bar(stat = "identity", fill = "orange") +  
  ggtitle("Average MPG by Number of Cylinders") +  
  xlab("Number of Cylinders") +  
  ylab("Average MPG")

Statistical Tests

ANOVA Test for Iris Dataset

We perform an ANOVA test to check for significant differences in Sepal Length across species.

# ANOVA test for Sepal Length
anova_results <- aov(Sepal.Length ~ Species, data = iris)
summary(anova_results)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Species       2  63.21  31.606   119.3 <2e-16 ***
## Residuals   147  38.96   0.265                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: A low p-value (< 0.05) indicates significant differences in Sepal Length across species.

Correlation Test for Mtcars Dataset

We analyze the correlation between Horsepower (HP) and Miles per Gallon (MPG).

# Correlation test between HP and MPG
cor_test <- cor.test(mtcars$hp, mtcars$mpg)
cor_test
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$hp and mtcars$mpg
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

Interpretation: The correlation coefficient indicates a strong negative relationship between horsepower and fuel efficiency.

Results

Iris Dataset

  • The ANOVA test indicates significant differences in Sepal Length among species.

  • Visualizations confirm that each species has distinct distributions for Sepal Length and other numerical features.

Mtcars Dataset

  • There is a negative correlation between Horsepower and Miles per Gallon.

  • Cars with fewer cylinders tend to have higher fuel efficiency, as shown in the bar graph.

Conclusion

This analysis provided the following insights:

  1. Iris Dataset:

    • Statistically significant differences in Sepal Length among species.
  2. Mtcars Dataset:

    • A strong negative correlation exists between Horsepower and Miles per Gallon.

    • Cars with fewer cylinders are generally more fuel-efficient.