Part 1

Answers to Questions from Textbook

Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and geneticist who worked on a data set that contained sepal length and width, and petal length and width from three species of iris flowers (setosa, versicolor, and virginica). There were 50 flowers from each species in the data set.

  1. How many cases were included in the data?

    Answer: There were 50 flowers, 3 species, meaning there is a total of 150 cases

    nrow(iris)
    ## [1] 150
  2. How many numerical variables are included in the data? Indicate what they are, and if they are continuous or discrete.

    Answer: There is a total of 4 numerical variables (Sepal Length, Sepal Width, Petal Length, and Petal Width). Each of these are continuous as they can go out multiple decimal points for detailed measurement

  1. How many categorical variables are included in the data, and what are they? List the corresponding levels (categories).

    Answer: There is only one categorical variable present, this is the species of the flower and it is a nominal variable. The corresponding levels/categories are setosa, versicolor, and virginica.

Analysis on Iris

Based on the below tables/graphs we observe that the “Iris” data set is constructed of Sepal & Petal (Length vs Width) data for 3 different species.

  • Setosa contains the largest sepal width but a relatively lower average sepal length of the 3 species. When comparing petal size, it is by far the smallest.

  • Versicolor is relatively in the middle of the pack in both sepal and petal size.

  • Virginica on average contain the largest sepal and petal size.

summary(iris[, 1:5])
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
setosa_data <- subset(iris, Species == "setosa")
versicolor_data <- subset(iris, Species == "versicolor")
virginica_data <- subset(iris, Species == "virginica")
summary(setosa_data)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
##  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
##  Median :5.000   Median :3.400   Median :1.500   Median :0.200  
##  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
##  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
##  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
##        Species  
##  setosa    :50  
##  versicolor: 0  
##  virginica : 0  
##                 
##                 
## 
summary(versicolor_data)
##   Sepal.Length    Sepal.Width     Petal.Length   Petal.Width          Species  
##  Min.   :4.900   Min.   :2.000   Min.   :3.00   Min.   :1.000   setosa    : 0  
##  1st Qu.:5.600   1st Qu.:2.525   1st Qu.:4.00   1st Qu.:1.200   versicolor:50  
##  Median :5.900   Median :2.800   Median :4.35   Median :1.300   virginica : 0  
##  Mean   :5.936   Mean   :2.770   Mean   :4.26   Mean   :1.326                  
##  3rd Qu.:6.300   3rd Qu.:3.000   3rd Qu.:4.60   3rd Qu.:1.500                  
##  Max.   :7.000   Max.   :3.400   Max.   :5.10   Max.   :1.800
summary(virginica_data)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.900   Min.   :2.200   Min.   :4.500   Min.   :1.400  
##  1st Qu.:6.225   1st Qu.:2.800   1st Qu.:5.100   1st Qu.:1.800  
##  Median :6.500   Median :3.000   Median :5.550   Median :2.000  
##  Mean   :6.588   Mean   :2.974   Mean   :5.552   Mean   :2.026  
##  3rd Qu.:6.900   3rd Qu.:3.175   3rd Qu.:5.875   3rd Qu.:2.300  
##  Max.   :7.900   Max.   :3.800   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    : 0  
##  versicolor: 0  
##  virginica :50  
##                 
##                 
## 
cor(iris[, 1:4])
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000
boxplot(iris[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Boxplots of Iris Variables, All species")

Each Species is below

boxplot(setosa_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Setosa")

boxplot(versicolor_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Versicolor")

boxplot(virginica_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Virginica")

# Create a scatterplot with different colors for each species
plot(iris$Sepal.Length, iris$Sepal.Width, 
     main = "Sepal Length vs. Sepal Width by Species", 
     xlab = "Sepal Length", ylab = "Sepal Width", 
     col = iris$Species)

# Add a legend to identify species
legend("topright", legend = unique(iris$Species), 
       col = unique(iris$Species), pch = 1)

# Create a scatterplot with different colors for each species
plot(iris$Petal.Length, iris$Petal.Width, 
     main = "Petal Length vs. Petal Width by Species", 
     xlab = "Petal Length", ylab = "Petal Width", 
     col = iris$Species)

# Add a legend to identify species
legend("topright", legend = unique(iris$Species), 
       col = unique(iris$Species), pch = 1)

Part 2

The “quakes” data set is Cross sectional data set

summary(quakes)
##       lat              long           depth            mag      
##  Min.   :-38.59   Min.   :165.7   Min.   : 40.0   Min.   :4.00  
##  1st Qu.:-23.47   1st Qu.:179.6   1st Qu.: 99.0   1st Qu.:4.30  
##  Median :-20.30   Median :181.4   Median :247.0   Median :4.60  
##  Mean   :-20.64   Mean   :179.5   Mean   :311.4   Mean   :4.62  
##  3rd Qu.:-17.64   3rd Qu.:183.2   3rd Qu.:543.0   3rd Qu.:4.90  
##  Max.   :-10.72   Max.   :188.1   Max.   :680.0   Max.   :6.40  
##     stations     
##  Min.   : 10.00  
##  1st Qu.: 18.00  
##  Median : 27.00  
##  Mean   : 33.42  
##  3rd Qu.: 42.00  
##  Max.   :132.00

Scatterplot

pairs(quakes[, 1:4], 
      col = quakes$stations, 
      pch = 19)

plot(quakes$mag, quakes$depth, 
     main = "Scatter Plot of Earthquake Magnitude vs. Depth",
     xlab = "Magnitude",
     ylab = "Depth",
     pch = 19,
     col = quakes$stations)