Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and geneticist who worked on a data set that contained sepal length and width, and petal length and width from three species of iris flowers (setosa, versicolor, and virginica). There were 50 flowers from each species in the data set.
How many cases were included in the data?
Answer: There were 50 flowers, 3 species, meaning there is a total of 150 cases
nrow(iris)
## [1] 150
How many numerical variables are included in the data? Indicate what they are, and if they are continuous or discrete.
Answer: There is a total of 4 numerical variables (Sepal Length, Sepal Width, Petal Length, and Petal Width). Each of these are continuous as they can go out multiple decimal points for detailed measurement
How many categorical variables are included in the data, and what are they? List the corresponding levels (categories).
Answer: There is only one categorical variable present, this is the species of the flower and it is a nominal variable. The corresponding levels/categories are setosa, versicolor, and virginica.
Based on the below tables/graphs we observe that the “Iris” data set is constructed of Sepal & Petal (Length vs Width) data for 3 different species.
Setosa contains the largest sepal width but a relatively lower average sepal length of the 3 species. When comparing petal size, it is by far the smallest.
Versicolor is relatively in the middle of the pack in both sepal and petal size.
Virginica on average contain the largest sepal and petal size.
summary(iris[, 1:5])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
setosa_data <- subset(iris, Species == "setosa")
versicolor_data <- subset(iris, Species == "versicolor")
virginica_data <- subset(iris, Species == "virginica")
summary(setosa_data)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100
## 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200
## Median :5.000 Median :3.400 Median :1.500 Median :0.200
## Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246
## 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300
## Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600
## Species
## setosa :50
## versicolor: 0
## virginica : 0
##
##
##
summary(versicolor_data)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 setosa : 0
## 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 versicolor:50
## Median :5.900 Median :2.800 Median :4.35 Median :1.300 virginica : 0
## Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326
## 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500
## Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800
summary(virginica_data)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.900 Min. :2.200 Min. :4.500 Min. :1.400
## 1st Qu.:6.225 1st Qu.:2.800 1st Qu.:5.100 1st Qu.:1.800
## Median :6.500 Median :3.000 Median :5.550 Median :2.000
## Mean :6.588 Mean :2.974 Mean :5.552 Mean :2.026
## 3rd Qu.:6.900 3rd Qu.:3.175 3rd Qu.:5.875 3rd Qu.:2.300
## Max. :7.900 Max. :3.800 Max. :6.900 Max. :2.500
## Species
## setosa : 0
## versicolor: 0
## virginica :50
##
##
##
cor(iris[, 1:4])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
boxplot(iris[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Boxplots of Iris Variables, All species")
boxplot(setosa_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Setosa")
boxplot(versicolor_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Versicolor")
boxplot(virginica_data[, 1:4], col = c("red", "blue", "green", "yellow"), main = "Virginica")
# Create a scatterplot with different colors for each species
plot(iris$Sepal.Length, iris$Sepal.Width,
main = "Sepal Length vs. Sepal Width by Species",
xlab = "Sepal Length", ylab = "Sepal Width",
col = iris$Species)
# Add a legend to identify species
legend("topright", legend = unique(iris$Species),
col = unique(iris$Species), pch = 1)
# Create a scatterplot with different colors for each species
plot(iris$Petal.Length, iris$Petal.Width,
main = "Petal Length vs. Petal Width by Species",
xlab = "Petal Length", ylab = "Petal Width",
col = iris$Species)
# Add a legend to identify species
legend("topright", legend = unique(iris$Species),
col = unique(iris$Species), pch = 1)
summary(quakes)
## lat long depth mag
## Min. :-38.59 Min. :165.7 Min. : 40.0 Min. :4.00
## 1st Qu.:-23.47 1st Qu.:179.6 1st Qu.: 99.0 1st Qu.:4.30
## Median :-20.30 Median :181.4 Median :247.0 Median :4.60
## Mean :-20.64 Mean :179.5 Mean :311.4 Mean :4.62
## 3rd Qu.:-17.64 3rd Qu.:183.2 3rd Qu.:543.0 3rd Qu.:4.90
## Max. :-10.72 Max. :188.1 Max. :680.0 Max. :6.40
## stations
## Min. : 10.00
## 1st Qu.: 18.00
## Median : 27.00
## Mean : 33.42
## 3rd Qu.: 42.00
## Max. :132.00
Scatterplot
pairs(quakes[, 1:4],
col = quakes$stations,
pch = 19)
plot(quakes$mag, quakes$depth,
main = "Scatter Plot of Earthquake Magnitude vs. Depth",
xlab = "Magnitude",
ylab = "Depth",
pch = 19,
col = quakes$stations)