Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and geneticist who worked on a data set that contained sepal length and width, and petal length and width from three species of iris flowers (setosa, versicolor, and virginica). There were 50 flowers from each species in the data set.
150 cases can be made out (50 flowers * 3 species)
nrow(iris)
## [1] 150
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
All the numerical variables have continuous data as displayed above.
unique(iris[c("Species")])
## Species
## 1 setosa
## 51 versicolor
## 101 virginica
There is only one categorical value, with three corresponding levels: setosa, versicolor, and virginica
Virginica has largest petal length while setosa has the smallest petal length.
Virginica has largest sepal length while setosa seems to genrally have the smallest sepal, but the thickest in width.
Vericolor mostly appears to have length and width ranging between that of virginica and setosa.
require(psych)
## Loading required package: psych
Reference: https://www.statology.org/iris-dataset-r/
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
describe(iris)
## vars n mean sd median trimmed mad min max range skew
## Sepal.Length 1 150 5.84 0.83 5.80 5.81 1.04 4.3 7.9 3.6 0.31
## Sepal.Width 2 150 3.06 0.44 3.00 3.04 0.44 2.0 4.4 2.4 0.31
## Petal.Length 3 150 3.76 1.77 4.35 3.76 1.85 1.0 6.9 5.9 -0.27
## Petal.Width 4 150 1.20 0.76 1.30 1.18 1.04 0.1 2.5 2.4 -0.10
## Species* 5 150 2.00 0.82 2.00 2.00 1.48 1.0 3.0 2.0 0.00
## kurtosis se
## Sepal.Length -0.61 0.07
## Sepal.Width 0.14 0.04
## Petal.Length -1.42 0.14
## Petal.Width -1.36 0.06
## Species* -1.52 0.07
plot(iris$Petal.Width, iris$Petal.Length,
col=iris$Species,
main='Species vs. Petal length and width',
xlab='Petal Width in cm',
ylab='Petal Length in cm',
pch=19)
legend("topright", legend = unique(iris$Species),
col = unique(iris$Species), pch = 19)
plot(iris$Sepal.Width, iris$Sepal.Length,
col=iris$Species,
main='Species vs. Sepal Length and width',
xlab='Sepal Width in cm',
ylab='Sepal Length in cm',
pch=19)
#Adding a legend
legend("topright", legend = unique(iris$Species),
col = unique(iris$Species), pch = 19)
Reference: https://datascienceplus.com/box-plots-identify-outliers/
#Boxplot for comparing petal length in species
boxplot(Petal.Length ~ Species, data=iris,
main="Petal Length for each Species",
col = iris$Species,
xlab="Species",
ylab="Petal Length in cm")
Under the life-cycle savings hypothesis as developed by Franco Modigliani, the savings ratio (aggregate personal saving divided by disposable income) is explained by per-capita disposable income, the percentage rate of change in per-capita disposable income, and two demographic variables: the percentage of population less than 15 years old and the percentage of the population over 75 years old. The data are averaged over the decade 1960–1970 to remove the business cycle or other short-term fluctuations.
This is a cross-sectional dataset.
describe(LifeCycleSavings)
## vars n mean sd median trimmed mad min max range skew
## sr 1 50 9.67 4.48 10.51 9.68 4.07 0.60 21.10 20.50 -0.01
## pop15 2 50 35.09 9.15 32.58 35.15 13.24 21.44 47.64 26.20 0.00
## pop75 3 50 2.29 1.29 2.17 2.22 1.61 0.56 4.70 4.14 0.31
## dpi 4 50 1106.76 990.87 695.66 980.85 713.94 88.94 4001.89 3912.95 0.95
## ddpi 5 50 3.76 2.87 3.00 3.33 1.75 0.22 16.71 16.49 2.14
## kurtosis se
## sr -0.32 0.63
## pop15 -1.68 1.29
## pop75 -1.33 0.18
## dpi -0.09 140.13
## ddpi 6.40 0.41
#Plotting a scatterplot
plot(LifeCycleSavings$sr, LifeCycleSavings$dpi,
main = "Per-capita disposable income vs. Personal Savings for population>75",
xlab = "Savings",
ylab = "Disposable income",
pch = 19,
col = LifeCycleSavings$pop75
)