We are doing some exploratory statistics on the Fisher Iris dataset. Start by reading the file:
iris = read.csv("iris.csv")
Calculate summary stats on all columns:
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1
## 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3
## Median :5.80 Median :3.00 Median :4.35 Median :1.3
## Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2
## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8
## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5
## Species Code
## setosa :50 Min. :1
## versicolor:50 1st Qu.:1
## virginica :50 Median :2
## Mean :2
## 3rd Qu.:3
## Max. :3
Just a scatterplot:
plot(iris$Sepal.Length, iris$Sepal.Width,
col=iris$Species, pch=16)
And a histogram:
hist(iris$Sepal.Length, col="hotpink")
Start this with a list
cor(iris[,1:4])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000 -0.1176 0.8718 0.8179
## Sepal.Width -0.1176 1.0000 -0.4284 -0.3661
## Petal.Length 0.8718 -0.4284 1.0000 0.9629
## Petal.Width 0.8179 -0.3661 0.9629 1.0000
Well, we have found out lots of things about the Iris. Flowers with wide petals tend to have long petals. Who knew?
[Correlation!] (http://xkcd.com/552/)