GEOG 6000 knitr Example

Introduction

We are doing some exploratory statistics on the Fisher Iris dataset. Start by reading the file:

iris = read.csv("iris.csv")

Summary statistics

Calculate summary stats on all columns:

summary(iris)

##   Sepal.Length   Sepal.Width    Petal.Length   Petal.Width 
##  Min.   :4.30   Min.   :2.00   Min.   :1.00   Min.   :0.1  
##  1st Qu.:5.10   1st Qu.:2.80   1st Qu.:1.60   1st Qu.:0.3  
##  Median :5.80   Median :3.00   Median :4.35   Median :1.3  
##  Mean   :5.84   Mean   :3.06   Mean   :3.76   Mean   :1.2  
##  3rd Qu.:6.40   3rd Qu.:3.30   3rd Qu.:5.10   3rd Qu.:1.8  
##  Max.   :7.90   Max.   :4.40   Max.   :6.90   Max.   :2.5  
##        Species        Code  
##  setosa    :50   Min.   :1  
##  versicolor:50   1st Qu.:1  
##  virginica :50   Median :2  
##                  Mean   :2  
##                  3rd Qu.:3  
##                  Max.   :3

Make a plot

Just a scatterplot:

plot(iris$Sepal.Length, iris$Sepal.Width, 
     col=iris$Species, pch=16)

And a histogram:

hist(iris$Sepal.Length, col="hotpink")

plot of chunk unnamed-chunk-4

Correlations

Start this with a list

Correlation using only the variables
These are columns 1 to 4

cor(iris[,1:4])

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length       1.0000     -0.1176       0.8718      0.8179
## Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661
## Petal.Length       0.8718     -0.4284       1.0000      0.9629
## Petal.Width        0.8179     -0.3661       0.9629      1.0000

Discussion

Well, we have found out lots of things about the Iris. Flowers with wide petals tend to have long petals. Who knew?

[Correlation!] (http://xkcd.com/552/)