We are doing some exploratory statistics on the Penguin dataset. Start by reading the file:
penguin <- read.csv("penguins.csv")
Calculate summary stats on all columns:
summary(penguin)
## species island bill_length_mm bill_depth_mm
## Length:344 Length:344 Min. :32.10 Min. :13.10
## Class :character Class :character 1st Qu.:39.23 1st Qu.:15.60
## Mode :character Mode :character Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 Length:344 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 Class :character 1st Qu.:2007
## Median :197.0 Median :4050 Mode :character Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
Just a scatterplot:
penguin$species <- as.factor(penguin$species)
plot(penguin$bill_length_mm, penguin$bill_depth_mm,
col = penguin$species, pch = 16)
And a histogram:
hist(penguin$bill_length_mm, col="pink")
Start this with a list
names(penguin)
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "year"
cor(penguin[ ,3:6], use = "pairwise.complete.obs")
## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## bill_length_mm 1.0000000 -0.2350529 0.6561813 0.5951098
## bill_depth_mm -0.2350529 1.0000000 -0.5838512 -0.4719156
## flipper_length_mm 0.6561813 -0.5838512 1.0000000 0.8712018
## body_mass_g 0.5951098 -0.4719156 0.8712018 1.0000000
Well, we have found out lots of things about penguins. Birds longer bills tend to have bigger bodies. Who knew?
[Correlation!] (http://xkcd.com/552/)