Introduction

We are doing some exploratory statistics on the Penguin dataset. Start by reading the file:

penguin <- read.csv("penguins.csv")

Summary statistics

Calculate summary stats on all columns:

summary(penguin)
##    species             island          bill_length_mm  bill_depth_mm  
##  Length:344         Length:344         Min.   :32.10   Min.   :13.10  
##  Class :character   Class :character   1st Qu.:39.23   1st Qu.:15.60  
##  Mode  :character   Mode  :character   Median :44.45   Median :17.30  
##                                        Mean   :43.92   Mean   :17.15  
##                                        3rd Qu.:48.50   3rd Qu.:18.70  
##                                        Max.   :59.60   Max.   :21.50  
##                                        NA's   :2       NA's   :2      
##  flipper_length_mm  body_mass_g       sex                 year     
##  Min.   :172.0     Min.   :2700   Length:344         Min.   :2007  
##  1st Qu.:190.0     1st Qu.:3550   Class :character   1st Qu.:2007  
##  Median :197.0     Median :4050   Mode  :character   Median :2008  
##  Mean   :200.9     Mean   :4202                      Mean   :2008  
##  3rd Qu.:213.0     3rd Qu.:4750                      3rd Qu.:2009  
##  Max.   :231.0     Max.   :6300                      Max.   :2009  
##  NA's   :2         NA's   :2

Make a plot

Just a scatterplot:

penguin$species <- as.factor(penguin$species)
plot(penguin$bill_length_mm, penguin$bill_depth_mm, 
     col = penguin$species, pch = 16)

And a histogram:

hist(penguin$bill_length_mm, col="pink")

Correlations

Start this with a list

  • Correlation using only the variables
  • These are columns 3 to 6
names(penguin)
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"
cor(penguin[ ,3:6], use = "pairwise.complete.obs")
##                   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## bill_length_mm         1.0000000    -0.2350529         0.6561813   0.5951098
## bill_depth_mm         -0.2350529     1.0000000        -0.5838512  -0.4719156
## flipper_length_mm      0.6561813    -0.5838512         1.0000000   0.8712018
## body_mass_g            0.5951098    -0.4719156         0.8712018   1.0000000

Discussion

Well, we have found out lots of things about penguins. Birds longer bills tend to have bigger bodies. Who knew?

[Correlation!] (http://xkcd.com/552/)