head(), summary(), and glimpse() give us a good look at the shape and basic distribution of our dataframe
Basic R plots such as hist() (histogram), boxplot(), and plot() allow us to visualize out basic data distribution
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18.0 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## sex year
## 1 male 2007
## 2 female 2007
## 3 female 2007
## 4 2007
## 5 female 2007
## 6 male 2007
## species island bill_length_mm bill_depth_mm
## Length:344 Length:344 Min. :32.10 Min. :13.10
## Class :character Class :character 1st Qu.:39.23 1st Qu.:15.60
## Mode :character Mode :character Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 Length:344 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 Class :character 1st Qu.:2007
## Median :197.0 Median :4050 Mode :character Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
## Rows: 344
## Columns: 8
## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "A…
## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", …
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <chr> "male", "female", "female", "", "female", "male", "f…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Analysis of penguin species
ggplot (part of tidyverse) allows us to take our visualizations to the next level.
Scatterplot using two continuous variables
Column chart using one categorical variables
Density plot
Box Plot
Solid line = Median
Edges of box = lower quartile (25%) and upper quartile (75%)
points outside whiskers = extremes