Michael Hunt

18-01-2021

This exercise uses a set of data on birds from Mark Gardener.

library(tidyverse)
library(here)

Load the data

untidy_birds<-read_csv(here("data","bird.csv"))
## Parsed with column specification:
## cols(
##   Species = col_character(),
##   Garden = col_double(),
##   Hedgerow = col_double(),
##   Parkland = col_double(),
##   Pasture = col_double(),
##   Woodland = col_double()
## )
glimpse(untidy_birds)
## Rows: 6
## Columns: 6
## $ Species  <chr> "Blackbird", "Chaffinch", "Great Tit", "House Sparrow", "Rob…
## $ Garden   <dbl> 47, 19, 50, 46, 9, 4
## $ Hedgerow <dbl> 10, 3, 0, 16, 3, 0
## $ Parkland <dbl> 40, 5, 10, 8, 0, 6
## $ Pasture  <dbl> 2, 0, 7, 4, 0, 0
## $ Woodland <dbl> 2, 2, 0, 0, 2, 0
head(untidy_birds)

Tidy the data

We see that the there is just one column that contains Species names, but that sightings values are contained in all the other columns, whose names are the various types of habitat. This dataset is definitely untidy.

In the code below, !Species means ‘NOT the Species column’, ie every column but this one.

tidy_birds<-pivot_longer(untidy_birds,!Species,names_to="Habitat",values_to="Sightings")

Group by species

gp_by_species<-tidy_birds %>%
  group_by(Species) %>%
  summarise(Max=max(Sightings),Min=min(Sightings))
gp_by_species

Group by habitat

gp_by_species<-tidy_birds %>%
  group_by(Habitat) %>%
  summarise(Most_often_seen_species=Species[which.max(Sightings)],Sightings=max(Sightings))
gp_by_species

Note here the use of which.max[Column name] to find the row number that contains the maximum value in that column, and then of column_name[index] to pick out the value in column <column name> at row <index>.