This exercise uses a set of data on birds from Mark Gardener.
library(tidyverse)
library(here)
untidy_birds<-read_csv(here("data","bird.csv"))
## Parsed with column specification:
## cols(
## Species = col_character(),
## Garden = col_double(),
## Hedgerow = col_double(),
## Parkland = col_double(),
## Pasture = col_double(),
## Woodland = col_double()
## )
glimpse(untidy_birds)
## Rows: 6
## Columns: 6
## $ Species <chr> "Blackbird", "Chaffinch", "Great Tit", "House Sparrow", "Rob…
## $ Garden <dbl> 47, 19, 50, 46, 9, 4
## $ Hedgerow <dbl> 10, 3, 0, 16, 3, 0
## $ Parkland <dbl> 40, 5, 10, 8, 0, 6
## $ Pasture <dbl> 2, 0, 7, 4, 0, 0
## $ Woodland <dbl> 2, 2, 0, 0, 2, 0
head(untidy_birds)
We see that the there is just one column that contains Species names, but that sightings values are contained in all the other columns, whose names are the various types of habitat. This dataset is definitely untidy.
In the code below, !Species means ‘NOT the Species column’, ie every column but this one.
tidy_birds<-pivot_longer(untidy_birds,!Species,names_to="Habitat",values_to="Sightings")
gp_by_species<-tidy_birds %>%
group_by(Species) %>%
summarise(Max=max(Sightings),Min=min(Sightings))
gp_by_species
gp_by_species<-tidy_birds %>%
group_by(Habitat) %>%
summarise(Most_often_seen_species=Species[which.max(Sightings)],Sightings=max(Sightings))
gp_by_species
Note here the use of which.max[Column name] to find the row number that contains the maximum value in that column, and then of column_name[index] to pick out the value in column <column name> at row <index>.