11/28/2017
Ordway bird species is a table of records of birds captured and released at the Katharine Ordaway natural History study area.
There are mistakes in the data entry. The variable SpeciesName
needs some fixing. It identifies each of the species of birds, the some of the spelling of the birds with similar names varies. This leads so misclassifications of birds. There is also month and day variables that have issues as well.
The data table OrdwaySpeciesNames
collects together all the different types of names. The assignment is basically creating a manual for birders to guide them to the correct time of year to visit ordway to see a particular species.
Getting the data table:
OrdwayBirds <- OrdwayBirds %>% select(SpeciesName, Month, Day) %>% mutate(Month= as.numeric(as.character(Month)), Day= as.numeric(as.character(Day)))
The mutate()
function arranges month and day as numerical variables.
Including Mis-spellings, how many different species are there in OrdwayBirds data?
Make a data table that gives the number of the distinct species in the SpeciesNameCleaned
variable in OrdwaySpeciesNames
. Using n_distinct()
is very helpful counting the number of unique values in a variable.
OrdwayBirds %>% summarise(count = n_distinct(SpeciesName))
## count ## 1 275
New data table:
OrdwaySpecNameCount <- OrdwaySpeciesNames %>% summarise(count = n_distinct(SpeciesNameCleaned)) OrdwaySpecNameCount
## count ## 1 109
Use the OrdwaySpeciesNames table to create a new data table that corrects the mispellings in SpeciesNames
. Can be done by easily using the inner_join()
data verb.
Corrected <- OrdwayBirds %>% inner_join( OrdwaySpeciesNames) %>% select(Species = SpeciesNameCleaned, Month, Day) %>% na.omit() # Cleans up the missing ones
Look at the names of the varibles in OrdwaySpeciesNames and OrdwayBirds:
Whch variable was used for matching cases?
What were the variables that will be added?
Count how many bird captures there are of each of the corrected species. You can call the data table that contains the count, CountCorrect
. Arrange this into descending order from the species with the most birds and look through the list.
CountCorrect <- Corrected %>% group_by(Species) %>% summarise(count=n()) %>% arrange(desc(count))
Define for yourself a "major species" as a species with more than a particular threshold count. Set your threshold so that there are 5 or 6 species designated a major
Filter to produce a data table with only the birds that belong to a major species. Save the output in a table called Majors
.
topSixSpec <- CountCorrect %>% head(n = 6) %>% .$Species topSixSpec
## [1] "Slate-colored Junco" "Tree Swallow" ## [3] "Black-capped Chickadee" "American Goldfinch" ## [5] "Field Sparrow" "Lincoln's Sparrow"
Majors <- Corrected %>% filter(Species %in% topSixSpec)
Write a command that produces the month-by-month count of each of the major species. Call this table ByMonth.
ByMonth <- Majors %>% group_by(Species, Month) %>% summarise(count = n()) %>% arrange(Month)
Display this month-by-month count with a bar chart arranged in a way that will tell the story of what time of year the various species appear.
CrazyMonth <- monthAbbr <- with(Majors, plyr::mapvalues(Month, from = 1:12, to = month.abb)) CrazyMonth <- factor(CrazyMonth, levels = month.abb) Majors$Month <- CrazyMonth
Majors %>% ggplot(aes(x = Month)) + geom_bar() + facet_wrap(~ Species) + theme(axis.text.x=element_text(angle=45,hjust=1))
What time of year the various Species appear.
Overall, correcting the data allows for people to basically know what months are best for going to the Ordway to see particular species especially the major species. In the future trying to do minor species as well with a threshold then compare the months of the major and minor species.