We can see scientificName joins the tables. Data contains lat/long, date/time (and time zones), state, tribe and subfamily. The field classes appear correctly auto-detected, so the data is probably needs little or no cleaning, but there are omissions in commonName and secondary_commonNames.
In this case local time is more important than universal time, time conversion is unnecessary.
Other relevant questions the data might answer are:
Can a link be found to photos of each species?
Can a database listing species discoveries be found? Who identified the most? When have the most discoveries happened?
Where is the most biodiversity in frogs?
Are there any Biome maps that can be compared to species distribution? Are any species generalists?
Do the species distributions more closely match biomes or watersheds?
Some data cleaning is needed.
129 records showing “Other Territories” for StateProvince were found to be around Jervis Bay and relabeled New South Wales.
Code
list_of_species_with_no_id_records <-setdiff(frog_names$scientificName, frogID_data$scientificName)list_of_species_ided_without_a_name_match <-setdiff(frogID_data$scientificName, frog_names$scientificName)cat("Some notes about cleaning the data")cat(length(list_of_species_with_no_id_records),"/",length(unique(frog_names$scientificName)), " species from the list of frog names do not appear in the observation data set.\n")cat("A look over these species show some are extinct (Rheobatrachus silus), some are newly discovered (Philoria knowlesi), some are considered rare and may not have been found during the survey time.\n")observations <- frogID_data %>%filter(scientificName %in% list_of_species_ided_without_a_name_match)cat(length(list_of_species_ided_without_a_name_match), " species in the obersavtion data do not appear in the frogName data set, accounting for ", length(observations$scientificName),"/", length(frogID_data$scientificName), " observations.\n")quicksum <- observations %>%group_by(scientificName) %>%summarise(count =n()) %>%mutate(pc = count/sum(count) *100)cat("For our purposes these will be matched by common names.\n") frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Limnodynastes dumerilii","Limnodynastes dumerilii dumerilii",scientificName))frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Litoria verreauxii","Litoria verreauxii verreauxii",scientificName))frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Cyclorana platycephala","Cyclorana platycephalus",scientificName))frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Lechriodus fletcheri","Platyplectrum fletcheri",scientificName))frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Philoria sphagnicola","Philoria sphagnicolus",scientificName))frogID_data <- frogID_data %>%mutate(scientificName =ifelse(scientificName =="Heleioporus australiacus","Heleioporus australiacus australiacus",scientificName))frogID_data <- frogID_data %>%mutate(stateProvince =ifelse(stateProvince =="Other Territories","New South Wales",stateProvince))frog_names <- frog_names %>%na.omit() %>%filter(scientificName !="Uperoleia mjobergii"| secondary_commonNames !="—")list_of_species_ided_without_a_name_match <-setdiff(frogID_data$scientificName, frog_names$scientificName)df <-left_join(frog_names, frogID_data, by =c("scientificName")) %>%na.omit()#df <- df %>% mutate(across(where(is.character), as.factor))
Some notes about cleaning the data113 / 293 species from the list of frog names do not appear in the observation data set.
A look over these species show some are extinct (Rheobatrachus silus), some are newly discovered (Philoria knowlesi), some are considered rare and may not have been found during the survey time.
6 species in the obersavtion data do not appear in the frogName data set, accounting for 9158 / 136621 observations.
For our purposes these will be matched by common names.
1.2 Observation times
The times that observations were made is not even throughout the day. The hours that people are making observations and the hours that frogs are active are two factors affecting this. There does seem to be an effect from Latitude too. Low latitudes might lead to low levels of activity in frogs and people. The large spikes in frog observations at 10am and 8pm might actually be more about the times people actively look for frog than frog activity levels. A dip in observations at 6pm may correspond with the time many people have dinner.
Code
lat_bin <-cut_number(df$decimalLatitude, n =10)time_bin <-time_cut(as.numeric(df$eventTime), n =288)time_of_day_plot <-ggplot(mapping =aes(x = time_bin, colour = lat_bin, fill = lat_bin)) +geom_bar(position ="stack")time_of_day_plot <- time_of_day_plot +labs(title ="Histogram of frog observations by time of day", "Colour scale show latitude of the observation", fill ="Latitude\nSouth is Red\nNorth is Magenta", alt ="This is a chart showing increases in observations around 10am and 8pm. There is a small dip at 6:30pm. ") +xlab("Time of day")+ylab("Number of observations")time_of_day_plot <- time_of_day_plot +scale_x_time(labels =function(x) format(as_datetime(x, tz ="UTC"), "%H:%M:%S")) +guides(colour ="none")time_of_day_plot
Code
time_of_day_plot <-ggplot(mapping =aes(x = time_bin, colour = lat_bin, fill = lat_bin)) +geom_bar(position ="fill")time_of_day_plot <- time_of_day_plot +labs(title ="Proportion of frog observations as given latitudes, by time of day", "Colour scale show latitude of the observation", fill ="Latitude\nSouth is Red\nNorth is Magenta") +xlab("Time of day")+ylab("Proportion of observations")time_of_day_plot <- time_of_day_plot +scale_x_time(labels =function(x) format(as_datetime(x, tz ="UTC"), "%H:%M:%S")) +guides(colour ="none")time_of_day_plot
Let’s compare scientificName with stateProvince for a coarse examination of whether species are highly endemic.
Code
species_count_states <- df %>%group_by(scientificName, stateProvince, tribe, subfamily) %>%summarise() %>%group_by(scientificName, tribe, subfamily) %>%summarise(Number_of_states =n())p <- species_count_states %>%ggplot(mapping =aes(x = Number_of_states, fill = subfamily)) +geom_bar()p <- p +labs(title ="Number of States in which each species was observed") +xlab("Number of States") +ylab("Number of species")p
We can see that the majority of species were only observed in one or two states. We see that a sizable subfamily (Microhylidae) were only observed in Queensland.
Even a coarse examination of the data suggests frogs are generally endemic.
Let’s look closer at their distributions.
We need to do a little prep. To make a responsive map showing observation locations we will fetch data on user choice of species. We will create this leaflet map via javascript for extra flexibility.
1.3.2 Do different frog species have distinct calling seasons?
Below are three species that are found in similar areas, but the distributions of their sightings makes it clear species call at different times of the year.