After looking through multiple datasets today I decided we should look at one of the most important datasets in New York’s database, the 2018 Central Park Squirrel Census. Per New York City’s OpenData, “The Squirrel Census”1 is a multimedia science, design, and storytelling project focusing on the Eastern gray (Sciurus carolinensis). They count squirrels and present their findings to the public. This table contains squirrel data for each of the 3,023 sightings, including location coordinates, age, primary and secondary fur color, elevation, activities, communications, and interactions between squirrels and with humans.”2
Now that we have the most important data lets import it.
sq <- read_csv(here("_data", "2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv"))
## Rows: 3023 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): Unique Squirrel ID, Hectare, Shift, Age, Primary Fur Color, Highli...
## dbl (4): X, Y, Date, Hectare Squirrel Number
## lgl (13): Running, Chasing, Climbing, Eating, Foraging, Kuks, Quaas, Moans, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(sq, class = 'table-bordered',
caption = '2018 Central Park Squirrels',
width = '100%', options = list(scrollX = TRUE, pageLength = 3, compact=TRUE
))
As you can see we have a lot of information on the squirrels. There are many ways to look at squirrel data, but I think I’ll focus on some of the “easier” data for now. Since there wasn’t much to do in cleaning I just made everything lowercase and fixed the naming to make it shorter.
sq <- rename(sq,
"long" = "X",
"lat" = "Y",
"unique.id" = "Unique Squirrel ID",
"primary.color" = "Primary Fur Color",
"highlight.color" = "Highlight Fur Color",
"combo.primary.highlight.color" = "Combination of Primary and Highlight Color",
"color.notes" = "Color notes",
"above.ground.sight.measure" = "Above Ground Sighter Measurement",
"specific.location" = "Specific Location",
"other.activites" = "Other Activities",
"tail.twitches" = "Tail twitches",
"tail.flags" = "Tail flags",
"runs.from" = "Runs from",
"other.interactions" = "Other Interactions",
"lat.long" = "Lat/Long",
"hectare.num" = "Hectare Squirrel Number"
)
colnames(sq)<-tolower(colnames(sq))
I think that each of the data is interesting, however I would like to break this down into more manageable tables. For one table I want to see what the squirrel was doing when they were spotted. Per the data here is what each activity is:
sq.spotting <- sq %>%
select(unique.id, shift, age, running, chasing, climbing,
eating, foraging, kuks, quaas, moans,
tail.flags, tail.twitches, approaches, runs.from) %>%
pivot_longer(cols = -c("unique.id", "shift", "age"),
names_to = "activites",
values_to = "T.F",
values_drop_na = TRUE) %>%
relocate(starts_with("o"), .after = "T.F" ) %>%
group_by(unique.id) %>%
filter(!T.F == "FALSE") %>%
mutate(age = replace_na(age, "?"))
p1 <- ggplot(sq.spotting, aes(x=age, fill=activites)) +
geom_bar(position=position_dodge()) +
labs(title = "Squirrel Activites Count",
caption = "Squirrels may do more than one activity at a time",
x = "age", y= "count") +
theme_minimal() +
theme(legend.key.size = unit(3, 'mm'),legend.key.width = unit(9,"mm"),
legend.position="bottom") +
scale_fill_brewer(palette = "Set3")
p1
p2 <- ggplot(sq.spotting, aes(x=shift, fill=activites)) +
geom_bar(position=position_dodge()) +
labs(title = "Squirrel Activites during AM or PM",
caption = "Squirrels may do more than one activity at a time",
x = "AM or PM", y= "count") +
theme_minimal() +
theme(legend.key.size = unit(3, 'mm'),legend.key.width = unit(9,"mm"),
legend.position="bottom") +
scale_fill_brewer(palette = "Set3")
p2
p3 <- p1 + facet_wrap(~activites)
p4 <- p2 + facet_wrap(~activites)
p5 <- p3 + p4
p5 & theme(legend.position = "none")
For the x-axis you will notice I have broken up the squirrels by types. There are adult squirrels, juvenile squirrels, and ? which is squirrels that the researcher put as unknown or there was no data filled in on them.
As you can see, squirrels were often foraging the most followed by eating. However interestingly running and run from are both high on the list. Since each squirrel may do more than one action at a time it may skew the results slightly.
For AM or PM we can also see similar movements throughout the day. Foraging and eating are the highest while running and running from seem to be slightly higher in the afternoon than the morning. This is most likely due to more people being around and squirrels having to be more on the lookout.
Interestingly for both breaking it out for ages and AM or PM tail flags and tail twitches are very low to non-existent. This may be because these types of actions may not happen when the researchers around due to squirrels being used to humans and not seeing them as a threat or being curious about them.
I have also used facet_wrap() to look at the graphs more closely. On the last one I combined them with patchwork, unfortunately the X-axis for ages became a it messed up.
If you’re interested, below you will find a table of more specific comments on what the researchers saw the squirrels do such as runs from (dogs!) and made a back-door escape from dog off-leash
sq.comments <- sq %>%
select(other.activites, other.interactions) %>%
filter(!across(c(other.activites, other.interactions), ~ is.na(.)))
datatable(sq.comments, class = 'table-bordered',
caption = '2018 Central Park Squirrel Observations',
width = '100%', options = list(scrollX = TRUE, pageLength = 5, compact=TRUE
))
Lets also look at the different colors of the squirrels.
sq.color <- sq %>%
select(contains("color")) %>%
filter(!across(c(primary.color), ~ is.na(.)))
p6 <- ggplot(sq.color, aes(x=primary.color, y=highlight.color,
color=combo.primary.highlight.color)) +
geom_point() +
labs(title = "Squirrel's Fur Colors",
caption = "Squirrels may have more than one highlight color",
x = "Primary Color", y= "Highlight Color",
color = "Combos") +
theme_minimal() +
theme(legend.key.size = unit(.5, 'mm'),legend.key.width = unit(.5,"mm"),
legend.position = "none")
p6
Here we can see a basic visualization of the combination of colors a squirrel can have for its fur. The primary color will be the main color and the highlight color will be what color is not the main color. A NA for highlight color means the squirrel did not have a highlight color or the researcher did not see the squirrel long enough.
After looking at the plot I can tell that in 2018 the researchers saw primary color gray squirrels with the most different highlight colors followed by primary color cinnamon. This is not surprising due to the species of squirrels they were studying.
Finally lets focus on where squirrels were found.
sq.gps <- sq %>%
select(long, lat)
leaflet(sq.gps) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircles() %>%
addMarkers(lng= -73.96743, lat=40.78297, popup="The first squirrel data point of 2018") %>%
addMarkers(lng= -73.95467, lat=40.79476, popup="The last squirrel data point of 2018")
As we can see if you zoom in the researchers were thorough and squirrels were seen almost everywhere throughout the park. The places squirrels were mostly not spotted were either water or meadows. For this I only used the Leaflet package, as all of my data is centralized to Central Park (pun), however there is some nice ideas with using it with the Maps package if it would be helpful!