Picking the Data

After looking through multiple datasets today I decided we should look at one of the most important datasets in New York’s database, the 2018 Central Park Squirrel Census. Per New York City’s OpenData, “The Squirrel Census”1 is a multimedia science, design, and storytelling project focusing on the Eastern gray (Sciurus carolinensis). They count squirrels and present their findings to the public. This table contains squirrel data for each of the 3,023 sightings, including location coordinates, age, primary and secondary fur color, elevation, activities, communications, and interactions between squirrels and with humans.”2

Data

Importing Data

Now that we have the most important data lets import it.

sq <- read_csv(here("_data", "2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv"))
## Rows: 3023 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): Unique Squirrel ID, Hectare, Shift, Age, Primary Fur Color, Highli...
## dbl  (4): X, Y, Date, Hectare Squirrel Number
## lgl (13): Running, Chasing, Climbing, Eating, Foraging, Kuks, Quaas, Moans, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(sq, class = 'table-bordered',
           caption = '2018 Central Park Squirrels',
           width = '100%', options = list(scrollX = TRUE, pageLength = 3, compact=TRUE
  ))

As you can see we have a lot of information on the squirrels. There are many ways to look at squirrel data, but I think I’ll focus on some of the “easier” data for now. Since there wasn’t much to do in cleaning I just made everything lowercase and fixed the naming to make it shorter.

Cleaning the Data

sq <- rename(sq,
             "long" = "X", 
             "lat" = "Y",
          "unique.id" = "Unique Squirrel ID",
          "primary.color" = "Primary Fur Color",
          "highlight.color" = "Highlight Fur Color",
          "combo.primary.highlight.color" = "Combination of Primary and Highlight Color",
          "color.notes" = "Color notes",
          "above.ground.sight.measure" = "Above Ground Sighter Measurement",
          "specific.location" = "Specific Location",
          "other.activites" = "Other Activities", 
          "tail.twitches" = "Tail twitches",
          "tail.flags" = "Tail flags",
          "runs.from" = "Runs from",
          "other.interactions" = "Other Interactions", 
          "lat.long" = "Lat/Long", 
          "hectare.num" = "Hectare Squirrel Number"
          )

colnames(sq)<-tolower(colnames(sq))

Seperating the Data

Spot that Squirrel!

I think that each of the data is interesting, however I would like to break this down into more manageable tables. For one table I want to see what the squirrel was doing when they were spotted. Per the data here is what each activity is:

  • Running - Squirrel was seen running.
  • Chasing - Squirrel was seen chasing another squirrel.
  • Climbing - Squirrel was seen climbing a tree or other environmental landmark.
  • Eating - Squirrel was seen eating.
  • Foraging - Squirrel was seen foraging for food.
  • Kuks - Squirrel was heard kukking, a chirpy vocal communication used for a variety of reasons.
  • Quaas - Squirrel was heard quaaing, an elongated vocal communication which can indicate the presence of a ground predator such as a dog.
  • Moans - Squirrel was heard moaning, a high-pitched vocal communication which can indicate the presence of an air predator such as a hawk.
  • Tail Flags - Squirrel was seen flagging its tail. Flagging is a whipping motion used to exaggerate squirrel’s size and confuse rivals or predators. Looks as if the squirrel is scribbling with tail into the air.
  • Tail Twitches - Squirrel was seen twitching its tail. Looks like a wave running through the tail, like a breakdancer doing the arm wave. Often used to communicate interest, curiosity.
  • Approaches - Squirrel was seen approaching human, seeking food.
  • Indifferent - Squirrel was indifferent to human presence.
  • Runs From - Squirrel was seen running from humans, seeing them as a threat.
sq.spotting <- sq  %>%
        select(unique.id, shift, age, running, chasing, climbing, 
               eating, foraging, kuks, quaas, moans, 
               tail.flags, tail.twitches, approaches, runs.from) %>% 
  pivot_longer(cols = -c("unique.id", "shift", "age"),
              names_to = "activites",
              values_to = "T.F",
              values_drop_na = TRUE) %>%
  relocate(starts_with("o"), .after = "T.F" ) %>%
   group_by(unique.id) %>%
  filter(!T.F == "FALSE") %>%
  mutate(age = replace_na(age, "?"))

p1 <- ggplot(sq.spotting, aes(x=age, fill=activites)) +
    geom_bar(position=position_dodge()) + 
  labs(title = "Squirrel Activites Count", 
       caption = "Squirrels may do more than one activity at a time", 
       x = "age", y= "count") +
   theme_minimal() +
 theme(legend.key.size = unit(3, 'mm'),legend.key.width = unit(9,"mm"), 
       legend.position="bottom") + 
  scale_fill_brewer(palette = "Set3")

p1

p2 <- ggplot(sq.spotting, aes(x=shift, fill=activites)) +
    geom_bar(position=position_dodge()) + 
  labs(title = "Squirrel Activites during AM or PM", 
       caption = "Squirrels may do more than one activity at a time", 
       x = "AM or PM", y= "count") +
   theme_minimal() +
 theme(legend.key.size = unit(3, 'mm'),legend.key.width = unit(9,"mm"), 
       legend.position="bottom") + 
  scale_fill_brewer(palette = "Set3") 

p2

p3 <- p1 + facet_wrap(~activites) 

p4 <- p2 + facet_wrap(~activites)

p5 <- p3 + p4

p5 &  theme(legend.position = "none")

For the x-axis you will notice I have broken up the squirrels by types. There are adult squirrels, juvenile squirrels, and ? which is squirrels that the researcher put as unknown or there was no data filled in on them.

As you can see, squirrels were often foraging the most followed by eating. However interestingly running and run from are both high on the list. Since each squirrel may do more than one action at a time it may skew the results slightly.

For AM or PM we can also see similar movements throughout the day. Foraging and eating are the highest while running and running from seem to be slightly higher in the afternoon than the morning. This is most likely due to more people being around and squirrels having to be more on the lookout.

Interestingly for both breaking it out for ages and AM or PM tail flags and tail twitches are very low to non-existent. This may be because these types of actions may not happen when the researchers around due to squirrels being used to humans and not seeing them as a threat or being curious about them.

I have also used facet_wrap() to look at the graphs more closely. On the last one I combined them with patchwork, unfortunately the X-axis for ages became a it messed up.

If you’re interested, below you will find a table of more specific comments on what the researchers saw the squirrels do such as runs from (dogs!) and made a back-door escape from dog off-leash

sq.comments <- sq %>%
  select(other.activites, other.interactions) %>% 
  filter(!across(c(other.activites, other.interactions), ~ is.na(.)))

datatable(sq.comments, class = 'table-bordered',
           caption = '2018 Central Park Squirrel Observations',
           width = '100%', options = list(scrollX = TRUE, pageLength = 5, compact=TRUE
  ))

The Most Colorful Squirrel

Lets also look at the different colors of the squirrels.

sq.color <- sq  %>%
        select(contains("color")) %>% 
  filter(!across(c(primary.color), ~ is.na(.)))

p6 <- ggplot(sq.color, aes(x=primary.color, y=highlight.color, 
                           color=combo.primary.highlight.color)) + 
    geom_point() +
    labs(title = "Squirrel's Fur Colors",
         caption = "Squirrels may have more than one highlight color",
       x = "Primary Color", y= "Highlight Color",
       color = "Combos") +
   theme_minimal() +
 theme(legend.key.size = unit(.5, 'mm'),legend.key.width = unit(.5,"mm"), 
       legend.position = "none")

p6

Here we can see a basic visualization of the combination of colors a squirrel can have for its fur. The primary color will be the main color and the highlight color will be what color is not the main color. A NA for highlight color means the squirrel did not have a highlight color or the researcher did not see the squirrel long enough.

After looking at the plot I can tell that in 2018 the researchers saw primary color gray squirrels with the most different highlight colors followed by primary color cinnamon. This is not surprising due to the species of squirrels they were studying.

Squirrel GPS

Finally lets focus on where squirrels were found.

sq.gps <- sq %>%
  select(long, lat)

leaflet(sq.gps) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircles() %>% 
  addMarkers(lng=   -73.96743, lat=40.78297, popup="The first squirrel data point of 2018") %>% 
  addMarkers(lng=   -73.95467, lat=40.79476, popup="The last squirrel data point of 2018")

As we can see if you zoom in the researchers were thorough and squirrels were seen almost everywhere throughout the park. The places squirrels were mostly not spotted were either water or meadows. For this I only used the Leaflet package, as all of my data is centralized to Central Park (pun), however there is some nice ideas with using it with the Maps package if it would be helpful!