Overview of the Project and Background

The dataset that I chose to analyze looks at squirrels in NYC Central Park. The report comes from the 2018 Central Park Squirrel Census.

This dataset includes both physical and behavioral characteristics of observed squirrels within the park. Additional emphasis is placed on coordinates and time-of-day observations.

In this report, I focus on variables related to fur color and age, observed behaviors (running, climbing, foraging, etc.), coordinates (X/Y), shift (AM/PM), and location (ground vs above ground). To explore these relationships, I use basic descriptive statistics and five visualizations that connect squirrel behavior to geography and observation context within Central Park.

The purpose of conducting this analysis is to provide an ecological perspective towards the other inhabitants of one of the most populated cities in the United States. Looking at descriptions and relationships between behaviors allows for a deeper understanding of general stability among squirrel populations within the park, as well as if there are any major differences between population groups. Behavior within a hyper-urban environment is particularly interesting as “natural” stimuli are minimized, while interactions with humans is heightened.

Libraries and Data Import

library(lubridate)
library(scales)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
library(leaflet)
library(tidyr)

nycsquirrels_raw <- read.csv(
  "/Users/loganvarra/Downloads/2018_Central_Park_Squirrel_Census_-_Squirrel_Data_20260203.csv",
  stringsAsFactors = FALSE
)

# Quick structure checks
dim(nycsquirrels_raw)

The dataset containts 3,023 observations across 31 variables.

I imported the dataset from the publicly available 2018 Central Park Squirrel Census. Manipulation packages, such as dplyr and tidyr, were loaded to support data wrangling. Additional shaping and visualization libraries (ggplot2, leaflet, lubridate) were necessary in completing this assignment. General structural checks were useful in understanding the scope of the data.

Cleaning and Preparing the Data for Analysis

# Convert Date from Numeric to Date Type
nycsquirrels_raw$Date <- mdy(as.character(nycsquirrels_raw$Date))

# Remove NAs from data
nycsquirrels_clean <- nycsquirrels_raw %>%
  filter(
    !is.na(X),
    !is.na(Y),
    !is.na(Primary.Fur.Color),
    !is.na(Age)
  ) %>%
  mutate(
    Age = as.factor(Age),
    Primary.Fur.Color = as.factor(Primary.Fur.Color)
  )

#Following research, I found that it would be best to convert Age and Primary Fur Color
#to a factor in order to keep these categorical rather than strings. As strings, R
# may try to order them or behave in unexpected ways.
nycsquirrels_clean <- nycsquirrels_clean %>%
  filter(Age %in% c("Adult", "Juvenile"))
nycsquirrels_clean$Age <- droplevels(nycsquirrels_clean$Age)

nycsquirrels_clean <- nycsquirrels_clean %>%
  filter(Primary.Fur.Color %in% c("Black", "Cinnamon", "Gray"))
nycsquirrels_clean$Primary.Fur.Color <- droplevels(nycsquirrels_clean$Primary.Fur.Color)

levels(nycsquirrels_clean$Age)
levels(nycsquirrels_clean$Primary.Fur.Color)

# Behavior columns used throughout visuals 2–5
behavior_cols <- c(
  "Running", "Chasing", "Climbing", "Eating", "Foraging",
  "Kuks", "Quaas", "Moans", "Tail.flags", "Tail.twitches",
  "Approaches", "Indifferent", "Runs.from"
)

# Create behavior_count = number of TRUE behaviors per sighting
nycsquirrels_clean <- nycsquirrels_clean %>%
  mutate(
    behavior_count = rowSums(
      sapply(select(., all_of(behavior_cols)), \(x) tolower(as.character(x)) == "true"),
      na.rm = TRUE
    )
  )

#General summary check
summary(nycsquirrels_clean$behavior_count)
table(nycsquirrels_clean$behavior_count)

Before creating visuals, it was crucial to clean and manage the data as there were missing values (coordinate and demographics in particular). Additional manipulation was performed to convert physical traits such as age and primary fur color to categorical variables. A new variable, behavior_count was created through summing the number of observed behaviors per sighting.

Visualizations

Leaflet Map Showing Squirrel Sightings in Central Park

pal <- colorFactor(
  palette = c("gray60", "saddlebrown", "black"),
  domain = nycsquirrels_clean$Primary.Fur.Color
)
visual1 <- leaflet(nycsquirrels_clean) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    lng = ~X,
    lat = ~Y,
    fillColor = ~pal(Primary.Fur.Color),
    color = ~pal(Primary.Fur.Color),
    radius = 3,
    stroke = FALSE,
    fillOpacity = 0.6
  ) %>%
  addLegend(
    "bottomright",
    pal = pal,
    values = ~Primary.Fur.Color,
    title = "Primary Fur Color"
  )

visual1

Visual 1 Analysis

This map is a demonstration of squirrel sightings throughout Central Park in New York, NY. Each marker (circle) is a representation of an individual sighting and the color of the marker is the primary fur color. It appears that grey squirrels are the most prevalent throughout a majority of the park. This map indicates that areas with frequent cross-park access, such as walkways and nearby bus routes, are the most prevalent with sightings. One is able to identify these areas of more frequent sightings through the density of sightings. This includes not just grey squirrel sightings, but the other colors as well. By itself, this graph is not able to identify whether or not squirrels naturally congregate around these areas, or if increased sightings is a result of increased foot traffic (correlation rather than causation). Overall, this graph is a helpful initial view to visualize general squirrel distributions.

Boxplot of Age and Observed Behavior Count

#I will now create a boxplot to descripe the relationship between age and behavior.
#Color was also changed for personal preference and theme. 
visual2 <- ggplot(nycsquirrels_clean, aes(x = Age, y = behavior_count, fill = Age)) +
  geom_boxplot(outlier.alpha = 0.3, linewidth = 0.8) +
  scale_fill_manual(values = c("Adult" = "#355E3B", "Juvenile" = "#8B5A2B")) +
  labs(
    title = "Squirrel Behavior Count by Age Group",
    subtitle = "Behavior count is the number of observed behaviors",
    x = "Age Group",
    y = "Behavior Count"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none"
  )

visual2

Visual 2 Analysis

This boxplot aims to compare the overall count of observed behaviors between the two different age groups of squirrels within the dataset: adults and juveniles. There are similar median behavior counts for both groups, which may suggest that there is not a clear distinction of overall activity levels between the two. However, there are more outliers within the adult group, which may support a hypothesis that adults display a broader range of behaviors. Overall, there is not a substantial difference between the two groups.

Density Plot of Behavior Counts Throughout the Day

visual3 <- nycsquirrels_clean %>%
  filter(Shift %in% c("AM", "PM")) %>%
  ggplot(aes(x = behavior_count, fill = Shift)) +
  geom_density(
    alpha = 0.45,
    adjust = 1.2,
    color = NA
  ) +
  scale_fill_manual(values = c("AM" = "#355E3B", "PM" = "#8B5A2B")) +
  scale_x_continuous(breaks = 0:8) +
  labs(
    title = "Distribution of Behavior Counts by Observation Shift",
    subtitle = "Comparing activity levels between AM and PM observations",
    x = "Behavior Count",
    y = "Density",
    fill = "AM or PM"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right",
    panel.grid.minor = element_blank()
  )

visual3

Visual 3 Analysis

Previously, we have examined observed behaviors across age and location, but not yet time. New York City is known for being the “City that Never Sleeps,” so I was curious if that also applied to our rodent companions. Both time shifts (AM vs PM) peak at 2 behaviors, which may indicate that there is similar behavior level between the two time periods. Overall, it does not appear that time of day meaningfully influences the number of behaviors observed per squirrel sighting. This may come from a combination of squirrel behavior being altered in urban environments, along with more frequent PM traveling done by NYC residents compared to other settings.

Heatmap of Behavior Prevalence by Location

#First, we have to convert the behavior columns to TRUE/FALSE. This allows us to compute TRUE by location.
heat_df <- nycsquirrels_clean %>%
  filter(Location %in% c("Ground Plane", "Above Ground")) %>%   # use your real values
  mutate(across(all_of(behavior_cols), ~tolower(as.character(.x)) == "true")) %>%
  pivot_longer(cols = all_of(behavior_cols), names_to = "Behavior", values_to = "Observed") %>%
  group_by(Location, Behavior) %>%
  summarise(pct = mean(Observed, na.rm = TRUE) * 100, .groups = "drop")
#Ordering behaviors
heat_df$Behavior <- factor(heat_df$Behavior, levels = behavior_cols)
visual4 <- ggplot(heat_df, aes(x = Behavior, y = Location, fill = pct)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(pct, 0)), size = 3) +
  scale_fill_gradient(low = "#F7FCF5", high = "#00441B", limits = c(0, 70)) +
  labs(
    title = "Behavior Prevalence by Location",
    subtitle = "Percentage of sightings in which behavior was observed",
    x = "",
    y = "",
    fill = "% Observed"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid = element_blank()
  )

visual4

Visual 4 Analysis

Eating and foraging appear to be more common on the ground when compared to above-ground sightings. Running is slightly more evenly split among on-ground and above-ground observations. There may be equal access to space among both of these areas to engage in running. There are small variations in the sound-making behaviors of squirrels between locations, along with non-verbal expressions such as tail flags. Overall, this indicates that on-ground versus above-ground has a stronger impact on behaviors for survival, while location has less of an impact on social behaviors.

Donut Chart of the Top Three Squirrel Behaviors

#For the last visual, we will be creating a donut chart summarizing behavior composition
#First, compute the behavior percentages across the data
behavior_overall <- nycsquirrels_clean %>%
  mutate(across(all_of(behavior_cols),
                ~tolower(as.character(.x)) == "true")) %>%
  summarise(across(all_of(behavior_cols),
                   ~mean(.x, na.rm = TRUE) * 100)) %>%
  #using pivot_longer, we are able to make sure that behaviors are stacked and then indicated with TRUE or FALSE
  pivot_longer(cols = everything(),
               names_to = "Behavior",
               values_to = "Percent") %>%
  arrange(desc(Percent))
#Taking the top three behaviors
top_behaviors <- behavior_overall %>%
  slice_max(Percent, n = 3)
#creating the visual
visual5 <- ggplot(top_behaviors,
                  aes(x = 2, y = Percent, fill = Behavior)) +
  geom_col(color = "white", width = 1) +
  coord_polar(theta = "y") +
  xlim(0.5, 2.5) +
  scale_fill_manual(
    values = c(
      "Foraging" = "#355E3B",
      "Eating" = "#8B5A2B",
      "Indifferent" = "#9CAF50"
    )
  ) +
  theme_void() +
  labs(
    title = "Top 3 Most Common Squirrel Behaviors",
    subtitle = "Overall prevalence across all sightings"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right",
    panel.background = element_rect(fill = "white"),
    plot.background  = element_rect(fill = "white")
  )
visual5 +
  geom_text(aes(label = paste0(round(Percent,1), "%")),
            position = position_stack(vjust = 0.5),
            size = 4)

Visual 5 Analysis

This donut chart (created for visual ease) is a summary of the top three behaviors most frequently observed among squirrel populations in Central Park. Foraging is a dominant behavior, alongside indifference. This may provide support towards a conclusion that most squirrels when observed in Central Park are engaging in basic behaviors, rather than overtly social ones. Eating is the third most prevalent behavior, which is a logical extension of foraging. Overall, this graph and the supporting additional graphs demonstrate that the squirrels that are observed are usually engaging in food-related behaviors.

Brief Conclusion

Following an analysis conducted through the use of five different data visualizations, it can be observed that the squirrels who inhabit Central Park are most often seen engaging in survival-oriented behaviors: particularly foraging and eating. There are modest differences by time of day and age. Location is more impactful when it comes to movement and feeding rather than social behaviors. Overall, these findings and visualizations would suggest that living in this urban environment more predominantly affects the quantity of observed survival-based behaviors rather than social ones. These conclusions extend across age and fur color.