library(tidyverse)
library(ggmap)
library(dplyr)
library(gridExtra)
nyc <- read_csv("Squirel.csv")

DESCRIPTION

COLUMNS

Each column represents a different variable of the squirrel spotted in NYC

These variables include:

  • X and Y coordinates
  • Date and Time
  • Age (adult or juvenile)
  • Primary fur color
  • Highlight fur color
  • Location (Ground, above ground)
  • Running, chasing, climbing, eating, foraging (TRUE or FALSE)

ROWS

  • Each row indicates a specific squirrel that was recorded
  • There are 3023 rows

MISSING VALUES

  • There are no missing values when it comes to the main variables like X, Y, Shift, and Hectare
  • There are missing values when it comes to variables that are due to human error like Age, Fur Color, and Location

QUESTIONS

  1. Is there a statistically significant time of the day (AM/PM) when squirrels are most frequently observed?
  2. What fur color is most prevalent in Central Park?
  3. What region of Central Park has the greatest concentration of squirrel sightings?

QUESTION 1

Is there a statistically significant time of the day (AM/PM) when squirrels are most frequently observed?

I am using a bar chart to find the count of squirrel sightings per day. I then ordered them in chronological order to easily see the variance

nyc %>%
  group_by(Date) %>%
  summarise(count = n()) %>%
  ggplot() +
  geom_col(aes(x = reorder(Date,count), y = count),
           color = "blue",
           fill = "lightblue") +
  labs(x = "Date",
       y = "Count",
       title = "Number of Squirrels Per Day")

Based on this visualization, you can clearly see which days have more sightings than others. I will be focusing on the top two, which also happen to land on the weekend.

I am using a scatter plot with the X and Y coordinates to plot the squirrels on the Google map. I then colored the dots based on the shift of the day.

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkHybrid) +
  geom_point(data = nyc,
             aes(x = X, y = Y,
                 color = Shift),
             size = 0.75) +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Scatterplot")

This shows that squirrels are spread quite evenly around the park, and no big differences regarding AM and PM.

I am using a grid arrange of the top two sighting days, split up by their primary fur color.

g1 <- nyc %>%
  filter(Date == 10072018) %>%
  filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
  ggplot() +
  geom_bar(aes(x = `Primary Fur Color`),
           fill = "lightblue") +
  labs(y = "Count",
       title = "10-07-2018")

g2 <- nyc %>%
  filter(Date == 10132018) %>%
  filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
  ggplot() +
  geom_bar(aes(x = `Primary Fur Color`),
           fill = "purple") +
  labs(y = "Count",
       title = "10-13-2018")

grid.arrange(g1, g2, ncol = 2)

This visualization shows that gray is the most common color by far with around 320 for both days.

I facet wrapped the prior graph all flipped them for all days to easily compare.

nyc %>%
  filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
  ggplot() +
  geom_bar(aes(x = `Primary Fur Color`,
               fill = Date),
           fill = "red") +
  coord_flip() +
  facet_wrap(~Date, ncol = 2) +
  labs(x = "Count",
       title = "Fur Colors by Day")

Based on this one, you can see the the numbers go up and down regarding the day of the week.

QUESTION 2

What fur color is most prevalent in Central Park?

For this I grouped the data by Primary Fur Color, then counted the amount for each. I then used a bar chart to show the counts.

nyc %>%
  filter(`Primary Fur Color` != "NA") %>%
  group_by(`Primary Fur Color`) %>%
  summarise(Fur_count = n()) %>%
  ggplot() +
  geom_col(aes(x = `Primary Fur Color`, y = Fur_count,
               color = `Primary Fur Color`,
               fill = `Primary Fur Color`),
           alpha = 0.5) +
  labs(y = "Count",
       title = "Fur Color Numbers in Central Park")

This visualization shows hwo the gray color is far more prevelant compared to the cinnamon and black with about 2500 gray and only 500 of black and cinnamon combined.

These next three graphs are on the Google map image of Central park and are filtered to just show the specific color for each.

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkHybrid) +
  geom_point(data = nyc %>% filter(`Primary Fur Color` == "Black"),
             aes(x = X, y = Y),
             color = "white") +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Scatterplot")

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkHybrid) +
  geom_point(data = nyc %>% filter(`Primary Fur Color` == "Cinnamon"),
             aes(x = X, y = Y),
             color = "brown") +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Scatterplot")

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkHybrid) +
  geom_point(data = nyc %>% filter(`Primary Fur Color` == "Gray"),
             aes(x = X, y = Y),
             color = "gray") +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Scatterplot")

As stated before, there are an abundance more of the color gray and they are all evenly distributed around the park. Regarding the color black, they are more scattered around the edges of the park with only a few sightings in the central area. With the cinnamon color squirrels, there is a higher density at the bottom of the park and less at the top.

QUESTION 3

What region of Central Park has the greatest concentration of squirrel sightings?

I used a 2d density plot to show the density of all squirrel sightings located in central park.

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkTerrain) +
  geom_density_2d(data = nyc,
                  aes(x = X, y = Y)) +
    labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Density Plot")

I used a heat map density plot with bins and colored them with continuous virdis color to show the density of all squirrel sightings located in central park.

ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")

CentralParkHybrid <- get_map(location = "Central Park",
                       zoom = 14,
                       source = "google",
                       maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
                              zoom = 14,
                              source = "google",
                              maptype = "terrain")
ggmap(CentralParkTerrain) +  
  geom_bin2d(data = nyc,
             aes(x = X, y = Y),
             bins = 40,
             alpha = 0.7) +
  scale_fill_viridis_c() +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Central Park Heatmap Density Plot")

Regarding both density plots, you can clearly see the density distribution of how the squirrel sightings are across central park. Especially with the heat map visualization, the bright yellow spots show where the highest counts are.