library(tidyverse)
library(ggmap)
library(dplyr)
library(gridExtra)
nyc <- read_csv("Squirel.csv")
Each column represents a different variable of the squirrel spotted in NYC
These variables include:
Is there a statistically significant time of the day (AM/PM) when squirrels are most frequently observed?
I am using a bar chart to find the count of squirrel sightings per day. I then ordered them in chronological order to easily see the variance
nyc %>%
group_by(Date) %>%
summarise(count = n()) %>%
ggplot() +
geom_col(aes(x = reorder(Date,count), y = count),
color = "blue",
fill = "lightblue") +
labs(x = "Date",
y = "Count",
title = "Number of Squirrels Per Day")
Based on this visualization, you can clearly see
which days have more sightings than others. I will be focusing on the
top two, which also happen to land on the weekend.
I am using a scatter plot with the X and Y coordinates to plot the squirrels on the Google map. I then colored the dots based on the shift of the day.
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkHybrid) +
geom_point(data = nyc,
aes(x = X, y = Y,
color = Shift),
size = 0.75) +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Scatterplot")
This shows that squirrels are spread quite
evenly around the park, and no big differences regarding AM and
PM.
I am using a grid arrange of the top two sighting days, split up by their primary fur color.
g1 <- nyc %>%
filter(Date == 10072018) %>%
filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
ggplot() +
geom_bar(aes(x = `Primary Fur Color`),
fill = "lightblue") +
labs(y = "Count",
title = "10-07-2018")
g2 <- nyc %>%
filter(Date == 10132018) %>%
filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
ggplot() +
geom_bar(aes(x = `Primary Fur Color`),
fill = "purple") +
labs(y = "Count",
title = "10-13-2018")
grid.arrange(g1, g2, ncol = 2)
This visualization shows that gray is the most
common color by far with around 320 for both days.
I facet wrapped the prior graph all flipped them for all days to easily compare.
nyc %>%
filter(`Primary Fur Color` %in% c("Black","Cinnamon", "Gray")) %>%
ggplot() +
geom_bar(aes(x = `Primary Fur Color`,
fill = Date),
fill = "red") +
coord_flip() +
facet_wrap(~Date, ncol = 2) +
labs(x = "Count",
title = "Fur Colors by Day")
Based on this one, you can see the the numbers
go up and down regarding the day of the week.
What fur color is most prevalent in Central Park?
For this I grouped the data by Primary Fur Color, then counted the amount for each. I then used a bar chart to show the counts.
nyc %>%
filter(`Primary Fur Color` != "NA") %>%
group_by(`Primary Fur Color`) %>%
summarise(Fur_count = n()) %>%
ggplot() +
geom_col(aes(x = `Primary Fur Color`, y = Fur_count,
color = `Primary Fur Color`,
fill = `Primary Fur Color`),
alpha = 0.5) +
labs(y = "Count",
title = "Fur Color Numbers in Central Park")
This visualization shows hwo the gray color is
far more prevelant compared to the cinnamon and black with about 2500
gray and only 500 of black and cinnamon combined.
These next three graphs are on the Google map image of Central park and are filtered to just show the specific color for each.
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkHybrid) +
geom_point(data = nyc %>% filter(`Primary Fur Color` == "Black"),
aes(x = X, y = Y),
color = "white") +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Scatterplot")
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkHybrid) +
geom_point(data = nyc %>% filter(`Primary Fur Color` == "Cinnamon"),
aes(x = X, y = Y),
color = "brown") +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Scatterplot")
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkHybrid) +
geom_point(data = nyc %>% filter(`Primary Fur Color` == "Gray"),
aes(x = X, y = Y),
color = "gray") +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Scatterplot")
As stated before, there are an abundance more of
the color gray and they are all evenly distributed around the park.
Regarding the color black, they are more scattered around the edges of
the park with only a few sightings in the central area. With the
cinnamon color squirrels, there is a higher density at the bottom of the
park and less at the top.
What region of Central Park has the greatest concentration of squirrel sightings?
I used a 2d density plot to show the density of all squirrel sightings located in central park.
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkTerrain) +
geom_density_2d(data = nyc,
aes(x = X, y = Y)) +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Density Plot")
I used a heat map density plot with bins and colored them with continuous virdis color to show the density of all squirrel sightings located in central park.
ggmap::register_google(key = "AIzaSyBVC4hbOlw9oAD3xi1rtIs6R4jnoF4YPNA")
CentralParkHybrid <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "hybrid")
CentralParkTerrain <- get_map(location = "Central Park",
zoom = 14,
source = "google",
maptype = "terrain")
ggmap(CentralParkTerrain) +
geom_bin2d(data = nyc,
aes(x = X, y = Y),
bins = 40,
alpha = 0.7) +
scale_fill_viridis_c() +
labs(x = "Longitude",
y = "Latitude",
title = "Central Park Heatmap Density Plot")
Regarding both density plots, you can clearly
see the density distribution of how the squirrel sightings are across
central park. Especially with the heat map visualization, the bright
yellow spots show where the highest counts are.