Assignment #1 - Spatial Analysis & Visualization

One of the oldest traces of life on the planet comes from nature, it’s so interconnected with humans that instead of focusing solely on large buildings and maximizing productivity, people chose to turn lots worth millions of dollars into parks. This is apparent in New York, where parks there is a dualism between the numerous skyscrapers and the countless green spaces provided by parks, however, one of the biggest deterrents to what would otherwise be a relaxing and inviting space are the amount of crimes that are committed within parks. This is why it’s important to know which parks need to be made safer, and more inviting. Thankfully, the NYPD has published this information publicly, on at https://www1.nyc.gov/site/nypd/stats/crime-statistics/park-crime-stats.page, and it’s thanks to them that one can figure out what are the thirty most dangerous parks within NYC.

First, before digging into the data, we need to set up our workspace within R, starting by calling our libraries, and using the default R Markdown code to set up the workspace.

Since the NYPD breaks up their data about crime within parks to different quarters, in order to examine 2020 as a whole so far, we need to import both the dataset from quarter one, as well as quarter two, combine them, and remove any overlap. Just to see how much crime happens per borough, it’s useful, to view a graph of that, and it’s a good way to get started, just to see overall how much crime is happening within the parks of each borough, so we need to trim our data down to solely represent boroughs, and the crime that happens within the parks of those boroughs.

q2v2parkcrime <- read.csv("nyc-park-crime-stats-q2-2020-v2.csv")
q1v2parkcrime <- read.csv("nyc-park-crime-stats-q1-2020-v2.csv")

newdataframe <- bind_rows(q1v2parkcrime,q2v2parkcrime)

cleandata <- clean_names(newdataframe)
mixedandfixed2 <- cleandata %>%
  group_by(borough) %>%
  summarise(
    total_crime = sum(total))

## `summarise()` ungrouping output (override with `.groups` argument)

crimebyborough <- slice(mixedandfixed2, -c(1))
count(crimebyborough[1:2], vars = "total_crime")

## # A tibble: 1 x 2
##   vars            n
##   <chr>       <int>
## 1 total_crime     6

summary(crimebyborough)

##    borough           total_crime    
##  Length:6           Min.   :  3.00  
##  Class :character   1st Qu.: 14.75  
##  Mode  :character   Median : 54.00  
##                     Mean   : 48.67  
##                     3rd Qu.: 67.75  
##                     Max.   :107.00

Once we have our data chopped down to solely the boroughs they represent, as well as the crime that happens within them, we need to graph it. We do this, in order to represent how they compare to one another, a bar graph should do the job splendidly.

firstgraph <- ggplot(crimebyborough, aes(x=borough, y=total_crime)) + 
  geom_bar(stat='identity',
           width = 0.75,
           color="blue",
           fill=rgb(0.1,0.4,0.5,0.7))
plot(firstgraph)

While this graph does a splendid job of showing the total amount of major felonies within all the parks in all the boroughs, it lacks legibility, seems ugly, hard to read, and could be misleading due to different boroughs housing different amounts of parks, all of which range in size. Instead, let’s examine the thirty most dangerous parks, counted by the amount of major felonies committed within them, throughout all of the boroughs. This will provide solid information based on each park, and won’t skew data in regards to the size of the parks, and the amount of parks within each borough, showing only clear, and concise data for individual parks.

To do this, we need to first set up our data, i.e. group all the parks by borough, and chop our data down to the thirty most felony ridden parks within NYC.In cutting down the data however, there were some difficulties, but both the members of NYU Wagner’s Data Analysis slack 2020, as well as their github, https://wagner-mspp-2020.github.io/r-demos/r-demo.html#making-graphs-with-ggplot2 were extraordinarily helpful with the compression of this data set.

cleandata <- clean_names(newdataframe)
mixedandfixed3 <- cleandata %>%
  group_by(park, borough) %>%
  summarise(
    total_crime = sum(total))

## `summarise()` regrouping output by 'park' (override with `.groups` argument)

boroughandparkcrim <- mixedandfixed3 %>%
  filter(total_crime<31)
count(boroughandparkcrim[1:3], vars = "total_crime")

## # A tibble: 1,154 x 3
## # Groups:   park [1,154]
##    park                                              vars            n
##    <chr>                                             <chr>       <int>
##  1 "\"UNCLE\" VITO E. MARANZANO GLENDALE PLAYGROUND" total_crime     1
##  2 "100% PLAYGROUND"                                 total_crime     1
##  3 "A.R.R.O.W. FIELD HOUSE"                          total_crime     1
##  4 "ABC PLAYGROUND"                                  total_crime     1
##  5 "ABE STARK SKATING RINK"                          total_crime     1
##  6 "ABIGAIL PLAYGROUND"                              total_crime     1
##  7 "ABRAHAM LINCOLN PLAYGROUND"                      total_crime     1
##  8 "ABYSSINIAN TOT LOT"                              total_crime     1
##  9 "ADAM CLAYTON POWELL JR. MALLS"                   total_crime     1
## 10 "ADAM YAUCH PARK"                                 total_crime     1
## # … with 1,144 more rows

parks_30_slice  <- filter(boroughandparkcrim, total_crime > 2) %>%
  arrange(borough) %>%
  group_by(borough)


parksfinal <- parks_30_slice %>%
  select(park, borough, total_crime) %>%
  mutate(
    parks = fct_reorder(park, borough)
  ) %>%
  arrange(borough)

summary(parksfinal)

##      park             borough           total_crime                  parks   
##  Length:30          Length:30          Min.   : 3.000   AQUEDUCT WALK   : 1  
##  Class :character   Class :character   1st Qu.: 3.000   BRONX PARK      : 1  
##  Mode  :character   Mode  :character   Median : 4.000   CLAREMONT PARK  : 1  
##                                        Mean   : 5.467   CROTONA PARK    : 1  
##                                        3rd Qu.: 5.000   DEVOE PARK      : 1  
##                                        Max.   :30.000   FRANZ SIGEL PARK: 1  
##                                                         (Other)         :24

Now that we have our data, we’re ready to plot out our graph, different colors represent each borough, with each borough being grouped together, and each park being individually labeled. The graph should provide a decent background as to what parks are the most dangerous in all of NYC.

finalplot <- ggplot(parksfinal, aes(x=parks, y=total_crime, fill = borough, group = borough)) + 
  geom_bar(stat='identity', position = 'dodge', width = 0.75)+
  scale_fill_discrete(labels = c("QUEENS" = "Queens", "BROOKLYN" = "Brooklyn",  "BROMX" = "Bronx", "MANHATTAN" = "Manhattan"))+
  labs(title = "The 30 Most Dangerous Parks in NYC in 2020",
       subtitle = "Shown by # of Major Felonies",
       caption = "Data Source: NYC Open Data",
       x = "Park by Name",
       y = "Amount of Major Felonies",
       fill = "Borough"
  )+
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))

plot(finalplot)

Perhaps the data here may be shocking, with many parks standing out as dangerous, and some outliers, the most staggering of which was Washington Square Park with 30 felonies, it’s important to remember that these statistics are done solely within the first two quarters of 2020, and don’t represent the year as a whole. Thanks to both NYC Open Data, as well as NYU Wagner’s Spatial Analysis class of 2020 for contributing to the creation of these graphs, and for helping to synthesize the data.

Assignment #1 - Spatial Analysis & Visualization

James Wilson-Schutter

9/27/2020