Part 1:Choosing a Dataset


Basic Information

The dataset I will be working on for the remainder of the class is called NYC Park Crime Statistics. The NYC Park Crime Statistics dataset spans from 2014 to 2024 and it provides valuable information on the types of crimes being committed in parks located throughout the five boroughs and how frequently these crimes occur. This dataset is collected by the New York City Police Department (NYPD) in partnership with the NYC Department of Parks & Recreation. This dataset is available in Excel format and can be accessed by the public through NYC Open Data and on the NYPD website.

Research Questions

1. Which NYC borough(Queens, Brooklyn, Bronx, Manhattan, Staten Island) has the highest amount of park crime and which borough has the least amount of park crime each year starting from 2014 up until 2024?

2. Has there been an increase or decrease in park crime rate over time?

3. Has there been any noticeable spikes or declines in NYC park crime rates during specific years, particularly during 4. significant events such as the COVID-19 pandemic, the Black Lives Matter (BLM) movement?

5. Which types of crimes are most and least prevalent in NYC Parks.(Murders, Rape, Robberies, Felony Assualts, Bulguaries and Grand Larceny.)?

6. Which park in NYC has the most amount of crime and which park has the least amount of crime?

7. Is crime more prevalent in larger parks compared to smaller parks?

Variables I am Interested in Analyzing

1. Park: List of NYC Parks

2. Borough: What Borough are the Parks located in(Queens, Brooklyn, Bronx, Manhattan, Staten Island).

3. Size(Acres): One Acre or Larger, Playground Less Than One Acre, Basketball and Playground Less Than one Acre, Pool and Recreation Center Less tan one Acres.

4. Murder: # of Murders

5. Rape: # of Rapes

6. Felony Assaults: # of Felony Assaults

7. Robbery: # of Robberies

8. Burglary: # of Bulglaries

9. Grand Larcerny: # of Grand Larceny

10. Grand Larceny of Motor Vehicle: # of Grand Larceny of Motor Vehicle

Analytical Techniques

1. Descriptive Statistics - Calculate the frequency of crimes occurring in parks.

2. Create Visuals - Use Bar Graphs, Line Graphs to gain a better understanding of my variables.

3. Correlation Analysis/Scatter Plots - I want to see if there is a relationship between Park Size and the number of crimes committed. Correlation Analysis/Scatter Plots will let me know if there is signinfcant relationship between two variables and if a negative or postive correlation exist between the two.

4. Linear Regression - I will use linear regression to see if there is a relationship between Park Size and the number of crimes committed. Linear regression will help me predict the number of crimes that will occur in a park based on its size.

Part 2: Pivot Wider and Pivot Longer Exercise

knitr::opts_chunk$set(echo = TRUE)

#Start Session
rm(list =ls())
gc()
##           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells  542674 29.0    1200377 64.2         NA   700240 37.4
## Vcells 1003399  7.7    8388608 64.0      16384  1963185 15.0
setwd("/Users/kevingregov/Desktop")

#Load Necessary Packages
library(datasets)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

#Load HairEyeColor Data
data("HairEyeColor")
HairEyeColor <- as_tibble(HairEyeColor)
glimpse(HairEyeColor)
## Rows: 32
## Columns: 4
## $ Hair <chr> "Black", "Brown", "Red", "Blond", "Black", "Brown", "Red", "Blond…
## $ Eye  <chr> "Brown", "Brown", "Brown", "Brown", "Blue", "Blue", "Blue", "Blue…
## $ Sex  <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "…
## $ n    <dbl> 32, 53, 10, 3, 11, 50, 10, 30, 10, 25, 7, 5, 3, 15, 7, 8, 36, 66,…
head(HairEyeColor)
## # A tibble: 6 × 4
##   Hair  Eye   Sex       n
##   <chr> <chr> <chr> <dbl>
## 1 Black Brown Male     32
## 2 Brown Brown Male     53
## 3 Red   Brown Male     10
## 4 Blond Brown Male      3
## 5 Black Blue  Male     11
## 6 Brown Blue  Male     50
#Convert Data From Long Format to Wide Using Pivot Wider
#Convert Hair observations and Eye observations into column names

HairEyeColor_Wide <- HairEyeColor %>% pivot_wider(names_from = c(Hair, Eye), values_from = n)
head(HairEyeColor_Wide)   
## # A tibble: 2 × 17
##   Sex    Black_Brown Brown_Brown Red_Brown Blond_Brown Black_Blue Brown_Blue
##   <chr>        <dbl>       <dbl>     <dbl>       <dbl>      <dbl>      <dbl>
## 1 Male            32          53        10           3         11         50
## 2 Female          36          66        16           4          9         34
## # ℹ 10 more variables: Red_Blue <dbl>, Blond_Blue <dbl>, Black_Hazel <dbl>,
## #   Brown_Hazel <dbl>, Red_Hazel <dbl>, Blond_Hazel <dbl>, Black_Green <dbl>,
## #   Brown_Green <dbl>, Red_Green <dbl>, Blond_Green <dbl>
#How to read wide data table: The table shows the number of people with different hair and eye colors. For example, there are 32 males with black hair and brown eyes, there are 53 males with Brown hair and brown eyes, there are 10 males with Red hair and brown eyes, and so on. 

#Convert Data back to Long Format Using Pivot Longer
#Convert Hair and Eye column names back into values/observations
HairEyeColor_Long <- HairEyeColor_Wide %>%
  pivot_longer(cols = -Sex, # Exclude Sex column
               names_to = c("Hair", "Eye"),
               names_sep = "_", # Split the column names into Hair and Eye
               values_to = "n")

head(HairEyeColor_Long)
## # A tibble: 6 × 4
##   Sex   Hair  Eye       n
##   <chr> <chr> <chr> <dbl>
## 1 Male  Black Brown    32
## 2 Male  Brown Brown    53
## 3 Male  Red   Brown    10
## 4 Male  Blond Brown     3
## 5 Male  Black Blue     11
## 6 Male  Brown Blue     50


Pivot Wider and Pivot Longer are commands that are included in the tidyverse package. Their main function is to reshape data between long and wide formats. Pivot wider converts data from a long format into a wide format while pivot longer converts data from a wide format into a long format. For instance Pivot Wider converts values/observations into column names, while pivot longer converts column names into values/observations. The dataset I used to demonstrate these commands is the HairEyeColor dataset which is included in the datasets package. The data was originally in a long format, so I used the pivot wider command to convert it into a wide format. So basically I converted the values/observations(Black,Blond,Brown,Red) from the Hair variable and values/observations(Hazel, Green, Brown, Blue) from the Eye variable into separate columns, making the dataset easier to analyze by representing hair and eye color combinations across different categories in a more structured format. I then used the pivot longer command to convert the different hair color/eye color column names back into values/observations under the Hair and Eye variable.