The dataset I will be working on for the remainder of the class is called NYC Park Crime Statistics. The NYC Park Crime Statistics dataset spans from 2014 to 2024 and it provides valuable information on the types of crimes being committed in parks located throughout the five boroughs and how frequently these crimes occur. This dataset is collected by the New York City Police Department (NYPD) in partnership with the NYC Department of Parks & Recreation. This dataset is available in Excel format and can be accessed by the public through NYC Open Data and on the NYPD website.
1. Which NYC borough(Queens, Brooklyn,
Bronx, Manhattan, Staten Island) has the highest amount of park crime
and which borough has the least amount of park crime each year starting
from 2014 up until 2024?
2. Has there been an increase or
decrease in park crime rate over time?
3. Has there been any noticeable
spikes or declines in NYC park crime rates during specific years,
particularly during 4. significant events such as the COVID-19 pandemic,
the Black Lives Matter (BLM) movement?
5. Which types of crimes are most and
least prevalent in NYC Parks.(Murders, Rape, Robberies, Felony Assualts,
Bulguaries and Grand Larceny.)?
6. Which park in NYC has the most
amount of crime and which park has the least amount of crime?
7. Is crime more prevalent in larger parks compared to smaller parks?
1. Park: List of NYC Parks
2. Borough: What Borough are the Parks
located in(Queens, Brooklyn, Bronx, Manhattan, Staten Island).
3. Size(Acres): One Acre or Larger,
Playground Less Than One Acre, Basketball and Playground Less Than one
Acre, Pool and Recreation Center Less tan one Acres.
4. Murder: # of Murders
5. Rape: # of Rapes
6. Felony Assaults: # of Felony
Assaults
7. Robbery: # of Robberies
8. Burglary: # of Bulglaries
9. Grand Larcerny: # of Grand
Larceny
10. Grand Larceny of Motor Vehicle: # of Grand Larceny of Motor Vehicle
1. Descriptive Statistics - Calculate
the frequency of crimes occurring in parks.
2. Create Visuals - Use Bar Graphs,
Line Graphs to gain a better understanding of my variables.
3. Correlation Analysis/Scatter Plots
- I want to see if there is a relationship between Park Size and the
number of crimes committed. Correlation Analysis/Scatter Plots will let
me know if there is signinfcant relationship between two variables and
if a negative or postive correlation exist between the two.
4. Linear Regression - I will use
linear regression to see if there is a relationship between Park Size
and the number of crimes committed. Linear regression will help me
predict the number of crimes that will occur in a park based on its
size.
knitr::opts_chunk$set(echo = TRUE)
#Start Session
rm(list =ls())
gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 542674 29.0 1200377 64.2 NA 700240 37.4
## Vcells 1003399 7.7 8388608 64.0 16384 1963185 15.0
setwd("/Users/kevingregov/Desktop")
#Load Necessary Packages
library(datasets)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
#Load HairEyeColor Data
data("HairEyeColor")
HairEyeColor <- as_tibble(HairEyeColor)
glimpse(HairEyeColor)
## Rows: 32
## Columns: 4
## $ Hair <chr> "Black", "Brown", "Red", "Blond", "Black", "Brown", "Red", "Blond…
## $ Eye <chr> "Brown", "Brown", "Brown", "Brown", "Blue", "Blue", "Blue", "Blue…
## $ Sex <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "…
## $ n <dbl> 32, 53, 10, 3, 11, 50, 10, 30, 10, 25, 7, 5, 3, 15, 7, 8, 36, 66,…
head(HairEyeColor)
## # A tibble: 6 × 4
## Hair Eye Sex n
## <chr> <chr> <chr> <dbl>
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
#Convert Data From Long Format to Wide Using Pivot Wider
#Convert Hair observations and Eye observations into column names
HairEyeColor_Wide <- HairEyeColor %>% pivot_wider(names_from = c(Hair, Eye), values_from = n)
head(HairEyeColor_Wide)
## # A tibble: 2 × 17
## Sex Black_Brown Brown_Brown Red_Brown Blond_Brown Black_Blue Brown_Blue
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Male 32 53 10 3 11 50
## 2 Female 36 66 16 4 9 34
## # ℹ 10 more variables: Red_Blue <dbl>, Blond_Blue <dbl>, Black_Hazel <dbl>,
## # Brown_Hazel <dbl>, Red_Hazel <dbl>, Blond_Hazel <dbl>, Black_Green <dbl>,
## # Brown_Green <dbl>, Red_Green <dbl>, Blond_Green <dbl>
#How to read wide data table: The table shows the number of people with different hair and eye colors. For example, there are 32 males with black hair and brown eyes, there are 53 males with Brown hair and brown eyes, there are 10 males with Red hair and brown eyes, and so on.
#Convert Data back to Long Format Using Pivot Longer
#Convert Hair and Eye column names back into values/observations
HairEyeColor_Long <- HairEyeColor_Wide %>%
pivot_longer(cols = -Sex, # Exclude Sex column
names_to = c("Hair", "Eye"),
names_sep = "_", # Split the column names into Hair and Eye
values_to = "n")
head(HairEyeColor_Long)
## # A tibble: 6 × 4
## Sex Hair Eye n
## <chr> <chr> <chr> <dbl>
## 1 Male Black Brown 32
## 2 Male Brown Brown 53
## 3 Male Red Brown 10
## 4 Male Blond Brown 3
## 5 Male Black Blue 11
## 6 Male Brown Blue 50
Pivot Wider and Pivot Longer are commands that are included in the tidyverse package. Their main function is to reshape data between long and wide formats. Pivot wider converts data from a long format into a wide format while pivot longer converts data from a wide format into a long format. For instance Pivot Wider converts values/observations into column names, while pivot longer converts column names into values/observations. The dataset I used to demonstrate these commands is the HairEyeColor dataset which is included in the datasets package. The data was originally in a long format, so I used the pivot wider command to convert it into a wide format. So basically I converted the values/observations(Black,Blond,Brown,Red) from the Hair variable and values/observations(Hazel, Green, Brown, Blue) from the Eye variable into separate columns, making the dataset easier to analyze by representing hair and eye color combinations across different categories in a more structured format. I then used the pivot longer command to convert the different hair color/eye color column names back into values/observations under the Hair and Eye variable.