Welcome to my Homework Assignment!

Hello! My name is Joyce Escatel-Flores and I am creating my assignment 3 R Script into R Markdown. I had fun creating this even though it took a while to figure out why R Markdown was not working at first.

API call to load NYPD Shootings Dataset

In this section, I ran some code to load the free available data set for NYC Shootings.

endpoint <- "https://data.cityofnewyork.us/resource/833y-fsy8.json"
resp <- httr::GET(endpoint, query = list("$limit" = 30000, "$order" = "occur_date DESC"))
shooting_data <- jsonlite::fromJSON(httr::content(resp, as = "text"), flatten = TRUE)
head(shooting_data)
##   incident_key              occur_date occur_time     boro loc_of_occur_desc
## 1    298699604 2024-12-31T00:00:00.000   19:16:00 BROOKLYN           OUTSIDE
## 2    298699604 2024-12-31T00:00:00.000   19:16:00 BROOKLYN           OUTSIDE
## 3    298672096 2024-12-30T00:00:00.000   16:45:00    BRONX           OUTSIDE
## 4    298672094 2024-12-30T00:00:00.000   12:15:00    BRONX           OUTSIDE
## 5    298672097 2024-12-30T00:00:00.000   18:48:00 BROOKLYN           OUTSIDE
## 6    298672096 2024-12-30T00:00:00.000   16:45:00    BRONX           OUTSIDE
##   precinct jurisdiction_code loc_classfctn_desc             location_desc
## 1       69                 0             STREET                    (null)
## 2       69                 0             STREET                    (null)
## 3       47                 0             STREET                    (null)
## 4       52                 0             STREET                    (null)
## 5       60                 2            HOUSING MULTI DWELL - PUBLIC HOUS
## 6       47                 0             STREET                    (null)
##   statistical_murder_flag perp_age_group perp_sex perp_race vic_age_group
## 1                   FALSE          25-44        M     BLACK         18-24
## 2                   FALSE          25-44        M     BLACK         25-44
## 3                   FALSE         (null)   (null)    (null)         18-24
## 4                   FALSE          45-64        M     BLACK         25-44
## 5                   FALSE          25-44        M     BLACK         45-64
## 6                   FALSE         (null)   (null)    (null)         25-44
##   vic_sex       vic_race x_coord_cd y_coord_cd  latitude  longitude
## 1       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 2       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 3       M          BLACK  1,021,316    259,277 40.878261 -73.865964
## 4       M          WHITE  1,017,719    260,875 40.882661 -73.878964
## 5       M          BLACK    989,372    155,205 40.592685 -73.981557
## 6       F WHITE HISPANIC  1,021,316    259,277 40.878261 -73.865964
##   geocoded_column.type geocoded_column.coordinates
## 1                Point         -73.88876, 40.64387
## 2                Point         -73.88876, 40.64387
## 3                Point         -73.86596, 40.87826
## 4                Point         -73.87896, 40.88266
## 5                Point         -73.98156, 40.59269
## 6                Point         -73.86596, 40.87826

Data Cleaning

The first piece of code I removed rows from the column titled “perp_age_group” that had missing data. What this code did was remove the rows that had “n/a” in the column.

The next piece of code, I added a new column titled “time_of_day”. This new variable allows us to know when the shooting occurred. Was it during the night, morning, or afternoon? Now we Know!

The last piece of code in the R chunk, I simply did not like that the column “boro” was all capitalized. What I did was write code to make all the column rows lowercase.

shooting_data<- shooting_data %>% filter(!is.na(perp_age_group))

shooting_data<-shooting_data %>%
  mutate(
    occur_time= as_hms(occur_time),
    time_of_day=case_when(
      hour(occur_time)>=0 & hour(occur_time)<12 ~"morning", 
      hour(occur_time)>12 & hour(occur_time)<20 ~"afternoon", 
      TRUE ~"night"
    ))

shooting_data<- shooting_data %>% mutate(boro = case_when(
  boro== "BROOKLYN"~"brooklyn",
  boro=="MANHATTAN"~"manhattan",
  boro== "STATEN ISLAND"~"staten island",
  boro== "BRONX"~"bronx",
  boro=="QUEENS"~"queens"
))

head(shooting_data)
##   incident_key              occur_date occur_time     boro loc_of_occur_desc
## 1    298699604 2024-12-31T00:00:00.000   19:16:00 brooklyn           OUTSIDE
## 2    298699604 2024-12-31T00:00:00.000   19:16:00 brooklyn           OUTSIDE
## 3    298672096 2024-12-30T00:00:00.000   16:45:00    bronx           OUTSIDE
## 4    298672094 2024-12-30T00:00:00.000   12:15:00    bronx           OUTSIDE
## 5    298672097 2024-12-30T00:00:00.000   18:48:00 brooklyn           OUTSIDE
## 6    298672096 2024-12-30T00:00:00.000   16:45:00    bronx           OUTSIDE
##   precinct jurisdiction_code loc_classfctn_desc             location_desc
## 1       69                 0             STREET                    (null)
## 2       69                 0             STREET                    (null)
## 3       47                 0             STREET                    (null)
## 4       52                 0             STREET                    (null)
## 5       60                 2            HOUSING MULTI DWELL - PUBLIC HOUS
## 6       47                 0             STREET                    (null)
##   statistical_murder_flag perp_age_group perp_sex perp_race vic_age_group
## 1                   FALSE          25-44        M     BLACK         18-24
## 2                   FALSE          25-44        M     BLACK         25-44
## 3                   FALSE         (null)   (null)    (null)         18-24
## 4                   FALSE          45-64        M     BLACK         25-44
## 5                   FALSE          25-44        M     BLACK         45-64
## 6                   FALSE         (null)   (null)    (null)         25-44
##   vic_sex       vic_race x_coord_cd y_coord_cd  latitude  longitude
## 1       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 2       M          BLACK  1,015,120    173,870 40.643866 -73.888761
## 3       M          BLACK  1,021,316    259,277 40.878261 -73.865964
## 4       M          WHITE  1,017,719    260,875 40.882661 -73.878964
## 5       M          BLACK    989,372    155,205 40.592685 -73.981557
## 6       F WHITE HISPANIC  1,021,316    259,277 40.878261 -73.865964
##   geocoded_column.type geocoded_column.coordinates time_of_day
## 1                Point         -73.88876, 40.64387   afternoon
## 2                Point         -73.88876, 40.64387   afternoon
## 3                Point         -73.86596, 40.87826   afternoon
## 4                Point         -73.87896, 40.88266       night
## 5                Point         -73.98156, 40.59269   afternoon
## 6                Point         -73.86596, 40.87826   afternoon

Insights

The first chunk of code is an insights of how many shootings occurred during each time of day. We see here that our output, it shows that there was a total of 8,177 morning shootings, 6378 night shootings, and 5895 afternoon shootings.

The second chunk of code is an insight of how many shootings occurred during each time of day per borough in order from greatest to lowest. We see here that number one is Brooklyn in the morning with a total of 2,786 shootings and number 15 is staten island in the night with a total of 173 shootings.

Insight_1<-shooting_data %>% 
  count(time_of_day) %>%
  arrange(desc(n))
Insight_1
##   time_of_day    n
## 1     morning 8177
## 2       night 6328
## 3   afternoon 5895
Insight_2<-shooting_data %>% 
  count(time_of_day, boro) %>% 
  arrange(desc(n))
Insight_2
##    time_of_day          boro    n
## 1      morning      brooklyn 2786
## 2      morning         bronx 2433
## 3    afternoon      brooklyn 2318
## 4        night      brooklyn 2290
## 5        night         bronx 2104
## 6    afternoon         bronx 1785
## 7      morning        queens 1420
## 8      morning     manhattan 1241
## 9        night     manhattan  911
## 10       night        queens  850
## 11   afternoon     manhattan  795
## 12   afternoon        queens  790
## 13     morning staten island  297
## 14   afternoon staten island  207
## 15       night staten island  173

Graphs and Table

The first chunk of code created a plot for the first insight. (See above section) What I did was call to use the NYPD data, make the x axis time of day and fill it with time of day. I thought flipping it is cool, so I did that as well. I put the title, x axis, and y axis titles and made sure to change its size, font, and make it bold.

The second chunk of code created a plot for the second insight. (See above section) What I did was use facet_wrap to show different graphs for each borough. I thought this was more neat and visually appealing. I did the same thing as the first graph and it gave me this pretty output :)

For the table, I simply created a table for the second insight!

ggplot(shooting_data, aes(x=time_of_day, fill = time_of_day)) +
geom_bar(fill = "red", color = "black") +
  coord_flip() +
  labs(
    title ="Shootings by Time of Day",
    x="Time of Day",
    y="Number of Shootings") +
  theme(plot.title = element_text(size=30, family="serif", face="bold"),
           axis.title = element_text(size=15, family ="serif"),
           axis.text = element_text(size = 10, family = "serif"))

ggplot(shooting_data, aes(x=time_of_day, fill = boro)) +
  geom_bar()+
  labs(title="Shootings that Occurred per Borough by Time of Day",
       x="Time of shootings",
       y="Number of shootings",
       fill="Borough")+
     theme(plot.title = element_text(size=20, family="serif", face="bold"),
           axis.title = element_text(size=15, family ="serif"),
           axis.text = element_text(size = 8, family = "serif")) +
  facet_wrap(~ boro)

knitr::kable(Insight_2)
time_of_day boro n
morning brooklyn 2786
morning bronx 2433
afternoon brooklyn 2318
night brooklyn 2290
night bronx 2104
afternoon bronx 1785
morning queens 1420
morning manhattan 1241
night manhattan 911
night queens 850
afternoon manhattan 795
afternoon queens 790
morning staten island 297
afternoon staten island 207
night staten island 173

If you wish to view the data set on your own to explore, please click here

Reflection

This assignment and being asked to reflect on my finished product made me realize this can be beneficial for my own thesis research. It is quite tiring to use SPSS to do all the manual work (for example cleaning data can be more simple the more I learn how to clean it with a few easy coding steps instead of manually having to go to each column and row when i have thousands of participants and a 20 minute survey). It also allows me to stay organized by being able to write what I did and what the code does. This way, I stay organized and do not have to go back and forth saving thousands of files and can have it all in one R markdown. I also notice that the graphs made here are absolutely beautiful and I can create it my own. I no longer have to rely on just one generic theme. Although it is not bad to have generic graphs, I do like to have my graphs to have a bit of personality :)