Hello! My name is Joyce Escatel-Flores and I am creating my assignment 3 R Script into R Markdown. I had fun creating this even though it took a while to figure out why R Markdown was not working at first.
In this section, I ran some code to load the free available data set for NYC Shootings.
endpoint <- "https://data.cityofnewyork.us/resource/833y-fsy8.json"
resp <- httr::GET(endpoint, query = list("$limit" = 30000, "$order" = "occur_date DESC"))
shooting_data <- jsonlite::fromJSON(httr::content(resp, as = "text"), flatten = TRUE)
head(shooting_data)
## incident_key occur_date occur_time boro loc_of_occur_desc
## 1 298699604 2024-12-31T00:00:00.000 19:16:00 BROOKLYN OUTSIDE
## 2 298699604 2024-12-31T00:00:00.000 19:16:00 BROOKLYN OUTSIDE
## 3 298672096 2024-12-30T00:00:00.000 16:45:00 BRONX OUTSIDE
## 4 298672094 2024-12-30T00:00:00.000 12:15:00 BRONX OUTSIDE
## 5 298672097 2024-12-30T00:00:00.000 18:48:00 BROOKLYN OUTSIDE
## 6 298672096 2024-12-30T00:00:00.000 16:45:00 BRONX OUTSIDE
## precinct jurisdiction_code loc_classfctn_desc location_desc
## 1 69 0 STREET (null)
## 2 69 0 STREET (null)
## 3 47 0 STREET (null)
## 4 52 0 STREET (null)
## 5 60 2 HOUSING MULTI DWELL - PUBLIC HOUS
## 6 47 0 STREET (null)
## statistical_murder_flag perp_age_group perp_sex perp_race vic_age_group
## 1 FALSE 25-44 M BLACK 18-24
## 2 FALSE 25-44 M BLACK 25-44
## 3 FALSE (null) (null) (null) 18-24
## 4 FALSE 45-64 M BLACK 25-44
## 5 FALSE 25-44 M BLACK 45-64
## 6 FALSE (null) (null) (null) 25-44
## vic_sex vic_race x_coord_cd y_coord_cd latitude longitude
## 1 M BLACK 1,015,120 173,870 40.643866 -73.888761
## 2 M BLACK 1,015,120 173,870 40.643866 -73.888761
## 3 M BLACK 1,021,316 259,277 40.878261 -73.865964
## 4 M WHITE 1,017,719 260,875 40.882661 -73.878964
## 5 M BLACK 989,372 155,205 40.592685 -73.981557
## 6 F WHITE HISPANIC 1,021,316 259,277 40.878261 -73.865964
## geocoded_column.type geocoded_column.coordinates
## 1 Point -73.88876, 40.64387
## 2 Point -73.88876, 40.64387
## 3 Point -73.86596, 40.87826
## 4 Point -73.87896, 40.88266
## 5 Point -73.98156, 40.59269
## 6 Point -73.86596, 40.87826
The first piece of code I removed rows from the column titled “perp_age_group” that had missing data. What this code did was remove the rows that had “n/a” in the column.
The next piece of code, I added a new column titled “time_of_day”. This new variable allows us to know when the shooting occurred. Was it during the night, morning, or afternoon? Now we Know!
The last piece of code in the R chunk, I simply did not like that the column “boro” was all capitalized. What I did was write code to make all the column rows lowercase.
shooting_data<- shooting_data %>% filter(!is.na(perp_age_group))
shooting_data<-shooting_data %>%
mutate(
occur_time= as_hms(occur_time),
time_of_day=case_when(
hour(occur_time)>=0 & hour(occur_time)<12 ~"morning",
hour(occur_time)>12 & hour(occur_time)<20 ~"afternoon",
TRUE ~"night"
))
shooting_data<- shooting_data %>% mutate(boro = case_when(
boro== "BROOKLYN"~"brooklyn",
boro=="MANHATTAN"~"manhattan",
boro== "STATEN ISLAND"~"staten island",
boro== "BRONX"~"bronx",
boro=="QUEENS"~"queens"
))
head(shooting_data)
## incident_key occur_date occur_time boro loc_of_occur_desc
## 1 298699604 2024-12-31T00:00:00.000 19:16:00 brooklyn OUTSIDE
## 2 298699604 2024-12-31T00:00:00.000 19:16:00 brooklyn OUTSIDE
## 3 298672096 2024-12-30T00:00:00.000 16:45:00 bronx OUTSIDE
## 4 298672094 2024-12-30T00:00:00.000 12:15:00 bronx OUTSIDE
## 5 298672097 2024-12-30T00:00:00.000 18:48:00 brooklyn OUTSIDE
## 6 298672096 2024-12-30T00:00:00.000 16:45:00 bronx OUTSIDE
## precinct jurisdiction_code loc_classfctn_desc location_desc
## 1 69 0 STREET (null)
## 2 69 0 STREET (null)
## 3 47 0 STREET (null)
## 4 52 0 STREET (null)
## 5 60 2 HOUSING MULTI DWELL - PUBLIC HOUS
## 6 47 0 STREET (null)
## statistical_murder_flag perp_age_group perp_sex perp_race vic_age_group
## 1 FALSE 25-44 M BLACK 18-24
## 2 FALSE 25-44 M BLACK 25-44
## 3 FALSE (null) (null) (null) 18-24
## 4 FALSE 45-64 M BLACK 25-44
## 5 FALSE 25-44 M BLACK 45-64
## 6 FALSE (null) (null) (null) 25-44
## vic_sex vic_race x_coord_cd y_coord_cd latitude longitude
## 1 M BLACK 1,015,120 173,870 40.643866 -73.888761
## 2 M BLACK 1,015,120 173,870 40.643866 -73.888761
## 3 M BLACK 1,021,316 259,277 40.878261 -73.865964
## 4 M WHITE 1,017,719 260,875 40.882661 -73.878964
## 5 M BLACK 989,372 155,205 40.592685 -73.981557
## 6 F WHITE HISPANIC 1,021,316 259,277 40.878261 -73.865964
## geocoded_column.type geocoded_column.coordinates time_of_day
## 1 Point -73.88876, 40.64387 afternoon
## 2 Point -73.88876, 40.64387 afternoon
## 3 Point -73.86596, 40.87826 afternoon
## 4 Point -73.87896, 40.88266 night
## 5 Point -73.98156, 40.59269 afternoon
## 6 Point -73.86596, 40.87826 afternoon
The first chunk of code is an insights of how many shootings occurred during each time of day. We see here that our output, it shows that there was a total of 8,177 morning shootings, 6378 night shootings, and 5895 afternoon shootings.
The second chunk of code is an insight of how many shootings occurred during each time of day per borough in order from greatest to lowest. We see here that number one is Brooklyn in the morning with a total of 2,786 shootings and number 15 is staten island in the night with a total of 173 shootings.
Insight_1<-shooting_data %>%
count(time_of_day) %>%
arrange(desc(n))
Insight_1
## time_of_day n
## 1 morning 8177
## 2 night 6328
## 3 afternoon 5895
Insight_2<-shooting_data %>%
count(time_of_day, boro) %>%
arrange(desc(n))
Insight_2
## time_of_day boro n
## 1 morning brooklyn 2786
## 2 morning bronx 2433
## 3 afternoon brooklyn 2318
## 4 night brooklyn 2290
## 5 night bronx 2104
## 6 afternoon bronx 1785
## 7 morning queens 1420
## 8 morning manhattan 1241
## 9 night manhattan 911
## 10 night queens 850
## 11 afternoon manhattan 795
## 12 afternoon queens 790
## 13 morning staten island 297
## 14 afternoon staten island 207
## 15 night staten island 173
The first chunk of code created a plot for the first insight. (See above section) What I did was call to use the NYPD data, make the x axis time of day and fill it with time of day. I thought flipping it is cool, so I did that as well. I put the title, x axis, and y axis titles and made sure to change its size, font, and make it bold.
The second chunk of code created a plot for the second insight. (See above section) What I did was use facet_wrap to show different graphs for each borough. I thought this was more neat and visually appealing. I did the same thing as the first graph and it gave me this pretty output :)
For the table, I simply created a table for the second insight!
ggplot(shooting_data, aes(x=time_of_day, fill = time_of_day)) +
geom_bar(fill = "red", color = "black") +
coord_flip() +
labs(
title ="Shootings by Time of Day",
x="Time of Day",
y="Number of Shootings") +
theme(plot.title = element_text(size=30, family="serif", face="bold"),
axis.title = element_text(size=15, family ="serif"),
axis.text = element_text(size = 10, family = "serif"))
ggplot(shooting_data, aes(x=time_of_day, fill = boro)) +
geom_bar()+
labs(title="Shootings that Occurred per Borough by Time of Day",
x="Time of shootings",
y="Number of shootings",
fill="Borough")+
theme(plot.title = element_text(size=20, family="serif", face="bold"),
axis.title = element_text(size=15, family ="serif"),
axis.text = element_text(size = 8, family = "serif")) +
facet_wrap(~ boro)
knitr::kable(Insight_2)
| time_of_day | boro | n |
|---|---|---|
| morning | brooklyn | 2786 |
| morning | bronx | 2433 |
| afternoon | brooklyn | 2318 |
| night | brooklyn | 2290 |
| night | bronx | 2104 |
| afternoon | bronx | 1785 |
| morning | queens | 1420 |
| morning | manhattan | 1241 |
| night | manhattan | 911 |
| night | queens | 850 |
| afternoon | manhattan | 795 |
| afternoon | queens | 790 |
| morning | staten island | 297 |
| afternoon | staten island | 207 |
| night | staten island | 173 |
If you wish to view the data set on your own to explore, please click here
This assignment and being asked to reflect on my finished product made me realize this can be beneficial for my own thesis research. It is quite tiring to use SPSS to do all the manual work (for example cleaning data can be more simple the more I learn how to clean it with a few easy coding steps instead of manually having to go to each column and row when i have thousands of participants and a 20 minute survey). It also allows me to stay organized by being able to write what I did and what the code does. This way, I stay organized and do not have to go back and forth saving thousands of files and can have it all in one R markdown. I also notice that the graphs made here are absolutely beautiful and I can create it my own. I no longer have to rely on just one generic theme. Although it is not bad to have generic graphs, I do like to have my graphs to have a bit of personality :)