Introduction & Background:

Graffiti has become synonymous with New York City, and it is nearly impossible to visit any place on this map without encountering some form of it. It is a form of expression that grew out of tensions specific to New York in the 1960’s, and thus has become part of the city’s identity. The versions of graffiti that we see today grew as graffiti writers perfected their practice, sharing techniques along the way.

However, because of graffiti’s ties to an impoverished, grimy New York, and its often-illegal nature, it is an art form that is heavily policed. Many pieces that today’s work grew from can now only be seen through photographic documentation, if at all. Although many original works are gone, they were still integral in the formation of graffiti that exists today, and it is interesting to consider the tensions that both cultivated and hindered the art form over the years.

Today, Graffiti-Free NYC is a task force employed by the city government (a cooperative effort among the NYC Economic Development Corporation, the NYC Department of Sanitation, and the Office of the Mayor) that cleans reported instances of illegal graffiti across the city. From 2013, they have been updating a database that contains information about the location of a reported graffiti incident, the date it was reported, the status of it’s cleaning, and the date it was cleaned on. The Graffiti-Free task force is currently paused so the city can allocate more resources to figting COVID19

This data (from January 2019 to January 2020) is available on NYC Open Data, and I wanted to see what it could tell us about graffiti today.

Considering the history of graffiti, I wanted to think about things like:

Through the use of data visualizations and maps, I was able to answer some of these questions, and think about how this research could grow in the future.

Innitial Visuliazations:

This set of primary graphs and maps helped me gain more understanding about the data, and helped direct the rest of my research. ##### Fig. 1: This bar chart shows the total count of real graffiti incidents reported in New York City from 2019-2020, categorized by borough. The raw data set contained information about reports where graffiti did not exist, so I did filter these out. It shows that Brooklyn had the most graffiti reported (9,131 incidents) and Staten Island had the least (263). However, it is important to consider that this count represents only the graffiti reported to the task force, and there are likly other incidents that have not been reported.

ggplot(data = real_graffiti) +
  geom_bar(mapping = aes(x = BOROUGH, fill = "count")) +
  theme_bw()+
  scale_fill_manual(name = "BOROUGH",
                    values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
                    breaks = c("BRONX", "BROOKLYN", "MANHATTAN", "QUEENS", "STATEN ISLAND")) +
  labs(x = "Borough", y = "Count of Incidents",
       title = "Reported Graffiti Incidents in New York City 2019-2020",
       #subtitle = "Average of number of days for each incident grouped by borough",
       caption = "Source: NYC Open Data") 

Fig. 2:

This bar chart breaks the data down further, separating incidents into “Closed” (meaning case closed, graffiti cleaned), and “Open” (meaning case still open, case unresolved). Here we see that the number of open and closed cases in each borough reflect the number of cases total, that there is not a borough that has a significant more open cases.

Fig. 3:

This bar chard showcases

---
title: "Maude LaVaute Final"
output: 
  html_document:
    df_print: paged
    code_download: true
---

```{r setup, include=FALSE}
#importing nessecary libraries
library(tidyverse)
library(tidycensus)
library(sf)
library(scales)
library(viridis)
library(dplyr)
library(lubridate)
library(plotly)
library(tigris)
```
##### Introduction & Background:
Graffiti has become synonymous with New York City, and it is nearly impossible to visit any place on this map without encountering some form of it. It is a form of expression that grew out of tensions specific to New York in the 1960’s, and thus has become part of the city’s identity. The versions of graffiti that we see today grew as graffiti writers perfected their practice, sharing techniques along the way.

However, because of graffiti’s ties to an impoverished, grimy New York, and its often-illegal nature, it is an art form that is heavily policed. Many pieces that today's work grew from can now only be seen through photographic documentation, if at all. Although many original works are gone, they were still integral in the formation of graffiti that exists today, and it is interesting to consider the tensions that both cultivated and hindered the art form over the years.

Today, [Graffiti-Free NYC](https://edc.nyc/program/graffiti-free-nyc) is a task force employed by the city government (a cooperative effort among the NYC Economic Development Corporation, the NYC Department of Sanitation, and the Office of the Mayor) that cleans reported instances of illegal graffiti across the city. From 2013, they have been updating a database that contains information about the location of a reported graffiti incident, the date it was reported, the status of it's cleaning, and the date it was cleaned on. *The Graffiti-Free task force is currently paused so the city can allocate more resources to figting COVID19*

This data (from January 2019 to January 2020) is available on [NYC Open Data](https://data.cityofnewyork.us/City-Government/DSNY-Graffiti-Tracking/gpwd-npar), and I wanted to see what it could tell us about graffiti today. 

Considering the history of graffiti, I wanted to think about things like:

  *   What does a year's worth of reported graffiti look like in New York City? Does it still dominate the city     like it did in the 1960's and 1970's? 
  *   Where is graffiti happening the most across the city?
  *   What do the demographics of neighborhoods where graffiti is heavily reported look like?
  *   Does the length of time it takes for Graffiti-Free NYC change from neighborhood to neighborhood?

Through the use of data visualizations and maps, I was able to answer some of these questions, and think about how this research could grow in the future. 

```{r setup4, include=FALSE}
#importing necessary data frames
graffiti_raw_shp <- st_read("data/graffiti_raw.shp")
#creating a datafile without geometry
graffitti_raw <- st_drop_geometry(graffiti_raw_shp)
#downloading from bytes of the big apple
census_data <- st_read("data/nyct2010_21d/nyct2010.shp")
```

```{r, echo=FALSE, message=F, warning=F}
#creating data table where incidents actually existed
real_graffiti <- graffitti_raw %>%
  filter(RESOLUTION != 'Cleaning crew dispatched. No graffiti on property.') %>%
  filter(RESOLUTION != 'Mail returned / wrong address') %>%
  filter(RESOLUTION != 'Cleaning crew dispatched.  Cannot locate property.') %>%
  filter(BOROUGH != 'Unspecified')
```

```{r, echo=FALSE, message=F, warning=F}
#making data tables to map:
#making data frames for each borough
#choosing each row in the BOROUGH col based on each borough
manhattan_graffitti <- subset(real_graffiti, BOROUGH == "MANHATTAN") #4530
brooklyn_graffitti <- subset(real_graffiti, BOROUGH == "BROOKLYN") #9131
queens_graffitti <- subset(real_graffiti, BOROUGH == "QUEENS") #2512
bronx_graffitti <- subset(real_graffiti, BOROUGH == "BRONX") #3467
staten_island_graffitti <- subset(real_graffiti, BOROUGH == "STATEN ISLAND") #263
```

```{r, echo=FALSE, message=F, warning=F}
#data frame with the count of each cleaned/uncleaned incident in each borough 
#manhattan example: pulling from manhattan only table
x_manhattan_graffitti_status <- manhattan_graffitti %>%
  #defining the group as STATUS to work from only that col
  group_by(STATUS) %>%
  #summaries creates a new data frame, counting the diffrent rows in STATUS
  summarise(count=n()) %>%
  #renamed to avoid confusion when joining 
  rename(manhattan_count = count)
x_brooklyn_graffitti_status <- brooklyn_graffitti %>%
  group_by(STATUS) %>%
  summarise(count=n()) %>%
  rename(brooklyn_count = count) %>%
  right_join(x_manhattan_graffitti_status, by = "STATUS")
#made one of these little tables for each borrough and joined them all
x_queens_graffitti_status <- queens_graffitti %>%
  group_by(STATUS) %>%
  summarise(count=n()) %>%
  rename(queens_count = count) %>%
  right_join(x_brooklyn_graffitti_status, by = "STATUS")
x_bronx_graffitti_status <- bronx_graffitti %>%
  group_by(STATUS) %>%
  summarise(count=n()) %>%
  rename(bronx_count = count) %>%
  right_join(x_queens_graffitti_status, by = "STATUS")
x_staten_island_graffitti_status <- staten_island_graffitti %>%
  group_by(STATUS) %>%
  summarise(count=n()) %>%
  rename(staten_island_count = count) %>%
  right_join(x_bronx_graffitti_status, by = "STATUS")
#all boros joined with total number from original data frame
graffiti_count_boro <- real_graffiti %>%
  group_by(STATUS) %>%
  summarise(count=n()) %>%
  rename(total_count = count) %>%
  right_join(x_staten_island_graffitti_status, by = "STATUS")
```
#### Innitial Visuliazations:
This set of primary graphs and maps helped me gain more understanding about the data, and helped direct the rest of my research.
##### Fig. 1:
This bar chart shows the total count of real graffiti incidents reported in New York City from 2019-2020, categorized by borough. The raw data set contained information about reports where graffiti did not exist, so I did filter these out. It shows that Brooklyn had the most graffiti reported (9,131 incidents) and Staten Island had the least (263). However, it is important to consider that this count represents only the graffiti *reported* to the task force, and there are likly other incidents that have not been reported.
```{r}
ggplot(data = real_graffiti) +
  geom_bar(mapping = aes(x = BOROUGH, fill = "count")) +
  theme_bw()+
  scale_fill_manual(name = "BOROUGH",
                    values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
                    breaks = c("BRONX", "BROOKLYN", "MANHATTAN", "QUEENS", "STATEN ISLAND")) +
  labs(x = "Borough", y = "Count of Incidents",
       title = "Reported Graffiti Incidents in New York City 2019-2020",
       #subtitle = "Average of number of days for each incident grouped by borough",
       caption = "Source: NYC Open Data") 
```


##### Fig. 2:
This bar chart breaks the data down further, separating incidents into "Closed" (meaning case closed, graffiti cleaned), and "Open" (meaning case still open, case unresolved). Here we see that the number of open and closed cases in each borough reflect the number of cases total, that there is not a borough that has a significant more open cases.
```{r, echo=FALSE, message=F, warning=F}
#RUN SUM STATS TO SEE DIFRENCE IN CLEANED VS OPEN??

ggplot(data = graffiti_count_boro) + 
  geom_col(mapping = aes(x = STATUS, y = brooklyn_count, fill = "Brooklyn")) +
  geom_col(mapping = aes(x = STATUS, y = manhattan_count, fill = "Manhattan")) +
  geom_col(mapping = aes(x = STATUS, y = bronx_count, fill = "Bronx")) +
  geom_col(mapping = aes(x = STATUS, y = queens_count, fill = "Queens")) +
  geom_col(mapping = aes(x = STATUS, y = staten_island_count, fill = "Staten Island")) +
    scale_fill_manual(name = "Borough",
                    values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
                    breaks = c("Brooklyn", "Manhattan", "Bronx", "Queens", "Staten Island")) +
    labs(x = "Status", y = "Count",
       title = "Graffiti Incidents by Burough",
       caption = "Source: NYC Open Data") +
  theme_bw()
```



```{r, echo=FALSE, message=F, warning=F}
cleaned_graffitti <- real_graffiti %>%
  filter(STATUS == "Closed")

#atempting to create cols that are numeric to do equations based off
#date_diff = CREATED_DA - CLOSED_DAT
dates <- cleaned_graffitti %>%
  select(INCIDENT_A, BOROUGH, CREATED_DA, CLOSED_DAT, CENSUS_TRA) %>%
  mutate(date_diff = difftime((as.Date(CREATED_DA, "%m/%d/%y")), 
                             (as.Date(CLOSED_DAT, "%m/%d/%y")), units = 'days')) %>%
  #removing the word 'dates' from the col
  mutate_at("date_diff", str_replace, "days", "") %>%
  mutate(date_diff = as.numeric(date_diff)) %>%
  mutate(date_diff = if_else(date_diff > 1, (date_diff - 365), date_diff)) %>%
  mutate(date_diff = date_diff * (-1))

cleaned_graffitti_simple <- cleaned_graffitti %>%
  group_by(BOROUGH) %>%
  summarise(cleaned_incident_count=n())

#Making a plot of count of each cleaned graffiti
# ggplot(data = dates) +
#   geom_bar(mapping = aes(x = BOROUGH, fill = "count")) +
#   theme_bw()+
#   scale_fill_manual(name = "BOROUGH",
#                     values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
#                     breaks = c("BRONX", "BROOKLYN", "MANHATTAN", "QUEENS", "STATEN ISLAND")) +
#   labs(x = "Borough", y = "Cleaned (closed) Graffiti Incidents",
#        title = "Cleaned Graffiti Incidents in New York City",
#        #subtitle = "Average of number of days for each incident grouped by borough",
#        caption = "Source: NYC Open Data") 

# ggplot(data = cleaned_graffitti) +
#   geom_col(mapping = aes(x = BOROUGH, y = cleaned_incident_count, fill = "cleaned_incident_count")) +
#   theme_bw()+
#   scale_fill_manual(name = "BOROUGH",
#                     values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
#                     breaks = c("BRONX", "BROOKLYN", "MANHATTAN", "QUEENS", "STATEN ISLAND")) +
#   labs(x = "Borough", y = "Cleaned (closed) Graffiti Incidents",
#        title = "Cleaned Graffiti Incidents in New York City",
#        #subtitle = "Average of number of days for each incident grouped by borough",
#        caption = "Source: NYC Open Data")

# cleaned_by_boro <- graffiti_count_boro %>%
#   filter(STATUS == "Closed")
# 
# ggplot(data = cleaned_by_boro) + 
#   scale_fill_manual(name = "Borough",
#                     values = c("#2abcc9", "#c46ec3", "#f27d07", "#6de373", "#f5e340"),
#                     breaks = c("Brooklyn", "Manhattan", "Bronx", "Queens", "Staten Island")) +
#   geom_col(mapping = aes(y = brooklyn_count, x = STATUS, fill = "Brooklyn")) +
#   geom_col(mapping = aes(y = manhattan_count, x = STATUS, fill = "Manhattan")) +
#   geom_col(mapping = aes(y = bronx_count, x = STATUS, fill = "Bronx")) +
#   geom_col(mapping = aes(y = queens_count, x = STATUS, fill = "Queens")) +
#   geom_col(mapping = aes(y = staten_island_count, x = STATUS, fill = "Staten Island")) +
#     labs(x = "Status", y = "Count",
#        title = "Graffiti Incidents by Burough",
#        caption = "Source: NYC Open Data") +
#   theme_bw()

```

##### Fig. 3:
This bar chard showcases

```{r, echo=FALSE, message=F, warning=F}
#creating a data table with the avreage number of days it took to clean in each borough
dates_ave <- dates %>%
  group_by(BOROUGH) %>%
  summarise(avg_date_diff = mean(date_diff, na.rm = TRUE))

#Plot of the average muber of days for each borough
dates_ave_plot <- ggplot() +
  geom_col(data = dates_ave, mapping = aes(x = BOROUGH, y = avg_date_diff)) +
  theme_bw() +
  labs(x = "Borough", y = "Average Time to Clean (Days)",
        title = "Time to Clean Graffiti (days) in New York City",
        subtitle = "Average of number of days for each incident grouped by borough",
        caption = "Source: NYC Open Data")
dates_ave_plot
```

```{r, echo=FALSE, message=F, warning=F}
#making a copy of each data frame with shape file inorder to map
real_graffiti_shp <- graffiti_raw_shp %>%
  filter(RESOLUTION != 'Cleaning crew dispatched. No graffiti on property.') %>%
  filter(RESOLUTION != 'Mail returned / wrong address') %>%
  filter(RESOLUTION != 'Cleaning crew dispatched.  Cannot locate property.')


cleaned_graffitti_shp <- real_graffiti_shp %>%
  filter(STATUS == "Closed")

dates_shp <- cleaned_graffitti_shp %>%
  select(INCIDENT_A, BOROUGH, CREATED_DA, CLOSED_DAT) %>%
  mutate(date_diff = difftime((as.Date(CREATED_DA, "%m/%d/%y")),
                             (as.Date(CLOSED_DAT, "%m/%d/%y")), units = 'days')) %>%
  mutate_at("date_diff", str_replace, "days", "") %>%
  mutate(date_diff = as.numeric(date_diff)) %>%
  mutate(date_diff = if_else(date_diff > 1, (date_diff - 365), date_diff)) %>%
  mutate(date_diff = date_diff * (-1))

dates_ave_shp <- dates_shp %>%
  group_by(BOROUGH) %>%
  summarise(avg_date_diff = mean(date_diff, na.rm = TRUE))

count_per_boro_shp <- cleaned_graffitti_shp %>%
  group_by(BOROUGH) %>%
  summarise(incident_count=n())

borough_shape <- census_data %>%
  group_by(BoroName) %>%
  summarise(count=n()) %>%
  st_join(count_per_boro_shp, join = st_intersects)

#map of all counts of data (using all data frames with shp attached)
ggplot() +
  geom_sf(data = borough_shape, fill = NA) +
  geom_sf(data =  dates_shp, mapping = aes(color = date_diff),
          show.legend = "date_diff") +
  scale_color_viridis(name="days", direction=-1) +
  theme_void() +
  #scale_fill_viridis(name="days", direction=-1)
  # scale_fill_stepsn(breaks=c(0, .4, .45, .5, .55, .6, 1),
  #                   colors = partisan_colors, 
  #                   name="Percent Democratic Votes (%)",
  #                   labels=percent_format(accuracy = 1L)) +
  labs(title = "Graffiti Length of Cleaning (days)",
       subtitle = "All Cleaned Incidents in New York City",
       caption = "Source: NYC OpenData 2021")
```

```{r, echo=FALSE, message=F, warning=F}
#using spatial data to map Manhattan
#subset - selecting just manhatan 
manhattan_cleaned_shp <- subset(cleaned_graffitti_shp, BOROUGH == "MANHATTAN") %>%
  #mutate to deal with census numbers that are decimiles... might not be nessecary with bites of big apple?
  mutate(CENSUS_TRA = if_else(CENSUS_TRA > 310, (CENSUS_TRA/100), CENSUS_TRA))

#same dataframe with out the geometry
manhattan_cleaned <- subset(cleaned_graffitti, BOROUGH == "MANHATTAN") %>%
    mutate(CENSUS_TRA = if_else(CENSUS_TRA > 310, (CENSUS_TRA/100), CENSUS_TRA))

#date_diff col with geometry
dates_M_shp <- manhattan_cleaned_shp %>%
  mutate(date_diff = difftime((as.Date(CREATED_DA, "%m/%d/%y")),
                             (as.Date(CLOSED_DAT, "%m/%d/%y")), units = 'days')) %>%
  mutate_at("date_diff", str_replace, "days", "") %>%
  mutate(date_diff = as.numeric(date_diff)) %>%
  mutate(date_diff = if_else(date_diff > 1, (date_diff - 365), date_diff)) %>%
  mutate(date_diff = date_diff * (-1)) %>%
  mutate(CENSUS_TRA = as.character(CENSUS_TRA, na.rm = TRUE))
  #select(INCIDENT_A, CREATED_DA, CLOSED_DAT, date_diff, CENSUS_TRA)

#census tract with average date col
dates_ave_CT_M_shp <- dates_M_shp %>%
  group_by(CENSUS_TRA) %>%
  summarise(avg_date_diff = round(mean(date_diff, na.rm = TRUE), 2))

#date_diff col with out geometry
dates_M <- manhattan_cleaned %>%
  mutate(date_diff = difftime((as.Date(CREATED_DA, "%m/%d/%y")), 
                             (as.Date(CLOSED_DAT, "%m/%d/%y")), units = 'days')) %>%
  #removing the word 'dates' from the col
  mutate_at("date_diff", str_replace, "days", "") %>%
  mutate(date_diff = as.numeric(date_diff)) %>%
  mutate(date_diff = if_else(date_diff > 1, (date_diff - 365), date_diff)) %>%
  mutate(date_diff = date_diff * (-1)) %>%
  mutate(CENSUS_TRA = as.character(CENSUS_TRA, na.rm = TRUE)) 

manhattan_date_diff <- ggplot() +
  geom_col(data = dates_M, mapping = aes(x = INCIDENT_A, y = date_diff)) +
  theme_bw() +
  labs(x = "Manhattan Incidents", y = "Length of Time (Days")

#manhattan_date_diff

dates_ave_CT_M <- dates_M %>%
  group_by(CENSUS_TRA) %>%
  summarise(avg_date_diff = mean(date_diff, na.rm = TRUE))

# ggplot(data = dates_ave_CT_M) +
#   geom_col(mapping = aes(x = CENSUS_TRA, y = avg_date_diff)) +
#   theme_bw()

#col1 <- c('#bc131e', '#eb4956', '#c36e9e', '#7279db', '#3c6ebf', '#1f4bae')

#making an outline of just Manhattan
#census data = 2010 bytes of the big apple
M_census_data_shp <- census_data %>%
  filter(BoroName == "Manhattan")

real_graffiti_M_shp <- graffiti_raw_shp %>%
  filter(RESOLUTION != 'Cleaning crew dispatched. No graffiti on property.') %>%
  filter(RESOLUTION != 'Mail returned / wrong address') %>%
  filter(RESOLUTION != 'Cleaning crew dispatched.  Cannot locate property.') %>%
  filter(BOROUGH == 'MANHATTAN')

c_d_M_graffiti <- real_graffiti_M_shp %>%
  mutate(cleaned = if_else(STATUS == "Closed", "yes", "no")) %>%
  select(INCIDENT_A, STATUS, NTA, cleaned, geometry)

```

```{r, echo=FALSE, message=F, warning=F}
#map of bytes of big apple 2010 census tracts with all incidents of graffiti in manhattan
ggplot() +
  geom_sf(data = M_census_data_shp, fill = NA) +
  geom_sf(data = c_d_M_graffiti, aes(color = cleaned)) +
  scale_color_manual(values = c('yes' = 'black', 'no' = 'red'), name = "Graffiti Cleaned?") +
  theme_void()
```

```{r setup2, include=FALSE}
#working with census data
#census_api_key("3c89220fb4c8c7af69b5141f9d23b4bee192bdfc", install = TRUE, overwrite =T)
#sf1_2010 <- load_variables(2010, "sf1", cache = TRUE)
#importing all variables for 2019 acs data
acs19 <- load_variables(2019, "acs5")

#importing 2019 acs data for Manhattan 
#B19013_001 - Estimate!!Median household income in the past 12 months (in 2019 inflation-adjusted dollars)
raw_income <- get_acs(geography = "tract", 
              variables = "B19013_001", 
              state = "NY", 
              county = "New York", 
              geometry = TRUE,
              year = 2019,
              cb = FALSE)

#checking the projection of the census data frame
#st_crs(raw_income) #4269
#changing the projection of the census data frame
raw_income_2263 <- raw_income %>% 
  st_transform(2263)

#creating function and data sets to clip Manhattan to shore lines
#Using just raw census data looks weird
st_erase <- function(x, y) {
  st_difference(x, st_union(y))
}
#data set of water bodies in NYC
M_water <- area_water("NY", "New York", class = "sf") %>%
  st_transform(2263)
#clipping the raw income dataset by the water botties
M_erase <- st_erase(raw_income_2263, M_water)
```

```{r, echo=FALSE, message=F, warning=F}
#mapping the "erase" data set that is clipped by the function, 
#fill = estimate of median hoisehold income
ggplot() + 
  geom_sf(data = M_erase, aes(fill = estimate)) + 
  theme_void() + 
  scale_fill_viridis_c(name = "Median Hosehold Income ($)", labels = scales::dollar) +
  geom_sf(data = manhattan_cleaned_shp, size = .5, alpha = .25) +
  labs(
    title = "Manhattan Median Household Income by Cencus Tract",
    subtitle = "(overlayed with cleaned graffiti incidents)",
    caption = "Source: ACS 2019 | Graffitti: NYC OpenData"
  )

```

```{r, echo=FALSE, message=F, warning=F}
#data table based on acs2019 of Manhattan mhi
manhattan_mhi_shp <- M_erase %>%
  #getting rid of extra info
  mutate_at("NAME", str_replace, "Census Tract ", "") %>%
  mutate_at("NAME", str_replace, ", New York County, New York", "") %>%
  #still unsure if this should be numeric...
  #mutate(CENSUS_TRA = as.numeric(NAME)) %>%
  rename(total_mhi = estimate,
         total_moe = moe,
         CENSUS_TRA = NAME)
#dropping geometry from acs2019 data
manhattan_mhi <- st_drop_geometry(manhattan_mhi_shp)
```

```{r, echo=FALSE, message=F, warning=F, fig.width=10,fig.height=11}
#table for average length of cleaning by NTA
dates_ave_NTA_M <- dates_M %>%
  group_by(NTA) %>%
  summarise(avg_date_diff_NTA = mean(date_diff, na.rm = TRUE))
#dropping geometry from bytes of the big apple data
M_census_data <- st_drop_geometry(M_census_data_shp) #NTA names
#creating a data table to have NTA name and mhi on same table
M_compiled_census_data <- manhattan_mhi %>%
  right_join(M_census_data, by = c("CENSUS_TRA" = "CTLabel")) %>%
  group_by(NTAName) %>%
  #NTA_ave_mhi = average MHI by NTA
  mutate(NTA_ave_mhi = mean(total_mhi, na.rm = TRUE)) %>%
  right_join(dates_ave_NTA_M, by = c("NTAName" = "NTA")) %>%
  select(CENSUS_TRA, total_mhi, NTACode, NTAName, NTA_ave_mhi, avg_date_diff_NTA)

#attempting a spatial join
dates_ave_NTA_M_shp <- dates_M_shp %>%
  group_by(NTA) %>%
  summarise(avg_date_diff_NTA = round(mean(date_diff, na.rm = TRUE), 2))

#checking each of the projections
# st_crs(dates_ave_NTA_M_shp) #2263
# st_crs(M_census_data_shp) #2263
# st_crs(manhattan_mhi_shp) #2263

#Just geometry and outline of NTAs
nta_outline <- M_census_data_shp %>%
  group_by(NTAName) %>%
  summarise(count=n()) %>%
  select(NTAName)
#testing to see if the map outline worked
# ggplot() +
#   geom_sf(data = nta_outline, color = "black", fill = NA) +
#   theme_void()

#1667 data points, joining acs table with bytes of the big apple and then graffiti looks the same as the one below but a little better?
#when mapping, divided by CT
manhattan_shp_mhi_nta <- manhattan_mhi_shp %>%
  st_join(M_census_data_shp, join = st_intersects) %>%
  group_by(NTAName) %>%
  #NTA_ave_mhi = average MHI by NTA
  mutate(NTA_ave_mhi = mean(total_mhi, na.rm = TRUE)) %>%
  st_join(dates_ave_NTA_M_shp, join = st_intersects) %>%
  select(CENSUS_TRA, total_mhi, NTACode, NTAName, NTA_ave_mhi, avg_date_diff_NTA, geometry)

#684 data points, joining outline(bytes of big apple) with acs then graffiti
#when mapping, divided by NTA
M_NTA_MHI_DA_shp <- nta_outline %>%
  st_join(manhattan_mhi_shp, join = st_intersects) %>%
  group_by(NTAName) %>%
  mutate(NTA_mhi = round(mean(total_mhi, na.rm = TRUE), 2)) %>%
  st_join(dates_ave_NTA_M_shp, join = st_intersects)

# #st_crs(M_NTA_MHI_DA_shp) #2263
# 
real_graffiti_M_shp <- graffiti_raw_shp %>%
  filter(RESOLUTION != 'Cleaning crew dispatched. No graffiti on property.') %>%
  filter(RESOLUTION != 'Mail returned / wrong address') %>%
  filter(RESOLUTION != 'Cleaning crew dispatched.  Cannot locate property.') %>%
  filter(BOROUGH == 'MANHATTAN')
#st_crs(real_graffiti_M_shp)

c_d_M_graffiti <- real_graffiti_M_shp %>%
  mutate(cleaned = if_else(STATUS == "Closed", "yes", "no")) %>%
  select(INCIDENT_A, STATUS, NTA, cleaned, geometry)

M_NTA_MHI_DA_shp = sf::st_cast(M_NTA_MHI_DA_shp, "MULTIPOLYGON")
NTA_mhi_plot <- ggplot() +
  geom_sf(data = M_NTA_MHI_DA_shp, aes(fill = NTA_mhi,
                                       text = paste("NTA Name: ", NTAName,
                                                    "<br>MHI: ", NTA_mhi,
                                                    "<br>Average Length of Cleaning: ", avg_date_diff_NTA))) +
  theme_void() +
  scale_fill_viridis_c(name = "Median Hosehold Income ($)", labels = scales::dollar) +
  geom_sf(data = c_d_M_graffiti, aes(color = cleaned,
                                     alpha = .25), show.legend = FALSE) +
  scale_color_manual(values = c('yes' = 'black', 'no' = 'grey77'), name = "Graffiti Cleaned?") +
  labs(
    title = "Manhattan Median Household Income (MHI)
    by Neigborhood Tabulation Area (NTA)",
    subtitle = "(overlayed with reported graffiti incidents)",
    caption = "Source: ACS 2019 | Graffitti: NYC OpenData")

NTA_mhi_plot
# #ggplotly(NTA_mhi_plot, tooltip = "text")


```

```{r, echo=FALSE, message=F, warning=F, fig.width=10,fig.height=11}
# acs10 <- load_variables(2010, "acs5")
# 
# #importing 2019 acs data for Manhattan 
# #B19013_001 - Estimate!!Median household income in the past 12 months (in 2019 inflation-adjusted dollars)
# raw_income <- get_acs(geography = "tract", 
#               variables = "B19013_001", 
#               state = "NY", 
#               county = "New York", 
#               geometry = TRUE,
#               year = 2010,
#               cb = FALSE)
# 
# #checking the projection of the census data frame
# #st_crs(raw_income) #4269
# #changing the projection of the census data frame
# raw_income_10_2263 <- raw_income %>% 
#   st_transform(2263)
# 
# dataframe_shp <- raw_income_10_2263 %>% #acs
#   st_join(M_census_data_shp, join = st_intersects) %>% #BOTBA/2010
#   st_join(real_graffiti_shp, join = st_intersects) #NYCD
# 
# dataframe <- dataframe_shp %>%
#   rename(CT_acs = NAME,
#          CT_2010 = CTLabel,
#          mhi = estimate,
#          CT_NYCD = CENSUS_TRA) %>%
#   filter(BOROUGH == 'MANHATTAN') %>%
#   select(CT_acs, CT_NYCD, CT_2010, NTAName, INCIDENT_A, CREATED_DA, CLOSED_DAT, RESOLUTION, STATUS, mhi) %>%
#   mutate(date_diff = difftime((as.Date(CREATED_DA, "%m/%d/%y")), 
#                              (as.Date(CLOSED_DAT, "%m/%d/%y")), units = 'days')) %>%
#   #removing the word 'dates' from the col
#   mutate_at("date_diff", str_replace, "days", "") %>%
#   mutate(date_diff = as.numeric(date_diff)) %>%
#   mutate(date_diff = if_else(date_diff > 1, (date_diff - 365), date_diff)) %>%
#   mutate(date_diff = date_diff * (-1)) %>%
#   group_by(NTAName) %>%
#   mutate(NTA_mhi = round(mean(mhi, na.rm = TRUE), 2)) %>%
#   mutate(avg_date_diff_NTA = mean(date_diff, na.rm = TRUE)) %>%
#   mutate(cleaned = if_else(STATUS == "Closed", "yes", "no"))
# 
# dataframe = sf::st_cast(dataframe, "MULTIPOLYGON")
# dataframe_plot <- ggplot() + 
#   geom_sf(data = dataframe, aes(fill = NTA_mhi, na.rm = TRUE,
#                                        text = paste("NTA Name: ", NTAName,
#                                                     "<br>MHI: ", NTA_mhi,
#                                                     "<br>Average Length of Cleaning: ", avg_date_diff_NTA))) + 
#   theme_void() + 
#   scale_fill_viridis_c(name = "Median Hosehold Income ($)", labels = scales::dollar, na.value = "transparent") +
#   geom_sf(data = c_d_M_graffiti, aes(color = cleaned,
#                                      alpha = .25), show.legend = FALSE) +
#   scale_color_manual(values = c('yes' = 'black', 'no' = 'grey77'), name = "Graffiti Cleaned?") +
#   labs(
#     title = "Manhattan Median Household Income (MHI) 
#     by Neigborhood Tabulation Area (NTA)",
#     subtitle = "(overlayed with reported graffiti incidents)",
#     caption = "Source: ACS 2019 | Graffitti: NYC OpenData")
# 
# dataframe_plot
#ggplotly(dataframe_plot, tooltip = "text")

  
```


```{r, echo=FALSE, message=F, warning=F}
# M_NTA_MHI_DA_shp = sf::st_cast(M_NTA_MHI_DA_shp, "MULTIPOLYGON")
# NTA_mhi_plot <- ggplot() +
#   geom_sf(data = M_NTA_MHI_DA_shp, aes(fill = NTA_mhi,
#                                        text = paste("NTA Name: ", NTAName,
#                                                     "<br>MHI: ", NTA_mhi,
#                                                     "<br>Average Length of Cleaning: ", avg_date_diff_NTA))) +
#   theme_void() +
#   scale_fill_viridis_c(name = "Median Hosehold Income ($)", labels = scales::dollar) +
#   geom_sf(data = real_graffiti_M_shp, size = .5, alpha = .5, color = "black") +
#   labs(
#     title = "Manhattan Median Household Income (MHI)
#     by Neigborhood Tabulation Area (NTA)",
#     subtitle = "(overlayed with reported graffiti incidents)",
#     caption = "Source: ACS 2019 | Graffitti: NYC OpenData")
# 
# NTA_mhi_plot
#ggplotly(NTA_mhi_plot, tooltip = "text")
```


```{r setup3, include=FALSE}
#working on getting a count of incidents per NTA


#count per NTA area
count_per_nta_shp <-real_graffiti_M_shp %>%
  group_by(NTA) %>%
  summarise(count=n())

#probably clipped by NTA
manhattan_big1_shp <- M_NTA_MHI_DA_shp %>%
  st_join(count_per_nta_shp, join = st_intersects) %>%
  mutate(count = as.numeric(count, na.rm = TRUE))
#probably clipped by CT
manhattan_big2_shp <- manhattan_shp_mhi_nta %>%
  st_join(count_per_nta_shp, join = st_intersects)

#dropping geometry to do table joins
M_NTA_MHI_DA <- st_drop_geometry(M_NTA_MHI_DA_shp)
count_per_nta <- st_drop_geometry(count_per_nta_shp) %>%
  filter(NTA != "NA")
#making datatables w/o shape to do math
#probably clipped by NTA
manhattan_big1 <- M_NTA_MHI_DA %>%
  right_join(count_per_nta, by = "NTA")
#probably clipped by CT
manhattan_big2 <- M_compiled_census_data %>%
  right_join(count_per_nta, by = c("NTAName" = "NTA"))

#working on a map w tool tip to show count
# nta_mhi_plot1 <- ggplot() +
#   geom_sf(data = manhattan_big1_shp, aes(fill = NTA_mhi, text = paste("Number of Incidents: ", count))) +
#   theme_void() +
#   scale_fill_viridis(name="MHI by NTA", direction=-1)
# 
# ggplotly(nta_mhi_plot1, tooltip = "text")
```

```{r, echo=FALSE, message=F, warning=F}
nta_mhi_plot1<-ggplot(data=count_per_nta, aes(x=reorder(NTA,count), y=count)) +
  geom_col() +
  theme(axis.text.x=element_text(angle=75,hjust=1,vjust=1)) +
  labs(x = "Neigborhood", y = "Count",
       title = "Count of Graffiti Incidents",
       subtitle = "Reported by NTA in Manhattan",
       caption = "Source: Bytes of the Big Apple | NYC OpenData")
nta_mhi_plot1

nta_mhi_plot2<-ggplot(data=count_per_nta, aes(y=reorder(NTA,count), x=count)) +
  geom_col() +
  theme_bw() +
  labs(x = "Neigborhood", y = "Count",
       title = "Count of Graffiti Incidents",
       subtitle = "Reported by NTA in Manhattan",
       caption = "Source: Bytes of the Big Apple | NYC OpenData")
nta_mhi_plot2
```

```{r, echo=FALSE, message=F, warning=F}
ggplot(data = manhattan_big1) +
  geom_col(mapping = aes(x = NTAName, y = count), group = 1) +
  geom_col(mapping = aes(x = NTAName, y = NTA_mhi), group = 1)+
  theme(axis.text.x=element_text(angle=75,hjust=1,vjust=1))
```

```{r, echo=FALSE, message=F, warning=F}
ggplot(data = manhattan_big1) +
  geom_bar(mapping = aes(x = count, fill = NTAName))
ggplot(data = manhattan_big1) +
  geom_bar(mapping = aes(x = NTAName, fill = avg_date_diff_NTA)) +
  theme(axis.text.x=element_text(angle=75,hjust=1,vjust=1))
```

