Pansy Schulman, DUE Methods I, 2021

Project Statement and Background

For this project, I wanted to answer the question: How did deindustrialization affect New York demographically and geographically? How do once-industrial areas reflect their history in terms of demographics, income, and geographic makeup?

The purpose of this research is to show how an economic shift manifested itself in New York’s landscape from 1970 to today. I looked at three geographic levels in my data analysis: first, for the entire city, or “place” in census definition, next for each individual county in the city, before finally zooming in on the neighborhoods of Red Hook, Sunset Park, and Gowanus in Brooklyn, alll of which were once major industry hubs and have seen major economic and demographic change over this time.

Red Hook and Sunset Park are on the western waterfront of Brooklyn, and thus historically their ports attracted an influx of manufacturing and industrial businesses in the late 19th and 20th century. All three of these neighborhoods were hubs for Brooklyn’s maritime shipping and industrial activites; they have hosted cement works, paint factories, tanneries, coal yards, warehouses, mills, and textile production amongst many other industrial sites in their history. The Gowanus Canal, which runs through the neighborhood, was used to conduit building materials to and from various factories in the area. Sunset Park was home to the Brooklyn Army Terminal, which was a major manufacturing hub for army supplies, as well as many other piers and factories. Immigrants, who made up a large part of the city’s industrial workforce, settled in these areas of Brooklyn with their families.

Manufacturing employment in New York City peaked in the late 1940s. In 1953, manufacturing accounted for 40% of all jobs, by 1994 this number dipped to 17%. New York City lost more than 700,000 manufacturing jobs between 1953 and 1995. This economic change was guided by city policy as well as larger global and national economic trends.

An industrial map of New York City from 1919, courtesy of the New York Public Library.

Rezoning industrial areas began in the 1950s and continued and escalated through the end of the 20th century to today. “Planned shrinkage” in the 1970s cut off municipal resources to poor and working-class areas and factories moved out of the city. Financial and real estate operations toppled the once-industrial city, making New York a global financial hub instead.

I chose this topic because I was interested in how the evolution of a neighborhood and city’s working class was involved in gentrification and other processes that have changed the demographic makeup of New York city communities over the past 50 years.

For the purposes of this project, I chose to incorporate not only those working in manufacturing, but in industry-related occupations, such as warehousing, wholesale trade, and trucking, so as to encompass all the people impacted by this economic shift.

The first variable I looked at was the percentage of manufacturing jobs in relation to the total working population. Generally, I used census data from the National Historical Geographical Information System for data from before 2010, and tidycensus to pull data from the 5-year American Community Survey for the most recent data.

Data and Methodology

#Importing and processing 1970 data from NHGIS :
##I used raw employment numbers from "Sex by Age" in employed persons over the age of 16 to get the total working population of areas to account for people having multiple jobs in the Occupation and Industry tables, I noticed a discrepancy when I compared the totals from the two tables I used.
placeworkforceraw70 <- read.csv("data/raw/nhgis0005_ds99_1970_place.csv") %>% 
  filter(AREANAME == "New York City") %>% 
  select(GISJOIN, STATE, PLACE, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

placeworkforce70 <- placeworkforceraw70 %>% 
  select(PLACE, totalworkerpop)

countyworkforceraw70 <- read.csv("data/raw/nhgis0006_ds99_1970_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

countyworkforce70 <- countyworkforceraw70 %>% 
  select(COUNTY, totalworkerpop)

tractworkforceraw70 <- read.csv("data/raw/nhgis0006_ds99_1970_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

tractworkforce70 <- tractworkforceraw70 %>% 
  select(GISJOIN, totalworkerpop) %>% 
  drop_na()

## For 70s employment figures I used the Industry Table for Employed Persons 16 years and over and isolated the following variables:
## C3B004: Furniture and lumber and wood products
## C3B005: Primary metal industries
## C3B006: Fabricated metal industries (including not specified metals)
## C3B007: Machinery, except electrical
## C3B008: Electrical machinery, equipment and supplies
## C3B009: Motor vehicles and other transportation equipment
## C3B010: Other durable goods
## C3B011: Food and kindred products
## C3B012: Textile mill and other fabricated textile products
## C3B013: Printing, publishing, and allied industries
## C3B014: Chemical and allied products
## C3B015: Other nondurable goods (including specified manufacturing industries)
## C3B017: Trucking service and warehousing
## C3B021: Wholesale trade

raw70laborplace <- read.csv("data/raw/1970/nhgis0011_ds99_1970_place.csv") %>% 
  filter(AREANAME == "New York City") %>% 
  select(GISJOIN, STATE, PLACE, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021)

labor70place <- raw70laborplace %>%
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(placeworkforce70, by = "PLACE") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1970' = pct_man) %>% 
  mutate(PLACE = str_remove_all(PLACE, " City"))

raw70countylabor <- read.csv("data/raw/1970/nhgis0011_ds99_1970_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens")

county70labortrue <-  raw70countylabor %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021) %>% 
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(countyworkforce70, by = "COUNTY") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1970' = pct_man)

raw70tractlabor <-read.csv("data/raw/1970/nhgis0011_ds99_1970_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings")

tract70labortrue <- raw70tractlabor %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021) %>% 
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(tractworkforce70, by = "GISJOIN") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(GISJOIN, pct_man, total_man, totalworkerpop) %>% 
  rename(totalworkerpop_70 = totalworkerpop, 
         totalman_70 = total_man,
         pctman_70 = pct_man, 
         GISJOIN.y = GISJOIN)

##Importing and processing 1980 data for NHGIS
##Using labor force totals from Table "Persons 16 yrs and over by labor force," variable used:
##B84AD1980:1980:Persons: 16 years and over ~ In labor force--Civilian--Employed
##I realized there was no longer a discrepency in the data so I stopped finding the workforce totals seperately from industry tables

##Using Industry table for industrial employment totals, I isolated the following variables: 
## DIA003: Manufacturing: Nondurable goods (codes 100-222)
## DIA004: Manufacturing: Durable goods (codes 230-392)
## DIA007: Wholesale trade (codes 500-571)

raw80laborplace <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_place.csv") %>% 
  filter(PLACE == "New York") %>% 
  mutate(total_worker_pop = sum(c_across(DIA001:DIA015)))

labor80place <- raw80laborplace %>% 
  select(GISJOIN, STATE, PLACE, AREANAME, DIA003, DIA004, DIA007, total_worker_pop) %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1980' = pct_man)

raw80laborcounty <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, DIA001, DIA002, DIA003, DIA004, DIA005, DIA006, DIA007, DIA008, DIA009, DIA010, DIA011, DIA012, DIA013, DIA014, DIA015) %>% 
  mutate(total_worker_pop = rowSums(across(where(is.numeric))))

labor80county <- raw80laborcounty %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1980' = pct_man) 

raw80labortract <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_tract.csv") %>% 
  filter(STATE == "New York",COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, COUNTY, DIA001, DIA002, DIA003, DIA004, DIA005, DIA006, DIA007, DIA008, DIA009, DIA010, DIA011, DIA012, DIA013, DIA014, DIA015) %>% 
  mutate(total_worker_pop = rowSums(across(where(is.numeric))))

labor80tract <- raw80labortract %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  rename(totalworkerpop_80 = total_worker_pop,
         totalman_80 = total_man,
         pctman_80 = pct_man) %>% 
  select(GISJOIN, totalman_80, totalworkerpop_80, pctman_80) %>% 
  rename(GISJOIN.y = GISJOIN)

##Importing and processing 1990 data from NHGIS
##Imported tables combine "Sex by Employment status" (variables E4I001- E4I008) and "Industry" (variables E4P001-E4P017), selecting totals from following columns to calculate total industrial employment:
## E4P004: Manufacturing, nondurable goods (100-229)
## E4P005: Manufacturing, durable goods (230-399)
## E4P008: Wholesale trade (500-579)
##Calculating total working population come from sum of:
## E4I002: Male >> In labor force: Civilian: Employed
## E4I006: Female >> In labor force: Civilian: Employed

raw90placelabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, STATE, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004,E4P005, E4P008)

placelabor90 <- raw90placelabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1990' = pct_man)

raw90countylabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004, E4P005, E4P008)

countylabor90 <- raw90countylabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1990' = pct_man)

raw90tractlabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, ANPSADPI, COUNTY, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004, E4P005, E4P008)

tractlabor90 <- raw90tractlabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  filter(pct_man != "NaN") %>% 
  select(GISJOIN, ANPSADPI, total_man, total_worker_pop, pct_man) %>% 
  rename(NAME = ANPSADPI) %>% 
  rename(totalworkerpop_90 = total_worker_pop,
         totalman_90 = total_man, 
         pctman_90 = pct_man)

##Importing and processing 2000 data,  Again I noticed a discrepancy with the total working population which distorted the data so imported table so I imported table: "Total Workers 16 years and Over to get working total first: 

totalworkforce00place <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(PLACE, GJU001 ) %>% 
  rename(total_worker_pop = GJU001)

totalworkforce00county <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_county.csv")%>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>%
  select(COUNTY, GJU001 ) %>% 
  rename(total_worker_pop = GJU001)

totalworkforce00tract <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, GJU001) %>% 
  rename(total_worker_pop =GJU001)


##Industrial employement data from table "Sex by Industry Type: Employed Civilian Persons 16 Years and Over in Selected Industry Categories:"
## GMH003:Male >> Manufacturing
## GMH004:Male >> Wholesale trade
## GMH016:Female >> Manufacturing
## GMH017:Female >> Wholesale trade
raw2000laborplace <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, STATE, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000place <- raw2000laborplace %>% 
  left_join(totalworkforce00place, by = "PLACE") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('2000' = pct_man)


raw2000laborcounty <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000county <- raw2000laborcounty %>% 
  left_join(totalworkforce00county, by = "COUNTY") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('2000' = pct_man)

raw2000labortract <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, COUNTY, NAME, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000tract <- raw2000labortract %>% 
  left_join(totalworkforce00tract, by = "GISJOIN") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  filter(pct_man != "NaN") %>% 
  select(GISJOIN, NAME, total_man, pct_man, total_worker_pop) 

##Importing and processing 2010 data from table "Means of Transportation to Work by Industry,Universe:Workers 16 years and over:" Variable for total working pop is:
## JZNE001:Total
##Total manufacturing employment from:
## JZNE004:Manufacturing
## JZNE005:Wholesale trade

rawtestlabor10place <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_place.csv") %>% 
filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10place <- rawtestlabor10place %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop))

rawlabor10county <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York County" | COUNTY == "Bronx County" | COUNTY == "Kings County" | COUNTY == "Queens County") %>% 
  select(GISJOIN, STATE, COUNTY, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10county <- rawlabor10county %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop))

rawlabor10tract <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_tract.csv") %>% 
  filter(STATE == "New York", COUNTY ==  "Kings County") %>% 
  select(GISJOIN, STATE, COUNTY, NAME_E, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10tract <- rawlabor10tract %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop)) %>% 
  rename(NAME = NAME_E)

Results:

##First for the city as a whole:
totalaborplace <- labor10place %>% 
  select(GISJOIN, PLACE, total_worker_pop, total_man, pct_man) %>% 
  rename('2010' = pct_man) %>% 
  left_join(labor2000place, by = "PLACE") %>% 
  left_join(placelabor90, by = "PLACE") %>% 
  select(PLACE, '2010', '2000', '1990') %>% 
  mutate(PLACE = str_remove_all(PLACE, " city")) %>% 
  left_join(labor80place, by = "PLACE") %>% 
  left_join(labor70place, by = "PLACE") %>% 
  pivot_longer(-c(PLACE), names_to = "year")

  
## Next for the counties: 
totallaborcounty <- labor10county %>% 
  select(COUNTY, pct_man) %>% 
  rename('2010' = pct_man) %>%
  mutate(COUNTY = str_remove_all(COUNTY, " County")) %>% 
  left_join(labor2000county, by = "COUNTY") %>% 
  left_join(countylabor90, by = "COUNTY") %>% 
  left_join(labor80county, by = "COUNTY") %>% 
  left_join(county70labortrue, by = "COUNTY") %>% 
  pivot_longer(-c(COUNTY), names_to = "year")


##Finally for tract, in order to isolate the neighborhoods I wanted, I used a spatial join with the Neighborhood Tabulation tables from NYC Open Data. I had to pull just the shape files from the census in order to match everything up.

##Importing and processing Neighborhood Tabulation Tables:

raw_nabes <- st_read("data/raw/nynta2010_21d/nynta2010.shp")
## Reading layer `nynta2010' from data source 
##   `/Users/pansyschulman/Dropbox/Mac/Desktop/Grad School/Methods/Final Project/data/raw/nynta2010_21d/nynta2010.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 195 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 913175.1 ymin: 120128.4 xmax: 1067383 ymax: 272844.3
## Projected CRS: NAD83 / New York Long Island (ftUS)
nabes <- raw_nabes %>% 
  filter(BoroName == "Brooklyn") %>% 
  filter(NTAName == "Carroll Gardens-Columbia Street-Red Hook" | NTAName == "Park Slope-Gowanus" | NTAName == "Sunset Park East" | NTAName == "Sunset Park West") %>% 
  select(BoroName, CountyFIPS, NTAName)

##importing geometry of 2010 tracts from Tidy Census and transforming in order to fit the NABES file.

rawlabor10tractshp <- get_acs(geography = "tract",
                              variables = "B24012_001",
                              year = 2010, 
                              state = "NY",
                              county = "Kings",
                              geometry = TRUE) %>% 
  select(NAME, geometry) %>% 
  st_transform(2263)
## Getting data from the 2006-2010 5-year ACS
labor10tractshp <- rawlabor10tractshp %>% 
  right_join(labor10tract, by = "NAME") %>% 
  select(GISJOIN, NAME, total_man, pct_man, total_worker_pop, geometry) %>% 
  st_join(nabes, join = st_intersects) %>% 
  drop_na() %>% 
  rename(totalman_10 = total_man,
         totalworkerpop_10 = total_worker_pop,
         pctman_10 = pct_man)

##Finally putting together total historical data for the tract geographical level:

totaltractlabordata <- labor10tractshp %>%
  left_join(labor2000tract, by = "GISJOIN") %>% 
  rename(totalworkerpop_00 = total_worker_pop,
         totalman_00 = total_man,
         pctman_00 = pct_man) %>% 
  mutate(NAME = str_remove_all(NAME.y, "Census ")) %>% 
  left_join(tractlabor90, by = "NAME") %>% 
  left_join(labor80tract, by = "GISJOIN.y") %>% 
  left_join(tract70labortrue, by = "GISJOIN.y") %>% 
  drop_na() %>% 
  select(GISJOIN.y, NAME.x, NTAName, totalworkerpop_70, totalworkerpop_80, totalworkerpop_90, totalworkerpop_00, totalworkerpop_10, totalman_70, totalman_80, totalman_90, totalman_00, totalman_10, pctman_70, pctman_80, pctman_90, pctman_00, pctman_10, geometry) %>% 
  rename(GISJOIN = GISJOIN.y,
         NAME = NAME.x)
PLOTtotaltractlabot <- st_drop_geometry(totaltractlabordata) %>% 
  mutate('1970' = mean(pctman_70),
         '1980' = mean(pctman_80),
         '1990' = mean(pctman_90), 
         '2000' = mean(pctman_00),
         '2010' = mean(pctman_10)) %>% 
  select(NAME, '1970', '1980', '1990', '2000', '2010') %>% 
  filter(NAME == "Census Tract 104, Kings County, New York") %>% 
  pivot_longer(-c(NAME), names_to = "year")

PLOTtotaltract2 <- st_drop_geometry(totaltractlabordata) %>% 
  select(NAME, pctman_70, pctman_80, pctman_90, pctman_00, pctman_10) %>% 
  rename('1970' = pctman_70,
         '1980' = pctman_80,
         '1990' = pctman_90, 
         '2000' = pctman_00,
         '2010' = pctman_10) %>% 
  pivot_longer(-c(NAME), names_to = "year")

summary(totalaborplace)
##     PLACE               year               value        
##  Length:5           Length:5           Min.   :0.07037  
##  Class :character   Class :character   1st Qu.:0.10006  
##  Mode  :character   Mode  :character   Median :0.15452  
##                                        Mean   :0.16048  
##                                        3rd Qu.:0.22164  
##                                        Max.   :0.25580
summary(totallaborcounty)
##     COUNTY              year               value        
##  Length:20          Length:20          Min.   :0.06221  
##  Class :character   Class :character   1st Qu.:0.09111  
##  Mode  :character   Mode  :character   Median :0.15597  
##                                        Mean   :0.16112  
##                                        3rd Qu.:0.22629  
##                                        Max.   :0.27618
summary(PLOTtotaltract2)
##      NAME               year               value         
##  Length:515         Length:515         Min.   :0.007401  
##  Class :character   Class :character   1st Qu.:0.111621  
##  Mode  :character   Mode  :character   Median :0.199712  
##                                        Mean   :0.198194  
##                                        3rd Qu.:0.269133  
##                                        Max.   :0.593810

Visualization:

##My map to compare the decline in industrial employment since 1970.
totallaboryears <- ggplot() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  geom_point(data = PLOTtotaltractlabot, mapping = aes(x = year, y = value), color = "green") + 
  geom_line(data = PLOTtotaltractlabot,  mapping = aes(x= year, y = value,), color = "green", group = 1)+ 
  geom_text(data = subset(PLOTtotaltractlabot, year == "1990"), mapping =aes(x= year, y = value), color = "green", label = "Red Hook/Gowanus/Sunset Park", hjust = -.2) +
  geom_point(data = totalaborplace, mapping = aes(x = year, y = value), color = "red", size = 2) +
  geom_line(data = totalaborplace, mapping = aes(x= year, y = value,), color = "red", group = 1, size = 5, alpha = .5) +
  geom_text(data = subset(totalaborplace, year == "1980"), aes(x= year, y = value), label = "New York City", color = "red", hjust = -.2) +
  geom_point(data = totallaborcounty, mapping = aes(x = year, y = value, color = COUNTY)) + 
  geom_line(data = totallaborcounty, mapping = aes(x = year, y = value, color = COUNTY, group = COUNTY), linetype = "dotdash") +
  geom_text(data = subset(totallaborcounty, year == "1970"), aes(label = COUNTY, x= year, y = value, color = COUNTY), hjust = 1.1) + 
  scale_color_manual(values = c("tomato3", "steelblue2",
                                "seagreen3", "orchid1")) +
  labs(x = "Year", y = "Industrial Employment (%)", title = "Industrial Employment in New York City since 1970", caption = "Sources: The US Census and American Community Survey", color = "County")
ggplotly(totallaboryears) 

Observations:

This chart shows a clear and steady decline in industrial employment since the 1970s, confirming the background historical information that I gathered. The three neighborhoods I isolated had a much larger share of industrial employment than the entire city but saw a much steeper drop between 1980 and 1990. Of the four boroughs, Brooklyn had an initially higher share of industrial employment, but by 2010 it was eclipsed by Queens with a small but significant margin.

Next I looked at contemporary geographic data for the neighborhoods I selected to better contextualize my historical data.
#Processing the demographic data of median income and median home value in Red Hook, Gowanus, and Sunset Park from 2010.
rawmedianincome <- get_acs(geography = "tract",
                           variables = "B19013_001",
                           year = 2010, 
                           state = "NY", 
                           county = "Kings", 
                           geometry = TRUE)
## Getting data from the 2006-2010 5-year ACS
medianincome <- rawmedianincome %>% 
  select(GEOID, NAME, estimate, geometry) %>% 
  rename(medianincome = estimate)

medianincomecorrectedshp <- medianincome %>% 
  st_transform(2263)

median_income_nabes <- medianincomecorrectedshp %>%
  st_join(nabes, join = st_intersects) %>% 
  drop_na()

median_incomemap <- ggplot()  + 
  geom_sf(data = median_income_nabes, mapping = aes(fill = medianincome), color = "#ffffff") + 
  scale_fill_gradient2(name="Median Income ($)", labels=dollar_format(accuracy = 1L)) +
  labs(title = "Median Income in Red Hook, Sunset Park, and Gowanus (2010)", caption = "Sources: American Community Survey") +
  theme_void()

rawhomevalue <- get_acs(geography = "tract",
                        variables = "B25077_001",
                        year = 2010, 
                        state = "NY", 
                        county = "Kings", 
                        geometry = TRUE)
## Getting data from the 2006-2010 5-year ACS
medianhomevalue <- rawhomevalue %>% 
  select(GEOID, NAME, estimate, geometry) %>% 
  rename(medianvalue = estimate)
medianvalueecorrectedshp <- medianhomevalue %>% 
  st_transform(2263)
median_value_nabes <- medianvalueecorrectedshp %>%
  st_join(nabes, join = st_intersects) %>% 
  drop_na()
options(scipen = 100)
median_valuemap <- ggplot()  + 
  geom_sf(data = median_value_nabes, mapping = aes(fill = medianvalue), color = "#ffffff") +
  scale_fill_gradient(name="Median Home Value ($)", labels=dollar_format(accuracy = 1L)) +
  labs(title = "Median Home Value in Red Hook, Sunset Park, and Gowanus (2010)", direction = 1, caption = "Sources: American Community Survey") +
  theme_void()

median_incomemap

median_valuemap

Observations:

These two maps show the current economic condition of these neighborhoods. They both reveal the prevalent gentrification in the neighborhoods of Red Hook and Gowanus, as higher median incomes and home values are clustered there. Sunset Park, however has maintained more of its working-class character. In the 1980s, Sunset Park became the borough’s first Chinatown; the neighborhood’s population of immigrants from China is still growing. As of 2010, the neighborhood’s population is 14.5% White, 35.2% Asian, and 46.4% Latino.

Next I pulled the historical data from previous years to see the change that resulted over time to the current state illustrated by the above maps:
##I used a table series table for median income only, as median home value wasn't recorded consistently according to NHGIS. In retrospect, I did not adjust for inflation so unsure how well this series can inform my findings.
rawmedianincome_8090000 <- read_csv("data/raw/medianincome/nhgis0018_ts_nominal_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings County")
## Rows: 105385 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): NHGISCODE, GJOIN1980, GJOIN1990, GJOIN2000, GJOIN2012, STATE, STAT...
## dbl  (5): B79AA1980, B79AA1990, B79AA2000, B79AA125, B79AA125M
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
medianincome_8090000 <- rawmedianincome_8090000 %>% 
  rename(medianincome_80 = B79AA1980,
         medianincome_90 = B79AA1990,
         medianincome_00 = B79AA2000) %>% 
  select(NAME2012, medianincome_80, medianincome_90, medianincome_00) %>%
  drop_na() %>% 
  rename(NAME = NAME2012)

median_income_nabesyears <- st_drop_geometry(median_income_nabes) %>% 
  left_join(medianincome_8090000, by = "NAME") %>% 
  rename(medianincome_10 = medianincome) %>% 
  drop_na() %>% 
  mutate('1980' = medianincome_80,
         '1990' = medianincome_90, 
         '2000' = medianincome_00,
         '2010' = medianincome_10) %>% 
  select(NAME, NTAName, '1980', '1990', '2000', '2010')%>% 
  pivot_longer(-c(NAME, NTAName), names_to = "year")

medianincomePLOTyears <- median_income_nabesyears %>% 
  mutate('1980' = mean('1980'),
         '1990' = mean('1990'), 
         '2000' = mean('2000'),
         '2010' = mean('2010')) %>% 
  select(NAME,'1980', '1990', '2000', '2010') %>% 
  pivot_longer(-c(NAME), names_to = "year") %>% 
  filter(NAME == "Census Tract 104, Kings County, New York")
## Warning in mean.default("1980"): argument is not numeric or logical: returning
## NA
## Warning in mean.default("1990"): argument is not numeric or logical: returning
## NA
## Warning in mean.default("2000"): argument is not numeric or logical: returning
## NA
## Warning in mean.default("2010"): argument is not numeric or logical: returning
## NA
medianincometime <- ggplot(aes(x = year, y = value), data = median_income_nabesyears) + 
  scale_y_continuous(labels = dollar_format(accuracy = 1)) +
  scale_x_discrete('year') +
  scale_color_discrete(name = "Neighborhood", labels = c("Red Hook", "Gowanus", "Sunset Park East", "Sunset Park West")) +
  geom_smooth(aes(group = NTAName, color = NTAName), method = 'lm', alpha = .2, formula =  y ~x) +
  geom_point(mapping = aes(color = NTAName)) + 
  labs(x = "Year", y = "Median Income ($)", title = "Median Income in Industrial Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey")

  medianincometime

## Observations:

As illustrated by the previous maps, it seems as if median income in the neighhborhoods of Red Hook and Gownanus have risen far more sharply than in Sunset Park. Currently, Red Hook and Gowanus are popular neighborhoods for restaurants and for going out. As compared to Sunset Park, the racial makeup of Red Hook and Gowanus is far whiter (60.9% and 46% respectively) in addition to being more affluent. Additionally, Sunset Park is a little further away from Manhattan and other more gentrified sections of Brooklyn, which may also explain why the young urban professionals who make up the current gentrifying class may be less likely to settle there.

The last variable I looked at it is how people’s commutes have changed over time. Previously, industrial workers would live close to their workplaces, in the post-industrial era there is often a greater physical seperation between ones home and workplace, especially as rents have dramatically increased in downtown business districts.
Looking at how this variable changed over time may complement the data of declining industrial employment in these neighborhoods.
rawtraveltime <- get_acs(geography = "tract",
                         table = "B08012",
                         year = 2010, 
                         state = "NY", 
                         county = "Kings", 
                         geometry = FALSE)
## Getting data from the 2006-2010 5-year ACS
## Loading ACS5 variables for 2010 from table B08012. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.
acs5variable <- load_variables(2010, "acs5", cache = FALSE)


traveltime <- rawtraveltime %>% 
  pivot_wider(names_from = variable, values_from = c(estimate, moe)) %>% 
  select(GEOID, NAME, estimate_B08012_001, estimate_B08012_002, estimate_B08012_003, estimate_B08012_004, estimate_B08012_005, estimate_B08012_006, estimate_B08012_007, estimate_B08012_008, estimate_B08012_009, estimate_B08012_010, estimate_B08012_011, estimate_B08012_012, estimate_B08012_013) %>% 
  rename(totalworkerpop = estimate_B08012_001,
         lessthan5 = estimate_B08012_002, 
         fivetonine = estimate_B08012_003, 
         ten_fourteen = estimate_B08012_004,
         fifteen_nineteen = estimate_B08012_005, 
         twenty_twenfour = estimate_B08012_006, 
         twenfive_twennine = estimate_B08012_007, 
         thirty_thirfour = estimate_B08012_008, 
         thirfive_thirnine = estimate_B08012_009,
         forty_fourtfour = estimate_B08012_010,
         fourtfive_fiftnine = estimate_B08012_011,
         sixt_eightnine = estimate_B08012_012,
         ninety_plus = estimate_B08012_013) %>% 
  mutate(lessthan30 = (lessthan5 + fivetonine + ten_fourteen + fifteen_nineteen + twenty_twenfour + twenfive_twennine),
         thirty_fourfour = (thirty_thirfour + thirfive_thirnine + forty_fourtfour),
         fourtfiveplue = (fourtfive_fiftnine + sixt_eightnine + ninety_plus)) %>% 
  mutate(perlessthan30_10 = (lessthan30/totalworkerpop),
         per304_10 = (thirty_fourfour/totalworkerpop), 
         per45_10 = (fourtfiveplue/totalworkerpop))

#Importing historical travel time data from NHGIS
traveltime8090 <- read_csv("data/raw/traveltime/nhgis0017_ts_nominal_tract.csv") %>% 
  filter(STATE == "New York",
         COUNTY == "Kings County")
## Rows: 106145 Columns: 56
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): NHGISCODE, GJOIN1980, GJOIN1990, GJOIN2000, GJOIN2012, STATE, STAT...
## dbl (40): C50AA1980, C50AA1990, C50AA2000, C50AA125, C50AA125M, C50AB1980, C...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
traveltime8090proc <- traveltime8090 %>%  
  mutate(lessthan30_80 = (C50AA1980 + C50AB1980 + C50AC1980 + C50AD1980 + C50AE1980),
         thirty_fourtyfour_80 = C50AF1980, 
         fourtyfiveplue_80 = (C50AG1980 + C50AH1980)) %>%
  mutate(totalworkerpop_80 = (C50AA1980 + C50AB1980 + C50AC1980 + C50AD1980 + C50AE1980 + C50AF1980 + C50AG1980 + C50AH1980)) %>% 
  mutate(lessthan30_90 = (C50AA1990 + C50AB1990 + C50AC1990 + C50AD1990 + C50AE1990),
         thirty_fourtyfour_90 = C50AF1990,
         fourtyfiveplue_90 = (C50AG1990 + C50AH1990)) %>% 
  mutate(totalworkerpop_90 = (C50AA1990 + C50AB1990 + C50AC1990 + C50AD1990 + C50AE1990 + C50AF1990 + C50AG1990 + C50AH1990)) %>% 
  mutate(lessthan30_00 = (C50AG2000 + C50AH2000),
         thirty_fourtyfour_00 = C50AF2000,
         fourtyfiveplue_00 = (C50AG2000 + C50AH2000)) %>% 
  mutate(totalworkerpop_00 = (C50AA2000 + C50AB2000 + C50AC2000 + C50AD2000 + C50AE2000 + C50AF2000 + C50AG2000 + C50AH2000))
  
perctraveltime809000 <- traveltime8090proc %>% 
  select(NAME2012, lessthan30_80, lessthan30_90, lessthan30_00, totalworkerpop_80, totalworkerpop_90, totalworkerpop_00, thirty_fourtyfour_80, thirty_fourtyfour_90, thirty_fourtyfour_00, fourtyfiveplue_80, fourtyfiveplue_90, fourtyfiveplue_00) %>% 
  mutate(perlessthan30_80 = (lessthan30_80/totalworkerpop_80),
         per304_80 = (thirty_fourtyfour_80/totalworkerpop_80),
         per45_80 = (fourtyfiveplue_80/totalworkerpop_80)) %>% 
  mutate(perlessthan30_90 = (lessthan30_90/totalworkerpop_90),
         per304_90 = (thirty_fourtyfour_90/totalworkerpop_90),
         per45_90 = (fourtyfiveplue_90/totalworkerpop_90)) %>% 
  mutate(perlessthan30_00 = (lessthan30_00/totalworkerpop_00),
         per304_00 = (thirty_fourtyfour_00/totalworkerpop_00),
         per45_00 = (fourtyfiveplue_00/totalworkerpop_00)) 

traveltime809000 <- perctraveltime809000 %>% 
  select(NAME2012, perlessthan30_80, per304_80, per45_80, perlessthan30_90, per304_90, per45_90, perlessthan30_00, per304_00, per45_00) %>% 
  drop_na() %>% 
  rename(NAME = NAME2012)

traveltimeyears <- traveltime %>% 
  select(NAME, perlessthan30_10, per304_10, per45_10) %>% 
  left_join(traveltime809000, by = "NAME") %>% 
  drop_na() 

traveltimeshp <- get_acs(geography = "tract",
                         variables = "B08012_001",
                         year = 2010, 
                         state = "NY", 
                         county = "Kings", 
                         geometry = TRUE)
## Getting data from the 2006-2010 5-year ACS
traveltimeyearsfinal <- traveltimeshp %>% 
  select(NAME, geometry) %>% 
  right_join(traveltimeyears, by = "NAME") %>% 
  st_transform(2263) %>% 
  st_join(nabes, join = st_intersects) %>% 
  drop_na()

traveltimeyearsfinalPLOT <- st_drop_geometry(traveltimeyearsfinal) %>% 
  select(NAME, NTAName, perlessthan30_80, per304_80, per45_80, perlessthan30_90, per304_90, per45_90, perlessthan30_00, per304_00, per45_00, perlessthan30_10, per304_10, per45_10) %>% 
  rename('029.2010' = perlessthan30_10,
         '029.2000' = perlessthan30_00,
         '029.1990' = perlessthan30_90,
         '029.1980' = perlessthan30_80,
         '3044.2010' = per304_10,
         '3044.2000' = per304_00,
         '3044.1990' = per304_90,
         '3044.1980' = per304_80,
         '4500.2010' = per45_10,
         '4500.2000' = per45_00,
         '4500.1990' = per45_90,
         '4500.1980' = per45_80) %>% 
  pivot_longer(-c(NAME, NTAName), names_to = "year") %>% 
  separate(year, c('Time', 'Year')) %>% 
  arrange(Time)

commuteline <- ggplot(aes(x = Year, y = value), data = traveltimeyearsfinalPLOT) + 
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_color_discrete(name = "Commute Time", labels = c("< 30 Minutes", "30-44 minutes", "45+ minutes")) +
  labs(x = "Year", y = "Percentage of Working Population (%)", title = "Commuting Times in West Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey") +
  geom_smooth(mapping = aes(group = Time, color = Time), method = 'lm', alpha = .2, formula =  y ~x) + 
  facet_wrap(~ NTAName, nrow= 2)

commutebar <- ggplot(data = subset(traveltimeyearsfinalPLOT), mapping = aes(x= Year, y = value, fill = Time)) +
  labs(x = "Year", y = "Percentage of Working Population (%)", title = "Commuting Times in West Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey") +
  scale_fill_discrete(name = "Commute Time", labels = c("< 30 Minutes", "30-44 minutes", "45+ minutes")) +
  geom_col(position = "dodge")  

commutebar

ggplotly(commuteline)

Observations:

Looking at the change in these four areas, the most dramatic shift is in the population share with a less than 30 minute commute in the Red Hook area (from over 40% in 1980 to ~32% by 2010) This could be directly attributed to the loss of industrial employment as factories left the area in the latter half of the 20th century. Longer commutes rose in conjunction with this fall. The Gowanus area shows a similar trend, but it is less pronounced. Sunset Park seems to have always had a higher share of longer than 45 minute commutes, this may be due to its industrial nature devolving more quickly than the other neighborhoods, as the Brooklyn Army Terminal, its main industrial hub, largely went into disuse after World War II. It is interesting that the population share with shorter commutes has remained somewhat stagnant. I speculate that may be in part due to the large immigrant population who own and work at local businesses.

Conclusion and Next Steps:

This research project demonstrated the dramatic destruction of New York City’s industrial nature over the second half of the 20th century. It marks how the industrial nature of specific neighborhoods, who are currently undergoing dramatic transformation, can still persist in demographic data.

For next steps it would be interesting to dig even deeper into these three neighborhoods, further looking at economic and demographic data, to dig out the complex character of their change over time. Zoning maps would serve as a useful resource to mark exactly when factories and warehouses moved out of these areas.

Another project idea would be to look at the actual physical industrial structures that have lingered in the neighborhoods. Projects like the Highline in Manhattan show how adaptive reuse can anchor a neighborhood’s industrial past in the present, while exponentially boosting nearby real estate values. The redevelopment of post-industrial neighborhoods has been a prevalent theme in recent urbanism.

Another remaining question is, where did industrial workers go? There are some lingering industrial remnants in Queens it seems, but it would be interesting to move this analysis to New Jersey, for instance, where many factories eventually moved to.

---
title: "Deindustrialization in New York City 1970 to 2010"
output:
  html_document:
    df_print: paged
    code_download: true
---

`````{r, echo= FALSE, message=F, warning = F}
knitr::opts_chunk$set(echo = TRUE)
library(tidycensus)
library(tidyverse)
options(tigris_use_cache = TRUE)
library(sf)
library(scales)
library(RColorBrewer)
library(viridis)
library(ggplot2)
library(plotly)
```
###### Pansy Schulman, DUE Methods I, 2021

# Project Statement and Background

  For this project, I wanted to answer the question: How did deindustrialization affect New York demographically and geographically? How do once-industrial areas reflect their history in terms of demographics, income, and geographic makeup?
  
  The purpose of this research is to show how an economic shift manifested itself in New York's landscape from 1970 to today.  I looked at three geographic levels in my data analysis: first, for the entire city, or "place" in census definition, next for each individual county in the city, before finally zooming in on the neighborhoods of Red Hook, Sunset Park, and Gowanus in Brooklyn, alll of which were once major industry hubs and have seen major economic and demographic change over this time.
  
  Red Hook and Sunset Park are on the western waterfront of Brooklyn, and thus historically their ports attracted an influx of manufacturing and industrial businesses in the late 19th and 20th century. All three of these neighborhoods were hubs for Brooklyn's maritime shipping and industrial activites; they have hosted cement works, paint factories, tanneries, coal yards, warehouses, mills, and textile production amongst many other industrial sites in their history. The Gowanus Canal, which runs through the neighborhood, was used to conduit building materials to and from various factories in the area. Sunset Park was home to the Brooklyn Army Terminal, which was a major manufacturing hub for army supplies, as well as many other piers and factories. Immigrants, who made up a large part of the city's industrial workforce, settled in these areas of Brooklyn with their families.
  
 Manufacturing employment in New York City peaked in the late 1940s. In 1953, manufacturing accounted for 40% of all jobs, by 1994 this number dipped to 17%. New York City lost more than 700,000 manufacturing jobs between 1953 and 1995. This economic change was guided by city policy as well as larger global and national economic trends. 
 
 ![*An industrial map of New York City from 1919, courtesy of the New York Public Library.*](manufacturingmap.jpeg)

  
  Rezoning industrial areas began in the 1950s and continued and escalated through the end of the 20th century to today. "Planned shrinkage" in the 1970s cut off municipal resources to poor and working-class areas and factories moved out of the city. Financial and real estate operations toppled the once-industrial city, making New York a global financial hub instead. 
  
  I chose this topic because I was interested in how the evolution of a neighborhood and city's working class was involved in gentrification and other processes that have changed the demographic makeup of New York city communities over the past 50 years.
  
  For the purposes of this project, I chose to incorporate not only those working in manufacturing, but in industry-related occupations, such as warehousing, wholesale trade, and trucking, so as to encompass all the people impacted by this economic shift.
  
  The first variable I looked at was the percentage of manufacturing jobs in relation to the total working population.
Generally, I used census data from the National Historical Geographical Information System for data from before 2010, and tidycensus to pull data from the 5-year American Community Survey for the most recent data.

# Data and Methodology

```{r}
#Importing and processing 1970 data from NHGIS :
##I used raw employment numbers from "Sex by Age" in employed persons over the age of 16 to get the total working population of areas to account for people having multiple jobs in the Occupation and Industry tables, I noticed a discrepancy when I compared the totals from the two tables I used.
placeworkforceraw70 <- read.csv("data/raw/nhgis0005_ds99_1970_place.csv") %>% 
  filter(AREANAME == "New York City") %>% 
  select(GISJOIN, STATE, PLACE, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

placeworkforce70 <- placeworkforceraw70 %>% 
  select(PLACE, totalworkerpop)

countyworkforceraw70 <- read.csv("data/raw/nhgis0006_ds99_1970_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

countyworkforce70 <- countyworkforceraw70 %>% 
  select(COUNTY, totalworkerpop)

tractworkforceraw70 <- read.csv("data/raw/nhgis0006_ds99_1970_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C24001,C24002,C24003,C24004,C24005,C24006,C24007,C24008,C24009,C24010,C24011,C24012,C24013,C24014,C24015,C24016) %>% 
  mutate(totalworkerpop = rowSums(across(where(is.numeric))))

tractworkforce70 <- tractworkforceraw70 %>% 
  select(GISJOIN, totalworkerpop) %>% 
  drop_na()

## For 70s employment figures I used the Industry Table for Employed Persons 16 years and over and isolated the following variables:
## C3B004: Furniture and lumber and wood products
## C3B005: Primary metal industries
## C3B006: Fabricated metal industries (including not specified metals)
## C3B007: Machinery, except electrical
## C3B008: Electrical machinery, equipment and supplies
## C3B009: Motor vehicles and other transportation equipment
## C3B010: Other durable goods
## C3B011: Food and kindred products
## C3B012: Textile mill and other fabricated textile products
## C3B013: Printing, publishing, and allied industries
## C3B014: Chemical and allied products
## C3B015: Other nondurable goods (including specified manufacturing industries)
## C3B017: Trucking service and warehousing
## C3B021: Wholesale trade

raw70laborplace <- read.csv("data/raw/1970/nhgis0011_ds99_1970_place.csv") %>% 
  filter(AREANAME == "New York City") %>% 
  select(GISJOIN, STATE, PLACE, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021)

labor70place <- raw70laborplace %>%
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(placeworkforce70, by = "PLACE") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1970' = pct_man) %>% 
  mutate(PLACE = str_remove_all(PLACE, " City"))

raw70countylabor <- read.csv("data/raw/1970/nhgis0011_ds99_1970_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens")

county70labortrue <-  raw70countylabor %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021) %>% 
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(countyworkforce70, by = "COUNTY") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1970' = pct_man)

raw70tractlabor <-read.csv("data/raw/1970/nhgis0011_ds99_1970_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings")

tract70labortrue <- raw70tractlabor %>% 
  select(GISJOIN, STATE, COUNTY, AREANAME, C3B004,C3B005, C3B006, C3B007, C3B008, C3B009, C3B010, C3B011, C3B012, C3B013, C3B014, C3B015, C3B017, C3B021) %>% 
  mutate(total_man = rowSums(across(where(is.numeric)))) %>% 
  left_join(tractworkforce70, by = "GISJOIN") %>% 
  mutate(pct_man = (total_man/totalworkerpop)) %>% 
  select(GISJOIN, pct_man, total_man, totalworkerpop) %>% 
  rename(totalworkerpop_70 = totalworkerpop, 
         totalman_70 = total_man,
         pctman_70 = pct_man, 
         GISJOIN.y = GISJOIN)

##Importing and processing 1980 data for NHGIS
##Using labor force totals from Table "Persons 16 yrs and over by labor force," variable used:
##B84AD1980:1980:Persons: 16 years and over ~ In labor force--Civilian--Employed
##I realized there was no longer a discrepency in the data so I stopped finding the workforce totals seperately from industry tables

##Using Industry table for industrial employment totals, I isolated the following variables: 
## DIA003: Manufacturing: Nondurable goods (codes 100-222)
## DIA004: Manufacturing: Durable goods (codes 230-392)
## DIA007: Wholesale trade (codes 500-571)

raw80laborplace <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_place.csv") %>% 
  filter(PLACE == "New York") %>% 
  mutate(total_worker_pop = sum(c_across(DIA001:DIA015)))

labor80place <- raw80laborplace %>% 
  select(GISJOIN, STATE, PLACE, AREANAME, DIA003, DIA004, DIA007, total_worker_pop) %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1980' = pct_man)

raw80laborcounty <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, DIA001, DIA002, DIA003, DIA004, DIA005, DIA006, DIA007, DIA008, DIA009, DIA010, DIA011, DIA012, DIA013, DIA014, DIA015) %>% 
  mutate(total_worker_pop = rowSums(across(where(is.numeric))))

labor80county <- raw80laborcounty %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1980' = pct_man) 

raw80labortract <- read.csv("data/raw/1980/nhgis0013_csv/nhgis0013_ds107_1980_tract.csv") %>% 
  filter(STATE == "New York",COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, COUNTY, DIA001, DIA002, DIA003, DIA004, DIA005, DIA006, DIA007, DIA008, DIA009, DIA010, DIA011, DIA012, DIA013, DIA014, DIA015) %>% 
  mutate(total_worker_pop = rowSums(across(where(is.numeric))))

labor80tract <- raw80labortract %>% 
  mutate(total_man = (DIA003 + DIA004 + DIA007)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  rename(totalworkerpop_80 = total_worker_pop,
         totalman_80 = total_man,
         pctman_80 = pct_man) %>% 
  select(GISJOIN, totalman_80, totalworkerpop_80, pctman_80) %>% 
  rename(GISJOIN.y = GISJOIN)

##Importing and processing 1990 data from NHGIS
##Imported tables combine "Sex by Employment status" (variables E4I001- E4I008) and "Industry" (variables E4P001-E4P017), selecting totals from following columns to calculate total industrial employment:
## E4P004: Manufacturing, nondurable goods (100-229)
## E4P005: Manufacturing, durable goods (230-399)
## E4P008: Wholesale trade (500-579)
##Calculating total working population come from sum of:
## E4I002: Male >> In labor force: Civilian: Employed
## E4I006: Female >> In labor force: Civilian: Employed

raw90placelabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, STATE, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004,E4P005, E4P008)

placelabor90 <- raw90placelabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('1990' = pct_man)

raw90countylabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004, E4P005, E4P008)

countylabor90 <- raw90countylabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('1990' = pct_man)

raw90tractlabor <- read.csv("data/raw/1990/nhgis0008_ds123_1990_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, ANPSADPI, COUNTY, E4I001, E4I002, E4I003, E4I005, E4I006, E4P004, E4P005, E4P008)

tractlabor90 <- raw90tractlabor %>% 
  mutate(total_worker_pop = (E4I002 +E4I006)) %>% 
  mutate(total_man = (E4P004 + E4P005 + E4P008)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  filter(pct_man != "NaN") %>% 
  select(GISJOIN, ANPSADPI, total_man, total_worker_pop, pct_man) %>% 
  rename(NAME = ANPSADPI) %>% 
  rename(totalworkerpop_90 = total_worker_pop,
         totalman_90 = total_man, 
         pctman_90 = pct_man)

##Importing and processing 2000 data,  Again I noticed a discrepancy with the total working population which distorted the data so imported table so I imported table: "Total Workers 16 years and Over to get working total first: 

totalworkforce00place <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(PLACE, GJU001 ) %>% 
  rename(total_worker_pop = GJU001)

totalworkforce00county <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_county.csv")%>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>%
  select(COUNTY, GJU001 ) %>% 
  rename(total_worker_pop = GJU001)

totalworkforce00tract <- read.csv("data/raw/2000/nhgis0019_csv/nhgis0019_ds151_2000_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, GJU001) %>% 
  rename(total_worker_pop =GJU001)


##Industrial employement data from table "Sex by Industry Type: Employed Civilian Persons 16 Years and Over in Selected Industry Categories:"
## GMH003:Male >> Manufacturing
## GMH004:Male >> Wholesale trade
## GMH016:Female >> Manufacturing
## GMH017:Female >> Wholesale trade
raw2000laborplace <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_place.csv") %>% 
  filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, STATE, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000place <- raw2000laborplace %>% 
  left_join(totalworkforce00place, by = "PLACE") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(PLACE, pct_man) %>% 
  rename('2000' = pct_man)


raw2000laborcounty <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York" | COUNTY == "Bronx" | COUNTY == "Kings" | COUNTY == "Queens") %>% 
  select(GISJOIN, STATE, COUNTY, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000county <- raw2000laborcounty %>% 
  left_join(totalworkforce00county, by = "COUNTY") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  select(COUNTY, pct_man) %>% 
  rename('2000' = pct_man)

raw2000labortract <- read.csv("data/raw/2000/nhgis0016_csv/nhgis0016_ds151_2000_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings") %>% 
  select(GISJOIN, STATE, COUNTY, NAME, GMH001, GMH002, GMH003, GMH004, GMH005, GMH006, GMH007, GMH008, GMH009, GMH010, GMH011, GMH012, GMH013, GMH014, GMH015, GMH016, GMH017, GMH018, GMH019)

labor2000tract <- raw2000labortract %>% 
  left_join(totalworkforce00tract, by = "GISJOIN") %>% 
  mutate(total_man = (GMH003 + GMH004 + GMH016 + GMH017)) %>% 
  mutate(pct_man = (total_man/total_worker_pop)) %>% 
  filter(pct_man != "NaN") %>% 
  select(GISJOIN, NAME, total_man, pct_man, total_worker_pop) 

##Importing and processing 2010 data from table "Means of Transportation to Work by Industry,Universe:Workers 16 years and over:" Variable for total working pop is:
## JZNE001:Total
##Total manufacturing employment from:
## JZNE004:Manufacturing
## JZNE005:Wholesale trade

rawtestlabor10place <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_place.csv") %>% 
filter(PLACE == "New York city") %>% 
  select(GISJOIN, PLACE, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10place <- rawtestlabor10place %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop))

rawlabor10county <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_county.csv") %>% 
  filter(STATE == "New York", COUNTY == "New York County" | COUNTY == "Bronx County" | COUNTY == "Kings County" | COUNTY == "Queens County") %>% 
  select(GISJOIN, STATE, COUNTY, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10county <- rawlabor10county %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop))

rawlabor10tract <- read.csv("data/raw/2010/nhgis0010_csv/nhgis0010_ds177_20105_tract.csv") %>% 
  filter(STATE == "New York", COUNTY ==  "Kings County") %>% 
  select(GISJOIN, STATE, COUNTY, NAME_E, JZNE001, JZNE002, JZNE003, JZNE004, JZNE005, JZNE006, JZNE007, JZNE008, JZNE009, JZNE010, JZNE011, JZNE012, JZNE013, JZNE014, JZNE015, JZNE016)

labor10tract <- rawlabor10tract %>% 
  rename(total_worker_pop = JZNE001) %>% 
  mutate(total_man = (JZNE004 + JZNE005),
         pct_man = (total_man/total_worker_pop)) %>% 
  rename(NAME = NAME_E)
```


## Results:

```{r}
##First for the city as a whole:
totalaborplace <- labor10place %>% 
  select(GISJOIN, PLACE, total_worker_pop, total_man, pct_man) %>% 
  rename('2010' = pct_man) %>% 
  left_join(labor2000place, by = "PLACE") %>% 
  left_join(placelabor90, by = "PLACE") %>% 
  select(PLACE, '2010', '2000', '1990') %>% 
  mutate(PLACE = str_remove_all(PLACE, " city")) %>% 
  left_join(labor80place, by = "PLACE") %>% 
  left_join(labor70place, by = "PLACE") %>% 
  pivot_longer(-c(PLACE), names_to = "year")

  
## Next for the counties: 
totallaborcounty <- labor10county %>% 
  select(COUNTY, pct_man) %>% 
  rename('2010' = pct_man) %>%
  mutate(COUNTY = str_remove_all(COUNTY, " County")) %>% 
  left_join(labor2000county, by = "COUNTY") %>% 
  left_join(countylabor90, by = "COUNTY") %>% 
  left_join(labor80county, by = "COUNTY") %>% 
  left_join(county70labortrue, by = "COUNTY") %>% 
  pivot_longer(-c(COUNTY), names_to = "year")


##Finally for tract, in order to isolate the neighborhoods I wanted, I used a spatial join with the Neighborhood Tabulation tables from NYC Open Data. I had to pull just the shape files from the census in order to match everything up.

##Importing and processing Neighborhood Tabulation Tables:

raw_nabes <- st_read("data/raw/nynta2010_21d/nynta2010.shp")

nabes <- raw_nabes %>% 
  filter(BoroName == "Brooklyn") %>% 
  filter(NTAName == "Carroll Gardens-Columbia Street-Red Hook" | NTAName == "Park Slope-Gowanus" | NTAName == "Sunset Park East" | NTAName == "Sunset Park West") %>% 
  select(BoroName, CountyFIPS, NTAName)

##importing geometry of 2010 tracts from Tidy Census and transforming in order to fit the NABES file.

rawlabor10tractshp <- get_acs(geography = "tract",
                              variables = "B24012_001",
                              year = 2010, 
                              state = "NY",
                              county = "Kings",
                              geometry = TRUE) %>% 
  select(NAME, geometry) %>% 
  st_transform(2263)

labor10tractshp <- rawlabor10tractshp %>% 
  right_join(labor10tract, by = "NAME") %>% 
  select(GISJOIN, NAME, total_man, pct_man, total_worker_pop, geometry) %>% 
  st_join(nabes, join = st_intersects) %>% 
  drop_na() %>% 
  rename(totalman_10 = total_man,
         totalworkerpop_10 = total_worker_pop,
         pctman_10 = pct_man)

##Finally putting together total historical data for the tract geographical level:

totaltractlabordata <- labor10tractshp %>%
  left_join(labor2000tract, by = "GISJOIN") %>% 
  rename(totalworkerpop_00 = total_worker_pop,
         totalman_00 = total_man,
         pctman_00 = pct_man) %>% 
  mutate(NAME = str_remove_all(NAME.y, "Census ")) %>% 
  left_join(tractlabor90, by = "NAME") %>% 
  left_join(labor80tract, by = "GISJOIN.y") %>% 
  left_join(tract70labortrue, by = "GISJOIN.y") %>% 
  drop_na() %>% 
  select(GISJOIN.y, NAME.x, NTAName, totalworkerpop_70, totalworkerpop_80, totalworkerpop_90, totalworkerpop_00, totalworkerpop_10, totalman_70, totalman_80, totalman_90, totalman_00, totalman_10, pctman_70, pctman_80, pctman_90, pctman_00, pctman_10, geometry) %>% 
  rename(GISJOIN = GISJOIN.y,
         NAME = NAME.x)
PLOTtotaltractlabot <- st_drop_geometry(totaltractlabordata) %>% 
  mutate('1970' = mean(pctman_70),
         '1980' = mean(pctman_80),
         '1990' = mean(pctman_90), 
         '2000' = mean(pctman_00),
         '2010' = mean(pctman_10)) %>% 
  select(NAME, '1970', '1980', '1990', '2000', '2010') %>% 
  filter(NAME == "Census Tract 104, Kings County, New York") %>% 
  pivot_longer(-c(NAME), names_to = "year")

PLOTtotaltract2 <- st_drop_geometry(totaltractlabordata) %>% 
  select(NAME, pctman_70, pctman_80, pctman_90, pctman_00, pctman_10) %>% 
  rename('1970' = pctman_70,
         '1980' = pctman_80,
         '1990' = pctman_90, 
         '2000' = pctman_00,
         '2010' = pctman_10) %>% 
  pivot_longer(-c(NAME), names_to = "year")

summary(totalaborplace)
summary(totallaborcounty)
summary(PLOTtotaltract2)

```

## Visualization:

```{r}

##My map to compare the decline in industrial employment since 1970.
totallaboryears <- ggplot() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  geom_point(data = PLOTtotaltractlabot, mapping = aes(x = year, y = value), color = "green") + 
  geom_line(data = PLOTtotaltractlabot,  mapping = aes(x= year, y = value,), color = "green", group = 1)+ 
  geom_text(data = subset(PLOTtotaltractlabot, year == "1990"), mapping =aes(x= year, y = value), color = "green", label = "Red Hook/Gowanus/Sunset Park", hjust = -.2) +
  geom_point(data = totalaborplace, mapping = aes(x = year, y = value), color = "red", size = 2) +
  geom_line(data = totalaborplace, mapping = aes(x= year, y = value,), color = "red", group = 1, size = 5, alpha = .5) +
  geom_text(data = subset(totalaborplace, year == "1980"), aes(x= year, y = value), label = "New York City", color = "red", hjust = -.2) +
  geom_point(data = totallaborcounty, mapping = aes(x = year, y = value, color = COUNTY)) + 
  geom_line(data = totallaborcounty, mapping = aes(x = year, y = value, color = COUNTY, group = COUNTY), linetype = "dotdash") +
  geom_text(data = subset(totallaborcounty, year == "1970"), aes(label = COUNTY, x= year, y = value, color = COUNTY), hjust = 1.1) + 
  scale_color_manual(values = c("tomato3", "steelblue2",
                                "seagreen3", "orchid1")) +
  labs(x = "Year", y = "Industrial Employment (%)", title = "Industrial Employment in New York City since 1970", caption = "Sources: The US Census and American Community Survey", color = "County")
ggplotly(totallaboryears) 
```

## Observations: 
This chart shows a clear and steady decline in industrial employment since the 1970s, confirming the background historical information that I gathered. The three neighborhoods I isolated had a much larger share of industrial employment than the entire city but saw a much steeper drop between 1980 and 1990. Of the four boroughs, Brooklyn had an initially higher share of industrial employment, but by 2010 it was eclipsed by Queens with a small but significant margin.


###### Next I looked at contemporary geographic data for the neighborhoods I selected to better contextualize my historical data. 


```{r}
#Processing the demographic data of median income and median home value in Red Hook, Gowanus, and Sunset Park from 2010.
rawmedianincome <- get_acs(geography = "tract",
                           variables = "B19013_001",
                           year = 2010, 
                           state = "NY", 
                           county = "Kings", 
                           geometry = TRUE)

medianincome <- rawmedianincome %>% 
  select(GEOID, NAME, estimate, geometry) %>% 
  rename(medianincome = estimate)

medianincomecorrectedshp <- medianincome %>% 
  st_transform(2263)

median_income_nabes <- medianincomecorrectedshp %>%
  st_join(nabes, join = st_intersects) %>% 
  drop_na()

median_incomemap <- ggplot()  + 
  geom_sf(data = median_income_nabes, mapping = aes(fill = medianincome), color = "#ffffff") + 
  scale_fill_gradient2(name="Median Income ($)", labels=dollar_format(accuracy = 1L)) +
  labs(title = "Median Income in Red Hook, Sunset Park, and Gowanus (2010)", caption = "Sources: American Community Survey") +
  theme_void()

rawhomevalue <- get_acs(geography = "tract",
                        variables = "B25077_001",
                        year = 2010, 
                        state = "NY", 
                        county = "Kings", 
                        geometry = TRUE)

medianhomevalue <- rawhomevalue %>% 
  select(GEOID, NAME, estimate, geometry) %>% 
  rename(medianvalue = estimate)
medianvalueecorrectedshp <- medianhomevalue %>% 
  st_transform(2263)
median_value_nabes <- medianvalueecorrectedshp %>%
  st_join(nabes, join = st_intersects) %>% 
  drop_na()
options(scipen = 100)
median_valuemap <- ggplot()  + 
  geom_sf(data = median_value_nabes, mapping = aes(fill = medianvalue), color = "#ffffff") +
  scale_fill_gradient(name="Median Home Value ($)", labels=dollar_format(accuracy = 1L)) +
  labs(title = "Median Home Value in Red Hook, Sunset Park, and Gowanus (2010)", direction = 1, caption = "Sources: American Community Survey") +
  theme_void()

median_incomemap
median_valuemap
```

## Observations: 
These two maps show the current economic condition of these neighborhoods. They both reveal the prevalent gentrification in the neighborhoods of Red Hook and Gowanus, as higher median incomes and home values are clustered there. Sunset Park, however has maintained more of its working-class character. In the 1980s, Sunset Park became the borough's first Chinatown; the neighborhood's population of immigrants from China is still growing. As of 2010, the neighborhood's population is 14.5% White, 35.2% Asian, and 46.4% Latino. 


###### Next I pulled the historical data from previous years to see the change that resulted over time to the current state illustrated by the above maps:

```{r}
##I used a table series table for median income only, as median home value wasn't recorded consistently according to NHGIS. In retrospect, I did not adjust for inflation so unsure how well this series can inform my findings.
rawmedianincome_8090000 <- read_csv("data/raw/medianincome/nhgis0018_ts_nominal_tract.csv") %>% 
  filter(STATE == "New York", COUNTY == "Kings County")

medianincome_8090000 <- rawmedianincome_8090000 %>% 
  rename(medianincome_80 = B79AA1980,
         medianincome_90 = B79AA1990,
         medianincome_00 = B79AA2000) %>% 
  select(NAME2012, medianincome_80, medianincome_90, medianincome_00) %>%
  drop_na() %>% 
  rename(NAME = NAME2012)

median_income_nabesyears <- st_drop_geometry(median_income_nabes) %>% 
  left_join(medianincome_8090000, by = "NAME") %>% 
  rename(medianincome_10 = medianincome) %>% 
  drop_na() %>% 
  mutate('1980' = medianincome_80,
         '1990' = medianincome_90, 
         '2000' = medianincome_00,
         '2010' = medianincome_10) %>% 
  select(NAME, NTAName, '1980', '1990', '2000', '2010')%>% 
  pivot_longer(-c(NAME, NTAName), names_to = "year")

medianincomePLOTyears <- median_income_nabesyears %>% 
  mutate('1980' = mean('1980'),
         '1990' = mean('1990'), 
         '2000' = mean('2000'),
         '2010' = mean('2010')) %>% 
  select(NAME,'1980', '1990', '2000', '2010') %>% 
  pivot_longer(-c(NAME), names_to = "year") %>% 
  filter(NAME == "Census Tract 104, Kings County, New York")

medianincometime <- ggplot(aes(x = year, y = value), data = median_income_nabesyears) + 
  scale_y_continuous(labels = dollar_format(accuracy = 1)) +
  scale_x_discrete('year') +
  scale_color_discrete(name = "Neighborhood", labels = c("Red Hook", "Gowanus", "Sunset Park East", "Sunset Park West")) +
  geom_smooth(aes(group = NTAName, color = NTAName), method = 'lm', alpha = .2, formula =  y ~x) +
  geom_point(mapping = aes(color = NTAName)) + 
  labs(x = "Year", y = "Median Income ($)", title = "Median Income in Industrial Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey")

  medianincometime
  
```
## Observations: 

As illustrated by the previous maps, it seems as if median income in the neighhborhoods of Red Hook and Gownanus have risen far more sharply than in Sunset Park. Currently, Red Hook and Gowanus are popular neighborhoods for restaurants and for going out. As compared to Sunset Park, the racial makeup of Red Hook and Gowanus is far whiter (60.9% and 46% respectively) in addition to being more affluent. Additionally, Sunset Park is a little further away from Manhattan and other more gentrified sections of Brooklyn, which may also explain why the young urban professionals who make up the current gentrifying class may be less likely to settle there. 

###### The last  variable I looked at it is how people's commutes have changed over time. Previously, industrial workers would live close to their workplaces, in the post-industrial era there is often a greater physical seperation between ones home and workplace, especially as rents have dramatically increased in downtown business districts.

###### Looking at how this variable changed over time may complement the data of declining industrial employment in these neighborhoods.

```{r}

rawtraveltime <- get_acs(geography = "tract",
                         table = "B08012",
                         year = 2010, 
                         state = "NY", 
                         county = "Kings", 
                         geometry = FALSE)
acs5variable <- load_variables(2010, "acs5", cache = FALSE)


traveltime <- rawtraveltime %>% 
  pivot_wider(names_from = variable, values_from = c(estimate, moe)) %>% 
  select(GEOID, NAME, estimate_B08012_001, estimate_B08012_002, estimate_B08012_003, estimate_B08012_004, estimate_B08012_005, estimate_B08012_006, estimate_B08012_007, estimate_B08012_008, estimate_B08012_009, estimate_B08012_010, estimate_B08012_011, estimate_B08012_012, estimate_B08012_013) %>% 
  rename(totalworkerpop = estimate_B08012_001,
         lessthan5 = estimate_B08012_002, 
         fivetonine = estimate_B08012_003, 
         ten_fourteen = estimate_B08012_004,
         fifteen_nineteen = estimate_B08012_005, 
         twenty_twenfour = estimate_B08012_006, 
         twenfive_twennine = estimate_B08012_007, 
         thirty_thirfour = estimate_B08012_008, 
         thirfive_thirnine = estimate_B08012_009,
         forty_fourtfour = estimate_B08012_010,
         fourtfive_fiftnine = estimate_B08012_011,
         sixt_eightnine = estimate_B08012_012,
         ninety_plus = estimate_B08012_013) %>% 
  mutate(lessthan30 = (lessthan5 + fivetonine + ten_fourteen + fifteen_nineteen + twenty_twenfour + twenfive_twennine),
         thirty_fourfour = (thirty_thirfour + thirfive_thirnine + forty_fourtfour),
         fourtfiveplue = (fourtfive_fiftnine + sixt_eightnine + ninety_plus)) %>% 
  mutate(perlessthan30_10 = (lessthan30/totalworkerpop),
         per304_10 = (thirty_fourfour/totalworkerpop), 
         per45_10 = (fourtfiveplue/totalworkerpop))

#Importing historical travel time data from NHGIS
traveltime8090 <- read_csv("data/raw/traveltime/nhgis0017_ts_nominal_tract.csv") %>% 
  filter(STATE == "New York",
         COUNTY == "Kings County")

traveltime8090proc <- traveltime8090 %>%  
  mutate(lessthan30_80 = (C50AA1980 + C50AB1980 + C50AC1980 + C50AD1980 + C50AE1980),
         thirty_fourtyfour_80 = C50AF1980, 
         fourtyfiveplue_80 = (C50AG1980 + C50AH1980)) %>%
  mutate(totalworkerpop_80 = (C50AA1980 + C50AB1980 + C50AC1980 + C50AD1980 + C50AE1980 + C50AF1980 + C50AG1980 + C50AH1980)) %>% 
  mutate(lessthan30_90 = (C50AA1990 + C50AB1990 + C50AC1990 + C50AD1990 + C50AE1990),
         thirty_fourtyfour_90 = C50AF1990,
         fourtyfiveplue_90 = (C50AG1990 + C50AH1990)) %>% 
  mutate(totalworkerpop_90 = (C50AA1990 + C50AB1990 + C50AC1990 + C50AD1990 + C50AE1990 + C50AF1990 + C50AG1990 + C50AH1990)) %>% 
  mutate(lessthan30_00 = (C50AG2000 + C50AH2000),
         thirty_fourtyfour_00 = C50AF2000,
         fourtyfiveplue_00 = (C50AG2000 + C50AH2000)) %>% 
  mutate(totalworkerpop_00 = (C50AA2000 + C50AB2000 + C50AC2000 + C50AD2000 + C50AE2000 + C50AF2000 + C50AG2000 + C50AH2000))
  
perctraveltime809000 <- traveltime8090proc %>% 
  select(NAME2012, lessthan30_80, lessthan30_90, lessthan30_00, totalworkerpop_80, totalworkerpop_90, totalworkerpop_00, thirty_fourtyfour_80, thirty_fourtyfour_90, thirty_fourtyfour_00, fourtyfiveplue_80, fourtyfiveplue_90, fourtyfiveplue_00) %>% 
  mutate(perlessthan30_80 = (lessthan30_80/totalworkerpop_80),
         per304_80 = (thirty_fourtyfour_80/totalworkerpop_80),
         per45_80 = (fourtyfiveplue_80/totalworkerpop_80)) %>% 
  mutate(perlessthan30_90 = (lessthan30_90/totalworkerpop_90),
         per304_90 = (thirty_fourtyfour_90/totalworkerpop_90),
         per45_90 = (fourtyfiveplue_90/totalworkerpop_90)) %>% 
  mutate(perlessthan30_00 = (lessthan30_00/totalworkerpop_00),
         per304_00 = (thirty_fourtyfour_00/totalworkerpop_00),
         per45_00 = (fourtyfiveplue_00/totalworkerpop_00)) 

traveltime809000 <- perctraveltime809000 %>% 
  select(NAME2012, perlessthan30_80, per304_80, per45_80, perlessthan30_90, per304_90, per45_90, perlessthan30_00, per304_00, per45_00) %>% 
  drop_na() %>% 
  rename(NAME = NAME2012)

traveltimeyears <- traveltime %>% 
  select(NAME, perlessthan30_10, per304_10, per45_10) %>% 
  left_join(traveltime809000, by = "NAME") %>% 
  drop_na() 

traveltimeshp <- get_acs(geography = "tract",
                         variables = "B08012_001",
                         year = 2010, 
                         state = "NY", 
                         county = "Kings", 
                         geometry = TRUE)

traveltimeyearsfinal <- traveltimeshp %>% 
  select(NAME, geometry) %>% 
  right_join(traveltimeyears, by = "NAME") %>% 
  st_transform(2263) %>% 
  st_join(nabes, join = st_intersects) %>% 
  drop_na()

traveltimeyearsfinalPLOT <- st_drop_geometry(traveltimeyearsfinal) %>% 
  select(NAME, NTAName, perlessthan30_80, per304_80, per45_80, perlessthan30_90, per304_90, per45_90, perlessthan30_00, per304_00, per45_00, perlessthan30_10, per304_10, per45_10) %>% 
  rename('029.2010' = perlessthan30_10,
         '029.2000' = perlessthan30_00,
         '029.1990' = perlessthan30_90,
         '029.1980' = perlessthan30_80,
         '3044.2010' = per304_10,
         '3044.2000' = per304_00,
         '3044.1990' = per304_90,
         '3044.1980' = per304_80,
         '4500.2010' = per45_10,
         '4500.2000' = per45_00,
         '4500.1990' = per45_90,
         '4500.1980' = per45_80) %>% 
  pivot_longer(-c(NAME, NTAName), names_to = "year") %>% 
  separate(year, c('Time', 'Year')) %>% 
  arrange(Time)

commuteline <- ggplot(aes(x = Year, y = value), data = traveltimeyearsfinalPLOT) + 
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_color_discrete(name = "Commute Time", labels = c("< 30 Minutes", "30-44 minutes", "45+ minutes")) +
  labs(x = "Year", y = "Percentage of Working Population (%)", title = "Commuting Times in West Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey") +
  geom_smooth(mapping = aes(group = Time, color = Time), method = 'lm', alpha = .2, formula =  y ~x) + 
  facet_wrap(~ NTAName, nrow= 2)

commutebar <- ggplot(data = subset(traveltimeyearsfinalPLOT), mapping = aes(x= Year, y = value, fill = Time)) +
  labs(x = "Year", y = "Percentage of Working Population (%)", title = "Commuting Times in West Brooklyn (1980-2010)", caption = "Sources: US Census and American Community Survey") +
  scale_fill_discrete(name = "Commute Time", labels = c("< 30 Minutes", "30-44 minutes", "45+ minutes")) +
  geom_col(position = "dodge")  

commutebar
ggplotly(commuteline)
```

## Observations: 
Looking at the change in these four areas, the most dramatic shift is in the population share with a less than 30 minute commute in the Red Hook area (from over 40% in 1980 to ~32% by 2010) This could be directly attributed to the loss of industrial employment as factories left the area in the latter half of the 20th century. Longer commutes rose in conjunction with this fall. The Gowanus area shows a similar trend, but it is less pronounced. Sunset Park seems to have always had a higher share of longer than 45 minute commutes, this may be due to its industrial nature devolving more quickly than the other neighborhoods, as the Brooklyn Army Terminal, its main industrial hub, largely went into disuse after World War II. It is interesting that the population share with shorter commutes has remained somewhat stagnant. I speculate that may be in part due to the large immigrant population who own and work at local businesses.


# Conclusion and Next Steps:

This research project demonstrated the dramatic destruction of New York City's industrial nature over the second half of the 20th century. It marks how the industrial nature of specific neighborhoods, who are currently undergoing dramatic transformation, can still persist in demographic data. 

For next steps it would be interesting to dig even deeper into these three neighborhoods, further looking at economic and demographic data, to dig out the complex character of their change over time. Zoning maps would serve as a useful resource to mark exactly when factories and warehouses moved out of these areas. 

Another project idea would be to look at the actual physical industrial structures that have lingered in the neighborhoods. Projects like the Highline in Manhattan show how adaptive reuse can anchor a neighborhood's industrial past in the present, while exponentially boosting nearby real estate values. The redevelopment of post-industrial neighborhoods has been a prevalent theme in recent urbanism. 

Another remaining question is, where did industrial workers go? There are some lingering industrial remnants in Queens it seems, but it would be interesting to move this analysis to New Jersey, for instance, where many factories eventually moved to.
