Overview

I want to look at trends in median income by borough in NYC over 10+ years. Since the American Community Survey (ACS) conducts their surveys in 1-year, 3-year, and 5-year increments, I will investigate two 5-year measures: (1) data from the 2013-2017 survey and (2) data from the 2005-2009 survey.

I am particularly interested in looking at which boroughs have a significantly higher median income over time. I am also interested in exploring what this might suggest about that borough, the people moving in or out of it, etc.

Import libraries

Note: tidycensus is needed to extract data from the census

library(tidycensus)
library(tidyverse)
library(viridis)

Input API code

In order to access census data, you must retrieve and use an API key. I retrieved a key by going to the following website: http://api.census.gov/data/key_signup.html.

Citation: Adapted code from the github page on tidycensus

census_api_key("4ce84da33024270337d2c7a57c4b2cbf6bc78f6c")
install = TRUE

Data from ACS 2017

To access any census data, you need to choose a variable ID which requires first searching through the data to determine the variable you want to focus on.

Use View(v17) at the end of the code to view the dataset

Citation: Adapted code from github

v17 <- load_variables(2017, "acs5", cache = TRUE)

The two main types of types of information I can access from the census are:

The U.S. Decennial Census

The American Community Survey (ACS) information

“The primary purpose of the American Community Survey (ACS) is to measure the changing social and economic characteristics of the U.S. population — our education, housing, jobs, and more.” - U.S. Census Bureau. We are looking for the median income, so I will use ACS data.

Now, I need to find the the ACS code that represents an estimation of median incomes by borough; this code is: B06011_001. Please Note: for the code below, the year defaults to 2017 so there is no need to specify.

Citation: Used code adapted from github

ny17 <- get_acs(geography = "county", 
              variables = c(medincome = "B06011_001"), 
              state = "NY")
ny17

## # A tibble: 62 x 5
##    GEOID NAME                         variable  estimate   moe
##    <chr> <chr>                        <chr>        <dbl> <dbl>
##  1 36001 Albany County, New York      medincome    32452   615
##  2 36003 Allegany County, New York    medincome    21607   527
##  3 36005 Bronx County, New York       medincome    20576   204
##  4 36007 Broome County, New York      medincome    25008   530
##  5 36009 Cattaraugus County, New York medincome    24740   459
##  6 36011 Cayuga County, New York      medincome    28586  1103
##  7 36013 Chautauqua County, New York  medincome    23202   423
##  8 36015 Chemung County, New York     medincome    27930   899
##  9 36017 Chenango County, New York    medincome    25398   815
## 10 36019 Clinton County, New York     medincome    26915   781
## # … with 52 more rows

GEOIDs for NYC Boroughs

Based on the ny17 dataset, we know the following information which will help us filter this dataset down to only the NYC boroughs. GEOIDs for NYC boroughs are as follows:

Bronx County - 36005

Kings County (Brooklyn) - 36047

New York County (Manhattan) - 36061

Richmond County (Staten Island) - 36085

Queens County - 36081

Filter

The following code will work to filter out everything but the NYC boroughs.

Citation: Adapted code from online resources and Maxwell Austensen

nyc17_boroughs <- filter(ny17, GEOID %in% c(36005, 36047, 36081, 36085, 36061))
nyc17_boroughs

## # A tibble: 5 x 5
##   GEOID NAME                      variable  estimate   moe
##   <chr> <chr>                     <chr>        <dbl> <dbl>
## 1 36005 Bronx County, New York    medincome    20576   204
## 2 36047 Kings County, New York    medincome    28238   281
## 3 36061 New York County, New York medincome    46152   605
## 4 36081 Queens County, New York   medincome    29706   306
## 5 36085 Richmond County, New York medincome    36459   488

Order

Next, I need to order the dataset by greatest median income to least median income by borough. This value is represented by the “estimate” column. With this in mind, we can use arrange(des(column)) to single out the factor we want to order by. Then, we use %>% with select(c(desired_column1, desired_column2, desired_column3)) to list all the columns you want to view within this new order.

Citation: Adapted code from stackoverflow - link:https://stackoverflow.com/questions/51255688/sort-a-data-frame-in-r-largest-to-smallest-from-value-in-a-column

boroughs17_ordered <- nyc17_boroughs %>% 
 # desc orders from greatest to least
 arrange(desc(estimate)) %>%
 # select the columns you want
 select(c("NAME", "estimate", "moe"))
boroughs17_ordered

## # A tibble: 5 x 3
##   NAME                      estimate   moe
##   <chr>                        <dbl> <dbl>
## 1 New York County, New York    46152   605
## 2 Richmond County, New York    36459   488
## 3 Queens County, New York      29706   306
## 4 Kings County, New York       28238   281
## 5 Bronx County, New York       20576   204

Then, I need to shorten borough names. For example, I want to shorten “Queens County, New York” to simply “Queens”. I do this by using mutate() and factor() to create a new column titled “labels” in the dataset.

Citation: Adapted a mixture of code from stackoverflow and Maxwell Austensen via Slack

boroughs17_names <- boroughs17_ordered %>%
  mutate(borough_names = factor(NAME, levels = c('NYCNY', 'RCNY', 'QCNY', 'KCNY', 'BCNY')),
  labels = c('Manhattan', 'Staten Island', 'Queens', 'Brooklyn', 'Bronx'))
boroughs17_names

## # A tibble: 5 x 5
##   NAME                      estimate   moe borough_names labels       
##   <chr>                        <dbl> <dbl> <fct>         <chr>        
## 1 New York County, New York    46152   605 <NA>          Manhattan    
## 2 Richmond County, New York    36459   488 <NA>          Staten Island
## 3 Queens County, New York      29706   306 <NA>          Queens       
## 4 Kings County, New York       28238   281 <NA>          Brooklyn     
## 5 Bronx County, New York       20576   204 <NA>          Bronx

Bar Graph for 2017 Data

Below is the bar graph for Median Incomes by Borough in NYC, 2017.

Citations: For color - used UCSB’s R color cheat sheet For ggplot - from R for Data Science Manual and adapted from Maxwell Austensen’s class demo

bgraph17_NYC <- ggplot(boroughs17_names) +
  aes(x = reorder(labels, -estimate), y = estimate) +
  geom_bar(stat = "identity", fill = 'seagreen4') +
  theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                   vjust = 0.5)) +
  labs(
    title = "Income by Borough",
    subtitle = "NYC, 2017",
    y = "Median Income",
    x = "Borough",
    caption = "Source: American Community Survey, 2013-2017"
  )

bgraph17_NYC

Add Column Values

Download ggrepel which will help us create labels for the column values

library(ggrepel)

Citations: Adapted code from cran r project, github, and r statistics; see links below:

cran r project - https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html#align-labels-on-the-top-or-bottom-edge

github - https://github.com/slowkow/ggrepel/issues/82

r statistics - http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html#1.%20Adding%20Plot%20and%20Axis%20Titles

bargraph17_labels <- bgraph17_NYC + geom_label_repel(aes(label=estimate), 
                                   size=4, data=boroughs17_names,
                                   colour = 'darkgreen',
                                   box.padding = unit(0.35, "lines"),
                                   segment.color = 'grey50',
                                   nudge_y = 36)
bargraph17_labels

Refine Bar Graph

I touched up the bar graph w/ 2017 data to add clarity.

Citation: Used code from r-statistics.co

bgraph17_final <-  bargraph17_labels +
  theme(plot.title=element_text(size=18, 
                                    face="bold", 
                                    family="American Typewriter",
                                    color="black",
                                    hjust=0.5,
                                    lineheight=1.2),  # title
            plot.subtitle=element_text(size=15, 
                                       family="American Typewriter",
                                       hjust=0.5),
            plot.caption=element_text(size=10, 
                                    family="American Typewriter",
                                    color="grey36"),
            axis.title.x=element_text(vjust=0.5,  
                                      size=15),  # X axis title
            axis.title.y=element_text(size=15,
                                      vjust=3),  # Y axis title)
            axis.text.x=element_text(size=12, 
                                     vjust=1),  # X axis text
            axis.text.y=element_text(size=12,
                                     angle=0))  # Y axis text
bgraph17_final

Plot Graph for 2017 Data

Below is the plot graph for Median Incomes by Borough in NYC, 2017.

For each ACS calculation, there is a margin of error (moe). The moe for ACS is set at a default of 90% accuracy with a 10% moe. I kept this default setting so this is the moe for the data represented in both 2017 graphs. Since the moe is valuable to know in order to distinguish the potential range of values for each borough, I want to create a plot graph that can show the moe for each borough calculation.

plotgraph17_NYC <- ggplot(boroughs17_names) +
  aes(x = estimate, y = reorder(labels, -estimate)) +
  geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe, 
                   y = reorder(labels, -estimate))) +
  geom_point(color = "seagreen4", size = 3) +
  theme(axis.text.y = element_text(angle = 15, hjust = 1, 
                                   vjust = 0.5)) +
  labs(
    title = "Income by Borough",
    subtitle = "NYC, 2017",
    y = "Borough",
    x = "Median Income",
    caption = "Source: American Community Survey, 2013-2017"
  )
plotgraph17_NYC

Refine 2017 Plot Graph

I touched up the plot graph w/ 2017 data to add clarity.

Citation: Used code from r-statistics.co

plotgraph17_final <- plotgraph17_NYC +
  theme(plot.title=element_text(size=18, 
                                    face="bold", 
                                    family="American Typewriter",
                                    color="black",
                                    hjust=0.5,
                                    lineheight=1.2),  # title
            plot.subtitle=element_text(size=15, 
                                       family="American Typewriter",
                                       hjust=0.5),
            plot.caption=element_text(size=10, 
                                    family="American Typewriter",
                                    color="grey36"),
            axis.title.x=element_text(size=15,
                                      vjust=1),  # Y axis title)
            axis.title.y=element_text(vjust=1,  
                                      size=15),  # X axis title
            axis.text.x=element_text(size=12, 
                                     vjust=1),  # X axis text
            axis.text.y=element_text(size=12,
                                     angle=10))  # Y axis text
plotgraph17_final

Extract 2012 ACS Data

Using the same steps as above, I will now pull the 2012 data from ACS.

Please Note: for the code below, the year defaults to 2017 so I adjusted it to 2012.

v12 <- load_variables(2012, "acs5", cache = TRUE)

Similar to above, I need to find the the ACS code that represents an estimation of median incomes by borough; this code is still: B06011_001.

Citation: Used code adapted from github

ny12 <- get_acs(geography = "county", year = 2012, 
              variables = c(medincome = "B06011_001"), 
              state = "NY")
ny12

## # A tibble: 62 x 5
##    GEOID NAME                         variable  estimate   moe
##    <chr> <chr>                        <chr>        <dbl> <dbl>
##  1 36001 Albany County, New York      medincome    31095   465
##  2 36003 Allegany County, New York    medincome    19942   701
##  3 36005 Bronx County, New York       medincome    19756   252
##  4 36007 Broome County, New York      medincome    23007   583
##  5 36009 Cattaraugus County, New York medincome    22346   484
##  6 36011 Cayuga County, New York      medincome    26004   541
##  7 36013 Chautauqua County, New York  medincome    21513   414
##  8 36015 Chemung County, New York     medincome    24876   614
##  9 36017 Chenango County, New York    medincome    23131   892
## 10 36019 Clinton County, New York     medincome    25252   923
## # … with 52 more rows

Filter

Filter 2012 info to only the NYC boroughs

Citation: Adapted code from stackoverflow and Maxwell Austensen via Slack

nyc12_boroughs <- filter(ny12, GEOID %in% c(36005, 36047, 36081, 36085, 36061))
nyc12_boroughs

## # A tibble: 5 x 5
##   GEOID NAME                      variable  estimate   moe
##   <chr> <chr>                     <chr>        <dbl> <dbl>
## 1 36005 Bronx County, New York    medincome    19756   252
## 2 36047 Kings County, New York    medincome    25670   133
## 3 36061 New York County, New York medincome    40414   470
## 4 36081 Queens County, New York   medincome    27626   272
## 5 36085 Richmond County, New York medincome    35112   569

Order

Order 2012 boroughs from greatest to least based on median income, using the same code as used above for 2017 data

Citation: Adapted code from stackoverflow and Maxwell Austensen via Slack

boroughs12_ordered <- nyc12_boroughs %>% 
 # desc orders from largest to smallest
 arrange(desc(estimate)) %>%
 # select subsets the columns you want
 select(c("NAME", "estimate", "moe"))
boroughs12_ordered

## # A tibble: 5 x 3
##   NAME                      estimate   moe
##   <chr>                        <dbl> <dbl>
## 1 New York County, New York    40414   470
## 2 Richmond County, New York    35112   569
## 3 Queens County, New York      27626   272
## 4 Kings County, New York       25670   133
## 5 Bronx County, New York       19756   252

Column Names

Change column names to shorter borough names by creating a new column titled “labels” using mutate() and factor()

Citation: Used a mixture of code from stackoverflow and Maxwell Austensen via Slack

boroughs12_names <- boroughs12_ordered %>%
  mutate(borough_names = factor(NAME, levels = c('NYCNY', 'RCNY', 'QCNY', 'KCNY', 'BCNY')),
  labels = c('Manhattan', 'Staten Island', 'Queens', 'Brooklyn', 'Bronx'))
boroughs12_names

## # A tibble: 5 x 5
##   NAME                      estimate   moe borough_names labels       
##   <chr>                        <dbl> <dbl> <fct>         <chr>        
## 1 New York County, New York    40414   470 <NA>          Manhattan    
## 2 Richmond County, New York    35112   569 <NA>          Staten Island
## 3 Queens County, New York      27626   272 <NA>          Queens       
## 4 Kings County, New York       25670   133 <NA>          Brooklyn     
## 5 Bronx County, New York       19756   252 <NA>          Bronx

Ensuring Accuracy

Inflation

The 2003-2017 data has been inflation-adjusted to 2017 dollar estimates. However, the 2008-2012 data has not. These calculations can be done by multiplying " the 2008-2012 dollar estimates by 1.06962963 (CPI-U-RS)" according to the U.S. Census Bureau.

Please see the chart below for the inflation-adjusted values for 2008-2012. The column name for this value is new_vals. On average the 2012 median incomes, after inflation-adjusted, increased by approx. 2,000-3,000 dollars.

Citation: U.S. Census Bureau - https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2017/5-year-comparison.html

boroughs12_newvals <- boroughs12_names %>%
  mutate(old_vals = factor(estimate),
         new_vals = c(estimate*1.06962963))
boroughs12_newvals

## # A tibble: 5 x 7
##   NAME             estimate   moe borough_names labels    old_vals new_vals
##   <chr>               <dbl> <dbl> <fct>         <chr>     <fct>       <dbl>
## 1 New York County…    40414   470 <NA>          Manhattan 40414      43228.
## 2 Richmond County…    35112   569 <NA>          Staten I… 35112      37557.
## 3 Queens County, …    27626   272 <NA>          Queens    27626      29550.
## 4 Kings County, N…    25670   133 <NA>          Brooklyn  25670      27457.
## 5 Bronx County, N…    19756   252 <NA>          Bronx     19756      21132.

Filter

Cleaned up dataset by filtering out everything except for the needed columns

boroughs12_columns <- boroughs12_newvals %>% 
 select(c("labels", "new_vals", "moe"))
boroughs12_columns

## # A tibble: 5 x 3
##   labels        new_vals   moe
##   <chr>            <dbl> <dbl>
## 1 Manhattan       43228.   470
## 2 Staten Island   37557.   569
## 3 Queens          29550.   272
## 4 Brooklyn        27457.   133
## 5 Bronx           21132.   252

Graph for 2012 Data

Below is the bar graph for Median Incomes by Borough in NYC, 2012.

Citations: For color - used UCSB’s R color cheat sheet

boroughs12_graph <- ggplot(boroughs12_columns) +
  aes(x = reorder(labels,-new_vals), y = new_vals) +
  geom_bar(stat = "identity", fill = 'palegreen3') +
  theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                   vjust = 0.5)) +
  labs(
    title = "Income by Borough",
    subtitle = "NYC, 2012",
    y = "Median Income",
    x = "Borough",
    caption = "Source: American Community Survey, 2008-2012"
  )
boroughs12_graph

Add Column Values

Citations: Adapted code from cran r project, github, and r statistics; see links below:

cran r project - https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html#align-labels-on-the-top-or-bottom-edge

github - https://github.com/slowkow/ggrepel/issues/82

r statistics - http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html#1.%20Adding%20Plot%20and%20Axis%20Titles

r documentation - https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/Round

bargraph12_labels <- boroughs12_graph + geom_label_repel(aes(label=round(new_vals, digits=0)
),
                                   size=4, data=boroughs12_columns,
                                   colour = 'darkgreen',
                                   box.padding = unit(0.35, "lines"),
                                   segment.color = 'grey50',
                                   nudge_y = 36)
bargraph12_labels

Touch Up

I touched up the bar graph w/ 2012 data to add clarity.

Citation: Used code from r-statistics.co

graph12_final <- bargraph12_labels + theme(plot.title=element_text(size=18, 
                                    face="bold", 
                                    family="American Typewriter",
                                    color="black",
                                    hjust=0.5,
                                    lineheight=1.2),  # title
            plot.subtitle=element_text(size=15, 
                                       family="American Typewriter",
                                       hjust=0.5),
            plot.caption=element_text(size=10, 
                                    family="American Typewriter",
                                    color="grey36"),
            axis.title.x=element_text(vjust=0.5,  
                                      size=15),  # X axis title
            axis.title.y=element_text(size=15,
                                      vjust=3),  # Y axis title)
            axis.text.x=element_text(size=12, 
                                     vjust=1),  # X axis text
            axis.text.y=element_text(size=12,
                                     angle=0))  # Y axis text
graph12_final

Plot Graph for 2012 Data

Below is the plot graph for Median Incomes by Borough in NYC, 2012.

As explained in the 2017 Plot Graph section - for each ACS calculation, there is a margin of error (moe). The moe for ACS is set at a default of 90% accuracy with a 10% moe. I kept this default setting for the 2012 data represented in both graphs.

plotgraph12_NYC <- ggplot(boroughs12_columns) +
  aes(x = new_vals, y = reorder(labels, -new_vals)) +
  geom_errorbarh(aes(xmin = new_vals - moe, xmax = new_vals + moe, 
                   y = reorder(labels, -new_vals))) +
  geom_point(color = "palegreen3", size = 3) +
  theme(axis.text.y = element_text(angle = 15, hjust = 1, 
                                   vjust = 0.5)) +
  labs(
    title = "Income by Borough",
    subtitle = "NYC, 2012",
    y = "Borough",
    x = "Median Income",
    caption = "Source: American Community Survey, 2008-2012"
  )
plotgraph12_NYC

Touch Up

I touched up the plot graph w/ 2012 data to add clarity.

Citation: Used code from r-statistics.co

plotgraph12_final <- plotgraph12_NYC +
  theme(plot.title=element_text(size=18, 
                                    face="bold", 
                                    family="American Typewriter",
                                    color="black",
                                    hjust=0.5,
                                    lineheight=1.2),  # title
            plot.subtitle=element_text(size=15, 
                                       family="American Typewriter",
                                       hjust=0.5),
            plot.caption=element_text(size=10, 
                                    family="American Typewriter",
                                    color="grey36"),
            axis.title.x=element_text(size=15,
                                      vjust=1),  # Y axis title)
            axis.title.y=element_text(vjust=1,  
                                      size=15),  # X axis title
            axis.text.x=element_text(size=12, 
                                     vjust=1),  # X axis text
            axis.text.y=element_text(size=12,
                                     angle=10))  # Y axis text
plotgraph12_final

Compare 2012 & 2017 Bar Graphs

First, you need to download the necessary packages.

library(grid)
library(gridExtra)
library(downloader)
library(grDevices)

Citation: To place graphs side by side, I adapted the code below from github - https://sebastiansauer.github.io/two-plots-rmd/

grid.arrange(graph12_final, bgraph17_final, ncol = 2)

Prep Dataset to Combine 2012 & 2017 Bar Graphs

Join the two datasets using left_join()

acs_12_17 <- left_join(boroughs12_columns, boroughs17_names, by = "labels")
acs_12_17

## # A tibble: 5 x 7
##   labels      new_vals moe.x NAME              estimate moe.y borough_names
##   <chr>          <dbl> <dbl> <chr>                <dbl> <dbl> <fct>        
## 1 Manhattan     43228.   470 New York County,…    46152   605 <NA>         
## 2 Staten Isl…   37557.   569 Richmond County,…    36459   488 <NA>         
## 3 Queens        29550.   272 Queens County, N…    29706   306 <NA>         
## 4 Brooklyn      27457.   133 Kings County, Ne…    28238   281 <NA>         
## 5 Bronx         21132.   252 Bronx County, Ne…    20576   204 <NA>

Filter Columns

Clean dataset by filtering down to only the needed columns for graphing.

acsCLEAN_12_17 <- acs_12_17  %>% 
 select(c("labels", "new_vals", "estimate"))
acsCLEAN_12_17

## # A tibble: 5 x 3
##   labels        new_vals estimate
##   <chr>            <dbl>    <dbl>
## 1 Manhattan       43228.    46152
## 2 Staten Island   37557.    36459
## 3 Queens          29550.    29706
## 4 Brooklyn        27457.    28238
## 5 Bronx           21132.    20576

Rename Columns

Rename columns before pivoting in order to keep data clean, organized, and easy to read.

Citations: Code adapted from Sam Raby and from R Documentation: https://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/rename

proper_names <- plyr::rename(acsCLEAN_12_17, c("new_vals" = "income2012", 
                      "estimate" = "income2017"))
names(proper_names)

## [1] "labels"     "income2012" "income2017"

proper_names

## # A tibble: 5 x 3
##   labels        income2012 income2017
##   <chr>              <dbl>      <dbl>
## 1 Manhattan         43228.      46152
## 2 Staten Island     37557.      36459
## 3 Queens            29550.      29706
## 4 Brooklyn          27457.      28238
## 5 Bronx             21132.      20576

Pivot Data

Use pivot_longer() to add columns - need to repeat Boroughs and differentiate which are in 2012 and which are in 2017 - repeat labels and add medincome columns and then distinguish values by year - woohoo

Use the following to pivot the two new coloumns

acs_long <- pivot_longer(proper_names, c("income2012","income2017"))
acs_long

## # A tibble: 10 x 3
##    labels        name        value
##    <chr>         <chr>       <dbl>
##  1 Manhattan     income2012 43228.
##  2 Manhattan     income2017 46152 
##  3 Staten Island income2012 37557.
##  4 Staten Island income2017 36459 
##  5 Queens        income2012 29550.
##  6 Queens        income2017 29706 
##  7 Brooklyn      income2012 27457.
##  8 Brooklyn      income2017 28238 
##  9 Bronx         income2012 21132.
## 10 Bronx         income2017 20576

Clean Names Again

proper_namesAgain <- plyr::rename(acs_long, c("labels"="Boroughs", "name" = "Year", 
                      "value" = "Median_Income"))
names(proper_names)

## [1] "labels"     "income2012" "income2017"

proper_namesAgain

## # A tibble: 10 x 3
##    Boroughs      Year       Median_Income
##    <chr>         <chr>              <dbl>
##  1 Manhattan     income2012        43228.
##  2 Manhattan     income2017        46152 
##  3 Staten Island income2012        37557.
##  4 Staten Island income2017        36459 
##  5 Queens        income2012        29550.
##  6 Queens        income2017        29706 
##  7 Brooklyn      income2012        27457.
##  8 Brooklyn      income2017        28238 
##  9 Bronx         income2012        21132.
## 10 Bronx         income2017        20576

Combine 2012 & 2017 Bar Graphs

Combine and order x-axis (Boroughs) based on the value of y (Median Incomes)

Citation: Github - https://sebastiansauer.github.io/ordering-bars/

combinedGraph <- ggplot(proper_namesAgain, aes(x = reorder(Boroughs, -Median_Income),
                                               y = Median_Income, fill=Year)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                   vjust = 0.5)) +
  labs(
    title = "Income by Borough",
    subtitle = "NYC, 2012 & 2017",
    y = "Median Income",
    x = "Borough",
    caption = "Source: American Community Survey, 2008-2012 & 2013-2017"
  )
combinedGraph

combinedGraph_labels <- combinedGraph + geom_label_repel(aes(label=round(Median_Income,
                                                                         digits=0)),
                                   size=3, data=proper_namesAgain,
                                   colour = 'black',
                                   box.padding = unit(0.35, "lines"),
                                   segment.color = 'grey50',
                                   nudge_x = 0,
                                   nudge_y = 0)
combinedGraph_labels

Color by Year

Citation: http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually

combinedGraph_colors <- combinedGraph_labels + scale_fill_manual(values=c('palegreen3', 'seagreen4'))
combinedGraph_colors

Touch Up Combined Graph

combinedGraph_final <- combinedGraph_colors +
  theme(plot.title=element_text(size=18, 
                                    face="bold", 
                                    family="American Typewriter",
                                    color="black",
                                    hjust=0.5,
                                    lineheight=1.2),  # title
            plot.subtitle=element_text(size=15, 
                                       family="American Typewriter",
                                       hjust=0.5),
            plot.caption=element_text(size=10, 
                                    family="American Typewriter",
                                    color="grey36"),
            axis.title.x=element_text(vjust=0.5,  
                                      size=15),  # X axis title
            axis.title.y=element_text(size=15,
                                      vjust=3),  # Y axis title)
            axis.text.x=element_text(size=12, 
                                     vjust=1),  # X axis text
            axis.text.y=element_text(size=12,
                                     angle=0))  # Y axis text
combinedGraph_final

Inferences

Based on the graphs above, we see that:

The order of highest median income to lowest across Boroughs remains the same

Manhattan experienced the greatest increase in median income from 2012 to 2017

Given the rate of inflation, Bronx actually experienced a slight decrease in the median income from 2012 to 2017

Given the above observations, to some degree, we can infer that boroughs with higher median incomes are increasing and boroughs with lower median incomes are either remaining the same or decreasing over time. Understanding why will take further research. With more time, I would be interested in exploring changes in population within each borough from 2012 to 2017. I would also be interested in exploring changes in the cost of rent in each borough. I believe these pieces of information would aid in figuring out what may be causing the overarching trends mentioned above.

Citation: U.S. Census Bureau - https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2017/5-year-comparison.html

Compare 2012 & 2017 Plot Graphs

grid.arrange(plotgraph12_final, plotgraph17_final, ncol = 2)

Inferences

Since these plot graphs show the margin of error (moe) for each value represented, we can see that Manhattan and Staten Island had the greatest moe or standard deviation from the median. In other words, this means that these boroughs contain the largest variety in income relative to the median. In that Manhattan and Staten Island contain the highest median incomes, we might infer the following:

Boroughs with a larger range of income levels typically have overall higher incomes

The above inference may be a result of greater job diversity and potentially even greater opportunity for employment or growth within pre-existing employment. One would need to pursue further research to be sure.

NYC Borough Incomes

Annmarie Rodriguez

10/7/2019

Overview

Import libraries

Input API code

Data from ACS 2017

GEOIDs for NYC Boroughs

Filter

Order

Bar Graph for 2017 Data

Add Column Values

Refine Bar Graph

Plot Graph for 2017 Data

Refine 2017 Plot Graph

Extract 2012 ACS Data

Filter

Order

Column Names

Ensuring Accuracy

Filter

Graph for 2012 Data

Add Column Values

Touch Up

Plot Graph for 2012 Data

Touch Up

Compare 2012 & 2017 Bar Graphs

Prep Dataset to Combine 2012 & 2017 Bar Graphs

Filter Columns

Rename Columns

Pivot Data

Clean Names Again

Combine 2012 & 2017 Bar Graphs

Color by Year

Touch Up Combined Graph

Inferences

Compare 2012 & 2017 Plot Graphs

Inferences