Overview

I want to look at trends in median income by borough in NYC over 10+ years. Since the American Community Survey (ACS) conducts their surveys in 1-year, 3-year, and 5-year increments, I will investigate two 5-year measures: (1) data from the 2013-2017 survey and (2) data from the 2005-2009 survey.

I am particularly interested in looking at which boroughs have a significantly higher median income over time. I am also interested in exploring what this might suggest about that borough, the people moving in or out of it, etc.

Import libraries

Note: tidycensus is needed to extract data from the census

library(tidycensus)
library(tidyverse)
library(viridis)

Input API code

In order to access census data, you must retrieve and use an API key. I retrieved a key by going to the following website: http://api.census.gov/data/key_signup.html.

Citation: Adapted code from the github page on tidycensus

census_api_key("4ce84da33024270337d2c7a57c4b2cbf6bc78f6c")
install = TRUE

Data from ACS 2017

To access any census data, you need to choose a variable ID which requires first searching through the data to determine the variable you want to focus on.

Use View(v17) at the end of the code to view the dataset

Citation: Adapted code from github

v17 <- load_variables(2017, "acs5", cache = TRUE)
The two main types of types of information I can access from the census are:
  • The U.S. Decennial Census
  • The American Community Survey (ACS) information
  • “The primary purpose of the American Community Survey (ACS) is to measure the changing social and economic characteristics of the U.S. population — our education, housing, jobs, and more.” - U.S. Census Bureau. We are looking for the median income, so I will use ACS data.

    Now, I need to find the the ACS code that represents an estimation of median incomes by borough; this code is: B06011_001. Please Note: for the code below, the year defaults to 2017 so there is no need to specify.

    Citation: Used code adapted from github

    ny17 <- get_acs(geography = "county", 
                  variables = c(medincome = "B06011_001"), 
                  state = "NY")
    ny17
    ## # A tibble: 62 x 5
    ##    GEOID NAME                         variable  estimate   moe
    ##    <chr> <chr>                        <chr>        <dbl> <dbl>
    ##  1 36001 Albany County, New York      medincome    32452   615
    ##  2 36003 Allegany County, New York    medincome    21607   527
    ##  3 36005 Bronx County, New York       medincome    20576   204
    ##  4 36007 Broome County, New York      medincome    25008   530
    ##  5 36009 Cattaraugus County, New York medincome    24740   459
    ##  6 36011 Cayuga County, New York      medincome    28586  1103
    ##  7 36013 Chautauqua County, New York  medincome    23202   423
    ##  8 36015 Chemung County, New York     medincome    27930   899
    ##  9 36017 Chenango County, New York    medincome    25398   815
    ## 10 36019 Clinton County, New York     medincome    26915   781
    ## # … with 52 more rows

    GEOIDs for NYC Boroughs

    Based on the ny17 dataset, we know the following information which will help us filter this dataset down to only the NYC boroughs. GEOIDs for NYC boroughs are as follows:
  • Bronx County - 36005
  • Kings County (Brooklyn) - 36047
  • New York County (Manhattan) - 36061
  • Richmond County (Staten Island) - 36085
  • Queens County - 36081
  • Filter

    The following code will work to filter out everything but the NYC boroughs.

    Citation: Adapted code from online resources and Maxwell Austensen

    nyc17_boroughs <- filter(ny17, GEOID %in% c(36005, 36047, 36081, 36085, 36061))
    nyc17_boroughs
    ## # A tibble: 5 x 5
    ##   GEOID NAME                      variable  estimate   moe
    ##   <chr> <chr>                     <chr>        <dbl> <dbl>
    ## 1 36005 Bronx County, New York    medincome    20576   204
    ## 2 36047 Kings County, New York    medincome    28238   281
    ## 3 36061 New York County, New York medincome    46152   605
    ## 4 36081 Queens County, New York   medincome    29706   306
    ## 5 36085 Richmond County, New York medincome    36459   488

    Order

    Next, I need to order the dataset by greatest median income to least median income by borough. This value is represented by the “estimate” column. With this in mind, we can use arrange(des(column)) to single out the factor we want to order by. Then, we use %>% with select(c(desired_column1, desired_column2, desired_column3)) to list all the columns you want to view within this new order.

    Citation: Adapted code from stackoverflow - link:https://stackoverflow.com/questions/51255688/sort-a-data-frame-in-r-largest-to-smallest-from-value-in-a-column

    boroughs17_ordered <- nyc17_boroughs %>% 
     # desc orders from greatest to least
     arrange(desc(estimate)) %>%
     # select the columns you want
     select(c("NAME", "estimate", "moe"))
    boroughs17_ordered
    ## # A tibble: 5 x 3
    ##   NAME                      estimate   moe
    ##   <chr>                        <dbl> <dbl>
    ## 1 New York County, New York    46152   605
    ## 2 Richmond County, New York    36459   488
    ## 3 Queens County, New York      29706   306
    ## 4 Kings County, New York       28238   281
    ## 5 Bronx County, New York       20576   204

    Then, I need to shorten borough names. For example, I want to shorten “Queens County, New York” to simply “Queens”. I do this by using mutate() and factor() to create a new column titled “labels” in the dataset.

    Citation: Adapted a mixture of code from stackoverflow and Maxwell Austensen via Slack

    boroughs17_names <- boroughs17_ordered %>%
      mutate(borough_names = factor(NAME, levels = c('NYCNY', 'RCNY', 'QCNY', 'KCNY', 'BCNY')),
      labels = c('Manhattan', 'Staten Island', 'Queens', 'Brooklyn', 'Bronx'))
    boroughs17_names
    ## # A tibble: 5 x 5
    ##   NAME                      estimate   moe borough_names labels       
    ##   <chr>                        <dbl> <dbl> <fct>         <chr>        
    ## 1 New York County, New York    46152   605 <NA>          Manhattan    
    ## 2 Richmond County, New York    36459   488 <NA>          Staten Island
    ## 3 Queens County, New York      29706   306 <NA>          Queens       
    ## 4 Kings County, New York       28238   281 <NA>          Brooklyn     
    ## 5 Bronx County, New York       20576   204 <NA>          Bronx

    Bar Graph for 2017 Data

    Below is the bar graph for Median Incomes by Borough in NYC, 2017.

    Citations: For color - used UCSB’s R color cheat sheet For ggplot - from R for Data Science Manual and adapted from Maxwell Austensen’s class demo

    bgraph17_NYC <- ggplot(boroughs17_names) +
      aes(x = reorder(labels, -estimate), y = estimate) +
      geom_bar(stat = "identity", fill = 'seagreen4') +
      theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                       vjust = 0.5)) +
      labs(
        title = "Income by Borough",
        subtitle = "NYC, 2017",
        y = "Median Income",
        x = "Borough",
        caption = "Source: American Community Survey, 2013-2017"
      )
    
    bgraph17_NYC

    Add Column Values

    Download ggrepel which will help us create labels for the column values

    library(ggrepel)
    Citations: Adapted code from cran r project, github, and r statistics; see links below:
  • cran r project - https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html#align-labels-on-the-top-or-bottom-edge
  • github - https://github.com/slowkow/ggrepel/issues/82
  • r statistics - http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html#1.%20Adding%20Plot%20and%20Axis%20Titles
  • bargraph17_labels <- bgraph17_NYC + geom_label_repel(aes(label=estimate), 
                                       size=4, data=boroughs17_names,
                                       colour = 'darkgreen',
                                       box.padding = unit(0.35, "lines"),
                                       segment.color = 'grey50',
                                       nudge_y = 36)
    bargraph17_labels

    Refine Bar Graph

    I touched up the bar graph w/ 2017 data to add clarity.

    Citation: Used code from r-statistics.co

    bgraph17_final <-  bargraph17_labels +
      theme(plot.title=element_text(size=18, 
                                        face="bold", 
                                        family="American Typewriter",
                                        color="black",
                                        hjust=0.5,
                                        lineheight=1.2),  # title
                plot.subtitle=element_text(size=15, 
                                           family="American Typewriter",
                                           hjust=0.5),
                plot.caption=element_text(size=10, 
                                        family="American Typewriter",
                                        color="grey36"),
                axis.title.x=element_text(vjust=0.5,  
                                          size=15),  # X axis title
                axis.title.y=element_text(size=15,
                                          vjust=3),  # Y axis title)
                axis.text.x=element_text(size=12, 
                                         vjust=1),  # X axis text
                axis.text.y=element_text(size=12,
                                         angle=0))  # Y axis text
    bgraph17_final 

    Plot Graph for 2017 Data

    Below is the plot graph for Median Incomes by Borough in NYC, 2017.

    For each ACS calculation, there is a margin of error (moe). The moe for ACS is set at a default of 90% accuracy with a 10% moe. I kept this default setting so this is the moe for the data represented in both 2017 graphs. Since the moe is valuable to know in order to distinguish the potential range of values for each borough, I want to create a plot graph that can show the moe for each borough calculation.

    plotgraph17_NYC <- ggplot(boroughs17_names) +
      aes(x = estimate, y = reorder(labels, -estimate)) +
      geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe, 
                       y = reorder(labels, -estimate))) +
      geom_point(color = "seagreen4", size = 3) +
      theme(axis.text.y = element_text(angle = 15, hjust = 1, 
                                       vjust = 0.5)) +
      labs(
        title = "Income by Borough",
        subtitle = "NYC, 2017",
        y = "Borough",
        x = "Median Income",
        caption = "Source: American Community Survey, 2013-2017"
      )
    plotgraph17_NYC

    Refine 2017 Plot Graph

    I touched up the plot graph w/ 2017 data to add clarity.

    Citation: Used code from r-statistics.co

    plotgraph17_final <- plotgraph17_NYC +
      theme(plot.title=element_text(size=18, 
                                        face="bold", 
                                        family="American Typewriter",
                                        color="black",
                                        hjust=0.5,
                                        lineheight=1.2),  # title
                plot.subtitle=element_text(size=15, 
                                           family="American Typewriter",
                                           hjust=0.5),
                plot.caption=element_text(size=10, 
                                        family="American Typewriter",
                                        color="grey36"),
                axis.title.x=element_text(size=15,
                                          vjust=1),  # Y axis title)
                axis.title.y=element_text(vjust=1,  
                                          size=15),  # X axis title
                axis.text.x=element_text(size=12, 
                                         vjust=1),  # X axis text
                axis.text.y=element_text(size=12,
                                         angle=10))  # Y axis text
    plotgraph17_final 

    Extract 2012 ACS Data

    Using the same steps as above, I will now pull the 2012 data from ACS.

    Please Note: for the code below, the year defaults to 2017 so I adjusted it to 2012.

    v12 <- load_variables(2012, "acs5", cache = TRUE)

    Similar to above, I need to find the the ACS code that represents an estimation of median incomes by borough; this code is still: B06011_001.

    Citation: Used code adapted from github

    ny12 <- get_acs(geography = "county", year = 2012, 
                  variables = c(medincome = "B06011_001"), 
                  state = "NY")
    ny12
    ## # A tibble: 62 x 5
    ##    GEOID NAME                         variable  estimate   moe
    ##    <chr> <chr>                        <chr>        <dbl> <dbl>
    ##  1 36001 Albany County, New York      medincome    31095   465
    ##  2 36003 Allegany County, New York    medincome    19942   701
    ##  3 36005 Bronx County, New York       medincome    19756   252
    ##  4 36007 Broome County, New York      medincome    23007   583
    ##  5 36009 Cattaraugus County, New York medincome    22346   484
    ##  6 36011 Cayuga County, New York      medincome    26004   541
    ##  7 36013 Chautauqua County, New York  medincome    21513   414
    ##  8 36015 Chemung County, New York     medincome    24876   614
    ##  9 36017 Chenango County, New York    medincome    23131   892
    ## 10 36019 Clinton County, New York     medincome    25252   923
    ## # … with 52 more rows

    Filter

    Filter 2012 info to only the NYC boroughs

    Citation: Adapted code from stackoverflow and Maxwell Austensen via Slack

    nyc12_boroughs <- filter(ny12, GEOID %in% c(36005, 36047, 36081, 36085, 36061))
    nyc12_boroughs
    ## # A tibble: 5 x 5
    ##   GEOID NAME                      variable  estimate   moe
    ##   <chr> <chr>                     <chr>        <dbl> <dbl>
    ## 1 36005 Bronx County, New York    medincome    19756   252
    ## 2 36047 Kings County, New York    medincome    25670   133
    ## 3 36061 New York County, New York medincome    40414   470
    ## 4 36081 Queens County, New York   medincome    27626   272
    ## 5 36085 Richmond County, New York medincome    35112   569

    Order

    Order 2012 boroughs from greatest to least based on median income, using the same code as used above for 2017 data

    Citation: Adapted code from stackoverflow and Maxwell Austensen via Slack

    boroughs12_ordered <- nyc12_boroughs %>% 
     # desc orders from largest to smallest
     arrange(desc(estimate)) %>%
     # select subsets the columns you want
     select(c("NAME", "estimate", "moe"))
    boroughs12_ordered
    ## # A tibble: 5 x 3
    ##   NAME                      estimate   moe
    ##   <chr>                        <dbl> <dbl>
    ## 1 New York County, New York    40414   470
    ## 2 Richmond County, New York    35112   569
    ## 3 Queens County, New York      27626   272
    ## 4 Kings County, New York       25670   133
    ## 5 Bronx County, New York       19756   252

    Column Names

    Change column names to shorter borough names by creating a new column titled “labels” using mutate() and factor()

    Citation: Used a mixture of code from stackoverflow and Maxwell Austensen via Slack

    boroughs12_names <- boroughs12_ordered %>%
      mutate(borough_names = factor(NAME, levels = c('NYCNY', 'RCNY', 'QCNY', 'KCNY', 'BCNY')),
      labels = c('Manhattan', 'Staten Island', 'Queens', 'Brooklyn', 'Bronx'))
    boroughs12_names
    ## # A tibble: 5 x 5
    ##   NAME                      estimate   moe borough_names labels       
    ##   <chr>                        <dbl> <dbl> <fct>         <chr>        
    ## 1 New York County, New York    40414   470 <NA>          Manhattan    
    ## 2 Richmond County, New York    35112   569 <NA>          Staten Island
    ## 3 Queens County, New York      27626   272 <NA>          Queens       
    ## 4 Kings County, New York       25670   133 <NA>          Brooklyn     
    ## 5 Bronx County, New York       19756   252 <NA>          Bronx

    Ensuring Accuracy

    Inflation

    The 2003-2017 data has been inflation-adjusted to 2017 dollar estimates. However, the 2008-2012 data has not. These calculations can be done by multiplying " the 2008-2012 dollar estimates by 1.06962963 (CPI-U-RS)" according to the U.S. Census Bureau.

    Please see the chart below for the inflation-adjusted values for 2008-2012. The column name for this value is new_vals. On average the 2012 median incomes, after inflation-adjusted, increased by approx. 2,000-3,000 dollars.

    Citation: U.S. Census Bureau - https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2017/5-year-comparison.html

    boroughs12_newvals <- boroughs12_names %>%
      mutate(old_vals = factor(estimate),
             new_vals = c(estimate*1.06962963))
    boroughs12_newvals 
    ## # A tibble: 5 x 7
    ##   NAME             estimate   moe borough_names labels    old_vals new_vals
    ##   <chr>               <dbl> <dbl> <fct>         <chr>     <fct>       <dbl>
    ## 1 New York County…    40414   470 <NA>          Manhattan 40414      43228.
    ## 2 Richmond County…    35112   569 <NA>          Staten I… 35112      37557.
    ## 3 Queens County, …    27626   272 <NA>          Queens    27626      29550.
    ## 4 Kings County, N…    25670   133 <NA>          Brooklyn  25670      27457.
    ## 5 Bronx County, N…    19756   252 <NA>          Bronx     19756      21132.

    Filter

    Cleaned up dataset by filtering out everything except for the needed columns

    boroughs12_columns <- boroughs12_newvals %>% 
     select(c("labels", "new_vals", "moe"))
    boroughs12_columns
    ## # A tibble: 5 x 3
    ##   labels        new_vals   moe
    ##   <chr>            <dbl> <dbl>
    ## 1 Manhattan       43228.   470
    ## 2 Staten Island   37557.   569
    ## 3 Queens          29550.   272
    ## 4 Brooklyn        27457.   133
    ## 5 Bronx           21132.   252

    Graph for 2012 Data

    Below is the bar graph for Median Incomes by Borough in NYC, 2012.

    Citations: For color - used UCSB’s R color cheat sheet

    boroughs12_graph <- ggplot(boroughs12_columns) +
      aes(x = reorder(labels,-new_vals), y = new_vals) +
      geom_bar(stat = "identity", fill = 'palegreen3') +
      theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                       vjust = 0.5)) +
      labs(
        title = "Income by Borough",
        subtitle = "NYC, 2012",
        y = "Median Income",
        x = "Borough",
        caption = "Source: American Community Survey, 2008-2012"
      )
    boroughs12_graph

    Add Column Values

    Citations: Adapted code from cran r project, github, and r statistics; see links below:
  • cran r project - https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html#align-labels-on-the-top-or-bottom-edge
  • github - https://github.com/slowkow/ggrepel/issues/82
  • r statistics - http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html#1.%20Adding%20Plot%20and%20Axis%20Titles
  • r documentation - https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/Round
  • bargraph12_labels <- boroughs12_graph + geom_label_repel(aes(label=round(new_vals, digits=0)
    ),
                                       size=4, data=boroughs12_columns,
                                       colour = 'darkgreen',
                                       box.padding = unit(0.35, "lines"),
                                       segment.color = 'grey50',
                                       nudge_y = 36)
    bargraph12_labels

    Touch Up

    I touched up the bar graph w/ 2012 data to add clarity.

    Citation: Used code from r-statistics.co

    graph12_final <- bargraph12_labels + theme(plot.title=element_text(size=18, 
                                        face="bold", 
                                        family="American Typewriter",
                                        color="black",
                                        hjust=0.5,
                                        lineheight=1.2),  # title
                plot.subtitle=element_text(size=15, 
                                           family="American Typewriter",
                                           hjust=0.5),
                plot.caption=element_text(size=10, 
                                        family="American Typewriter",
                                        color="grey36"),
                axis.title.x=element_text(vjust=0.5,  
                                          size=15),  # X axis title
                axis.title.y=element_text(size=15,
                                          vjust=3),  # Y axis title)
                axis.text.x=element_text(size=12, 
                                         vjust=1),  # X axis text
                axis.text.y=element_text(size=12,
                                         angle=0))  # Y axis text
    graph12_final

    Plot Graph for 2012 Data

    Below is the plot graph for Median Incomes by Borough in NYC, 2012.

    As explained in the 2017 Plot Graph section - for each ACS calculation, there is a margin of error (moe). The moe for ACS is set at a default of 90% accuracy with a 10% moe. I kept this default setting for the 2012 data represented in both graphs.

    plotgraph12_NYC <- ggplot(boroughs12_columns) +
      aes(x = new_vals, y = reorder(labels, -new_vals)) +
      geom_errorbarh(aes(xmin = new_vals - moe, xmax = new_vals + moe, 
                       y = reorder(labels, -new_vals))) +
      geom_point(color = "palegreen3", size = 3) +
      theme(axis.text.y = element_text(angle = 15, hjust = 1, 
                                       vjust = 0.5)) +
      labs(
        title = "Income by Borough",
        subtitle = "NYC, 2012",
        y = "Borough",
        x = "Median Income",
        caption = "Source: American Community Survey, 2008-2012"
      )
    plotgraph12_NYC

    Touch Up

    I touched up the plot graph w/ 2012 data to add clarity.

    Citation: Used code from r-statistics.co

    plotgraph12_final <- plotgraph12_NYC +
      theme(plot.title=element_text(size=18, 
                                        face="bold", 
                                        family="American Typewriter",
                                        color="black",
                                        hjust=0.5,
                                        lineheight=1.2),  # title
                plot.subtitle=element_text(size=15, 
                                           family="American Typewriter",
                                           hjust=0.5),
                plot.caption=element_text(size=10, 
                                        family="American Typewriter",
                                        color="grey36"),
                axis.title.x=element_text(size=15,
                                          vjust=1),  # Y axis title)
                axis.title.y=element_text(vjust=1,  
                                          size=15),  # X axis title
                axis.text.x=element_text(size=12, 
                                         vjust=1),  # X axis text
                axis.text.y=element_text(size=12,
                                         angle=10))  # Y axis text
    plotgraph12_final 

    Compare 2012 & 2017 Bar Graphs

    First, you need to download the necessary packages.

    library(grid)
    library(gridExtra)
    library(downloader)
    library(grDevices)

    Citation: To place graphs side by side, I adapted the code below from github - https://sebastiansauer.github.io/two-plots-rmd/

    grid.arrange(graph12_final, bgraph17_final, ncol = 2)

    Prep Dataset to Combine 2012 & 2017 Bar Graphs

    Join the two datasets using left_join()

    acs_12_17 <- left_join(boroughs12_columns, boroughs17_names, by = "labels")
    acs_12_17
    ## # A tibble: 5 x 7
    ##   labels      new_vals moe.x NAME              estimate moe.y borough_names
    ##   <chr>          <dbl> <dbl> <chr>                <dbl> <dbl> <fct>        
    ## 1 Manhattan     43228.   470 New York County,…    46152   605 <NA>         
    ## 2 Staten Isl…   37557.   569 Richmond County,…    36459   488 <NA>         
    ## 3 Queens        29550.   272 Queens County, N…    29706   306 <NA>         
    ## 4 Brooklyn      27457.   133 Kings County, Ne…    28238   281 <NA>         
    ## 5 Bronx         21132.   252 Bronx County, Ne…    20576   204 <NA>

    Filter Columns

    Clean dataset by filtering down to only the needed columns for graphing.

    acsCLEAN_12_17 <- acs_12_17  %>% 
     select(c("labels", "new_vals", "estimate"))
    acsCLEAN_12_17
    ## # A tibble: 5 x 3
    ##   labels        new_vals estimate
    ##   <chr>            <dbl>    <dbl>
    ## 1 Manhattan       43228.    46152
    ## 2 Staten Island   37557.    36459
    ## 3 Queens          29550.    29706
    ## 4 Brooklyn        27457.    28238
    ## 5 Bronx           21132.    20576

    Rename Columns

    Rename columns before pivoting in order to keep data clean, organized, and easy to read.

    Citations: Code adapted from Sam Raby and from R Documentation: https://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/rename

    proper_names <- plyr::rename(acsCLEAN_12_17, c("new_vals" = "income2012", 
                          "estimate" = "income2017"))
    names(proper_names)
    ## [1] "labels"     "income2012" "income2017"
    proper_names
    ## # A tibble: 5 x 3
    ##   labels        income2012 income2017
    ##   <chr>              <dbl>      <dbl>
    ## 1 Manhattan         43228.      46152
    ## 2 Staten Island     37557.      36459
    ## 3 Queens            29550.      29706
    ## 4 Brooklyn          27457.      28238
    ## 5 Bronx             21132.      20576

    Pivot Data

    Use pivot_longer() to add columns - need to repeat Boroughs and differentiate which are in 2012 and which are in 2017 - repeat labels and add medincome columns and then distinguish values by year - woohoo

    Use the following to pivot the two new coloumns

    acs_long <- pivot_longer(proper_names, c("income2012","income2017"))
    acs_long
    ## # A tibble: 10 x 3
    ##    labels        name        value
    ##    <chr>         <chr>       <dbl>
    ##  1 Manhattan     income2012 43228.
    ##  2 Manhattan     income2017 46152 
    ##  3 Staten Island income2012 37557.
    ##  4 Staten Island income2017 36459 
    ##  5 Queens        income2012 29550.
    ##  6 Queens        income2017 29706 
    ##  7 Brooklyn      income2012 27457.
    ##  8 Brooklyn      income2017 28238 
    ##  9 Bronx         income2012 21132.
    ## 10 Bronx         income2017 20576

    Clean Names Again

    proper_namesAgain <- plyr::rename(acs_long, c("labels"="Boroughs", "name" = "Year", 
                          "value" = "Median_Income"))
    names(proper_names)
    ## [1] "labels"     "income2012" "income2017"
    proper_namesAgain
    ## # A tibble: 10 x 3
    ##    Boroughs      Year       Median_Income
    ##    <chr>         <chr>              <dbl>
    ##  1 Manhattan     income2012        43228.
    ##  2 Manhattan     income2017        46152 
    ##  3 Staten Island income2012        37557.
    ##  4 Staten Island income2017        36459 
    ##  5 Queens        income2012        29550.
    ##  6 Queens        income2017        29706 
    ##  7 Brooklyn      income2012        27457.
    ##  8 Brooklyn      income2017        28238 
    ##  9 Bronx         income2012        21132.
    ## 10 Bronx         income2017        20576

    Combine 2012 & 2017 Bar Graphs

    Combine and order x-axis (Boroughs) based on the value of y (Median Incomes)

    Citation: Github - https://sebastiansauer.github.io/ordering-bars/

    combinedGraph <- ggplot(proper_namesAgain, aes(x = reorder(Boroughs, -Median_Income),
                                                   y = Median_Income, fill=Year)) +
      geom_bar(stat = "identity", position = "dodge") +
      theme(axis.text.x = element_text(angle = 15, hjust = 1, 
                                       vjust = 0.5)) +
      labs(
        title = "Income by Borough",
        subtitle = "NYC, 2012 & 2017",
        y = "Median Income",
        x = "Borough",
        caption = "Source: American Community Survey, 2008-2012 & 2013-2017"
      )
    combinedGraph

    combinedGraph_labels <- combinedGraph + geom_label_repel(aes(label=round(Median_Income,
                                                                             digits=0)),
                                       size=3, data=proper_namesAgain,
                                       colour = 'black',
                                       box.padding = unit(0.35, "lines"),
                                       segment.color = 'grey50',
                                       nudge_x = 0,
                                       nudge_y = 0)
    combinedGraph_labels

    Color by Year

    Citation: http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually

    combinedGraph_colors <- combinedGraph_labels + scale_fill_manual(values=c('palegreen3', 'seagreen4'))
    combinedGraph_colors

    Touch Up Combined Graph

    combinedGraph_final <- combinedGraph_colors +
      theme(plot.title=element_text(size=18, 
                                        face="bold", 
                                        family="American Typewriter",
                                        color="black",
                                        hjust=0.5,
                                        lineheight=1.2),  # title
                plot.subtitle=element_text(size=15, 
                                           family="American Typewriter",
                                           hjust=0.5),
                plot.caption=element_text(size=10, 
                                        family="American Typewriter",
                                        color="grey36"),
                axis.title.x=element_text(vjust=0.5,  
                                          size=15),  # X axis title
                axis.title.y=element_text(size=15,
                                          vjust=3),  # Y axis title)
                axis.text.x=element_text(size=12, 
                                         vjust=1),  # X axis text
                axis.text.y=element_text(size=12,
                                         angle=0))  # Y axis text
    combinedGraph_final 

    Inferences

    Based on the graphs above, we see that:
  • The order of highest median income to lowest across Boroughs remains the same
  • Manhattan experienced the greatest increase in median income from 2012 to 2017
  • Given the rate of inflation, Bronx actually experienced a slight decrease in the median income from 2012 to 2017
  • Given the above observations, to some degree, we can infer that boroughs with higher median incomes are increasing and boroughs with lower median incomes are either remaining the same or decreasing over time. Understanding why will take further research. With more time, I would be interested in exploring changes in population within each borough from 2012 to 2017. I would also be interested in exploring changes in the cost of rent in each borough. I believe these pieces of information would aid in figuring out what may be causing the overarching trends mentioned above.

    Citation: U.S. Census Bureau - https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data/2017/5-year-comparison.html

    Compare 2012 & 2017 Plot Graphs

    grid.arrange(plotgraph12_final, plotgraph17_final, ncol = 2)

    Inferences

    Since these plot graphs show the margin of error (moe) for each value represented, we can see that Manhattan and Staten Island had the greatest moe or standard deviation from the median. In other words, this means that these boroughs contain the largest variety in income relative to the median. In that Manhattan and Staten Island contain the highest median incomes, we might infer the following:
  • Boroughs with a larger range of income levels typically have overall higher incomes
  • The above inference may be a result of greater job diversity and potentially even greater opportunity for employment or growth within pre-existing employment. One would need to pursue further research to be sure.