Wankelman Module 11

The project undertaken was an ambitious and insightful endeavor to harness the power of the U.S. Census Bureau’s API, creating an R Shiny application capable of visualizing changes in the Gini Index and median household income across U.S. counties from 2015 to 2020. The goal was to present a dynamic, data-driven narrative that could illuminate socio-economic trends over this period. However, the journey was not without its hurdles and learning moments. Though not fully unraveled, the challenges encountered pointed to the complexity and nuances inherent in working with large-scale, real-world data sets. While impeding the final realization of the project’s full potential, these challenges provided valuable insights into the intricacies of data handling and application development in R.

Initially, the project started with extracting data from CSV files from the Census Bureau. While successful in creating a foundational Shiny application, this preliminary approach revealed limitations in terms of data comprehensiveness. Many counties were missing from the dataset, prompting a pivot to directly accessing the Census Bureau’s API for a more exhaustive dataset. This shift marked a significant step forward, enabling the creation of static visualizations that more accurately represented the socio-economic landscape of the United States at a county level. The process involved meticulously joining multiple datasets to form a cohesive and comprehensive view of the Gini Index and median household income trends. The resulting visualizations offered a static but insightful glimpse into the economic disparities and shifts across the nation.

However, transitioning from static visualizations to a dynamic Shiny application introduced a new set of challenges. The development was frequently hindered by unexpected errors, particularly in rendering the maps, which became a recurring issue. Efforts to troubleshoot included extensive debugging and strategically removing NA values from the dataset to stabilize the application. Despite these efforts, the application fell short of its full interactive potential, with issues in map display persisting. This project phase underscored the complexities involved in developing a fully functional Shiny application, particularly when handling large and intricate datasets. The experience, though marked by setbacks, was a profound learning journey in data manipulation, API utilization, and interactive application development in R.

## Reading layer `county' from data source 
##   `/Users/jcw81/Desktop/JHU/Semester 1/Data Visualization/Class Project/Module 11/county.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 3207 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -170 ymin: 17.6804 xmax: -64.6086 ymax: 71.3526
## Geodetic CRS:  WGS 84

The intent of this code is to set environment variables for GDAL and PROJ library paths, which are essential for spatial data processing in R, especially when using libraries like rgdal or sf.

Sys.setenv(GDAL_DATA = "/usr/local/Cellar/gdal/3.7.2_1/share/gdal")
Sys.setenv(PROJ_LIB = "/usr/local/opt/proj")

df1 <- read.csv("~/Desktop/JHU/Semester 1/Data Visualization/Class Project/Module 11/County Gini Dataset.csv",header=TRUE, stringsAsFactors=FALSE)

head(spdf, n=20)

## Simple feature collection with 20 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -170 ymin: 31.1922 xmax: -84.9015 ymax: 56.6142
## Geodetic CRS:  WGS 84
## First 10 features:
##         COUNTY_STATE_NAME  COUNTY_STATE_CODE GEOID FIPS_CODE
## 1        Barbour, Alabama        Barbour, AL  1005    01-005
## 2        Choctaw, Alabama        Choctaw, AL  1023    01-023
## 3        Conecuh, Alabama        Conecuh, AL  1035    01-035
## 4         Elmore, Alabama         Elmore, AL  1051    01-051
## 5           Hale, Alabama           Hale, AL  1065    01-065
## 6           Pike, Alabama           Pike, AL  1109    01-109
## 7        Russell, Alabama        Russell, AL  1113    01-113
## 8         Shelby, Alabama         Shelby, AL  1117    01-117
## 9  Aleutians West, Alaska Aleutians West, AK  2016    02-016
## 10        Mohave, Arizona         Mohave, AZ  4015    04-015
##                          geometry
## 1  MULTIPOLYGON (((-85.6577 31...
## 2  MULTIPOLYGON (((-88.4732 31...
## 3  MULTIPOLYGON (((-86.9059 31...
## 4  MULTIPOLYGON (((-86.375 32....
## 5  MULTIPOLYGON (((-87.7157 33...
## 6  MULTIPOLYGON (((-86.1914 31...
## 7  MULTIPOLYGON (((-85.434 32....
## 8  MULTIPOLYGON (((-87.0268 33...
## 9  MULTIPOLYGON (((-166.3104 5...
## 10 MULTIPOLYGON (((-114.0506 3...

head(df1)

##           GEO_ID    NAME    LSAD    STATE X2010  X2011  X2012  X2013  X2014
## 1 0500000US01003 Baldwin  County  Alabama 0.459 0.4324 0.4371 0.4727 0.4648
## 2 0500000US01015 Calhoun  County  Alabama 0.460 0.4530 0.4563 0.4670 0.4561
## 3 0500000US01043 Cullman  County  Alabama 0.473 0.4478 0.4453 0.4543 0.4533
## 4 0500000US01049  DeKalb  County  Alabama 0.465 0.4857 0.4131 0.3940 0.4754
## 5 0500000US01051  Elmore  County  Alabama 0.403 0.4233 0.3744 0.4206 0.4160
## 6 0500000US01055  Etowah  County  Alabama 0.449 0.4539 0.4467 0.4351 0.4374
##    X2015  X2016  X2017  X2018  X2019  X2021  X2022
## 1 0.4722 0.4498 0.4363 0.4725 0.4630 0.4539 0.4727
## 2 0.4219 0.4692 0.4892 0.4650 0.4572 0.4532 0.4600
## 3 0.4438 0.4518 0.4121 0.4981 0.3961 0.4274 0.4701
## 4 0.4639 0.4528 0.4975 0.5003 0.4459 0.5274 0.4363
## 5 0.4620 0.4535 0.4307 0.4162 0.4468 0.4370 0.4079
## 6 0.4125 0.4477 0.4657 0.4469 0.5008 0.4450 0.4437

The purpose of this code is to remove the prefix “US” from the GEO_ID column in the dataframe df1 and convert the modified GEO_ID values to numeric format, verifying the changes by viewing the first few rows.

# This removes everything before and including "US"
df1$GEO_ID <- gsub(".*US", "", df1$GEO_ID)

# Convert GEO_ID to numeric if necessary
df1$GEO_ID <- as.numeric(df1$GEO_ID)

# You can check the first few rows to confirm
head(df1)

##   GEO_ID    NAME    LSAD    STATE X2010  X2011  X2012  X2013  X2014  X2015
## 1   1003 Baldwin  County  Alabama 0.459 0.4324 0.4371 0.4727 0.4648 0.4722
## 2   1015 Calhoun  County  Alabama 0.460 0.4530 0.4563 0.4670 0.4561 0.4219
## 3   1043 Cullman  County  Alabama 0.473 0.4478 0.4453 0.4543 0.4533 0.4438
## 4   1049  DeKalb  County  Alabama 0.465 0.4857 0.4131 0.3940 0.4754 0.4639
## 5   1051  Elmore  County  Alabama 0.403 0.4233 0.3744 0.4206 0.4160 0.4620
## 6   1055  Etowah  County  Alabama 0.449 0.4539 0.4467 0.4351 0.4374 0.4125
##    X2016  X2017  X2018  X2019  X2021  X2022
## 1 0.4498 0.4363 0.4725 0.4630 0.4539 0.4727
## 2 0.4692 0.4892 0.4650 0.4572 0.4532 0.4600
## 3 0.4518 0.4121 0.4981 0.3961 0.4274 0.4701
## 4 0.4528 0.4975 0.5003 0.4459 0.5274 0.4363
## 5 0.4535 0.4307 0.4162 0.4468 0.4370 0.4079
## 6 0.4477 0.4657 0.4469 0.5008 0.4450 0.4437

library(usmap)
us_states <- us_map(regions = "states")

# Check column names in both data frames
colnames(spdf)

## [1] "COUNTY_STATE_NAME" "COUNTY_STATE_CODE" "GEOID"            
## [4] "FIPS_CODE"         "geometry"

colnames(df1)

##  [1] "GEO_ID" "NAME"   "LSAD"   "STATE"  "X2010"  "X2011"  "X2012"  "X2013" 
##  [9] "X2014"  "X2015"  "X2016"  "X2017"  "X2018"  "X2019"  "X2021"  "X2022"

# Prepare the data by making GEO_IDs match
df1$GEO_ID <- as.numeric(sub("0500000US", "", df1$GEO_ID))

# Merge the datasets on the GEOID
merged_df <- left_join(spdf, df1, by = c("GEOID" = "GEO_ID"))

# Check the head to ensure the merge went correctly
head(merged_df, n = 10)

## Simple feature collection with 10 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -170 ymin: 31.1922 xmax: -84.9015 ymax: 56.6142
## Geodetic CRS:  WGS 84
##         COUNTY_STATE_NAME  COUNTY_STATE_CODE GEOID FIPS_CODE   NAME    LSAD
## 1        Barbour, Alabama        Barbour, AL  1005    01-005   <NA>    <NA>
## 2        Choctaw, Alabama        Choctaw, AL  1023    01-023   <NA>    <NA>
## 3        Conecuh, Alabama        Conecuh, AL  1035    01-035   <NA>    <NA>
## 4         Elmore, Alabama         Elmore, AL  1051    01-051 Elmore  County
## 5           Hale, Alabama           Hale, AL  1065    01-065   <NA>    <NA>
## 6           Pike, Alabama           Pike, AL  1109    01-109   <NA>    <NA>
## 7        Russell, Alabama        Russell, AL  1113    01-113   <NA>    <NA>
## 8         Shelby, Alabama         Shelby, AL  1117    01-117 Shelby  County
## 9  Aleutians West, Alaska Aleutians West, AK  2016    02-016   <NA>    <NA>
## 10        Mohave, Arizona         Mohave, AZ  4015    04-015 Mohave  County
##       STATE X2010  X2011  X2012  X2013  X2014  X2015  X2016  X2017  X2018
## 1      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 2      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 3      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4   Alabama 0.403 0.4233 0.3744 0.4206 0.4160 0.4620 0.4535 0.4307 0.4162
## 5      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 6      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 7      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 8   Alabama 0.426 0.4008 0.4477 0.4250 0.4405 0.4424 0.4305 0.4208 0.4589
## 9      <NA>    NA     NA     NA     NA     NA     NA     NA     NA     NA
## 10  Arizona 0.432 0.4265 0.4554 0.4339 0.4257 0.4286 0.4645 0.4551 0.4334
##     X2019  X2021  X2022                       geometry
## 1      NA     NA     NA MULTIPOLYGON (((-85.6577 31...
## 2      NA     NA     NA MULTIPOLYGON (((-88.4732 31...
## 3      NA     NA     NA MULTIPOLYGON (((-86.9059 31...
## 4  0.4468 0.4370 0.4079 MULTIPOLYGON (((-86.375 32....
## 5      NA     NA     NA MULTIPOLYGON (((-87.7157 33...
## 6      NA     NA     NA MULTIPOLYGON (((-86.1914 31...
## 7      NA     NA     NA MULTIPOLYGON (((-85.434 32....
## 8  0.4527 0.4470 0.4134 MULTIPOLYGON (((-87.0268 33...
## 9      NA     NA     NA MULTIPOLYGON (((-166.3104 5...
## 10 0.4214 0.4834 0.5238 MULTIPOLYGON (((-114.0506 3...

# Verify column names
print(names(merged_df))

##  [1] "COUNTY_STATE_NAME" "COUNTY_STATE_CODE" "GEOID"            
##  [4] "FIPS_CODE"         "NAME"              "LSAD"             
##  [7] "STATE"             "X2010"             "X2011"            
## [10] "X2012"             "X2013"             "X2014"            
## [13] "X2015"             "X2016"             "X2017"            
## [16] "X2018"             "X2019"             "X2021"            
## [19] "X2022"             "geometry"

# Check for NA values in the column of interest
summary(merged_df$X2021)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.3447  0.4300  0.4515  0.4533  0.4768  0.6272    2375

# Plot the choropleth map using the correct column name
ggplot(data = merged_df) +
  geom_sf(aes(fill = X2021)) + # Make sure this is the correct column name
  scale_fill_viridis_c() +
  labs(title = "Gini Ratio 2021", fill = "Index Value") +
  theme_minimal()

The creation of a Shiny application to visualize the Gini Index across U.S. counties, despite the challenges with incomplete data, showcases a successful implementation of data visualization techniques in R. The application, built using libraries such as shiny, leaflet, viridis, and dplyr, provides an interactive map for exploring income inequality as measured by the Gini Index from 2015 to 2022.

In this application, the user interface (UI) is thoughtfully designed with a sidebar containing a slider for year selection and a reset button to restore the map to its initial state. The leaflet package is used to render the choropleth map, while the viridis color palette enhances visual appeal and clarity. The server function is where the application’s logic resides. It includes a reactive expression to filter data based on the selected year and dynamically updates the color palette.

library(shiny)
library(leaflet)
library(viridis)

## Loading required package: viridisLite

## 
## Attaching package: 'viridis'

## The following object is masked from 'package:maps':
## 
##     unemp

## The following object is masked from 'package:scales':
## 
##     viridis_pal

library(dplyr)

# Define the user interface
ui <- fluidPage(
  titlePanel("Choropleth Map of Gini Index by County"),

  sidebarLayout(
    sidebarPanel(
      sliderInput("year", "Year:", min = 2015, max = 2022, value = 2022, step = 1, sep = ""),
      actionButton("reset", "Reset Map")
    ),
    mainPanel(leafletOutput("choroplethMap"))
  )
)

server <- function(input, output, session) {

  # Store map view state
  mapViewState <- reactiveValues(lat = 37.8, lng = -96, zoom = 4)

  # Reactive expression to filter the data
    filteredData <- reactive({
      year_column <- paste0("X", input$year)
  
      # Check if the year column exists in the dataframe
      if (!year_column %in% names(merged_df)) {
        return(NULL)
      }
      # Add COUNTY_STATE_NAME to the selection in the dataframe
      df <- merged_df %>%
        # Ensure there is at least one non-NA value in the year column
        filter(!all(is.na(.[[year_column]]))) %>%
        select(GEOID, NAME, COUNTY_STATE_NAME, geometry, !!sym(year_column))
    
      if (nrow(df) == 0) {
        return(NULL)
      }
      df
    })

  # Reactive expression for color palette
  pal <- reactive({
    df <- filteredData()
    if (is.null(df)) {
      return(NULL)
    }
    colorNumeric(palette = "viridis", domain = df[[paste0("X", input$year)]])
  })

  # Render the initial map
  output$choroplethMap <- renderLeaflet({
    leaflet() %>%
      addProviderTiles(providers$CartoDB.Positron) %>%
      setView(lng = mapViewState$lng, lat = mapViewState$lat, zoom = mapViewState$zoom)
  })

 # Update the map when the year changes
  observeEvent(input$year, {
    df <- filteredData()
    if (is.null(df)) {
      leafletProxy("choroplethMap") %>%
        clearShapes() %>%
        removeControl("legendID")
      return()
    }
    color_palette <- pal()
    
    leafletProxy("choroplethMap", data = df) %>%
      clearShapes() %>%
      removeControl("legendID") %>%
      addPolygons(
        fillColor = ~ifelse(is.na(df[[paste0("X", input$year)]]), 
                            NA, 
                            color_palette(df[[paste0("X", input$year)]])),
        fillOpacity = 0.7,
        color = "#BDBDC3",
        weight = 0.2,
        smoothFactor = 0.5,
        highlightOptions = highlightOptions(
          weight = 2,
          color = "#667",
          fillOpacity = 0.7,
          bringToFront = TRUE
        ),
        label = ~paste(NAME),
        labelOptions = labelOptions(
          style = list("font-weight" = "normal", padding = "3px 8px"),
          textsize = "15px",
          direction = "auto"
        ),
        popup = ~paste("<strong>County:</strong>", NAME,
                       "<br><strong>State:</strong>", COUNTY_STATE_NAME, # Ensure this line uses the correct variable
                       "<br><strong>Gini Value:</strong>", df[[paste0("X", input$year)]])
      ) %>%
      addLegend(
        pal = color_palette,
        values = ~df[[paste0("X", input$year)]],
        title = "Gini Index Value",
        position = "bottomright",
        layerId = "legendID"
      )
  })


  # Reset Map View
  observeEvent(input$reset, {
    mapViewState$lat <- 37.8
    mapViewState$lng <- -96
    mapViewState$zoom <- 4
    leafletProxy("choroplethMap") %>%
      setView(lng = mapViewState$lng, lat = mapViewState$lat, zoom = mapViewState$zoom)
  })
}

# Run the Shiny application
shinyApp(ui, server)

## 
## Listening on http://127.0.0.1:4291

#Load the Census API Key
census_api_key("40281d2187403a277b174e30172129f2e2263059", install = TRUE, overwrite = TRUE)

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.

## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`

## [1] "40281d2187403a277b174e30172129f2e2263059"

#Load readRenviron to access API key
readRenviron("~/.Renviron")

#Assign the value of the API key to the variable api_key
api_key <- Sys.getenv("CENSUS_API_KEY")

library(purrr)
income_inequality <- get_acs(geography = "county", 
                             variables = "B19083_001", 
                             year = 2020, 
                             survey = "acs5")

## Getting data from the 2016-2020 5-year ACS

# Define a vector of years for which you want to pull data
years <- 2015:2020

# Modify income_inequality to include year
income_inequality <- map_df(years, function(year) {
  year_data <- get_acs(geography = "county", 
                       variables = "B19083_001", 
                       year = year, 
                       survey = "acs5")
  mutate(year_data, year = year)
})

## Getting data from the 2011-2015 5-year ACS

## Getting data from the 2012-2016 5-year ACS

## Getting data from the 2013-2017 5-year ACS

## Getting data from the 2014-2018 5-year ACS

## Getting data from the 2015-2019 5-year ACS

## Getting data from the 2016-2020 5-year ACS

# View the resulting data
head(income_inequality)

## # A tibble: 6 × 6
##   GEOID NAME                    variable   estimate    moe  year
##   <chr> <chr>                   <chr>         <dbl>  <dbl> <int>
## 1 01001 Autauga County, Alabama B19083_001    0.423 0.0175  2015
## 2 01003 Baldwin County, Alabama B19083_001    0.456 0.0108  2015
## 3 01005 Barbour County, Alabama B19083_001    0.464 0.015   2015
## 4 01007 Bibb County, Alabama    B19083_001    0.441 0.0475  2015
## 5 01009 Blount County, Alabama  B19083_001    0.404 0.0145  2015
## 6 01011 Bullock County, Alabama B19083_001    0.465 0.0414  2015

median_income <- get_acs(geography = "county", 
                         variables = "B19013_001", 
                         year = 2020, 
                         survey = "acs5")

## Getting data from the 2016-2020 5-year ACS

# Define a vector of years for which you want to pull data
years <- 2015:2020

# Modify median_income to include year
median_income <- map_df(years, function(year) {
  year_data <- get_acs(geography = "county", 
                       variables = "B19013_001", 
                       year = year, 
                       survey = "acs5")
  mutate(year_data, year = year)
})

## Getting data from the 2011-2015 5-year ACS

## Getting data from the 2012-2016 5-year ACS

## Getting data from the 2013-2017 5-year ACS

## Getting data from the 2014-2018 5-year ACS

## Getting data from the 2015-2019 5-year ACS

## Getting data from the 2016-2020 5-year ACS

# View the resulting data
head(median_income)

## # A tibble: 6 × 6
##   GEOID NAME                    variable   estimate   moe  year
##   <chr> <chr>                   <chr>         <dbl> <dbl> <int>
## 1 01001 Autauga County, Alabama B19013_001    51281  2391  2015
## 2 01003 Baldwin County, Alabama B19013_001    50254  1263  2015
## 3 01005 Barbour County, Alabama B19013_001    32964  2973  2015
## 4 01007 Bibb County, Alabama    B19013_001    38678  3995  2015
## 5 01009 Blount County, Alabama  B19013_001    45813  3141  2015
## 6 01011 Bullock County, Alabama B19013_001    31938  5884  2015

# Perform the join with corrected column names and including year
combined_data <- income_inequality %>%
  left_join(median_income, by = c("GEOID", "year"))

head(combined_data)

## # A tibble: 6 × 10
##   GEOID NAME.x   variable.x estimate.x  moe.x  year NAME.y variable.y estimate.y
##   <chr> <chr>    <chr>           <dbl>  <dbl> <int> <chr>  <chr>           <dbl>
## 1 01001 Autauga… B19083_001      0.423 0.0175  2015 Autau… B19013_001      51281
## 2 01003 Baldwin… B19083_001      0.456 0.0108  2015 Baldw… B19013_001      50254
## 3 01005 Barbour… B19083_001      0.464 0.015   2015 Barbo… B19013_001      32964
## 4 01007 Bibb Co… B19083_001      0.441 0.0475  2015 Bibb … B19013_001      38678
## 5 01009 Blount … B19083_001      0.404 0.0145  2015 Bloun… B19013_001      45813
## 6 01011 Bullock… B19083_001      0.465 0.0414  2015 Bullo… B19013_001      31938
## # ℹ 1 more variable: moe.y <dbl>

combined_data <- combined_data %>%
  rename(
    Gini_Index = estimate.x,
    Median_Income = estimate.y
  )

head(combined_data)

## # A tibble: 6 × 10
##   GEOID NAME.x              variable.x Gini_Index  moe.x  year NAME.y variable.y
##   <chr> <chr>               <chr>           <dbl>  <dbl> <int> <chr>  <chr>     
## 1 01001 Autauga County, Al… B19083_001      0.423 0.0175  2015 Autau… B19013_001
## 2 01003 Baldwin County, Al… B19083_001      0.456 0.0108  2015 Baldw… B19013_001
## 3 01005 Barbour County, Al… B19083_001      0.464 0.015   2015 Barbo… B19013_001
## 4 01007 Bibb County, Alaba… B19083_001      0.441 0.0475  2015 Bibb … B19013_001
## 5 01009 Blount County, Ala… B19083_001      0.404 0.0145  2015 Bloun… B19013_001
## 6 01011 Bullock County, Al… B19083_001      0.465 0.0414  2015 Bullo… B19013_001
## # ℹ 2 more variables: Median_Income <dbl>, moe.y <dbl>

# Load US map data
us_counties <- sf::st_as_sf(maps::map("county", plot = FALSE, fill = TRUE))

print(colnames(us_counties))

## [1] "ID"   "geom"

head(us_counties)

## Simple feature collection with 6 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                              ID                           geom
## alabama,autauga alabama,autauga MULTIPOLYGON (((-86.50517 3...
## alabama,baldwin alabama,baldwin MULTIPOLYGON (((-87.93757 3...
## alabama,barbour alabama,barbour MULTIPOLYGON (((-85.42801 3...
## alabama,bibb       alabama,bibb MULTIPOLYGON (((-87.02083 3...
## alabama,blount   alabama,blount MULTIPOLYGON (((-86.9578 33...
## alabama,bullock alabama,bullock MULTIPOLYGON (((-85.66866 3...

# Split the ID at the comma to separate the state and county
us_counties_split <- strsplit(as.character(us_counties$ID), ",")

# Create a new GEOID by pasting the county and state together in the format "County, State"
us_counties$GEOID <- sapply(us_counties_split, function(x) paste(trimws(x[2]), trimws(x[1]), sep = ", "))

head(us_counties)

## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                              ID                           geom            GEOID
## alabama,autauga alabama,autauga MULTIPOLYGON (((-86.50517 3... autauga, alabama
## alabama,baldwin alabama,baldwin MULTIPOLYGON (((-87.93757 3... baldwin, alabama
## alabama,barbour alabama,barbour MULTIPOLYGON (((-85.42801 3... barbour, alabama
## alabama,bibb       alabama,bibb MULTIPOLYGON (((-87.02083 3...    bibb, alabama
## alabama,blount   alabama,blount MULTIPOLYGON (((-86.9578 33...  blount, alabama
## alabama,bullock alabama,bullock MULTIPOLYGON (((-85.66866 3... bullock, alabama

library(tools)

# Assuming that 'ID' in us_counties is formatted as 'state,county' and we need it as 'County, State'
us_counties$GEOID <- sapply(strsplit(as.character(us_counties$ID), ","), 
                            function(x) {
                              # Capitalize each part and paste them together
                              paste(toTitleCase(trimws(x[2])), "County,", toTitleCase(trimws(x[1])))
                            })

head(us_counties)

## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                              ID                           geom
## alabama,autauga alabama,autauga MULTIPOLYGON (((-86.50517 3...
## alabama,baldwin alabama,baldwin MULTIPOLYGON (((-87.93757 3...
## alabama,barbour alabama,barbour MULTIPOLYGON (((-85.42801 3...
## alabama,bibb       alabama,bibb MULTIPOLYGON (((-87.02083 3...
## alabama,blount   alabama,blount MULTIPOLYGON (((-86.9578 33...
## alabama,bullock alabama,bullock MULTIPOLYGON (((-85.66866 3...
##                                   GEOID
## alabama,autauga Autauga County, Alabama
## alabama,baldwin Baldwin County, Alabama
## alabama,barbour Barbour County, Alabama
## alabama,bibb       Bibb County, Alabama
## alabama,blount   Blount County, Alabama
## alabama,bullock Bullock County, Alabama

# Since combined_data already has a NAME column with "County, State" format, we can use that as the GEOID
combined_data$GEOID <- combined_data$NAME.x

head(combined_data)

## # A tibble: 6 × 10
##   GEOID              NAME.x variable.x Gini_Index  moe.x  year NAME.y variable.y
##   <chr>              <chr>  <chr>           <dbl>  <dbl> <int> <chr>  <chr>     
## 1 Autauga County, A… Autau… B19083_001      0.423 0.0175  2015 Autau… B19013_001
## 2 Baldwin County, A… Baldw… B19083_001      0.456 0.0108  2015 Baldw… B19013_001
## 3 Barbour County, A… Barbo… B19083_001      0.464 0.015   2015 Barbo… B19013_001
## 4 Bibb County, Alab… Bibb … B19083_001      0.441 0.0475  2015 Bibb … B19013_001
## 5 Blount County, Al… Bloun… B19083_001      0.404 0.0145  2015 Bloun… B19013_001
## 6 Bullock County, A… Bullo… B19083_001      0.465 0.0414  2015 Bullo… B19013_001
## # ℹ 2 more variables: Median_Income <dbl>, moe.y <dbl>

head(map_data)

##                                                                      
## 1 function (map, region = ".", exact = FALSE, ...)                   
## 2 {                                                                  
## 3     check_installed("maps", reason = "for `map_data()`")           
## 4     map_obj <- maps::map(map, region, exact = exact, plot = FALSE, 
## 5         fill = TRUE, ...)                                          
## 6     fortify(map_obj)

# Perform the merge using the new GEOID columns
map_data <- left_join(us_counties, combined_data, by = "GEOID")

# Check column names
print(colnames(combined_data))

##  [1] "GEOID"         "NAME.x"        "variable.x"    "Gini_Index"   
##  [5] "moe.x"         "year"          "NAME.y"        "variable.y"   
##  [9] "Median_Income" "moe.y"

print(colnames(income_inequality))

## [1] "GEOID"    "NAME"     "variable" "estimate" "moe"      "year"

print(colnames(median_income))

## [1] "GEOID"    "NAME"     "variable" "estimate" "moe"      "year"

print(colnames(us_counties))

## [1] "ID"    "geom"  "GEOID"

print(colnames(map_data))

##  [1] "ID"            "GEOID"         "NAME.x"        "variable.x"   
##  [5] "Gini_Index"    "moe.x"         "year"          "NAME.y"       
##  [9] "variable.y"    "Median_Income" "moe.y"         "geom"

# Since combined_data has a NAME column with "County, State" format, we will use that for merging
combined_data$GEOID <- combined_data$NAME.x

# check if combined_data has a column that matches GEOID in us_counties
if("NAME.x" %in% names(combined_data)) {
  # If yes, use that for GEOID
  combined_data$GEOID <- combined_data$NAME.x
} else {
}

# Now perform the join using the GEOID columns
map_data <- left_join(us_counties, combined_data, by = "GEOID")

ggplot(map_data) +
  geom_sf(aes(fill = Gini_Index)) + 
  labs(title = "Income Inequality by County",
       fill = "Gini Index") +
  theme_minimal()

ggplot(map_data) +
  geom_sf(aes(fill = Gini_Index), color = "white", size = 0.1) + 
  scale_fill_viridis_c(
    name = "Gini Index",
    labels = scales::label_number(suffix = ""),
    option = "D"
  ) +
  labs(
    title = "Income Inequality by County in 2020",
    subtitle = "Visualized using Gini Index values",
    caption = "Source: American Community Survey 2016-2020"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 8)
  )

# Filter for Louisiana entries in combined_data
louisiana_combined <- combined_data[grep(", Louisiana", combined_data$NAME.x), ]
# Check the first few entries to see the naming convention
head(louisiana_combined)

## # A tibble: 6 × 10
##   GEOID              NAME.x variable.x Gini_Index  moe.x  year NAME.y variable.y
##   <chr>              <chr>  <chr>           <dbl>  <dbl> <int> <chr>  <chr>     
## 1 Acadia Parish, Lo… Acadi… B19083_001      0.468 0.0126  2015 Acadi… B19013_001
## 2 Allen Parish, Lou… Allen… B19083_001      0.469 0.0376  2015 Allen… B19013_001
## 3 Ascension Parish,… Ascen… B19083_001      0.404 0.0101  2015 Ascen… B19013_001
## 4 Assumption Parish… Assum… B19083_001      0.450 0.0284  2015 Assum… B19013_001
## 5 Avoyelles Parish,… Avoye… B19083_001      0.488 0.0189  2015 Avoye… B19013_001
## 6 Beauregard Parish… Beaur… B19083_001      0.449 0.0227  2015 Beaur… B19013_001
## # ℹ 2 more variables: Median_Income <dbl>, moe.y <dbl>

# Filter for Louisiana entries in us_counties
louisiana_counties <- us_counties[grep("louisiana", us_counties$ID, ignore.case = TRUE), ]
# Check the first few entries to see the naming convention
head(louisiana_counties)

## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -93.74163 ymin: 29.62765 xmax: -90.63619 ymax: 31.35225
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                                        ID                           geom
## louisiana,acadia         louisiana,acadia MULTIPOLYGON (((-92.61863 3...
## louisiana,allen           louisiana,allen MULTIPOLYGON (((-92.5957 30...
## louisiana,ascension   louisiana,ascension MULTIPOLYGON (((-91.12894 3...
## louisiana,assumption louisiana,assumption MULTIPOLYGON (((-91.23207 3...
## louisiana,avoyelles   louisiana,avoyelles MULTIPOLYGON (((-92.32642 3...
## louisiana,beauregard louisiana,beauregard MULTIPOLYGON (((-93.12856 3...
##                                             GEOID
## louisiana,acadia         Acadia County, Louisiana
## louisiana,allen           Allen County, Louisiana
## louisiana,ascension   Ascension County, Louisiana
## louisiana,assumption Assumption County, Louisiana
## louisiana,avoyelles   Avoyelles County, Louisiana
## louisiana,beauregard Beauregard County, Louisiana

# Create a new column GEOID with updated names
us_counties$GEOID <- ifelse(grepl("louisiana", us_counties$ID, ignore.case = TRUE),
                            gsub("County", "Parish", us_counties$GEOID),
                            us_counties$GEOID)

# Filter for Louisiana entries in us_counties
louisiana_counties <- us_counties[grep("louisiana", us_counties$ID, ignore.case = TRUE), ]
# Check the first few entries to see the naming convention
head(louisiana_counties)

## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -93.74163 ymin: 29.62765 xmax: -90.63619 ymax: 31.35225
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                                        ID                           geom
## louisiana,acadia         louisiana,acadia MULTIPOLYGON (((-92.61863 3...
## louisiana,allen           louisiana,allen MULTIPOLYGON (((-92.5957 30...
## louisiana,ascension   louisiana,ascension MULTIPOLYGON (((-91.12894 3...
## louisiana,assumption louisiana,assumption MULTIPOLYGON (((-91.23207 3...
## louisiana,avoyelles   louisiana,avoyelles MULTIPOLYGON (((-92.32642 3...
## louisiana,beauregard louisiana,beauregard MULTIPOLYGON (((-93.12856 3...
##                                             GEOID
## louisiana,acadia         Acadia Parish, Louisiana
## louisiana,allen           Allen Parish, Louisiana
## louisiana,ascension   Ascension Parish, Louisiana
## louisiana,assumption Assumption Parish, Louisiana
## louisiana,avoyelles   Avoyelles Parish, Louisiana
## louisiana,beauregard Beauregard Parish, Louisiana

# Since combined_data has a NAME column with "County, State" format, we will use that for merging
combined_data$GEOID <- combined_data$NAME.x

# Perform the merge using the new GEOID columns
map_data <- left_join(us_counties, combined_data, by = "GEOID")

ggplot(map_data) +
  geom_sf(aes(fill = Gini_Index), color = "white", size = 0.1) + # Use 'estimate' for fill
  scale_fill_viridis_c(
    name = "Gini Index",
    labels = scales::label_number(suffix = ""),
    option = "D"
  ) +
  labs(
    title = "Income Inequality by County in 2020",
    subtitle = "Visualized using Gini Index values",
    caption = "Source: American Community Survey 2016-2020"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 8)
  )

ggplot(map_data) +
  geom_sf(aes(fill = Median_Income), color = "white", size = 0.1) + # Use 'estimate' for fill
  scale_fill_viridis_c(
    name = "Median Income",
    labels = scales::label_number(suffix = ""),
    option = "D"
  ) +
  labs(
    title = "Median Household Income by County in 2020",
    subtitle = "Visualized using Gini Index values",
    caption = "Source: American Community Survey 2016-2020"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 8)
  )

head(combined_data)

## # A tibble: 6 × 10
##   GEOID              NAME.x variable.x Gini_Index  moe.x  year NAME.y variable.y
##   <chr>              <chr>  <chr>           <dbl>  <dbl> <int> <chr>  <chr>     
## 1 Autauga County, A… Autau… B19083_001      0.423 0.0175  2015 Autau… B19013_001
## 2 Baldwin County, A… Baldw… B19083_001      0.456 0.0108  2015 Baldw… B19013_001
## 3 Barbour County, A… Barbo… B19083_001      0.464 0.015   2015 Barbo… B19013_001
## 4 Bibb County, Alab… Bibb … B19083_001      0.441 0.0475  2015 Bibb … B19013_001
## 5 Blount County, Al… Bloun… B19083_001      0.404 0.0145  2015 Bloun… B19013_001
## 6 Bullock County, A… Bullo… B19083_001      0.465 0.0414  2015 Bullo… B19013_001
## # ℹ 2 more variables: Median_Income <dbl>, moe.y <dbl>

str(combined_data)

## tibble [19,321 × 10] (S3: tbl_df/tbl/data.frame)
##  $ GEOID        : chr [1:19321] "Autauga County, Alabama" "Baldwin County, Alabama" "Barbour County, Alabama" "Bibb County, Alabama" ...
##  $ NAME.x       : chr [1:19321] "Autauga County, Alabama" "Baldwin County, Alabama" "Barbour County, Alabama" "Bibb County, Alabama" ...
##  $ variable.x   : chr [1:19321] "B19083_001" "B19083_001" "B19083_001" "B19083_001" ...
##  $ Gini_Index   : num [1:19321] 0.423 0.456 0.464 0.441 0.404 ...
##  $ moe.x        : num [1:19321] 0.0175 0.0108 0.015 0.0475 0.0145 0.0414 0.0184 0.0111 0.0356 0.0243 ...
##  $ year         : int [1:19321] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ NAME.y       : chr [1:19321] "Autauga County, Alabama" "Baldwin County, Alabama" "Barbour County, Alabama" "Bibb County, Alabama" ...
##  $ variable.y   : chr [1:19321] "B19013_001" "B19013_001" "B19013_001" "B19013_001" ...
##  $ Median_Income: num [1:19321] 51281 50254 32964 38678 45813 ...
##  $ moe.y        : num [1:19321] 2391 1263 2973 3995 3141 ...

str(map_data)

## Classes 'sf' and 'data.frame':   18001 obs. of  12 variables:
##  $ ID           : chr  "alabama,autauga" "alabama,autauga" "alabama,autauga" "alabama,autauga" ...
##  $ GEOID        : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ NAME.x       : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ variable.x   : chr  "B19083_001" "B19083_001" "B19083_001" "B19083_001" ...
##  $ Gini_Index   : num  0.423 0.429 0.45 0.46 0.454 ...
##  $ moe.x        : num  0.0175 0.0177 0.0391 0.041 0.0392 0.0326 0.0108 0.0098 0.01 0.0097 ...
##  $ year         : int  2015 2016 2017 2018 2019 2020 2015 2016 2017 2018 ...
##  $ NAME.y       : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ variable.y   : chr  "B19013_001" "B19013_001" "B19013_001" "B19013_001" ...
##  $ Median_Income: num  51281 53099 55317 58786 58731 ...
##  $ moe.y        : num  2391 2631 2838 2972 4410 ...
##  $ geom         :sfc_MULTIPOLYGON of length 18001; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:51, 1:2] -86.5 -86.5 -86.5 -86.6 -86.6 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geom"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:11] "ID" "GEOID" "NAME.x" "variable.x" ...

head(us_counties)

## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
##                              ID                           geom
## alabama,autauga alabama,autauga MULTIPOLYGON (((-86.50517 3...
## alabama,baldwin alabama,baldwin MULTIPOLYGON (((-87.93757 3...
## alabama,barbour alabama,barbour MULTIPOLYGON (((-85.42801 3...
## alabama,bibb       alabama,bibb MULTIPOLYGON (((-87.02083 3...
## alabama,blount   alabama,blount MULTIPOLYGON (((-86.9578 33...
## alabama,bullock alabama,bullock MULTIPOLYGON (((-85.66866 3...
##                                   GEOID
## alabama,autauga Autauga County, Alabama
## alabama,baldwin Baldwin County, Alabama
## alabama,barbour Barbour County, Alabama
## alabama,bibb       Bibb County, Alabama
## alabama,blount   Blount County, Alabama
## alabama,bullock Bullock County, Alabama

# Load US map data
us_counties <- sf::st_as_sf(maps::map("county", plot = FALSE, fill = TRUE))

# Correct the GEOID in us_counties to match the format in combined_data
us_counties$GEOID <- sapply(strsplit(as.character(us_counties$ID), ","), 
                            function(x) {
                              paste(trimws(x[2]), "County,", trimws(x[1]))
                            })
# Convert to title case and ensure whitespace is managed correctly
us_counties$GEOID <- gsub(" ,", ", ", toTitleCase(us_counties$GEOID))

# Join geographic data with combined data
map_data <- left_join(us_counties, combined_data, by = "GEOID")

# Remove rows with NA in any of the key columns
map_data <- map_data %>%
  filter(!is.na(Gini_Index), !is.na(Median_Income))

# Optional: Check the structure and head of the final data
str(map_data)

## Classes 'sf' and 'data.frame':   17583 obs. of  12 variables:
##  $ ID           : chr  "alabama,autauga" "alabama,autauga" "alabama,autauga" "alabama,autauga" ...
##  $ GEOID        : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ NAME.x       : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ variable.x   : chr  "B19083_001" "B19083_001" "B19083_001" "B19083_001" ...
##  $ Gini_Index   : num  0.423 0.429 0.45 0.46 0.454 ...
##  $ moe.x        : num  0.0175 0.0177 0.0391 0.041 0.0392 0.0326 0.0108 0.0098 0.01 0.0097 ...
##  $ year         : int  2015 2016 2017 2018 2019 2020 2015 2016 2017 2018 ...
##  $ NAME.y       : chr  "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" "Autauga County, Alabama" ...
##  $ variable.y   : chr  "B19013_001" "B19013_001" "B19013_001" "B19013_001" ...
##  $ Median_Income: num  51281 53099 55317 58786 58731 ...
##  $ moe.y        : num  2391 2631 2838 2972 4410 ...
##  $ geom         :sfc_MULTIPOLYGON of length 17583; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:51, 1:2] -86.5 -86.5 -86.5 -86.6 -86.6 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geom"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:11] "ID" "GEOID" "NAME.x" "variable.x" ...

head(map_data, n = 20)

## Simple feature collection with 20 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 33.24874
## Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
## First 10 features:
##                 ID                   GEOID                  NAME.x variable.x
## 1  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 2  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 3  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 4  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 5  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 6  alabama,autauga Autauga County, Alabama Autauga County, Alabama B19083_001
## 7  alabama,baldwin Baldwin County, Alabama Baldwin County, Alabama B19083_001
## 8  alabama,baldwin Baldwin County, Alabama Baldwin County, Alabama B19083_001
## 9  alabama,baldwin Baldwin County, Alabama Baldwin County, Alabama B19083_001
## 10 alabama,baldwin Baldwin County, Alabama Baldwin County, Alabama B19083_001
##    Gini_Index  moe.x year                  NAME.y variable.y Median_Income
## 1      0.4227 0.0175 2015 Autauga County, Alabama B19013_001         51281
## 2      0.4295 0.0177 2016 Autauga County, Alabama B19013_001         53099
## 3      0.4501 0.0391 2017 Autauga County, Alabama B19013_001         55317
## 4      0.4602 0.0410 2018 Autauga County, Alabama B19013_001         58786
## 5      0.4542 0.0392 2019 Autauga County, Alabama B19013_001         58731
## 6      0.4552 0.0326 2020 Autauga County, Alabama B19013_001         57982
## 7      0.4564 0.0108 2015 Baldwin County, Alabama B19013_001         50254
## 8      0.4608 0.0098 2016 Baldwin County, Alabama B19013_001         51365
## 9      0.4618 0.0100 2017 Baldwin County, Alabama B19013_001         52562
## 10     0.4609 0.0097 2018 Baldwin County, Alabama B19013_001         55962
##    moe.y                           geom
## 1   2391 MULTIPOLYGON (((-86.50517 3...
## 2   2631 MULTIPOLYGON (((-86.50517 3...
## 3   2838 MULTIPOLYGON (((-86.50517 3...
## 4   2972 MULTIPOLYGON (((-86.50517 3...
## 5   4410 MULTIPOLYGON (((-86.50517 3...
## 6   4839 MULTIPOLYGON (((-86.50517 3...
## 7   1263 MULTIPOLYGON (((-87.93757 3...
## 8    991 MULTIPOLYGON (((-87.93757 3...
## 9   1348 MULTIPOLYGON (((-87.93757 3...
## 10  1204 MULTIPOLYGON (((-87.93757 3...

print(summary(map_data))

##       ID               GEOID              NAME.x           variable.x       
##  Length:17583       Length:17583       Length:17583       Length:17583      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Gini_Index         moe.x              year         NAME.y         
##  Min.   :0.2567   Min.   :0.00120   Min.   :2015   Length:17583      
##  1st Qu.:0.4200   1st Qu.:0.01440   1st Qu.:2016   Class :character  
##  Median :0.4418   Median :0.02180   Median :2017   Mode  :character  
##  Mean   :0.4440   Mean   :0.02494   Mean   :2018                     
##  3rd Qu.:0.4655   3rd Qu.:0.03140   3rd Qu.:2019                     
##  Max.   :0.6962   Max.   :0.26180   Max.   :2020                     
##   variable.y        Median_Income        moe.y                  geom      
##  Length:17583       Min.   : 19501   Min.   :  247   MULTIPOLYGON :17583  
##  Class :character   1st Qu.: 41706   1st Qu.: 1865   epsg:NA      :    0  
##  Mode  :character   Median : 49067   Median : 2723   +proj=long...:    0  
##                     Mean   : 50621   Mean   : 3304                        
##                     3rd Qu.: 56762   3rd Qu.: 4018                        
##                     Max.   :147111   Max.   :65791

The project to create a Shiny application for visualizing income inequality and median household income across U.S. counties encountered significant challenges, particularly in implementing the R script. The goal was to integrate data from the American Community Survey (ACS) from 2015 to 2020 using R packages such as shiny, ggplot2, sf, dplyr, tidycensus, maps, scales, and rlang. The user interface was thoughtfully designed to allow a selection of years, data display options, and visualization settings. However, difficulties arose during the rendering phase of the application. Despite debugging attempts and checks on the data structure, the persistent error “non-numeric argument to binary operator” indicated unresolved data manipulation or plotting issues.

This setback highlighted the complexities of working with diverse data sources and interactive web frameworks. Despite the logical structure of the code and its apparent potential for creating a dynamic and informative map, the application still needs to materialize as envisioned. The process, though fraught with obstacles, offered valuable insights into data handling, dataset merging, and the intricacies of interactive visualization in R. It underscored the ongoing nature of the project and the continuous journey in data science and application development, emphasizing that resolution of these challenges is still a work in progress.

ui <- fluidPage(
  titlePanel("Median Household Income by US County"),
  
  sidebarLayout(
    sidebarPanel(
      selectInput("yearInput", "Select Year", choices = 2015:2020),
      hr(),
      helpText("Visualization Settings:"),
      sliderInput("opacityInput", 
                  "Map Opacity:", 
                  min = 0, 
                  max = 1, 
                  value = 0.7),
      selectInput("colorSchemeInput", 
                  "Select Color Scheme:", 
                  choices = c("Blue", "Red", "Green", "Purple", "Orange"),
                  selected = "Orange")
    ),
    
    mainPanel(
      plotOutput("mapOutput")
    )
  )
)

server <- function(input, output) {
  # Reactive expression for combined data
  data_reactive <- reactive({
    req(input$yearInput) # Ensure this runs only when a year is selected

    # Get ACS data for the selected year
    median_income <- get_acs(geography = "county", 
                             variables = "B19013_001", 
                             year = input$yearInput, 
                             survey = "acs5")

    # Rename the estimate to Median_Income and ensure it's numeric
    median_income <- median_income %>%
      mutate(Median_Income = as.numeric(estimate)) %>%
      select(GEOID, Median_Income, year)

    # Load and prepare US counties map data
    us_counties <- sf::st_as_sf(maps::map("county", plot = FALSE, fill = TRUE))
    us_counties <- us_counties %>%
      mutate(GEOID = sapply(strsplit(as.character(ID), ","), 
                            function(x) paste(trimws(x[2]), "County,", trimws(x[1])))) %>%
      mutate(GEOID = gsub(" ,", ", ", toTitleCase(GEOID)))

    # Merge map data with ACS data
    map_data <- left_join(us_counties, median_income, by = "GEOID")

    # Check data structure and handle NA values
    map_data <- map_data %>%
      drop_na(Median_Income) %>%
      mutate(Median_Income = as.numeric(Median_Income))

    str(map_data) # Check the structure of map_data
    return(map_data)
  })

  # Render the map
  output$mapOutput <- renderPlot({
    # Retrieve the data
    map_data <- data_reactive()
    
    # Plotting code
    ggplot(map_data) +
      geom_sf(aes(fill = Median_Income), color = "white", size = 0.1) +
      scale_fill_viridis_c(
        na.value = "grey50",
        labels = label_number(suffix = ""),
        option = "D"
      ) +
      labs(
        title = paste("Median Household Income by County in", as.character(input$yearInput)),
        caption = "Source: American Community Survey"
      ) +
      theme_minimal() +
      theme(
        legend.position = "right",
        plot.title = element_text(size = 14, face = "bold"),
        plot.caption = element_text(size = 8)
      ) +
      guides(fill = guide_legend(override.aes = list(alpha = input$opacityInput))) # Set the legend's opacity based on input
  })
}

# Run the application
shinyApp(ui = ui, server = server)

## 
## Listening on http://127.0.0.1:6198

## Warning: Error in -: non-numeric argument to binary operator

Wankelman Module 11

2023-11-29