Set up libraries

(Don’t need to show the libraries).

Load the saved data frames from the data preparation phase

From mini assignment 1, I am saving the data frames yelp_bank, yelp_carpenters, and yelp_final(combined) into one data set with a new name. Then, I need to load it into this project.

# Saved from Mini_Assignment_1.Rmd
# save(yelp_carpenters, yelp_bank, yelp_final, file = 'data_all.RData')
load('data_all.RData')

Tidy the data frames

There are a few steps to clean up the way the data is presented, so it’s easier to work with and more reliable for visualizations.

See if duplicate columns exist

Check for duplicated columns in the data yelp_final, which is the combined data.

yelp_final %>% distinct(.) # returning no duplicates
## # A tibble: 428 Ă— 16
##    id       alias name  image_url is_closed url   review_count categories rating
##    <chr>    <chr> <chr> <chr>     <lgl>     <chr>        <int> <list>      <dbl>
##  1 Rd5vXW3… citi… Citi… "https:/… FALSE     http…            2 <df>          5  
##  2 sFLTL1s… chas… Chas… "https:/… FALSE     http…            2 <df>          2  
##  3 8EzNyE_… well… Well… "https:/… FALSE     http…            4 <df>          2  
##  4 YFs1mwU… well… Well… "https:/… FALSE     http…           22 <df>          2.5
##  5 tDmsP7e… bond… BOND… "https:/… FALSE     http…           12 <df>          2.5
##  6 I4ajnYe… chas… Chas… "https:/… FALSE     http…           12 <df>          2.5
##  7 C6iSJji… bank… Bank… "https:/… FALSE     http…           26 <df>          2  
##  8 aiSmo01… syno… Syno… "https:/… FALSE     http…            5 <df>          3  
##  9 Rd5vXW3… citi… Citi… "https:/… FALSE     http…            2 <df>          5  
## 10 FBDOrMp… bank… Bank… ""        FALSE     http…            3 <df>          2.5
## # ℹ 418 more rows
## # ℹ 7 more variables: coordinates <df[,2]>, transactions <list>,
## #   location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>,
## #   business_type <chr>
##dupl_df <- data.frame(yelp_bank = c("id", "alias", "name"),
  #                    yelp_carpenters = c("id", "alias", "name"))

#duplicated(dupl_df$data.frame)

Delete duplicate columns

There are no duplicated columns, but this is an example of how to delete duplicated columns.

# Duplicates in column "location" removed.

# dupl_df[!duplicated(dupl_df$location),]

Separate any data that is in nested columns

Flatten the nested columns.

yelp_flat <- yelp_final %>% unnest_wider(categories, names_sep = "_") %>% # as a new data frame using a new name
  unnest_wider(coordinates, names_sep = "_") %>% # use _ separator to replace the $ in original data set
  unnest_wider(location, names_sep = "_")

New yelp_flat contains 25 variables, while yelp_final contains 16 variables. Using yelp_flat from here forward.

Drop empty elements

Drop NA in my new coordinates separated columns.

yelp_flat %>% 
  filter(!is.na(coordinates_latitude)) %>% 
  filter(!is.na(coordinates_longitude)) # there are two columns that have NA, from looking into yelp_flat data
## # A tibble: 428 Ă— 25
##    id        alias name  image_url is_closed url   review_count categories_alias
##    <chr>     <chr> <chr> <chr>     <lgl>     <chr>        <int>      <list<chr>>
##  1 Rd5vXW3G… citi… Citi… "https:/… FALSE     http…            2              [1]
##  2 sFLTL1sr… chas… Chas… "https:/… FALSE     http…            2              [1]
##  3 8EzNyE_7… well… Well… "https:/… FALSE     http…            4              [1]
##  4 YFs1mwUA… well… Well… "https:/… FALSE     http…           22              [1]
##  5 tDmsP7eT… bond… BOND… "https:/… FALSE     http…           12              [2]
##  6 I4ajnYe2… chas… Chas… "https:/… FALSE     http…           12              [1]
##  7 C6iSJji5… bank… Bank… "https:/… FALSE     http…           26              [1]
##  8 aiSmo01K… syno… Syno… "https:/… FALSE     http…            5              [1]
##  9 Rd5vXW3G… citi… Citi… "https:/… FALSE     http…            2              [1]
## 10 FBDOrMpN… bank… Bank… ""        FALSE     http…            3              [1]
## # ℹ 418 more rows
## # ℹ 17 more variables: categories_title <list<chr>>, rating <dbl>,
## #   coordinates_latitude <dbl>, coordinates_longitude <dbl>,
## #   transactions <list>, location_address1 <chr>, location_address2 <chr>,
## #   location_address3 <chr>, location_city <chr>, location_zip_code <chr>,
## #   location_country <chr>, location_state <chr>,
## #   location_display_address <list<list>>, phone <chr>, display_phone <chr>, …

Constrain data to the census tract boundary

Finally, we need to remove rows that are not inside the census tract boundary. Instead of importing it from another .Rmd file, I’m just going to re-initialize the census tract here.

# need to get polygon data here, choose a different variable
tract <- tidycensus::get_acs(geography = "tract",
                            state = "GA",
                            county = "Dekalb",
                            variables = c(population = "B01003_001",
                                          medianincome = "B19013_001"),
                            year = 2019,
                            survey = "acs5",
                            geometry = TRUE, # returns sf objects
                            output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
# atlanta <- places('GA') %>%
#   filter(NAME %in% c('Stone Mountain', 'Atlanta')) ##Dekalb county stretches into two cities
# 
# tract <- tract[atlanta,]

# Filter for specific cities
atlanta <- tract %>%
  filter(NAME %in% c('Stone Mountain', 'Atlanta'))

# Make yelp_flat an sf object
yelp_sf <- yelp_flat %>%
  st_as_sf(coords = c("coordinates_longitude", "coordinates_latitude"), crs = st_crs(tract))

# Perform a spatial join between yelp_sf and atlanta
filtered_yelp <- st_join(yelp_sf, atlanta)


## View acs data
tract
## Simple feature collection with 145 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.35022 ymin: 33.61467 xmax: -84.02371 ymax: 33.97088
## Geodetic CRS:  NAD83
## First 10 features:
##          GEOID                                        NAME populationE
## 1  13089021213 Census Tract 212.13, DeKalb County, Georgia        3526
## 2  13089023506 Census Tract 235.06, DeKalb County, Georgia        6465
## 3  13089021305 Census Tract 213.05, DeKalb County, Georgia        4970
## 4  13089023313 Census Tract 233.13, DeKalb County, Georgia        5294
## 5  13089021604 Census Tract 216.04, DeKalb County, Georgia        3237
## 6  13089021913 Census Tract 219.13, DeKalb County, Georgia        4450
## 7  13089021906 Census Tract 219.06, DeKalb County, Georgia        5572
## 8  13089021413 Census Tract 214.13, DeKalb County, Georgia        4081
## 9  13089021911 Census Tract 219.11, DeKalb County, Georgia        1569
## 10 13089023114 Census Tract 231.14, DeKalb County, Georgia        2901
##    populationM medianincomeE medianincomeM                       geometry
## 1          204        154063         19674 MULTIPOLYGON (((-84.34783 3...
## 2          927         45924         13793 MULTIPOLYGON (((-84.25237 3...
## 3          391         55109          4607 MULTIPOLYGON (((-84.28811 3...
## 4          576         55143          5672 MULTIPOLYGON (((-84.14593 3...
## 5          254        159306         38073 MULTIPOLYGON (((-84.31051 3...
## 6          559         32983          3760 MULTIPOLYGON (((-84.1905 33...
## 7          570         46448          4613 MULTIPOLYGON (((-84.187 33....
## 8          481         47885         10004 MULTIPOLYGON (((-84.32911 3...
## 9          348         27835          5106 MULTIPOLYGON (((-84.19619 3...
## 10         322         51105          3293 MULTIPOLYGON (((-84.24137 3...

The original map

The original map included businesses outside the tract boundary of Dekalb county.

The updated map

This is a static map that still shows the business locations within Dekalb County boundary.

The map is not working but this image will.

knitr::include_graphics("assignment_2_map.png")

Findings

During the data preparation phase, I noticed that the bank data seemed more dense at borders of the tract(Dekalb county) and around major streets and interstates. These were presumptions that I would want to validate if that information seemed useful.

Looking at the new map, I notice that a lot of the banks are a more central-western portion of Dekalb county, which is more inside the city of Atlanta. The cities seem to be smaller and more densely occurring in that part of the map too. The carpenters’ businesses seem to be more sparse, and more likely to be located in cities that cover larger geographical portions of the county.

While there are not more carpenters than banks represented in the tract, the carpenters tend to trend farther east than the dense clusters of banks, which east happens to be farther away from Atlanta territory. I don’t see a trustworthy way to validate some of my hypotheses now, like whether the carpenters are over-represented by self-owned and self-reported alternate business addresses, so I did not check for that.

The data is much tidier but the trend of heavy clusters of banks to the west and sparse carpenters becomes more obvious.