Notes

City of Choice: City of Duluth, Georgia
POI Type of Choice: Brunch Restaurants & Cafes

1. Tidying POI Data

1.1 Import data

poi <- read_rds("duluth_google_poi_data.rds")

1.2 Tidy data

1.2.1 Remove duplicated rows

poi_unique <- poi %>% distinct(places.id, .keep_all=T)

Show how the number of rows has changed after removing.

glue::glue("Before dropping duplicated rows, there were {nrow(poi)} rows. After dropping them, there are {nrow(poi_unique)} rows.")
## Before dropping duplicated rows, there were 161 rows. After dropping them, there are 70 rows.

1.2.2 Flatten/unnest list-columns

#Collapse the places.types column so that each element contains a single string value
poi_flat <- poi_unique %>%
  mutate(places.types = places.types %>% 
           map_chr(., ~str_c(.x, collapse=",")))
#check if all columns that are list-columns are flattened
glimpse(poi_flat)
## Rows: 70
## Columns: 10
## $ places.id                       <chr> "ChIJdxvWExq99YgReZJFbpnZ1xo", "ChIJRe…
## $ places.types                    <chr> "bakery,coffee_shop,wholesaler,cafe,fo…
## $ places.formattedAddress         <chr> "1290 Old Peachtree Rd NW, Duluth, GA …
## $ places.rating                   <dbl> 4.7, 3.8, 4.3, 4.2, 4.1, 3.7, 3.6, NA,…
## $ places.priceLevel               <chr> "PRICE_LEVEL_MODERATE", "PRICE_LEVEL_M…
## $ places.userRatingCount          <int> 264, 224, 573, 314, 842, 931, 14, NA, …
## $ places.location.latitude        <dbl> 34.00390, 33.99795, 34.00381, 34.00515…
## $ places.location.longitude       <dbl> -84.08548, -84.08994, -84.08333, -84.0…
## $ places.displayName.text         <chr> "Paris Baguette", "Starbucks", "Hansel…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…

1.2.3 Handle missing values

#Drop rows that have missing values in columns: rating, rating count, price level
poi_dropna <- poi_flat %>% 
  drop_na(c(places.rating, places.userRatingCount, places.priceLevel))

Explain your reasoning

I chose the fields “Rating”, “Rating Count”, and “Price Level” as fields to check for NA values, because I think the three fields are the most crucial in providing useful information beyond POI location and name.

Report how many rows remain after dropping missing values

glue::glue("Before further dropping rows with missing values, there were {nrow(poi_flat)} rows. After dropping them, there are {nrow(poi_dropna)} rows.")
## Before further dropping rows with missing values, there were 70 rows. After dropping them, there are 41 rows.

1.2.4 Filter by location

#Remove rows that fall outside the city boundary
#City of Duluth boundary
duluth <- tigris::places('GA', progress_bar = FALSE) %>% 
  filter(NAME == 'Duluth') %>%
  st_transform(4326)
#Converting poi_dropna into a sf object
poi_sf <- poi_dropna %>% 
  st_as_sf(coords=c("places.location.longitude", "places.location.latitude"), 
           crs = 4326)
#POIs within the City of Duluth boundary
poi_sf_in <- poi_sf[duluth, ]
#Create separate column extracting restaurant type
poi_sf_in$type <- NA
for (i in 1:nrow(poi_sf_in)){
  if ((grepl("brunch_restaurant", poi_sf_in$places.types[i])) & (is.na(poi_sf_in$type[i])==TRUE)){
    poi_sf_in$type[i] <- "bruch"
  }else if ((grepl("cafe", poi_sf_in$places.types[i])) & (is.na(poi_sf_in$type[i])==TRUE)){
    poi_sf_in$type[i] <- "cafe"
  }
}

Show how the number of rows changes after filtering

glue::glue("Before further dropping rows outside of City of Duluth Boundary, there were {nrow(poi_sf)} rows. After dropping them, there are {nrow(poi_sf_in)} rows.")
## Before further dropping rows outside of City of Duluth Boundary, there were 41 rows. After dropping them, there are 17 rows.

1.3 Show cleaned POI data

Show how the number of rows changes after entire cleaning

glue::glue("Before cleaning POI's for duplicates, missing values, and beyond city boundary, there were {nrow(poi)} rows. After dropping them, there are {nrow(poi_sf_in)} rows.")
## Before cleaning POI's for duplicates, missing values, and beyond city boundary, there were 161 rows. After dropping them, there are 17 rows.

2. Explore and report findings

2.1 Plot map of brunch restaurant and cafe locations & ratings in Duluth

tmap_mode("view")
tm_shape(duluth) + tm_fill(col = "#d6d6d6", border.col = "#9e9e9e", lwd=1.5, alpha=0.7) +
  tm_shape(poi_sf_in) + 
  tm_dots(col = "type", palette = c("#4c97d4", "#de5252"),
          size = "places.rating", shape=16, alpha=0.7,
          scale = 2,
          title.size = "Rating") +
  tm_title("Interactive Map of Brunch Restaurant and Cafes Locations & their Ratings - City of Duluth, Atlanta") +
  tm_layout(component.autoscale = FALSE)

What are the most noticeable differences between the two POI types?

The most noticeable differences between brunch restaurant and cafe is that Duluth has more cafes than brunches. Also while cafes are scattered around the city, brunch restaurants are seen centered in the city center.

Do POIs tend to cluster in specific neighborhoods, or are they spread evenly across the city?

For all POI’s, many are scattered but there are still noteable clusters around west, center, and south Duluth.

2.2 Plot map of POI’s ratings and rating counts in Duluth

tm_shape(duluth) + tm_fill(col = "#d6d6d6", border.col = "#9e9e9e", lwd=1.5, alpha=0.7) +
  tm_shape(poi_sf_in) + 
  tm_dots(col = "places.rating", 
          size = "places.userRatingCount",
          scale = 2,
          palette = "Teal",
          title.size = "Rating Count")

If you had to choose one area to visit based on the dataset, which would you pick and why?

If I could choose one area to visit, I would choose the city center, because there is a high concentration of restaurants with higher ratings and also there is more variety, with both brunch restaurants and cafes to choose from.

Is there an association between rating score and count?

There doesn’t seems to be a high association of rating score and count, in which both restaurants with less than 500 rating counts and more than 3500 rating counts have high ratings.