Tidying your POI data

1. Import your data

Load the Google Places POI data you downloaded for Mini-Assignment 1. As a reminder, state the city you selected and the two POI types you chose in the previous assignment.

I chose ‘coffe_shop’ and ‘bar’ in Buford city for my assignment.

coffee_all <- readRDS(here::here("buford_poi_coffee_shops.rds"))
bar_all    <- readRDS(here::here("buford_poi_bars.rds"))

# City boundary
buford <- tigris::places('GA') %>% filter(NAME == 'Buford')
## Retrieving data for the year 2022
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%
# Convert the data to an sf object using XY coordinates
coffee_all_sf <- coffee_all %>%
  rename(x = places.location.longitude, y = places.location.latitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>%
  st_as_sf(coords = c("x", "y"), crs = 4326)

bar_all_sf <- bar_all %>%
  rename(x = places.location.longitude, y = places.location.latitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>%
  st_as_sf(coords = c("x", "y"), crs = 4326)

coffee_all_sf$type <- "Coffee"
bar_all_sf$type    <- "Bar"

poi_sf <- rbind(coffee_all_sf, bar_all_sf)

# Map
tm_shape(poi_sf) + 
  tm_dots(col = "type", 
          size = "places.userRatingCount",
          palette = c("Coffee" = "brown", "Bar" = "darkblue"),
          border.lwd=0.5,
          popup.vars = c("Name" = "places.displayName.text",
                         "Address" = "places.formattedAddress",
                         "Rating" = "places.rating",
                         "Rating Count" = "places.userRatingCount",
                         "Price Level" = "places.priceLevel")) +
  tm_shape(buford) + 
  tm_borders()

2. Tidy your data Work through the following steps to clean and prepare your dataset:

  • Remove duplicated rows. Show how the number of rows has changed after removing.

  • Flatten/unnest list-columns. Collapse the places.types column so that each element contains a single string value. If your data includes list-columns other than places.types, handle them appropriately while ensuring each row still represents a unique POI.

  • Handle missing values. Remove rows with NA values in columns that you consider important. Explain your reasoning. Report how many rows remain after this step.

  • Filter by location. Remove rows that fall outside the city boundary. Show how the number of rows changes after filtering.

# Check the number of row
print(paste("Previous number:", nrow(poi_sf)))
## [1] "Previous number: 60"
# Keep only the first occurrence of each "ID"
poi_sf_clean <- poi_sf[!duplicated(poi_sf$places.id), ]
print(paste("Cleaned row:", nrow(poi_sf_clean)))
## [1] "Cleaned row: 60"
glimpse(poi_sf_clean)
## Rows: 60
## Columns: 11
## $ places.id                       <chr> "ChIJ02mJktiV9YgRM_efrnJN5TU", "ChIJz1…
## $ places.types                    <list> <"coffee_shop", "cafe", "point_of_int…
## $ places.formattedAddress         <chr> "1600 Mall of Georgia Blvd, Buford, GA…
## $ places.rating                   <dbl> 4.8, 3.2, 4.0, 4.5, 3.5, 1.7, 3.9, 4.3…
## $ places.userRatingCount          <int> 21, 2443, 632, 270, 1272, 15, 306, 520…
## $ places.priceLevel               <chr> NA, "PRICE_LEVEL_INEXPENSIVE", "PRICE_…
## $ places.displayName.text         <chr> "Brush n’ Bean (Inside PAINTED TREE BO…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…
## $ source_type                     <chr> "coffee_shop", "coffee_shop", "coffee_…
## $ type                            <chr> "Coffee", "Coffee", "Coffee", "Coffee"…
## $ geometry                        <POINT [°]> POINT (-83.99489 34.06405), POIN…
# Flatten / unnest list-columns (places.types)
poi_sf_flt <- poi_sf_clean %>%
  mutate(places.types.unnest = places.types %>%
           map_chr(., ~str_c(.x, collapse=", ")))

glimpse(poi_sf_flt)
## Rows: 60
## Columns: 12
## $ places.id                       <chr> "ChIJ02mJktiV9YgRM_efrnJN5TU", "ChIJz1…
## $ places.types                    <list> <"coffee_shop", "cafe", "point_of_int…
## $ places.formattedAddress         <chr> "1600 Mall of Georgia Blvd, Buford, GA…
## $ places.rating                   <dbl> 4.8, 3.2, 4.0, 4.5, 3.5, 1.7, 3.9, 4.3…
## $ places.userRatingCount          <int> 21, 2443, 632, 270, 1272, 15, 306, 520…
## $ places.priceLevel               <chr> NA, "PRICE_LEVEL_INEXPENSIVE", "PRICE_…
## $ places.displayName.text         <chr> "Brush n’ Bean (Inside PAINTED TREE BO…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…
## $ source_type                     <chr> "coffee_shop", "coffee_shop", "coffee_…
## $ type                            <chr> "Coffee", "Coffee", "Coffee", "Coffee"…
## $ geometry                        <POINT [°]> POINT (-83.99489 34.06405), POIN…
## $ places.types.unnest             <chr> "coffee_shop, cafe, point_of_interest,…
# Handle missing value
# I am gonna drop NA places.rating and pricelevel since it is the good indicator for me to identify quality of the place.
poi_drop_na <- poi_sf_flt %>% 
  filter(!is.na(places.rating)) %>%
  filter(!is.na(places.priceLevel))
# city boundary
buford <- tigris::places("GA", progress_bar = FALSE) %>% 
  filter(NAME == 'Buford') %>% 
  st_transform(4326)
## Retrieving data for the year 2022
# Converting poi_dropna into a sf object
poi_sf <- poi_drop_na %>% 
  st_as_sf(coords=c("places.location.longitude", "places.location.latitude"), 
           crs = 4326)

# keep only POIs inside boundary
poi_sf_in <- poi_sf[buford, ]

print(paste0("Before: ", nrow(poi_sf)))
## [1] "Before: 45"
print(paste0("After: ", nrow(poi_sf_in)))
## [1] "After: 24"

3. Show your cleaned POI data Print the first 10 rows of your final dataset using either print() or kableExtra::kable().

poi_sf_in %>% 
  slice(1:10) %>% 
  kable()
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
## Warning in attr(x, "format"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
places.id places.types places.formattedAddress places.rating places.userRatingCount places.priceLevel places.displayName.text places.displayName.languageCode source_type type places.types.unnest geometry
ChIJdZp2eAaV9YgRqYKODOS3Yt8 coffee_shop , cafe , dessert_shop , confectionery , tea_house , food_store , food , point_of_interest, store , establishment 3480 Financial Center Way Ste M1000, Buford, GA 30519, USA 4.3 520 PRICE_LEVEL_INEXPENSIVE Kung Fu Tea en coffee_shop Coffee coffee_shop, cafe, dessert_shop, confectionery, tea_house, food_store, food, point_of_interest, store, establishment POINT (-83.98664 34.0742)
ChIJG0jmzmeU9YgRrRsawq-ISz0 bagel_shop , coffee_shop , breakfast_restaurant, bakery , cafe , food_store , restaurant , food , point_of_interest , store , establishment 3410 Buford Dr Ste G400, Buford, GA 30519, USA 4.0 550 PRICE_LEVEL_INEXPENSIVE Einstein Bros. Bagels en coffee_shop Coffee bagel_shop, coffee_shop, breakfast_restaurant, bakery, cafe, food_store, restaurant, food, point_of_interest, store, establishment POINT (-83.98455 34.07575)
ChIJOYMTz2eU9YgR4HdekqpEB_o coffee_shop , breakfast_restaurant, internet_cafe , cafe , food_store , restaurant , food , point_of_interest , store , establishment 3380 Buford Dr, Buford, GA 30519, USA 4.1 1456 PRICE_LEVEL_MODERATE Starbucks en coffee_shop Coffee coffee_shop, breakfast_restaurant, internet_cafe, cafe, food_store, restaurant, food, point_of_interest, store, establishment POINT (-83.98466 34.07435)
ChIJaevs7WeU9YgR2cekuffZWKQ donut_shop , fast_food_restaurant, coffee_shop , bakery , cafe , dessert_shop , confectionery , food_store , restaurant , food , point_of_interest , store , establishment 3387 Buford Dr, Buford, GA 30519, USA 3.4 2019 PRICE_LEVEL_INEXPENSIVE Krispy Kreme en coffee_shop Coffee donut_shop, fast_food_restaurant, coffee_shop, bakery, cafe, dessert_shop, confectionery, food_store, restaurant, food, point_of_interest, store, establishment POINT (-83.98342 34.07361)
ChIJb3rrIFOV9YgRGXF84KHZKgI coffee_shop , donut_shop , fast_food_restaurant, breakfast_restaurant, bagel_shop , bakery , cafe , food_store , meal_takeaway , restaurant , food , point_of_interest , store , establishment 3687 Buford Dr, Buford, GA 30519, USA 3.0 470 PRICE_LEVEL_INEXPENSIVE Dunkin’ en coffee_shop Coffee coffee_shop, donut_shop, fast_food_restaurant, breakfast_restaurant, bagel_shop, bakery, cafe, food_store, meal_takeaway, restaurant, food, point_of_interest, store, establishment POINT (-83.98571 34.0806)
ChIJg0L-uwCU9YgRzouIV8k3O6E fast_food_restaurant, hamburger_restaurant, sandwich_shop , coffee_shop , cafe , breakfast_restaurant, american_restaurant , restaurant , point_of_interest , food_store , food , store , establishment 4358 Buford Dr, Buford, GA 30518, USA 3.3 1675 PRICE_LEVEL_INEXPENSIVE McDonald’s en coffee_shop Coffee fast_food_restaurant, hamburger_restaurant, sandwich_shop, coffee_shop, cafe, breakfast_restaurant, american_restaurant, restaurant, point_of_interest, food_store, food, store, establishment POINT (-84.01207 34.09448)
ChIJUxfnTEiW9YgR0PpGO36pt88 bakery , coffee_shop , breakfast_restaurant, cafe , dessert_shop , confectionery , food_store , restaurant , food , point_of_interest , store , establishment 4360 S Lee St, Buford, GA 30518, USA 4.7 1076 PRICE_LEVEL_INEXPENSIVE The Baking Grounds Bakery Cafe en coffee_shop Coffee bakery, coffee_shop, breakfast_restaurant, cafe, dessert_shop, confectionery, food_store, restaurant, food, point_of_interest, store, establishment POINT (-84.00249 34.10295)
ChIJmeE8IsOT9YgRF8O3ZhqP5wg cafe , coffee_shop , food_store , store , food , point_of_interest, establishment 179 E Moreno St Suite C, Buford, GA 30518, USA 4.8 302 PRICE_LEVEL_MODERATE Tchin Tchin Coffee en coffee_shop Coffee cafe, coffee_shop, food_store, store, food, point_of_interest, establishment POINT (-84.00355 34.12005)
ChIJ6fil2cGT9YgRWJtFTjI5xQo coffee_shop , internet_cafe , cafe , breakfast_restaurant, restaurant , point_of_interest , food_store , food , store , establishment 4942 Bristol Industrial Way, Buford, GA 30518, USA 4.0 643 PRICE_LEVEL_MODERATE Starbucks en coffee_shop Coffee coffee_shop, internet_cafe, cafe, breakfast_restaurant, restaurant, point_of_interest, food_store, food, store, establishment POINT (-83.95604 34.1434)
ChIJdZb3_d2S9YgRQ3Hq6b19URg convenience_store, gas_station , coffee_shop , atm , public_bathroom , meal_takeaway , cafe , finance , food_store , restaurant , food , store , point_of_interest, establishment 4809 Golden Pkwy, Buford, GA 30518, USA 3.3 27 PRICE_LEVEL_INEXPENSIVE Circle K en coffee_shop Coffee convenience_store, gas_station, coffee_shop, atm, public_bathroom, meal_takeaway, cafe, finance, food_store, restaurant, food, store, point_of_interest, establishment POINT (-83.95288 34.1435)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(buford) +
  tm_borders() +
  tm_shape(poi_sf_in) +
  tm_dots(
    shape = 21,
    col = "places.rating",         # fill color mapped to rating
    palette = "magma",             # color palette
    size = "places.userRatingCount",
    border.col = "black",          # outline color
    border.lwd = 0.5,              # outline thickness
    popup.vars = c(
      "Name" = "places.displayName.text",
      "Rating" = "places.rating",
      "Rating Count" = "places.userRatingCount"
    )
  )
## Legend for symbol sizes not available in view mode.

4. Explore and report findings Write about at least four interesting observations you discovered (maximum 200 words). Include plots or maps if helpful. Example questions you might explore include:

  • What are the most noticeable differences between the two POI types?

Around the Mall of Georgia, bars outnumber coffee shops, reflecting its role as a shopping destination where dining and nightlife dominate. In contrast, Sugar Hill and Golden Parkway have more coffee shops than bars, serving as community hubs with stronger daytime and neighborhood interactions. Overall, coffee shops cluster near community centers while bars line major roads, highlighting the contrasting rhythms of suburban day-life and nightlife.

  • What is the average rating score?
poi_sf_in %>%
  summarise(avg_rating = mean(places.rating, na.rm = TRUE))
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOINT
## Dimension:     XY
## Bounding box:  xmin: -84.01263 ymin: 34.07283 xmax: -83.95242 ymax: 34.14479
## Geodetic CRS:  WGS 84
##   avg_rating                       geometry
## 1      4.025 MULTIPOINT ((-83.95288 34.1...

4.02 is the average rating score.

  • Does it seem related to the number of ratings?

Number of ratings shows positive correlation with average rating, showing that more user ratings tend to have slightly higher average ratings. It’s not a very steep slope, so the effect is modest, but it’s consistent. Most POIs cluster around 4.0-4.5 starts, regardless of the number of ratings.

## `geom_smooth()` using formula = 'y ~ x'

  • Is there an association between price level and rating score?

It seems like ‘price level moderate’ gets higher average rating score than ‘price level inexpensive’. IQR of the ‘price level inexpensive’ is larger than that of ‘price level moderate’. We can assume that place rating in inexpensive place has larger variance than that of place moderate.

Note: The questions above are only examples–feel free to be creative!