Load the Google Places POI data you downloaded for Mini-Assignment 1. As a reminder, state the city you selected and the two POI types you chose in the previous assignment.
I chose ‘coffe_shop’ and ‘bar’ in Buford city for my assignment.
coffee_all <- readRDS(here::here("buford_poi_coffee_shops.rds"))
bar_all <- readRDS(here::here("buford_poi_bars.rds"))
# City boundary
buford <- tigris::places('GA') %>% filter(NAME == 'Buford')
## Retrieving data for the year 2022
## | | | 0% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |==== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |=========================== | 38% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================= | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
# Convert the data to an sf object using XY coordinates
coffee_all_sf <- coffee_all %>%
rename(x = places.location.longitude, y = places.location.latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
bar_all_sf <- bar_all %>%
rename(x = places.location.longitude, y = places.location.latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
coffee_all_sf$type <- "Coffee"
bar_all_sf$type <- "Bar"
poi_sf <- rbind(coffee_all_sf, bar_all_sf)
# Map
tm_shape(poi_sf) +
tm_dots(col = "type",
size = "places.userRatingCount",
palette = c("Coffee" = "brown", "Bar" = "darkblue"),
border.lwd=0.5,
popup.vars = c("Name" = "places.displayName.text",
"Address" = "places.formattedAddress",
"Rating" = "places.rating",
"Rating Count" = "places.userRatingCount",
"Price Level" = "places.priceLevel")) +
tm_shape(buford) +
tm_borders()
Remove duplicated rows. Show how the number of rows has changed after removing.
Flatten/unnest list-columns. Collapse the places.types column so that each element contains a single string value. If your data includes list-columns other than places.types, handle them appropriately while ensuring each row still represents a unique POI.
Handle missing values. Remove rows with NA values in columns that you consider important. Explain your reasoning. Report how many rows remain after this step.
Filter by location. Remove rows that fall outside the city boundary. Show how the number of rows changes after filtering.
# Check the number of row
print(paste("Previous number:", nrow(poi_sf)))
## [1] "Previous number: 60"
# Keep only the first occurrence of each "ID"
poi_sf_clean <- poi_sf[!duplicated(poi_sf$places.id), ]
print(paste("Cleaned row:", nrow(poi_sf_clean)))
## [1] "Cleaned row: 60"
glimpse(poi_sf_clean)
## Rows: 60
## Columns: 11
## $ places.id <chr> "ChIJ02mJktiV9YgRM_efrnJN5TU", "ChIJz1…
## $ places.types <list> <"coffee_shop", "cafe", "point_of_int…
## $ places.formattedAddress <chr> "1600 Mall of Georgia Blvd, Buford, GA…
## $ places.rating <dbl> 4.8, 3.2, 4.0, 4.5, 3.5, 1.7, 3.9, 4.3…
## $ places.userRatingCount <int> 21, 2443, 632, 270, 1272, 15, 306, 520…
## $ places.priceLevel <chr> NA, "PRICE_LEVEL_INEXPENSIVE", "PRICE_…
## $ places.displayName.text <chr> "Brush n’ Bean (Inside PAINTED TREE BO…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…
## $ source_type <chr> "coffee_shop", "coffee_shop", "coffee_…
## $ type <chr> "Coffee", "Coffee", "Coffee", "Coffee"…
## $ geometry <POINT [°]> POINT (-83.99489 34.06405), POIN…
# Flatten / unnest list-columns (places.types)
poi_sf_flt <- poi_sf_clean %>%
mutate(places.types.unnest = places.types %>%
map_chr(., ~str_c(.x, collapse=", ")))
glimpse(poi_sf_flt)
## Rows: 60
## Columns: 12
## $ places.id <chr> "ChIJ02mJktiV9YgRM_efrnJN5TU", "ChIJz1…
## $ places.types <list> <"coffee_shop", "cafe", "point_of_int…
## $ places.formattedAddress <chr> "1600 Mall of Georgia Blvd, Buford, GA…
## $ places.rating <dbl> 4.8, 3.2, 4.0, 4.5, 3.5, 1.7, 3.9, 4.3…
## $ places.userRatingCount <int> 21, 2443, 632, 270, 1272, 15, 306, 520…
## $ places.priceLevel <chr> NA, "PRICE_LEVEL_INEXPENSIVE", "PRICE_…
## $ places.displayName.text <chr> "Brush n’ Bean (Inside PAINTED TREE BO…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…
## $ source_type <chr> "coffee_shop", "coffee_shop", "coffee_…
## $ type <chr> "Coffee", "Coffee", "Coffee", "Coffee"…
## $ geometry <POINT [°]> POINT (-83.99489 34.06405), POIN…
## $ places.types.unnest <chr> "coffee_shop, cafe, point_of_interest,…
# Handle missing value
# I am gonna drop NA places.rating and pricelevel since it is the good indicator for me to identify quality of the place.
poi_drop_na <- poi_sf_flt %>%
filter(!is.na(places.rating)) %>%
filter(!is.na(places.priceLevel))
# city boundary
buford <- tigris::places("GA", progress_bar = FALSE) %>%
filter(NAME == 'Buford') %>%
st_transform(4326)
## Retrieving data for the year 2022
# Converting poi_dropna into a sf object
poi_sf <- poi_drop_na %>%
st_as_sf(coords=c("places.location.longitude", "places.location.latitude"),
crs = 4326)
# keep only POIs inside boundary
poi_sf_in <- poi_sf[buford, ]
print(paste0("Before: ", nrow(poi_sf)))
## [1] "Before: 45"
print(paste0("After: ", nrow(poi_sf_in)))
## [1] "After: 24"
poi_sf_in %>%
slice(1:10) %>%
kable()
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
## Warning in attr(x, "format"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
| places.id | places.types | places.formattedAddress | places.rating | places.userRatingCount | places.priceLevel | places.displayName.text | places.displayName.languageCode | source_type | type | places.types.unnest | geometry |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ChIJdZp2eAaV9YgRqYKODOS3Yt8 | coffee_shop , cafe , dessert_shop , confectionery , tea_house , food_store , food , point_of_interest, store , establishment | 3480 Financial Center Way Ste M1000, Buford, GA 30519, USA | 4.3 | 520 | PRICE_LEVEL_INEXPENSIVE | Kung Fu Tea | en | coffee_shop | Coffee | coffee_shop, cafe, dessert_shop, confectionery, tea_house, food_store, food, point_of_interest, store, establishment | POINT (-83.98664 34.0742) |
| ChIJG0jmzmeU9YgRrRsawq-ISz0 | bagel_shop , coffee_shop , breakfast_restaurant, bakery , cafe , food_store , restaurant , food , point_of_interest , store , establishment | 3410 Buford Dr Ste G400, Buford, GA 30519, USA | 4.0 | 550 | PRICE_LEVEL_INEXPENSIVE | Einstein Bros. Bagels | en | coffee_shop | Coffee | bagel_shop, coffee_shop, breakfast_restaurant, bakery, cafe, food_store, restaurant, food, point_of_interest, store, establishment | POINT (-83.98455 34.07575) |
| ChIJOYMTz2eU9YgR4HdekqpEB_o | coffee_shop , breakfast_restaurant, internet_cafe , cafe , food_store , restaurant , food , point_of_interest , store , establishment | 3380 Buford Dr, Buford, GA 30519, USA | 4.1 | 1456 | PRICE_LEVEL_MODERATE | Starbucks | en | coffee_shop | Coffee | coffee_shop, breakfast_restaurant, internet_cafe, cafe, food_store, restaurant, food, point_of_interest, store, establishment | POINT (-83.98466 34.07435) |
| ChIJaevs7WeU9YgR2cekuffZWKQ | donut_shop , fast_food_restaurant, coffee_shop , bakery , cafe , dessert_shop , confectionery , food_store , restaurant , food , point_of_interest , store , establishment | 3387 Buford Dr, Buford, GA 30519, USA | 3.4 | 2019 | PRICE_LEVEL_INEXPENSIVE | Krispy Kreme | en | coffee_shop | Coffee | donut_shop, fast_food_restaurant, coffee_shop, bakery, cafe, dessert_shop, confectionery, food_store, restaurant, food, point_of_interest, store, establishment | POINT (-83.98342 34.07361) |
| ChIJb3rrIFOV9YgRGXF84KHZKgI | coffee_shop , donut_shop , fast_food_restaurant, breakfast_restaurant, bagel_shop , bakery , cafe , food_store , meal_takeaway , restaurant , food , point_of_interest , store , establishment | 3687 Buford Dr, Buford, GA 30519, USA | 3.0 | 470 | PRICE_LEVEL_INEXPENSIVE | Dunkin’ | en | coffee_shop | Coffee | coffee_shop, donut_shop, fast_food_restaurant, breakfast_restaurant, bagel_shop, bakery, cafe, food_store, meal_takeaway, restaurant, food, point_of_interest, store, establishment | POINT (-83.98571 34.0806) |
| ChIJg0L-uwCU9YgRzouIV8k3O6E | fast_food_restaurant, hamburger_restaurant, sandwich_shop , coffee_shop , cafe , breakfast_restaurant, american_restaurant , restaurant , point_of_interest , food_store , food , store , establishment | 4358 Buford Dr, Buford, GA 30518, USA | 3.3 | 1675 | PRICE_LEVEL_INEXPENSIVE | McDonald’s | en | coffee_shop | Coffee | fast_food_restaurant, hamburger_restaurant, sandwich_shop, coffee_shop, cafe, breakfast_restaurant, american_restaurant, restaurant, point_of_interest, food_store, food, store, establishment | POINT (-84.01207 34.09448) |
| ChIJUxfnTEiW9YgR0PpGO36pt88 | bakery , coffee_shop , breakfast_restaurant, cafe , dessert_shop , confectionery , food_store , restaurant , food , point_of_interest , store , establishment | 4360 S Lee St, Buford, GA 30518, USA | 4.7 | 1076 | PRICE_LEVEL_INEXPENSIVE | The Baking Grounds Bakery Cafe | en | coffee_shop | Coffee | bakery, coffee_shop, breakfast_restaurant, cafe, dessert_shop, confectionery, food_store, restaurant, food, point_of_interest, store, establishment | POINT (-84.00249 34.10295) |
| ChIJmeE8IsOT9YgRF8O3ZhqP5wg | cafe , coffee_shop , food_store , store , food , point_of_interest, establishment | 179 E Moreno St Suite C, Buford, GA 30518, USA | 4.8 | 302 | PRICE_LEVEL_MODERATE | Tchin Tchin Coffee | en | coffee_shop | Coffee | cafe, coffee_shop, food_store, store, food, point_of_interest, establishment | POINT (-84.00355 34.12005) |
| ChIJ6fil2cGT9YgRWJtFTjI5xQo | coffee_shop , internet_cafe , cafe , breakfast_restaurant, restaurant , point_of_interest , food_store , food , store , establishment | 4942 Bristol Industrial Way, Buford, GA 30518, USA | 4.0 | 643 | PRICE_LEVEL_MODERATE | Starbucks | en | coffee_shop | Coffee | coffee_shop, internet_cafe, cafe, breakfast_restaurant, restaurant, point_of_interest, food_store, food, store, establishment | POINT (-83.95604 34.1434) |
| ChIJdZb3_d2S9YgRQ3Hq6b19URg | convenience_store, gas_station , coffee_shop , atm , public_bathroom , meal_takeaway , cafe , finance , food_store , restaurant , food , store , point_of_interest, establishment | 4809 Golden Pkwy, Buford, GA 30518, USA | 3.3 | 27 | PRICE_LEVEL_INEXPENSIVE | Circle K | en | coffee_shop | Coffee | convenience_store, gas_station, coffee_shop, atm, public_bathroom, meal_takeaway, cafe, finance, food_store, restaurant, food, store, point_of_interest, establishment | POINT (-83.95288 34.1435) |
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(buford) +
tm_borders() +
tm_shape(poi_sf_in) +
tm_dots(
shape = 21,
col = "places.rating", # fill color mapped to rating
palette = "magma", # color palette
size = "places.userRatingCount",
border.col = "black", # outline color
border.lwd = 0.5, # outline thickness
popup.vars = c(
"Name" = "places.displayName.text",
"Rating" = "places.rating",
"Rating Count" = "places.userRatingCount"
)
)
## Legend for symbol sizes not available in view mode.
Around the Mall of Georgia, bars outnumber coffee shops, reflecting its role as a shopping destination where dining and nightlife dominate. In contrast, Sugar Hill and Golden Parkway have more coffee shops than bars, serving as community hubs with stronger daytime and neighborhood interactions. Overall, coffee shops cluster near community centers while bars line major roads, highlighting the contrasting rhythms of suburban day-life and nightlife.
poi_sf_in %>%
summarise(avg_rating = mean(places.rating, na.rm = TRUE))
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOINT
## Dimension: XY
## Bounding box: xmin: -84.01263 ymin: 34.07283 xmax: -83.95242 ymax: 34.14479
## Geodetic CRS: WGS 84
## avg_rating geometry
## 1 4.025 MULTIPOINT ((-83.95288 34.1...
4.02 is the average rating score.
Number of ratings shows positive correlation with average rating, showing that more user ratings tend to have slightly higher average ratings. It’s not a very steep slope, so the effect is modest, but it’s consistent. Most POIs cluster around 4.0-4.5 starts, regardless of the number of ratings.
## `geom_smooth()` using formula = 'y ~ x'
It seems like ‘price level moderate’ gets higher average rating score than ‘price level inexpensive’. IQR of the ‘price level inexpensive’ is larger than that of ‘price level moderate’. We can assume that place rating in inexpensive place has larger variance than that of place moderate.
Note: The questions above are only examples–feel free to be creative!