Tidying Google POI data

R Code Steps

  1. Loading all the necessary R packages that will be used throughout the workflow
library(kableExtra)
library(tidycensus)
library(sf)
library(tmap)
library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(knitr)
  1. Importing and previewing POI data
poi <- readRDS("google_poi_data.rds")

poi %>% 
  select(-places.displayName.languageCode) %>% 
  head(5) %>% 
  kable()
places.id places.types places.formattedAddress places.rating places.priceLevel places.userRatingCount places.location.latitude places.location.longitude places.displayName.text
ChIJhS0bgblh9IgRXsWcGEVHojQ pharmacy , drugstore , point_of_interest, health , store , establishment 1524 GA-16, Griffin, GA 30223, USA 2.3 PRICE_LEVEL_MODERATE 100 33.24501 -84.29424 Kroger Pharmacy
ChIJYZw7t5KK9IgRaeKqe94IPwM drugstore , convenience_store, food_store , clothing_store , food , point_of_interest, health , store , establishment 1602 N Expy, Griffin, GA 30223, USA 3.2 PRICE_LEVEL_MODERATE 92 33.27481 -84.29009 Walgreens
ChIJPaQo9xOL9IgRxL1mjhK7DKQ pharmacy , point_of_interest, health , store , establishment 1602 N Expy, Griffin, GA 30223, USA 4.0 NA 1 33.27482 -84.29017 COVID-19 Drive-Thru Testing at Walgreens
ChIJfal1vJKK9IgRRTAUE1xcvg8 pharmacy , point_of_interest, health , store , establishment 1602 N Expy, Griffin, GA 30223, USA 3.5 PRICE_LEVEL_MODERATE 20 33.27484 -84.29031 Walgreens Pharmacy
ChIJa-71qmGK9IgRGEKqF_wWEa4 veterinary_care , pharmacy , point_of_interest, health , store , establishment 656 N Expy, Griffin, GA 30223, USA 4.8 NA 556 33.25984 -84.28817 Griffin Animal Care
  1. Removing duplicated rows
poi_unique <- poi %>% distinct(places.id, .keep_all=T)
glue::glue("Before dropping duplicated rows, there were {nrow(poi)} rows. After dropping them, there are {nrow(poi_unique)} rows.")
## Before dropping duplicated rows, there were 116 rows. After dropping them, there are 34 rows.
  1. Multiple variables in one column
str_split_fixed(poi_unique$places.formattedAddress, pattern = "GA |, USA", n = 3) %>% head(10)
##       [,1]                                 [,2]    [,3]
##  [1,] "1524 GA-16, Griffin, "              "30223" ""  
##  [2,] "1602 N Expy, Griffin, "             "30223" ""  
##  [3,] "1602 N Expy, Griffin, "             "30223" ""  
##  [4,] "1602 N Expy, Griffin, "             "30223" ""  
##  [5,] "656 N Expy, Griffin, "              "30223" ""  
##  [6,] "1569 N Expy, Griffin, "             "30223" ""  
##  [7,] "1665 W McIntosh Rd, Griffin, "      "30223" ""  
##  [8,] "1523 A, 1523 Zebulon Rd, Griffin, " "30224" ""  
##  [9,] "1655 Zebulon Rd, Griffin, "         "30224" ""  
## [10,] "1655 Zebulon Rd, Griffin, "         "30224" ""
str_split_fixed(poi_unique$places.formattedAddress, pattern = "GA |, USA", n = 3) %>% .[,2]
##  [1] "30223" "30223" "30223" "30223" "30223" "30223" "30223" "30224" "30224"
## [10] "30224" "30224" "30223" "30224" "30223" "30223" "30223" "30223" "30224"
## [19] "30224" "30224" "30224" "30224" "30224" "30224" "30224" "30224" "30224"
## [28] "30223" "30224" "30224" "30224" "30224" "30266" "30224"
  1. Working with semi-structured data: Flatten/unnext list in columns. Identify which columns are list columns in the dataset
for (col in colnames(poi_unique)){
  if (class(poi_unique[[col]]) == "list"){
    print(col)
  }
}
## [1] "places.types"
poi_unique$places.types[[1]]
## [1] "pharmacy"          "drugstore"         "point_of_interest"
## [4] "health"            "store"             "establishment"
poi_flat <- poi_unique %>%
mutate(places.types = places.types %>% 
map_chr(~ paste(.x, collapse = ",")))

head(poi_flat$places.types)
## [1] "pharmacy,drugstore,point_of_interest,health,store,establishment"                                        
## [2] "drugstore,convenience_store,food_store,clothing_store,food,point_of_interest,health,store,establishment"
## [3] "pharmacy,point_of_interest,health,store,establishment"                                                  
## [4] "pharmacy,point_of_interest,health,store,establishment"                                                  
## [5] "veterinary_care,pharmacy,point_of_interest,health,store,establishment"                                  
## [6] "pharmacy,point_of_interest,health,store,establishment"
  1. Data doesn’t have any list column that contains more complex semi-structured data such as place reviews

  2. Handle missing values: In this dataset there are null values in columns such as places.priceLevel, places.rating and places.userRatingCount. In total there 34 data rows out of which a minimum of 7 rows will be dropped if i choose to delete rows with missing values. Since anyway the data columns user rating, price level or user rating count do not have any major implications in the analysis of spatial distribution of drugstores and pharmacies in Griffin, I choose to retain all the rows in my data frame.

To print the current number of rows in dataframe

print(paste0("Number of rows: ", nrow(poi_flat)))
## [1] "Number of rows: 34"
  1. Filtering POIs outside the city boundary:
griffin <- tigris::places("GA", progress_bar = FALSE) %>% 
  filter(NAME == 'Griffin') %>% 
  st_transform(4326)

poi_sf <- poi_flat %>% 
  st_as_sf(coords=c("places.location.longitude", "places.location.latitude"), 
           crs = 4326)

poi_sf_in <- st_filter(poi_sf, griffin)

print(paste0("Before: ", nrow(poi_sf)))
## [1] "Before: 34"
print(paste0("After: ", nrow(poi_sf_in)))
## [1] "After: 26"
glue::glue("number of rows before: {nrow(poi)} -> after: {nrow(poi_sf_in)} \n
number of columns before: {ncol(poi)} -> after: {ncol(poi_sf_in)} \n")
## number of rows before: 116 -> after: 26 
## 
## number of columns before: 10 -> after: 9

7.Visualize

tmap_mode("view")

tm_shape(griffin) + 
  tm_borders() + 
  tm_shape(poi_sf_in) + 
  tm_dots(shape = 21,
          col = "black", # if tmap v3, `border.col = "black"`
          lwd = 1, # if tmap v3, `border.lwd = 0.5`
          fill = "places.rating", # if tmap v3, `col = "places.rating`
          fill.scale = tm_scale_continuous(values = "magma"), # if tmap v3, `palette = "magma"`
          size = "places.userRatingCount",
          popup.vars = c("Name" = "places.displayName.text",
                         "Rating" = "places.rating",
                         "Rating Count" = "places.userRatingCount"))

8.Showing cleaned POI data

kable(head(poi_flat,10), booktabs = TRUE) %>%
kable_styling(font_size = 8)
places.id places.types places.formattedAddress places.rating places.priceLevel places.userRatingCount places.location.latitude places.location.longitude places.displayName.text places.displayName.languageCode
ChIJhS0bgblh9IgRXsWcGEVHojQ pharmacy,drugstore,point_of_interest,health,store,establishment 1524 GA-16, Griffin, GA 30223, USA 2.3 PRICE_LEVEL_MODERATE 100 33.24501 -84.29424 Kroger Pharmacy en
ChIJYZw7t5KK9IgRaeKqe94IPwM drugstore,convenience_store,food_store,clothing_store,food,point_of_interest,health,store,establishment 1602 N Expy, Griffin, GA 30223, USA 3.2 PRICE_LEVEL_MODERATE 92 33.27481 -84.29009 Walgreens en
ChIJPaQo9xOL9IgRxL1mjhK7DKQ pharmacy,point_of_interest,health,store,establishment 1602 N Expy, Griffin, GA 30223, USA 4.0 NA 1 33.27482 -84.29017 COVID-19 Drive-Thru Testing at Walgreens en
ChIJfal1vJKK9IgRRTAUE1xcvg8 pharmacy,point_of_interest,health,store,establishment 1602 N Expy, Griffin, GA 30223, USA 3.5 PRICE_LEVEL_MODERATE 20 33.27484 -84.29031 Walgreens Pharmacy en
ChIJa-71qmGK9IgRGEKqF_wWEa4 veterinary_care,pharmacy,point_of_interest,health,store,establishment 656 N Expy, Griffin, GA 30223, USA 4.8 NA 556 33.25984 -84.28817 Griffin Animal Care en
ChIJw3LmIvKK9IgR9UMqBDlCXdc pharmacy,point_of_interest,health,store,establishment 1569 N Expy, Griffin, GA 30223, USA 3.1 PRICE_LEVEL_INEXPENSIVE 50 33.27241 -84.29667 Walmart Pharmacy en
ChIJx1ZtzfGK9IgRf1vDkLjfwRU convenience_store,discount_store,gift_shop,drugstore,grocery_store,food_store,food,point_of_interest,health,store,establishment 1665 W McIntosh Rd, Griffin, GA 30223, USA 4.1 PRICE_LEVEL_INEXPENSIVE 435 33.27425 -84.29778 Dollar General en
ChIJ6wFJIfCJ9IgR7C6W0dAjS-U convenience_store,gift_shop,drugstore,discount_store,health,point_of_interest,grocery_store,food_store,food,store,establishment 1523 A, 1523 Zebulon Rd, Griffin, GA 30224, USA 4.0 PRICE_LEVEL_INEXPENSIVE 264 33.20417 -84.28223 Dollar General en
ChIJVVWVqumJ9IgRPKkycm6SCEs pharmacy,health,point_of_interest,store,establishment 1655 Zebulon Rd, Griffin, GA 30224, USA NA NA NA 33.19787 -84.28326 COVID-19 Drive-Thru Testing at Walgreens en
ChIJrTcFjsSJ9IgRoVIlWlCcy8I pharmacy,health,point_of_interest,store,establishment 1655 Zebulon Rd, Griffin, GA 30224, USA 4.4 NA 18 33.19787 -84.28326 Walgreens Pharmacy en

Questions & Answers

1. What are the most noticeable differences between the two POI types?
Answer: The map itself does not clearly distinguish between pharmacies and drugstores, because in the dataset these POIs were grouped with other place types, often due to their situation being inside larger commercial complexes. As a result, the most noticeable differences are not between “pharmacy” versus “drugstore,” but rather in how individual POIs vary in terms of rating levels (color), popularity (size of circles), and spatial distribution across Griffin.
2. What is the average rating score? Does it seem related to the number of ratings?
Answer: Most ratings cluster around 3.5 to 4.5, so the average is roughly in that range. The relationship to the number of ratings is weak: larger circles are not always lighter (high rating) or darker (low rating). This suggests that more popular places don’t necessarily have higher scores.
3. Is there an association between price level and rating score?
Answer: From the dataset it was observed that price level had many missing values. Thus I chose not to visualize in the map as there were insufficient data to make imply any significant pattern.
4. Do POIs tend to cluster in specific neighborhoods, or are they spread evenly across the city?
Answer: They cluster strongly in central Griffin. A few appear along major roads or on the city’s edge, but the spread is uneven. Pharmacies and drugstores are concentrated in the center probably the downtown area, and becomes sparsely distributed as we move towards the suburbs.
5. If you had to choose one POI to visit based on the dataset, which would you pick and why?
Answer: I’d pick one of the larger, lighter-colored circles in the city center because they represent POIs with both many reviews (indicating reliability) and higher ratings (indicating quality). That combination suggests a place that is both popular and well-liked.
6.Is accessibility influenced by major thoroughfares in the city?
Answer: The clustering of POIs along expressways is likely not random. Besides car accessibility, these locations benefit from high visibility to passing traffic, easier logistics for deliveries, and the tendency for commercial zoning to allow larger complexes at major road intersections. Such corridors attract chain drugstores and pharmacies that rely on steady commuter flows rather than walk-in neighborhood customers.
7. Are there under-served areas in Griffin with no nearby POIs?
Answer: Large parts of the western and southern city limits show no POIs, suggesting gaps in service coverage.