1. Tidying POI Data

1.1 Import data

poi <- read_rds("duluth_google_poi_data.rds")

1.2 Tidy data

1.2.1 Remove duplicated rows

poi_unique <- poi %>% distinct(places.id, .keep_all=T)

Show how the number of rows has changed after removing.

glue::glue("Before dropping duplicated rows, there were {nrow(poi)} rows. After dropping them, there are {nrow(poi_unique)} rows.")

## Before dropping duplicated rows, there were 161 rows. After dropping them, there are 70 rows.

1.2.2 Flatten/unnest list-columns

#Collapse the places.types column so that each element contains a single string value
poi_flat <- poi_unique %>%
  mutate(places.types = places.types %>% 
           map_chr(., ~str_c(.x, collapse=",")))
#check if all columns that are list-columns are flattened
glimpse(poi_flat)

## Rows: 70
## Columns: 10
## $ places.id                       <chr> "ChIJdxvWExq99YgReZJFbpnZ1xo", "ChIJRe…
## $ places.types                    <chr> "bakery,coffee_shop,wholesaler,cafe,fo…
## $ places.formattedAddress         <chr> "1290 Old Peachtree Rd NW, Duluth, GA …
## $ places.rating                   <dbl> 4.7, 3.8, 4.3, 4.2, 4.1, 3.7, 3.6, NA,…
## $ places.priceLevel               <chr> "PRICE_LEVEL_MODERATE", "PRICE_LEVEL_M…
## $ places.userRatingCount          <int> 264, 224, 573, 314, 842, 931, 14, NA, …
## $ places.location.latitude        <dbl> 34.00390, 33.99795, 34.00381, 34.00515…
## $ places.location.longitude       <dbl> -84.08548, -84.08994, -84.08333, -84.0…
## $ places.displayName.text         <chr> "Paris Baguette", "Starbucks", "Hansel…
## $ places.displayName.languageCode <chr> "en", "en", "en", "en", "en", "en", "e…

1.2.3 Handle missing values

#Drop rows that have missing values in columns: rating, rating count, price level
poi_dropna <- poi_flat %>% 
  drop_na(c(places.rating, places.userRatingCount, places.priceLevel))

Explain your reasoning

I chose the fields “Rating”, “Rating Count”, and “Price Level” as fields to check for NA values, because I think the three fields are the most crucial in providing useful information beyond POI location and name.

Report how many rows remain after dropping missing values

glue::glue("Before further dropping rows with missing values, there were {nrow(poi_flat)} rows. After dropping them, there are {nrow(poi_dropna)} rows.")

## Before further dropping rows with missing values, there were 70 rows. After dropping them, there are 41 rows.

1.2.4 Filter by location

#Remove rows that fall outside the city boundary
#City of Duluth boundary
duluth <- tigris::places('GA', progress_bar = FALSE) %>% 
  filter(NAME == 'Duluth') %>%
  st_transform(4326)

#Converting poi_dropna into a sf object
poi_sf <- poi_dropna %>% 
  st_as_sf(coords=c("places.location.longitude", "places.location.latitude"), 
           crs = 4326)

#POIs within the City of Duluth boundary
poi_sf_in <- poi_sf[duluth, ]

#Create separate column extracting restaurant type
poi_sf_in$type <- NA
for (i in 1:nrow(poi_sf_in)){
  if ((grepl("brunch_restaurant", poi_sf_in$places.types[i])) & (is.na(poi_sf_in$type[i])==TRUE)){
    poi_sf_in$type[i] <- "bruch"
  }else if ((grepl("cafe", poi_sf_in$places.types[i])) & (is.na(poi_sf_in$type[i])==TRUE)){
    poi_sf_in$type[i] <- "cafe"
  }
}

Show how the number of rows changes after filtering

glue::glue("Before further dropping rows outside of City of Duluth Boundary, there were {nrow(poi_sf)} rows. After dropping them, there are {nrow(poi_sf_in)} rows.")

## Before further dropping rows outside of City of Duluth Boundary, there were 41 rows. After dropping them, there are 17 rows.

1.3 Show cleaned POI data

Show how the number of rows changes after entire cleaning

glue::glue("Before cleaning POI's for duplicates, missing values, and beyond city boundary, there were {nrow(poi)} rows. After dropping them, there are {nrow(poi_sf_in)} rows.")

## Before cleaning POI's for duplicates, missing values, and beyond city boundary, there were 161 rows. After dropping them, there are 17 rows.

Print the first 10 rows of final dataset

kable(poi_sf_in)

	places.id	places.types	places.formattedAddress	places.rating	places.priceLevel	places.userRatingCount	places.displayName.text	places.displayName.languageCode	geometry	type
10	ChIJaxQeiA6i9YgRwPOZ-xnbinw	coffee_shop,bagel_shop,fast_food_restaurant,donut_shop,breakfast_restaurant,cafe,bakery,meal_takeaway,food_store,store,restaurant,food,point_of_interest,establishment	3435 Peachtree Industrial Blvd, Duluth, GA 30096, USA	3.8	PRICE_LEVEL_INEXPENSIVE	601	Dunkin’	en	POINT (-84.17066 34.00697)	cafe
11	ChIJjy-rqDGi9YgRbfpdeqcGduc	fast_food_restaurant,hamburger_restaurant,sandwich_shop,breakfast_restaurant,coffee_shop,cafe,american_restaurant,food_store,store,restaurant,food,point_of_interest,establishment	3485 Peachtree Industrial Blvd, Duluth, GA 30096, USA	3.3	PRICE_LEVEL_INEXPENSIVE	1681	McDonald’s	en	POINT (-84.17092 34.00534)	cafe
13	ChIJMTmMjMWZ9YgRl9bvKt7CjuM	asian_restaurant,sandwich_shop,tea_house,coffee_shop,cafe,vietnamese_restaurant,seafood_restaurant,food_store,store,restaurant,food,point_of_interest,establishment	3095 Peachtree Industrial Blvd #120, Duluth, GA 30097, USA	4.3	PRICE_LEVEL_INEXPENSIVE	173	Lobster Banh Mi	en	POINT (-84.15758 34.02283)	cafe
14	ChIJFSqINDmj9YgRYppWyEyuU2c	donut_shop,fast_food_restaurant,breakfast_restaurant,coffee_shop,cafe,dessert_shop,bakery,confectionery,meal_takeaway,food_store,store,restaurant,food,point_of_interest,establishment	4165 Pleasant Hill Rd, Duluth, GA 30096, USA	5.0	PRICE_LEVEL_INEXPENSIVE	2	Shipley Do-Nuts	en	POINT (-84.17116 34.00384)	cafe
15	ChIJqUgf4xGi9YgRehwcXtzp5HE	coffee_shop,breakfast_restaurant,internet_cafe,cafe,food_store,store,restaurant,food,point_of_interest,establishment	3501 Peachtree Industrial Blvd, Duluth, GA 30096, USA	4.1	PRICE_LEVEL_MODERATE	1108	Starbucks	en	POINT (-84.17067 34.00379)	cafe
17	ChIJWbYMlFSi9YgReTXKTrCaHUM	coffee_shop,tea_house,cafe,point_of_interest,food_store,food,store,establishment	2628 Pleasant Hill Rd #100, Duluth, GA 30096, USA	4.6	PRICE_LEVEL_INEXPENSIVE	1035	Boba Mocha	en	POINT (-84.14764 33.97152)	cafe
18	ChIJC7rRz6qj9YgRL5EHmxg7YRY	gas_station,coffee_shop,convenience_store,fast_food_restaurant,liquor_store,cafe,meal_takeaway,dessert_shop,confectionery,restaurant,point_of_interest,food_store,food,store,establishment	2592 Pleasant Hill Rd, Duluth, GA 30098, USA	3.2	PRICE_LEVEL_INEXPENSIVE	248	RaceTrac	en	POINT (-84.14529 33.96979)	cafe
19	ChIJRfiUQtOj9YgRyiMsgL9WRCo	tea_house,cafe,point_of_interest,food,establishment	2570 Pleasant Hill Rd Suite #101, Duluth, GA 30096, USA	4.2	PRICE_LEVEL_INEXPENSIVE	455	Tiger Sugar [DULUTH]	en	POINT (-84.14501 33.96908)	cafe
20	ChIJiWdFPACj9YgR12Cmx4EHKqM	bakery,coffee_shop,cafe,breakfast_restaurant,wholesaler,dessert_shop,confectionery,restaurant,point_of_interest,food_store,food,store,establishment	2550 Pleasant Hill Rd bldg 300, Duluth, GA 30096, USA	4.2	PRICE_LEVEL_MODERATE	15	Tous Les Jours	en	POINT (-84.14289 33.9695)	cafe
24	ChIJMdpNq2Oi9YgR4qd6ab3U_8g	coffee_shop,donut_shop,bagel_shop,fast_food_restaurant,meal_takeaway,breakfast_restaurant,bakery,cafe,food_store,restaurant,food,store,point_of_interest,establishment	3185 Buford Hwy, Duluth, GA 30096, USA	3.9	PRICE_LEVEL_INEXPENSIVE	312	Dunkin’	en	POINT (-84.14618 33.99999)	cafe
25	ChIJzZj65uWj9YgRUCFRjhq_jas	brunch_restaurant,asian_restaurant,cafe,korean_restaurant,restaurant,food,point_of_interest,establishment	3455 Duluth Hwy Suite 1B, Duluth, GA 30096, USA	4.6	PRICE_LEVEL_INEXPENSIVE	437	The Cream	en	POINT (-84.14364 34.00159)	bruch
26	ChIJ4ytxgzC99YgRA6xOz—irY	restaurant,coffee_shop,brunch_restaurant,cafe,food_store,american_restaurant,food,store,point_of_interest,establishment	3550 W Lawrenceville St #210, Duluth, GA 30096, USA	4.4	PRICE_LEVEL_MODERATE	2164	Maple Street Biscuit Company	en	POINT (-84.14503 34.00363)	bruch
27	ChIJx0uk0Yqj9YgRSADpy4VNZ6g	coffee_shop,cafe,food_store,food,store,point_of_interest,establishment	2640 Old Peachtree Rd NW C, Duluth, GA 30097, USA	4.8	PRICE_LEVEL_INEXPENSIVE	168	Coffee That Matters by Phoenix Roasters	en	POINT (-84.12892 34.00774)	cafe
28	ChIJWcJuXpyi9YgRjQzDFlTtCVI	fast_food_restaurant,coffee_shop,hamburger_restaurant,sandwich_shop,breakfast_restaurant,cafe,food_store,american_restaurant,restaurant,food,store,point_of_interest,establishment	2695 Old Peachtree Rd NW, Duluth, GA 30097, USA	3.5	PRICE_LEVEL_INEXPENSIVE	997	McDonald’s	en	POINT (-84.12975 34.00898)	cafe
29	ChIJ____2xOi9YgRk4P9ImsuO9E	breakfast_restaurant,brunch_restaurant,shopping_mall,restaurant,food,point_of_interest,establishment	3585 Peachtree Industrial Blvd #122, Duluth, GA 30096, USA	4.8	PRICE_LEVEL_MODERATE	3498	The Breakfast Bar	en	POINT (-84.16956 34.00044)	bruch
30	ChIJO9Vg3hOi9YgRYnTIkRDoWVA	coffee_shop,dessert_shop,confectionery,cafe,food_store,restaurant,food,store,point_of_interest,establishment	3585 Peachtree Industrial Blvd #128, Duluth, GA 30096, USA	4.7	PRICE_LEVEL_INEXPENSIVE	344	Cafe Rothem 카페 로뎀	en	POINT (-84.16974 34.00045)	cafe
36	ChIJ77UJ6YiY9YgR2f5iJCpFZIY	gas_station,coffee_shop,convenience_store,fast_food_restaurant,liquor_store,cafe,meal_takeaway,dessert_shop,confectionery,restaurant,point_of_interest,food_store,food,store,establishment	2180 Peachtree Industrial Blvd, Duluth, GA 30097, USA	3.2	PRICE_LEVEL_INEXPENSIVE	222	RaceTrac	en	POINT (-84.11948 34.02124)	cafe

Mini Assignment 2

Vivian Lin

2025-09-25

Notes

1. Tidying POI Data

1.1 Import data

1.2 Tidy data

1.2.1 Remove duplicated rows

Show how the number of rows has changed after removing.

1.2.2 Flatten/unnest list-columns

1.2.3 Handle missing values

Explain your reasoning

Report how many rows remain after dropping missing values

1.2.4 Filter by location

Show how the number of rows changes after filtering

1.3 Show cleaned POI data

Show how the number of rows changes after entire cleaning

Print the first 10 rows of final dataset

2. Explore and report findings

2.1 Plot map of brunch restaurant and cafe locations & ratings in Duluth

What are the most noticeable differences between the two POI types?

Do POIs tend to cluster in specific neighborhoods, or are they spread evenly across the city?

2.2 Plot map of POI’s ratings and rating counts in Duluth

If you had to choose one area to visit based on the dataset, which would you pick and why?

Is there an association between rating score and count?