This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
library(here)
## here() starts at C:/Users/akaamah3/Documents/SCaRP Course Materials_Fall2025/Into_to_Urban_Analytics/CP8883_working_with_R
library(tmap)
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:purrr':
##
## flatten
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
##Instructions (Tidying your POI data)
Import your data Load the Google Places POI data you downloaded for Mini-Assignment 1. As a reminder, state the city you selected and the two POI types you chose in the previous assignment.
Tidy your data Work through the following steps to clean and prepare your dataset:
Remove duplicated rows. Show how the number of rows has changed after removing.
Flatten/unnest list-columns. Collapse the places.types column so that each element contains a single string value. If your data includes list-columns other than places.types, handle them appropriately while ensuring each row still represents a unique POI. Handle missing values. Remove rows with NA values in columns that you consider important. Explain your reasoning. Report how many rows remain after this step. Filter by location. Remove rows that fall outside the city boundary. Show how the number of rows changes after filtering.
Show your cleaned POI data Print the first 10 rows of your final dataset using either print() or kableExtra::kable().
Explore and report findings Write about at least four interesting observations you discovered (maximum 200 words). Include plots or maps if helpful. Example questions you might explore include:
What are the most noticeable differences between the two POI types? What is the average rating score? Does it seem related to the number of ratings? Is there an association between price level and rating score? Is there any connection between POI rating scores and household income? Do POIs tend to cluster in specific neighborhoods, or are they spread evenly across the city? If you had to choose one POI to visit based on the dataset, which would you pick and why? Note: The questions above are only examples–feel free to be creative!
#1. Importation of data
### The name of the chosen city is witerville, located in Clerke country, Georgia State
### The chosen POI were "bank" and "school"
google_places_poi_data<- read_rds("google_poi_data.rds") # loading saved google places poi data
#2. Tidying loaded Google POI data
# Removing duplicates of rows
dupl_google_places_poi_data<- google_places_poi_data %>%
distinct (.keep_all=TRUE)
## Showing changes in the number of rows for google_places_poi_data and dupl_google_places_poi_data
nrow(google_places_poi_data)
## [1] 42
nrow(dupl_google_places_poi_data)
## [1] 23
for (col in colnames(dupl_google_places_poi_data)){
if (class(dupl_google_places_poi_data[[col]]) == "list"){
print(col)
}
}
## [1] "places.types"
## [1] "places.reviews"
dupl_google_places_poi_data$places.types[[1]]
## [1] "mexican_restaurant" "restaurant" "food"
## [4] "point_of_interest" "establishment"
##Flattening/Unnesting list-columns
## Collapsing of places.types column so that each element contains a single string
coll_google_places_poi_data <- dupl_google_places_poi_data %>%
mutate(places.types = places.types %>%
map_chr(., ~ paste(.x, collapse = ", ")))
## Other list columns other than places.types
coll_google_places_poi_data <- dupl_google_places_poi_data %>%
mutate(across(where(is.list), ~ map_chr(.x, ~ paste(.x, collapse = ", "))))
head(coll_google_places_poi_data$places.types)
## [1] "mexican_restaurant, restaurant, food, point_of_interest, establishment"
## [2] "mexican_restaurant, meal_takeaway, restaurant, food, point_of_interest, establishment"
## [3] "mexican_restaurant, restaurant, food, point_of_interest, establishment"
## [4] "mexican_restaurant, restaurant, food, point_of_interest, establishment"
## [5] "mexican_restaurant, restaurant, food, point_of_interest, establishment"
## [6] "mexican_restaurant, restaurant, food, point_of_interest, establishment"
### Removing missing values
coll_google_places_poi_data %>% map_dbl(., ~sum(is.na(.x)))
## places.id
## 0
## places.types
## 0
## places.formattedAddress
## 0
## places.rating
## 1
## places.businessStatus
## 0
## places.userRatingCount
## 1
## places.takeout
## 0
## places.delivery
## 1
## places.dineIn
## 0
## places.primaryType
## 0
## places.reviews
## 0
## places.outdoorSeating
## 3
## places.menuForChildren
## 6
## places.location.latitude
## 0
## places.location.longitude
## 0
## places.displayName.text
## 0
## places.displayName.languageCode
## 0
## places.priceRange.startPrice.currencyCode
## 1
## places.priceRange.startPrice.units
## 1
## places.priceRange.endPrice.currencyCode
## 1
## places.priceRange.endPrice.units
## 1
## places.priceLevel
## 6
## places.allowsDogs
## 9
## places.reviewSummary.flagContentUri
## 2
## places.reviewSummary.reviewsUri
## 2
## places.reviewSummary.text.text
## 2
## places.reviewSummary.text.languageCode
## 2
## places.reviewSummary.disclosureText.text
## 2
## places.reviewSummary.disclosureText.languageCode
## 2
# Drop rows that have missing values in any of the four columns
dropna_google_places_poi_data <- coll_google_places_poi_data %>%
filter(
!is.na(places.outdoorSeating) &
!is.na(places.menuForChildren) &
!is.na(places.userRatingCount) &
!is.na(places.priceLevel)
)
print(paste0("Before: ", nrow(coll_google_places_poi_data)))
## [1] "Before: 23"
print(paste0("After: ", nrow(dropna_google_places_poi_data)))
## [1] "After: 12"
### Filtering by location
# city boundary
Winterville <- tigris::places("GA", progress_bar = FALSE) %>%
filter(NAME == 'Winterville') %>%
st_transform(4326)
## Retrieving data for the year 2024
# Converting dropna_google_places_poi_data into a sf object
google_POI_sf <- dropna_google_places_poi_data %>%
st_as_sf(coords=c("places.location.longitude", "places.location.latitude"),
crs = 4326)
# POIs within the city boundary
google_POI_sf_in <- google_POI_sf[st_within(google_POI_sf, Winterville, sparse = FALSE), ]
print(paste0("Before: ", nrow(google_POI_sf)))
## [1] "Before: 12"
print(paste0("After: ", nrow(google_POI_sf_in)))
## [1] "After: 0"
##3. Showing my cleaned POI data
kableExtra::kable(google_POI_sf_in)
places.id | places.types | places.formattedAddress | places.rating | places.businessStatus | places.userRatingCount | places.takeout | places.delivery | places.dineIn | places.primaryType | places.reviews | places.outdoorSeating | places.menuForChildren | places.displayName.text | places.displayName.languageCode | places.priceRange.startPrice.currencyCode | places.priceRange.startPrice.units | places.priceRange.endPrice.currencyCode | places.priceRange.endPrice.units | places.priceLevel | places.allowsDogs | places.reviewSummary.flagContentUri | places.reviewSummary.reviewsUri | places.reviewSummary.text.text | places.reviewSummary.text.languageCode | places.reviewSummary.disclosureText.text | places.reviewSummary.disclosureText.languageCode | geometry |
---|
##4. Exploring and Reporting findings
There exploration of the dataset is a level of cleaning that stems from different stages. The data frame (saved google places poi data from mini assignment 1) was imported into RStudio. The total number of observations from the data frame was 42. This was subjected to tidying by first removing duplicates of row which resulted in the reduction of the rows from 42 to 23. The colunms in the new data frame was then collapsed such that string values will be separated by commas. Missing values were then identified and removed with the exception of important columns which resulted in the reduction of the number of observations from 23 to 12. At this point, the remaining POI’s were filtered by location to be able to identify if poi’s were located within the city of winterville’s boundary. Upon exploration of the poi’s within the city boundary, the number of observations had reduced to 0. This gives an indication that non of the poi’s that were observed in the city Winterville in mini assignment one actually exists within the boundary of the city after the data was subjected to tidying.