In the first section, you need to select one Census Tract that you think is the most walkable and another one that you think is least walkable within Fulton and DeKalb Counties, GA. As long as the two Census Tracts are within the two counties, you can pick any two you want. If the area you want to use as walkable/unwalkable area is not well-covered by one Census Tract, you can select multiple tracts (e.g., selecting three adjacent tracts as one walkable area). The definition of ‘walkable’ can be your own - you can choose solely based on your experience (e.g., had best/worst walking experience), refer to Walk Score, or any other mix of criteria you want. After you make the selection, provide a short write-up with a map explaining why you chose those Census Tracts.
The second section is the main part of this assignment in which you prepare OSM data, download GSV images, apply computer vision (i.e., semantic segmentation).
In the third section, you will summarise and analyze the output and provide your findings. After you apply computer vision to the images, you will have the number of pixels in each image that represent 150 categories in your data. You will focus on the following categories in your analysis: building, sky, tree, road, and sidewalk. Specifically, you will (1) create maps to visualize the spatial distribution of different elements, (2) compare the mean of each category between the two Census Tract and (3) draw box plots to compare the distributions.
library(tidyverse)
library(tidycensus)
library(osmdata)
library(sfnetworks)
library(units)
library(sf)
library(tidygraph)
library(tmap)
library(here)
ttm()
tmap_mode("view")
# helper functions
preview_polygon <- function(polygon) {
map <- tm_shape(polygon) +
tm_borders(lwd = 2, col = "red")
map
}
preview_polyline <- function(polyline) {
map <- tm_basemap("CartoDB.Voyager")+
tm_shape(polyline) +
tm_lines(line_width = 2, line_color = "blue")
map
}
Select walkable Census Tract(s) and unwalkable Census Tract(s) within Fulton and DeKalb counties.
In the quest to search for Census Tracts, you can use an approach similar to what we did in Step 1 of ‘Module4_getting_GSV_images.Rmd’. This time, instead of cities, we are focusing on Census Tracts; and the search boundary is the two counties, instead of metro Atlanta.
Provide a brief description and visualization of your Census Tracts. Why do you think the Census Tracts are walkable and unwalkable? What were the contributing factors?
Based on personal walking experience, I have chosen census tract 10.01, also known as Tech Square, as the most walkable tract.
For the most un-walkable tract, I have selected 113.10, which is located near the Atlanta airport. Although I have not personally walked there, I believe it to be highly unwalkable based on my observations while passing by in cars.
Fill out the template to complete the script.
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Set up your api key here
# census_api_key("census_api_key")
tidycensus::census_api_key(Sys.getenv("census_api_key"))
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Download Census Tract polygon for Fulton and DeKalb
tract <- get_acs("tract",
variables = c('tot_pop' = 'B01001_001'),
year = 2020,
state = "GA",
county = c("Fulton", "DeKalb"),
geometry = TRUE)
# =========== NO MODIFY ZONE ENDS HERE ========================================
# preview_polygon(tract)
# TASK ////////////////////////////////////////////////////////////////////////
# The purpose of this TASK is to create one bounding box for walkable Census Tract and another bounding box for unwalkable Census Tract.
# As long as you generate what's needed for the subsequent codes, you are good. The numbered list of tasks below is to provide some hints.
# 1. Write the GEOID of walkable & unwalkable Census Tracts. e.g., tr1_ID <- c("13121001205", "13121001206")
# 2. Extract the selected Census Tracts using tr1_ID & tr2_ID
# 3. Create their bounding boxes using st_bbox(), and
# 4. Assign them to tract_1_bb and tract_1_bb, respectively.
# 5. Change the coordinate system to GCS, if necessary.
# For the walkable Census Tract(s)
# 1.
tr1_ID <- "13121001001" # **YOUR CODE HERE..** --> For example, tr1_ID <- c("13121001205", "13121001206").
# 2~4
tract_1 <- filter(tract, GEOID == tr1_ID)
tract_1_bb <- st_bbox(tract_1)
# **YOUR CODE HERE..**
# For the unwalkable Census Tract(s)
# 1.
tr2_ID <- "13121011310" # **YOUR CODE HERE..**
# 2~4
tract_2 <- filter(tract, GEOID == tr2_ID)
tract_2_bb <- st_bbox(tract_2)
# **YOUR CODE HERE..**
# //TASK //////////////////////////////////////////////////////////////////////
# preview_polygon(tract_2)
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get OSM data for the two bounding box
osm_1 <- opq(bbox = tract_1_bb) %>%
add_osm_feature(key = 'highway',
value = c("motorway", "trunk", "primary",
"secondary", "tertiary", "unclassified",
"residential")) %>%
osmdata_sf() %>%
osm_poly2line()
osm_2 <- opq(bbox = tract_2_bb) %>%
add_osm_feature(key = 'highway',
value = c("motorway", "trunk", "primary",
"secondary", "tertiary", "unclassified",
"residential")) %>%
osmdata_sf() %>%
osm_poly2line()
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_1 and osm_2 to sfnetworks objects (set directed = FALSE)
# 2. Clean the network by (1) deleting parallel lines and loops, (2) create missing nodes, and (3) remove pseudo nodes,
# 3. Add a new column named length using edge_length() function.
net1 <- osm_1$osm_lines %>%
select(osm_id, highway) %>%
sfnetworks::as_sfnetwork(directed = FALSE) %>%
activate("edges") %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
convert(., sfnetworks::to_spatial_subdivision) %>%
convert(., sfnetworks::to_spatial_smooth) %>%
mutate(length = edge_length())
## Warning: to_spatial_subdivision assumes attributes are constant over geometries
net2 <- osm_2$osm_lines %>%
select(osm_id, highway) %>%
sfnetworks::as_sfnetwork(directed = FALSE) %>%
activate("edges") %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
convert(., sfnetworks::to_spatial_subdivision) %>%
convert(., sfnetworks::to_spatial_smooth) %>%
mutate(length = edge_length())
## Warning: to_spatial_subdivision assumes attributes are constant over geometries
# **YOUR CODE HERE..**
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# OSM for the walkable part
edges_1 <- net1 %>%
# Extract 'edges'
st_as_sf("edges") %>%
# Drop redundant columns
select(osm_id, highway, length) %>%
# Drop segments that are too short (100m)
mutate(length = as.vector(length)) %>%
filter(length > 100) %>%
# Add a unique ID for each edge
mutate(edge_id = seq(1,nrow(.)),
is_walkable = "walkable")
# OSM for the unwalkable part
edges_2 <- net2 %>%
# Extract 'edges'
st_as_sf("edges") %>%
# Drop redundant columns
select(osm_id, highway, length) %>%
# Drop segments that are too short (100m)
mutate(length = as.vector(length)) %>%
filter(length > 100) %>%
# Add a unique ID for each edge
mutate(edge_id = seq(1,nrow(.)),
is_walkable = "unwalkable")
# Merge the two
edges <- bind_rows(edges_1, edges_2)
# =========== NO MODIFY ZONE ENDS HERE ========================================
plot(edges_1)
# preview_polyline(edges_1)
# preview_polyline(edges_2)
# preview_polyline(edges)
getAzi <- function(line){
# This function takes one edge (i.e., a street segment) as an input and
# outputs a data frame with four points (start, mid1, mid2, and end) and their azimuth.
# TASK ////////////////////////////////////////////////////////////////////////
# 1. From `line` object, extract the coordinates using st_coordinates() and extract the first two rows.
# 2. Use atan2() function to calculate the azimuth in degree.
# Make sure to adjust the value such that 0 is north, 90 is east, 180 is south, and 270 is west.
# 1 ----------------------------------------------
start_p <- line %>%
st_coordinates() %>%
.[1:2,1:2]
start_azi <- atan2(start_p[2,"X"] - start_p[1, "X"],
start_p[2,"Y"] - start_p[1, "Y"])*180/pi
# 2 ----------------------------------------------
end_p <- line %>%
st_coordinates() %>%
.[(nrow(.)-1):nrow(.),1:2]
end_azi <- atan2(end_p[2,"X"] - end_p[1, "X"],
end_p[2,"Y"] - end_p[1, "Y"])*180/pi
end_azi <- if (end_azi < 180) {end_azi + 180} else {end_azi - 180}
# mid point 1 ---------------------------------------------
mid_p3 <- line %>%
st_line_sample(sample = c(0.45, 0.5, 0.55)) %>%
st_cast("POINT") %>%
st_coordinates()
mid_p <- mid_p3[2,]
mid_azi <- atan2(mid_p3[3,"X"] - mid_p3[1, "X"],
mid_p3[3,"Y"] - mid_p3[1, "Y"])*180/pi
mid_azi2 <- ifelse(mid_azi < 180, mid_azi + 180, mid_azi - 180)
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
return(tribble(
~type, ~X, ~Y, ~azi,
"start", start_p[1,"X"], start_p[1,"Y"], start_azi,
"mid1", mid_p["X"], mid_p["Y"], mid_azi,
"mid2", mid_p["X"], mid_p["Y"], mid_azi2,
"end", end_p[2,"X"], end_p[2,"Y"], end_azi))
}
# =========== NO MODIFY ZONE ENDS HERE ========================================
We can apply getAzi() function to the edges object. We
finally append edges object to make use of the columns in
edges object (e.g., is_walkable column). When
you are finished with this code chunk, you will be ready to download GSV
images.
# TASK ////////////////////////////////////////////////////////////////////////
# Apply getAzi() function to all edges.
# Remember that you need to pass edges object to st_geometry() before you apply getAzi()
endp_azi <- edges %>%
st_geometry() %>%
map_df(getAzi, .progress = T)
# **YOUR CODE HERE..**
# save.image('1115-1054.RData')
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
endp <- endp_azi %>%
bind_cols(edges %>%
st_drop_geometry() %>%
slice(rep(1:nrow(edges),each=4))) %>%
st_as_sf(coords = c("X", "Y"), crs = 4326, remove=FALSE) %>%
mutate(node_id = seq(1, nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================
# dsn stands for dsn 'data source name', that's weird!
# st_write(endp, dsn = 'endp.geojson')
get_image <- function(iterrow){
# This function takes one row of endp and downloads GSV image using the information from endp.
# TASK ////////////////////////////////////////////////////////////////////////
# Finish this function definition.
# 1. Extract required information from the row of endp, including
# type (i.e., start, mid1, mid2, end), location, heading, edge_id, node_id, source (i.e., outdoor vs. default) and key.
# 2. Format the full URL and store it in furl.
# 3. Format the full path (including the file name) of the image being downloaded and store it in fpath
type = iterrow$type
location <- paste0(iterrow$Y %>% round(5), ",", iterrow$X %>% round(5))
heading <- iterrow$azi %>% round(1)
edge_id <- iterrow$edge_id
node_id <- iterrow$node_id
# highway <- iterrow$highway
key <- Sys.getenv("google_api_key")
furl <- glue::glue("https://maps.googleapis.com/maps/api/streetview?size=640x640&location={location}&heading={heading}&fov=90&pitch=0&key={key}")
fname <- glue::glue("GSV-nid_{node_id}-eid_{edge_id}-type_{type}-Location_{location}-heading_{heading}.jpg") # Don't change this code for fname
fpath <- here('/home/rstudio/major3/download',fname)# **YOUR CODE HERE..**
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Download images
if (!file.exists(fpath)){
download.file(furl, fpath, mode = 'wb')
}
# =========== NO MODIFY ZONE ENDS HERE ========================================
}
Before you download GSV images, make sure
the row number of endp is not too large! The row number of
endp will be the number of GSV images you will be
downloading. Before you download images, always double-check your Google
Cloud Console’s Billing tab to make sure that you will not go above the
free credit of $200 each month. The price is $7 per 1000 images.
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Loop!
for (i in seq(1,nrow(endp))){
get_image(endp[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================
ZIP THE DOWNLOADED IMAGES AND NAME IT ‘gsv_images.zip’ FOR STEP 6.
Now, use Google Colab to apply the semantic segmentation model.
Merge the segmentation output to edges.
# Read the downloaded CSV file from Google Drive
seg_output <- read.csv("/home/rstudio/major3/seg_output.csv")
endp = st_read(dsn="endp.geojson")
## Reading layer `endp' from data source `/home/rstudio/major3/endp.geojson' using driver `GeoJSON'
## Simple feature collection with 1828 features and 10 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -84.49482 ymin: 33.65002 xmax: -84.38006 ymax: 33.79001
## Geodetic CRS: WGS 84
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Join the segmentation result to endp object.
seg_output_nodes <- endp %>% inner_join(seg_output, by=c("node_id"="img_id")) %>%
select(type, X, Y, node_id, building, sky, tree, road, sidewalk) %>%
mutate(across(c(building, sky, tree, road, sidewalk), function(x) x/(640*640)))
# =========== NO MODIFY ZONE ENDS HERE ========================================
At the beginning of this assignment, you defined one Census Tract as walkable and the other as unwalkable. The key to the following analysis is the comparison between walkable/unwalkable Census Tracts.
Create maps of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. In total, you will have 10 maps.
Below the maps, provide a brief description of your findings from the maps.
# TASK ////////////////////////////////////////////////////////////////////////
# Create map(s) to visualize the `pspnet_nodes` objects.
# As long as you can deliver the message clearly, you can use any format/package you want.
plot(seg_output_nodes)
# //TASK //////////////////////////////////////////////////////////////////////
analysis_1 <- function(col, palette){
col <- as.character(substitute(col))
palette <- as.character(substitute(palette))
col_title <- tools::toTitleCase(col)
map1 <- tm_basemap("CartoDB.Voyager")+
tm_shape(seg_output_nodes, bbox = tract_1_bb) +
tm_dots(col = col, style="quantile", palette = palette) +
tm_layout(title = paste("Walkable Area"))
map2 <- tm_basemap("CartoDB.Voyager")+
tm_shape(seg_output_nodes, bbox = tract_2_bb) +
tm_dots(col = col, style="quantile", palette = palette)+
tm_layout(title = paste("Unwalkable Area"))
tmap_arrange(map1, map2, sync = T)
}
# building, sky, tree, road, and sidewalk
analysis_1(building, PuRd)
sky_distribution <- analysis_1(sky, Blues)
sky_distribution
tree_distribution <- analysis_1(tree, Greens)
tree_distribution
road_distribution <- analysis_1(road, Reds)
road_distribution
sidewalk_distribution <- analysis_1(sidewalk, Oranges)
sidewalk_distribution
Calculate the mean of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. In total, you will have 10 mean values.
After the calculation, provide a brief description of your findings.
# TASK ////////////////////////////////////////////////////////////////////////
# Perform the calculation as described above.
# As long as you can deliver the message clearly, you can use any format/package you want.
# save.image('1115-1625.RData')
seg_output_nodes <- st_set_crs(seg_output_nodes, 4326)
tract_1 <- st_set_crs(tract_1, 4326)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
tract_2 <- st_set_crs(tract_2, 4326)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
subset_tract_1 <- st_intersection(seg_output_nodes, tract_1)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
subset_tract_2 <- st_intersection(seg_output_nodes, tract_2)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
calculate_mean <- function(tract, categories) {
result <- data.frame(Category = character(), Mean = numeric(), stringsAsFactors = FALSE)
for (category in categories) {
mean_value <- mean(tract[[category]], na.rm = TRUE)
result <- rbind(result, data.frame(Category = category, Mean = mean_value, stringsAsFactors = FALSE))
}
return(result)
}
categories <- c("building", "sky", "tree", "sidewalk", "road")
mean_df_1 <- calculate_mean(subset_tract_1,categories = categories)
mean_df_2 <- calculate_mean(subset_tract_2,categories = categories)
# //TASK //////////////////////////////////////////////////////////////////////
combined_df <- bind_rows(mutate(mean_df_1, area = "Walkable Area"), mutate(mean_df_2, area = "Unwalkable Area"))
combined_histogram <- ggplot(data = combined_df, aes(x = Category, y = Mean, fill = area)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Mean Values by Category in Walkable and Unwalkable Areas", x = "Category", y = "Mean") +
theme_minimal()
combined_histogram
Draw box plots comparing the proportion of building, sky, tree, road, and sidewalk between walkable and unwalkable areas. Each plot presents two boxes: one for walkable areas and the other for unwalkable areas. In total, you will have 5 plots.
After the calculation, provide a brief description of your findings.
# TASK ////////////////////////////////////////////////////////////////////////
# Create box plot(s) using geom_boxplot() function from ggplot2 package.
# Use `seg_output_nodes` object to draw the box plots.
# You will find `pivot_longer` function useful.
combined_sf <- bind_rows(mutate(subset_tract_1, area = "Walkable Area"), mutate(subset_tract_2, area = "Unwalkable Area"))
plot_category <- function(category) {
bxplot <- ggplot(data = combined_sf) +
geom_boxplot(aes(x = area, y = .data[[category]]))
plotly::ggplotly(bxplot)
}
plot_category("building")
# //TASK //////////////////////////////////////////////////////////////////////
plot_category("sky")
plot_category("tree")
plot_category("road")
plot_category("sidewalk")
# save.image('1115-2308.RData')