Major Assignment 2 - Template

Sujin Lee

2024-11-15

How to use this template

You will see # TASK ///// through out this template. This indicates the beginning of a task. Right below it will be instructions for the task. Each # TASK ///// will be paired with # //TASK ///// to indicate where that specific task ends.

Introduction to the assignment

This assignment consists of three main sections.

In the first section, you need to select one Census Tract that you think is the most walkable and another one that you think is least walkable within Fulton and DeKalb Counties, GA. As long as they are within the two counties, you can pick any two Census Tracts you want. If the area you want to use as walkable/unwalkable area is not well-covered by a single Census Tract, you can select multiple tracts (e.g., selecting three adjacent tracts as one walkable area). The definition of ‘walkable’ can be your own - you can choose solely based on your experience (e.g., had best/worst walking experience because …), refer to Walk Score, or any other mix of criteria you want. After you make the selection, provide a short write-up of why you chose those Census Tracts.

The second section is the main part of this assignment in which you prepare OSM data, download GSV images, apply computer vision technique we learned in the class (i.e., semantic segmentation).

In the third section, you will summarise and analyze the output and provide your findings. After you apply computer vision to the images, you will have the number of pixels in each image that represent 150 categories in your data. You will focus on the following categories in your analysis: building, sky, tree, road, and sidewalk. Specifically, you will (1) create maps to visualize the spatial distribution of different objects, (2) compare the mean of each category between the two Census Tract and (3) draw boxplots to compare the distributions.

Section 1. Choose your Census Tracts.

Provide a brief description of your census tracts. Why do you think the Census Tracts are walkable and unwalkable? What were the contributing factors?

Section 2. OSM, GSV, and computer vision.

Fill out the template to complete the script.

library(tidyverse)
library(tidycensus)
library(osmdata)
library(sfnetworks)
library(units)
library(sf)
library(tidygraph)
library(tmap)
library(here)

Step 1. Get OSM data and clean it.

The getbb() function, which we used in the class to download OSM data, isn’t suitable for downloading just two Census Tracts. We will instead use an alternative method.

  1. Using tidycensus package, download the Census Tract polygon for Fulton and DeKalb counties.
  2. Extract two Census Tracts, each of which will be your most walkable and least walkable Census Tracts.
  3. Using their bounding boxes, get OSM data.
  4. Convert them into sfnetwork object and clean it.

The Reason I Chose Walkable and Unwalkable Areas

Based on my experience, the streets in midtown are more walkable. Overall, the sidewalk conditions are better than in unwalkable areas. Additionally, with many stores around, it feels more lively. On the other hand, in unwalkable areas, the sidewalks have many cracks and are messy. The roads next to the sidewalks have higher traffic volumes, and the cars move at faster speeds.

Step 2. Define getAzimuth() function.

getAzimuth <- function(line){
  # This function takes one edge (i.e., a street segment) as an input and
  # outputs a data frame with four points (start, mid1, mid2, and end) and their azimuth.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. From `line` object, extract the coordinates using st_coordinates() and extract the first two rows.
  # 2. Use atan2() function to calculate the azimuth in degree. 
  #    Make sure to adjust the value such that 0 is north, 90 is east, 180 is south, and 270 is west.
  # 1
  start_p <- line %>% 
    st_coordinates() %>%
    .[1:2, 1:2]

  # 2
  start_azi <-atan2(start_p[2, "X"] - start_p[1, "X"],
                    start_p[2, "Y"] - start_p[1, "Y"])*180/pi
  # //TASK //////////////////////////////////////////////////////////////////////
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # Repeat what you did above, but for last two rows (instead of the first two rows).
  # Remember to flip the azimuth so that the camera would be looking at the street that's being measured
  end_p <- line %>% 
    st_coordinates() %>% 
    .[(nrow(.)-1):nrow(.),1:2]
    # **YOUR CODE HERE..**
    
  end_azi <- atan2(end_p[2, "X"] - end_p[1, "X"],
                   end_p[2, "Y"] - end_p[1, "Y"])*180/pi
    
  end_azi <- if (end_azi < 180) {end_azi + 180} else {end_azi - 180}
  # //TASK //////////////////////////////////////////////////////////////////////
  

  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. From `line` object, use st_line_sample() function to generate points at 0.45 and 0.55 locations. These two points will be used to calculate the azimuth.
  # 2. Use st_case() function to convert 'MULTIPOINT' object to 'POINT' object.
  # 3. Extract coordinates using st_coordinates().
  # 4. Use atan2() functino to Calculate azimuth.
  # 5. Use st_line_sample() again to generate a point at 0.5 location and get its coordinates. This point will be the location at which GSV image will be downloaded.
  
  mid_p <- line %>% 
    st_line_sample(sample = c(0.45, 0.55)) %>%
    st_cast("POINT") %>%
    st_coordinates()
    # **YOUR CODE HERE..** --> For 0.45 & 0.55 points
  
  mid_azi <- atan2(mid_p[2, "X"]-mid_p[1, "X"],
                   mid_p[2, "Y"]-mid_p[1, "Y"])*180/pi
  
  mid_p <- line  %>% 
    st_line_sample(sample = 0.5) %>%
    st_coordinates() %>%
    .[1,]
    # **YOUR CODE HERE..** --> For 0.5 point
  # //TASK //////////////////////////////////////////////////////////////////////
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  return(tribble(
    ~type,    ~X,            ~Y,             ~azi,
    "start",   start_p[1,"X"], start_p[1,"Y"], start_azi,
    "mid1",    mid_p["X"],   mid_p["Y"],   mid_azi,
    "mid2",    mid_p["X"],   mid_p["Y"],   ifelse(mid_azi < 180, mid_azi + 180, mid_azi - 180),
    "end",     end_p[2,"X"],   end_p[2,"Y"],   end_azi))
  # =========== NO MODIFY ZONE ENDS HERE ========================================

}

Step 3. Apply the function to all street segments

We can apply getAzimuth() function to the edges object. We finally append edges object to make use of the columns in edges object (e.g., is_walkable column). When you are finished with this code chunk, you will be ready to download GSV images.

# TASK ////////////////////////////////////////////////////////////////////////
# Apply getAzimuth() function to all edges.
# Remember that you need to pass edges object to st_geometry() before you apply getAzimuth()
edges_azi <- edges %>% 
  st_geometry() %>%
  map_df(getAzimuth, .progress = T)
  # **YOUR CODE HERE..**

# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
edges_azi <- edges_azi %>% 
  bind_cols(edges %>% 
              st_drop_geometry() %>% 
              slice(rep(1:nrow(edges),each=4))) %>% 
  st_as_sf(coords = c("X", "Y"), crs = 4326, remove=FALSE) %>% 
  mutate(node_id = seq(1, nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 4. Define a function that formats request URL and download images.

getImage <- function(iterrow){
  # This function takes one row of edges_azi and downloads GSV image using the information from edges_azi.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # Finish this function definition.
  # 1. Extract required information from the row of edges_azi, including 
  #    type (i.e., start, mid1, mid2, end), location, heading, edge_id, node_id, and key.
  # 2. Format the full URL and store it in `request`. Refer to this page: https://developers.google.com/maps/documentation/streetview/request-streetview
  # 3. Format the full path (including the file name) of the image being downloaded and store it in `fpath`
  type = iterrow$type
  location <- paste0(iterrow$Y %>% round(5), ",", iterrow$X %>% round(5))
  heading <- iterrow$azi %>% round(1)
  edge_id <- iterrow$edge_id
  node_id <- iterrow$node_id
  highway <- iterrow$highway
  key <- Sys.getenv("google_api")
  
  endpoint <- "https://maps.googleapis.com/maps/api/streetview"
  
  furl <- glue::glue("{endpoint}?size=640x640&location={location}&heading={heading}&fov=90&pitch=0&key={key}")
  fname <- glue::glue("GSV-nid_{node_id}-eid_{edge_id}-type_{type}-Location_{location}-heading_{heading}.jpg") # Don't change this code for fname
  fpath <- file.path("/home/rstudio/project", fname)
  # //TASK //////////////////////////////////////////////////////////////////////

  
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  # Download images
  if (!file.exists(fpath)){
    download.file(furl, fpath, mode = 'wb') 
  }
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 5. Download GSV images

Before you download GSV images, make sure the row number of edges_azi is not too large! The row number of edges_azi will be the number of GSV images you will be downloading. Before you download images, always double-check your Google Cloud Console’s Billing tab to make sure that you will not go above the free credit of $200 each month. The price is $7 per 1000 images.

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Loop!
for (i in seq(1,nrow(edges_azi))){
  getImage(edges_azi[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================

ZIP THE DOWNLOADED IMAGES AND NAME IT ‘gsv_images.zip’ FOR STEP 6.

Step 6. Apply computer vision

Now, use Google Colab to apply the semantic segmentation model. Zip your images and upload the images to your Colab session.

Step 7. Merging the processed data back to R

Once all of the images are processed and saved in your Colab session as a CSV file, download the CSV file and merge it back to edges.

# TASK ////////////////////////////////////////////////////////////////////////
# Read the downloaded CSV file from Google Colab
seg_output <- read.csv('/home/rstudio/project/seg_output_main.csv')

# //TASK ////////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Join the seg_output object back to edges_azi object using node_id as the join key.
edges_seg_output <- edges_azi %>% 
  inner_join(seg_output, by=c("node_id" = "img_id")) %>% 
  select(type, X, Y, node_id, building, sky, tree, road, sidewalk, is_walkable) %>% 
  mutate(across(c(building, sky, tree, road, sidewalk), function(x) x/(640*640)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Section 3. Summarise and analyze the results.

At the beginning of this assignment, you defined one Census Tract as walkable and the other as unwalkable. The key to the following analysis is the comparison between walkable and unwalkable Census Tracts.

Analysis 1 - Create interactive map(s) to visualize the spatial distribution of the streetscape.

You need to create maps of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. In total, you will have 10 maps.

Provide a brief description of your findings from the maps.

# TASK ////////////////////////////////////////////////////////////////////////
# Create interactive map(s) to visualize the `edges_seg_output` objects. 
# As long as you can deliver the message clearly, you can use any format/package you want.


walk_building <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'walkable')) + 
  tm_dots(col = "building", style="quantile", palette = 'viridis')

walk_sky <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'walkable')) + 
  tm_dots(col = "sky", style="quantile", palette ='viridis')

walk_tree <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'walkable')) + 
  tm_dots(col = "tree", style="quantile", palette = 'viridis')

walk_road <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'walkable')) + 
  tm_dots(col = "road", style="quantile", palette = 'viridis')

walk_sidewalk <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'walkable')) + 
  tm_dots(col = "sidewalk", style="quantile", palette = 'viridis')

unwalk_building <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'unwalkable')) + 
  tm_dots(col = "building", style="quantile", palette = 'viridis')

unwalk_sky <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'unwalkable')) + 
  tm_dots(col = "sky", style="quantile", palette = 'viridis')

unwalk_tree <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'unwalkable')) + 
  tm_dots(col = "tree", style="quantile", palette = 'viridis')

unwalk_road <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'unwalkable')) + 
  tm_dots(col = "road", style="quantile", palette = 'viridis')

unwalk_sidewalk <- tm_shape(edges_seg_output %>% filter(edges_seg_output$is_walkable == 'unwalkable')) + 
  tm_dots(col = "sidewalk", style="quantile", palette = 'viridis')


tmap_arrange(walk_building, walk_sky, walk_tree, walk_road, walk_sidewalk,
             unwalk_building, unwalk_sky, unwalk_tree, unwalk_road, unwalk_sidewalk)
# //TASK //////////////////////////////////////////////////////////////////////

When it comes to walkable areas, locations with a high percentage of trees tend to have similar sidewalk distributions. The yellow points representing sidewalks and trees are especially concentrated in the middle of the area. In this region, the building and road ratios are low, even though the sky ratio is also low. Overall, the point patterns of sidewalks closely align with those of trees but differ from the patterns of buildings and roads.

On the other hand, in unwalkable areas, the point patterns of sidewalks are more similar to those of roads and buildings, while they differ from trees. Based on this, I can assume that the similarity between sidewalks and trees, as well as their separation from buildings and roads, might influence my perception of walkable and unwalkable areas.

Analysis 2 - Compare the means.

You need to calculate the mean of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. For example, you need to calculate the mean of building category for each of walkable and unwalkable Census Tracts. Then, you need to calculate the mean of sky category for each of walkable and unwalkable Census Tracts. In total, you will have 10 mean values. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Perform the calculation as described above.
# As long as you can deliver the message clearly, you can use any format/package you want.

mean_edge <- edges_seg_output %>%
  st_drop_geometry() %>% 
  group_by(is_walkable)%>%
  summarise(
    mean_building = mean(building, na.rm = TRUE),
    mean_sky = mean(sky, na.rm = TRUE),
    mean_tree = mean(tree, na.rm = TRUE),
    mean_road = mean(road, na.rm = TRUE),
    mean_sidewalk = mean(sidewalk, na.rm = TRUE),
  )

mean_edge
## # A tibble: 2 × 6
##   is_walkable mean_building mean_sky mean_tree mean_road mean_sidewalk
##   <chr>               <dbl>    <dbl>     <dbl>     <dbl>         <dbl>
## 1 unwalkable          0.130    0.239     0.142     0.360        0.0354
## 2 walkable            0.235    0.139     0.164     0.346        0.0653
# //TASK //////////////////////////////////////////////////////////////////////

When it comes to the mean proportion of buildings, trees, and sidewalks, walkable areas have higher values compared to unwalkable areas. On the other hand, unwalkable areas show higher mean proportions of sky and road. However, the differences in tree and road proportions are not significant. The most significant difference lies in the mean proportion of sidewalks, followed by sky and building. Therefore, I believe it is reasonable to conclude that the more sidewalks there are, the more walkable an area tends to be.

Analysis 3 - Draw boxplot

You need to calculate the mean of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. For example, you need to calculate the mean of building category for each of walkable and unwalkable Census Tracts. Then, you need to calculate the mean of sky category for each of walkable and unwalkable Census Tracts. In total, you will have 10 mean values. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Create boxplot(s) using ggplot2 package.

cate <- c("building", "sky", "tree", "road", "sidewalk")

boxplot_edge <- edges_seg_output %>%
  st_drop_geometry() %>% 
  gather(key = "Category", value = "Proportion", all_of(cate))

ggplot(boxplot_edge, aes(x = is_walkable, y = Proportion, fill = is_walkable)) + 
  geom_boxplot() +
  facet_wrap(~ Category, scales = "free_y") +  
  theme_minimal() +
  labs(title = "Comparison of Proportions for Walkable vs Unwalkable Areas",
       y = "Proportion",
       x = "Type (Walkable vs Unwalkable)") +
  theme(legend.position = "none")  

# //TASK //////////////////////////////////////////////////////////////////////

The proportion of trees shows a wider range compared to other elements. Overall, roads have a higher median proportion across both walkable and unwalkable areas. The median proportion for buildings and the sky shows a notable difference between walkable and unwalkable areas