BeltLine Access Point Prep and Analysis

Introduction

This project performs an analysis of images from Google Street View, with the aim of evaluating the quality of streetscapes in and around the Atlanta BeltLine. Through this methodology, we set out to contextualize the following question: which access points (and areas around access points) to the BeltLine exhibit a poor degree of walkability and pedestrian oriented design. In order to perform this analysis, we identify a series of points around each of the major access points to the BeltLine. This process was done manually in ArcGIS. Each of these points represents a Google Street View image of the road corridor. Then, using the PSPNET Computer Vision model, we identify and calculate the various metrics that determine the quality of the streetscape. Across each series of images (aggregated across each BeltLine access point), we aggregate the calculated metrics and determine a final streetscape “score” for each sector. Once the analysis is complete, we will plot on a map each of the access points alongside their calculated scores. The goal of this study is to provide a spatial analysis of which parts of the BeltLine need enhanced pedestrian infrastructure, in order to ameliorate heightened injury risk at the points of access.

We hypothesize that access points with a greater percentage of positive metrics and lower percentage of negative metrics, as identified by the computer vision model and calculated through the scoring criteria, will indicate a higher degree of walkability and pedestrian oriented design in that location.

Review of Literature

The inspiration for this project largely originated from two articles to answer the question: ‘How can cities use technology to protect pedestrians?’ We used this literature to better understand how to start our research, how to identify important POIs for the research, what technologies we need to use, as well as the methodology for the later stages of the study.

Parmjit Dhillon and colleagues, in their article “Improving Pedestrian Safety using Computer Vision, Machine Learning, and Data Analytics”, utilized computer vision technology to asess the degree of safety of hundreds of intersections:

With more than 6,000 pedestrian deaths in the United States each year, there are several technologies and use cases that can help cities make roads and highways safer. The Smart Intersection proof of concept uses computer vision, edge and near-edge computing to detect and monitor pedestrian and vehicle movement at intersections to improve pedestrian safety. The identified data can help cities design and optimize intersections, focusing on optimizing pedestrian safety while streamlining traffic flow at intersections. Using the training dataset, we can train computer vision models and capture additional metadata on intersection traffic types so that cities can have richer insights and bring them closer to the Vision Zero goals. (Dhillon et al. 2021)

Bon Woo Koo, in his work “Measuring Street-Level Walkability Through Big Image Data and its Associations with Walking Behavior”, chose eight categories to represent the meso-scale streetscape, including buildings, houses, sidewalks, trees, roads, grass, cars, and plants.

“We developed three metrics. Building to Street Ratio: The ratio of buildings and houses to the sum of the ratios of sidewalks, roads and cars. Greenness: calculated as the sum of the ratio of trees, grass and plants. Sidewalk-to-street ratio” is the ratio of sidewalks to the sum of the ratios of sidewalks, roads, and cars. These three variables are called streetscape factors” (Koo, Guhathakurta et al. 2021)

library(tidyverse)
library(tidycensus)
library(sf)
library(tmap)
library(units)
library(osmdata)
library(sfnetworks)
library(tidygraph)
library(wordcloud2)

#Set tmap options
tmap_options(
             main.title.size = 0.8,
             main.title.fontface = 'bold')

tmap_mode('plot')

Section 1: Data Prep

Here, we load in the access point data and plot the distribution of access points in space.

Please note that due to RAM constraints, we were not able to knit maps into leaflet format

#All geographic analysis will use the NAD83 / Georgia West Projected Coordinate System
epsg_id <- 26967

#Read in raw dataframe of coordinate points
ap_raw <- read_csv('./data/beltline_points.csv')

## Rows: 77 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): OBJECTID, longitude, latitude
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Convert the spreadsheet to an SF object
ap  <- ap_raw %>%
  st_as_sf(coords = c('latitude', 'longitude'), crs = 4326) %>%
  st_transform(epsg_id)

#Read in Atlanta BeltLine Shapefile
beltline <- st_read('./data/BeltLine_Trails.shp') %>%
  st_transform(epsg_id)

## Reading layer `BeltLine_Trails' from data source 
##   `C:\Users\Sam\OneDrive - Georgia Institute of Technology\Intro to Urban Analytics\Final Project\data\BeltLine_Trails.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 6 fields
## Geometry type: MULTILINESTRING
## Dimension:     XYZ
## Bounding box:  xmin: -9399156 ymin: 3991181 xmax: -9390586 ymax: 4004786
## z_range:       zmin: 0 zmax: 0
## Projected CRS: WGS 84 / Pseudo-Mercator

#Plot accesspoints relative to BeltLine
tmap_mode('plot')

## tmap mode set to plotting

tm_shape(beltline) +
  tm_lines(lwd = 2, col = '#00820d', lty = 'dashed') +
  tm_shape(ap) +
  tm_dots(alpha = 0.75)

The next order of business is to load in OSM Street Networks, so that we can identify the surrounding streets of the BeltLine for which we will collect images.

#Establish a buffer length for analysis
buffer <- set_units(20, 'meters')

#Set bounding box of the search area
bl_bbox <- st_bbox(beltline %>% 
                     st_buffer(dist = set_units(1, 'mile')) %>% 
                     st_transform(crs = 4326))
  

#Read in OSM street data for streets within the bounding box
osm_raw <- opq(bbox = bl_bbox) %>%
  add_osm_feature(key = 'highway', value = c('motorway', 'trunk', 'primary', 'secondary', 'tertiary', 'unclassified','residential')) %>%
  osmdata_sf() %>%
  osm_poly2line()

#Get the street SF object
osm_streets <- osm_raw$osm_lines %>%
  st_transform(epsg_id) %>%
  filter(!is.na(osm_id))

#Convert to SF Network, and clean the edges
street_net <- osm_streets %>%
  as_sfnetwork(directed = FALSE)

street_edges <- street_net %>%
  activate('edges') %>%
  mutate('length' = edge_length()) %>%
  filter(!edge_is_multiple()) %>%
  filter(!edge_is_loop()) %>%
  convert(to_spatial_subdivision) %>%
  convert(to_spatial_smooth) %>%
  mutate('length' = edge_length() %>% unclass()) %>%
  st_as_sf()

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

tm_shape(street_edges) +
  tm_lines() +
  tm_shape(beltline) +
  tm_lines(col = 'red', lwd = 2) +
  tm_layout(main.title = 'BeltLine and Surrounding Streets')

#Now clean the road networks to just those that intersect with the access points
street_segments <- street_edges %>%
  select(c(osm_id, geometry)) %>%
  st_join(ap %>% st_buffer(buffer), st_intersects) %>%
  filter(!is.na(OBJECTID))

tm_shape(street_segments) +
  tm_lines(lwd = 1.25) +
  tm_shape(beltline) +
  tm_lines(col = 'red', lwd = 2) +
  tm_shape(ap) +
  tm_dots(size = .05) +
  tm_layout(main.title = 'BeltLine Access Point Streets')

Next, we divide each street segment into points at each 10 m interval on said street segment. After, we create a lookup table of nearest access point geometries that correspond to each road point.

#Now we want the points on each street segment that are closest to the beltline.
street_points <- street_segments %>%
  
  #Breakup lines into segments
  st_line_sample(density = 1/10) %>%
  st_as_sf() %>%
  
  #Convert to points
  st_cast(to = "POINT") %>%
  
  #Reappend OSM IDs
  st_join(street_segments %>% st_buffer(set_units(1,'meter')), st_intersects) %>%
  
  st_transform(epsg_id)

#Now we need to get the access point geometry that corresponds to each street segment
street_nearest_ap <- street_points %>%
  st_drop_geometry() %>%
  left_join(ap, by = 'OBJECTID') %>%
  st_as_sf() %>%
  st_transform(epsg_id)

This chunk uses the lookup tables above to slice the 3 closest road points to each corresponding access point. These points will be the locations for the Google Street View image queries.

ap_closest <- street_points %>%
  #Create a line geometry of each road point to its corresponding access point.
  st_nearest_points(street_nearest_ap, pairwise = TRUE) %>%
  st_as_sf() %>%
  
  #Calculate the distance of each road point to the access point
  mutate(length = st_length(.) %>% unclass()) %>%
  st_drop_geometry() %>%
  bind_cols(street_points) %>%
  
  #Find the 3 closest points to each access point
  group_by(OBJECTID) %>%
  arrange(length) %>%
  slice(1:3) %>%
  st_as_sf(crs = epsg_id)

tm_shape(street_segments) +
  tm_lines() +
  tm_shape(beltline) +
  tm_lines(col = 'red', lwd = 2) +
  tm_shape(ap) +
  tm_dots(size = .075, col = 'blue') +
  tm_shape(ap_closest) +
  tm_dots(size = .05, col = 'green') +
  tm_layout(main.title = 'BeltLine Access Point Streets')

Here we create helper functions to calculate the Azimuth (angle of the camera) and convert the road point coordinates to WGS 1984.

#Function for getting the azimuth towards the beltline
get_azi <- function(point, o_id, unit = 10){
  
  #Get the beltline access point
  b <- ap %>%
    filter(o_id == OBJECTID)
  
  #Get a point x meters down from the beltline
  a <- point
  
  #Calculate the azimuth and return it
  y1 <- a %>% st_coordinates() %>% .[,'Y']
  y2 <- b %>% st_coordinates() %>% .[,'Y']
  x1 <- a %>% st_coordinates() %>% .[,'X']
  x2 <- b %>% st_coordinates() %>% .[,'X']
  azi <- atan2(y1 - y2, x1 - x2) * (180/pi)
  
  return(azi)
}

#Vectorize the azimuth function
get_azi_V <- Vectorize(get_azi)

#Function for getting the coordinates for imagery
get_gsv_coord <- function(point, unit = 10, epsg = 4326){
  
  pt <- point %>%
    st_transform(crs = epsg) %>%
    st_coordinates()
  
  coords <- paste0(pt[,'Y'] %>% round(4), ',', pt[,'X'] %>% round(4))
  
  return(coords)
}

Here we craft the URLs for each road point.

key <- Sys.getenv('gsv_api')
fpath <- './image exports/'

gsv_prepped <- ap_closest %>%
  #Append the azimuth for the closest point to the beltline access point
  mutate(azi = get_azi_V(point = `geometry`, o_id = `OBJECTID`, unit = 10) %>% round(1)) %>%
  
  #Prep the coordinates for the API call
  mutate(coord = get_gsv_coord(geometry)) %>%
  ungroup() %>%
  mutate(node_id = row_number()) %>%
  
  
  #Craft URL
  mutate(furl = glue::glue("https://maps.googleapis.com/maps/api/streetview?size=640x640&location={coord}&heading={azi}&fov=90&pitch=0&key={key}")) %>%
  
  mutate(path = glue::glue("./image exports/GSV-nid_{node_id}-Location_{coord}-heading_{azi}.jpg"))

remove(key)

Section 2: Query and Analyze Imagery

Now we read in each image.

for (i in 1:nrow(gsv_prepped)){
  obs <- gsv_prepped[i,]
  if(!file.exists(obs$path)){
    download.file(obs$furl, obs$path, mode = 'wb')
  }
}

Read in PSPNet output and merge it with original data.

#Read in PSPNet output
imagery_raw <- read_csv('./data/seg_output.csv')

## Rows: 216 Columns: 151
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (151): node_id, wall, building, sky, floor, tree, ceiling, road, bed, wi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Join with original data and normalize each field as a percentage of pixels
imagery <- imagery_raw %>%
  left_join(gsv_prepped %>% select(c(node_id, osm_id, OBJECTID)), by = 'node_id') %>%
  mutate(
    across(-c(node_id, OBJECTID, osm_id, geometry), ~ .x/(640^2))
    ) %>%
  st_drop_geometry() %>%
  select(-geometry) %>%
  
  #Join geometry
  left_join(ap, by = 'OBJECTID') %>%
  st_as_sf(crs = epsg_id)

#Remove key as it is no longer needed
remove(gsv_prepped)

#Write output
write_csv(imagery %>% st_drop_geometry(), './data/PSPNet_Output.csv')
st_write(imagery, './data/PSPNet_Output.shp', append = FALSE)

## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver

## Deleting layer `PSPNet_Output' using driver `ESRI Shapefile'
## Writing layer `PSPNet_Output' to data source 
##   `./data/PSPNet_Output.shp' using driver `ESRI Shapefile'
## Writing 216 features with 153 fields and geometry type Point.

Create plots for aggregated PSPNet output.

imagery_long <- imagery %>%
  st_drop_geometry() %>%
  mutate(
    across(-c(node_id, OBJECTID, osm_id), ~ .x*(640^2))
    ) %>%
  pivot_longer(cols = -c(node_id, OBJECTID, osm_id), names_to = 'object') %>%
  filter(value > 0)

words <- imagery_long %>%
  group_by(object) %>%
  summarize(freq = log(sum(value)))

ggplot(imagery_long, aes(x = reorder(object,-value), y = value)) +
  geom_bar(stat = 'identity', fill = 'lightblue') +
  labs(x = 'Object', y = '# of Pixels', title = 'What Does the Computer See?') +
  ggdark::dark_mode() +
  theme(axis.text.x = element_text(angle = -90, size = 7), title = element_text(face = 'bold'))

## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

wordcloud2(words, size = .45, color = 'random-light', background = 'black')

Section 3: Create Streetscape Metrics and Perform Statistical Analysis

Streets were Scored Across the Following Categories

Nature
- Sum of %Sky + %Trees + %Grass + %Plants + %Animals
Building Structures
- Sum of %Wall, %Building, %Window, %House
Vehicles
- Sum of %Bus, %Car, %Truck, %Road, %Van
Active Modes
- Sum of %Person, %Bicycle
Path Presence
- Sum of %Path, %Sidewalk, %Bridge
Safety
- Sum of %Trafic lights, %Street lights
Other
- Sum of %Fence, %Pole, %Railing, %Sign board

The total score was calculated as the sum of all the preceeding sub criteria.The total score is normalized between zero and one.

A total weighted score was also calculated using the following weights:

‘Sub Criteria Weights’

To provide a final score for each image, the Vehicles is subtracted from the total weighted score, and then normalized on a scale of zero to one.

This scoring was performed in ArcGIS Pro. Attached is the outputted scores for each node.

scores <- read_csv('./data/Final scoring.csv')

## Rows: 216 Columns: 63
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (63): FID, node_id, wall, buildng, sky, tree, road, windwpn, grass, side...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(scores %>% select(c(OBJECTI, starts_with('F'))) %>% select(-fence))

## # A tibble: 6 × 10
##   OBJECTI   FID Final Fbuilt Fnature Fvehicle Factive Fstreet Fsafety Fother
##     <dbl> <dbl> <dbl>  <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
## 1       1     0  39.5 0.0865    35.7     60.5       0   2.71  0.389    0.599
## 2       1     1  70.0 9.16      27.1     30.0       0  30.8   0.140    2.85 
## 3       3     2  62.9 0.111     61.2     37.1       0   1.20  0.00414  0.466
## 4       1     3  63.0 0.432     39.2     37.0       0  21.7   0.0147   1.68 
## 5       3     4  51.8 0         48.8     48.2       0   1.48  0.448    1.04 
## 6       3     5  46.9 0.355     45.8     53.1       0   0.399 0.114    0.263

Section 4: Analysis of Access Point Scores

The actual scoring process was originally performed in ArcGIS pro, the following description details the results of that process:

The “best” access points (highest positive factor score) were identified as the following:

‘Highest Scoring Access Points’

The “worst” access points were identified as:

‘Lowest Scoring Access Points’

Spatial Distribution of Scoring Criteria

Conclusions and Takeaways

Based on our scoring criteria and relevant metrics identified by the computer vision model, access points with a greater percentage of positive metrics and lower percentage of negative metrics do indicate a higher degree of walkability and pedestrian oriented design. The maps presented above indicate that the worst access points are primarily concentrated on the west side of the BeltLine loop and the best access points are concentrated on the east side of the BeltLine loop. Atlanta’s Midtown neighborhood is a densely packed, vibrant, multi-use environment that is conducive to entertainment, office spaces, retail, and recreational uses, is home to young professionals and families, and is a visitor attraction. With a high number of POIs and a young demographic, the east side BeltLine Trail experiences high usership and hence receives greater funding to create safe and accessible access points, in comparison to the westside trail. The location of the westside trail has very low density and the access points are overwhelmingly integrated with vehicle-based infrastructure. There are few access points that are actually integrated with pre-existing pedestrian infrastructure as the surrounding land use is not conducive to pedestrian activity or active modes of transportation. Understanding the context of the locations of the best and worst access points allows us to draw the conclusion that the high scores received by the best access points and low scores received by the worst access points align with the intent of the scoring criteria. Additionally, the GSV images obtained at each access point were manually assessed to determine if the scoring does in fact align with how the access point looks in the images.

Works Cited

Dhillon, Parmjit, et al. “Improving Pedestrian Safety Using Computer Vision, Machine Learning and Data Analytics.” Fall Technical Forum, 2021, https://www.nctatechnicalpapers.com/Paper/2021/2021-improving-pedestrian-safety-using-computer-vision-machine-learning-and-data-analytics.

Koo, Bon Woo, et al. “How Are Neighborhood and Street-Level Walkability Factors Associated with Walking Behaviors? A Big Data Approach Using Street View Images.” Environment and Behavior, vol. 54, no. 1, 2021, pp. 211–241., https://doi.org/10.1177/00139165211014609.