Section 0. Packages

Importing the necessary packages is part of this assignment. Add any required packages to the code chunk below as you progress through the tasks.

library(tidycensus)

## Warning: package 'tidycensus' was built under R version 4.4.3

library(osmdata)

## Warning: package 'osmdata' was built under R version 4.4.3

## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.4.3

## Warning: package 'ggplot2' was built under R version 4.4.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tmap)

## Warning: package 'tmap' was built under R version 4.4.3

library(sf)

## Warning: package 'sf' was built under R version 4.4.3

## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE

library(sfnetworks)

## Warning: package 'sfnetworks' was built under R version 4.4.3

library(tidygraph)

## Warning: package 'tidygraph' was built under R version 4.4.3

## 
## Attaching package: 'tidygraph'
## 
## The following object is masked from 'package:stats':
## 
##     filter

library(ggplot2)       
library(broom)         
library(kableExtra)

## Warning: package 'kableExtra' was built under R version 4.4.3

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Section 1. Choose your Census Tracts.

Use the Census Tract map in the following code chunk to identify the GEOIDs of the tracts you consider walkable and unwalkable.

# TASK ////////////////////////////////////////////////////////////////////////
# Set up your api key here
census_api_key(
  Sys.getenv("CENSUS_API_KEY"),
  install = FALSE, overwrite = FALSE
)

## To install your API key for use in future sessions, run this function with `install = TRUE`.

Sys.setenv(GOOGLE_API_KEY = "AIzaSyD002i4nGbRxizgRMvxLGC0mKexo3-E1Uo")

# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Download Census Tract polygon for Fulton and DeKalb
tract <- get_acs("tract", 
                 variables = c('pop' = 'B01001_001'),
                 year = 2023,
                 state = "GA", 
                 county = c("Fulton", "DeKalb"), 
                 geometry = TRUE)

## Getting data from the 2019-2023 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

##   |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |===================================                                   |  51%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%

tmap_mode("view")

## ℹ tmap mode set to "view".

#Because of the new library update, I had to change the code above from tm_mode to tmap_mode for it to run
tm_basemap("OpenStreetMap") +
  tm_shape(tract) + 
  tm_polygons(fill_alpha = 0.2)

## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite

# =========== NO MODIFY ZONE ENDS HERE ========================================

Once you have the GEOIDs, create two Census Tract objects – one representing your most walkable area and the other your least walkable area.

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Specify the GEOIDs of your walkable and unwalkable Census Tracts. 
#    e.g., tr_id_walkable <- c("13121001205", "13121001206")
# 2. Extract the selected Census Tracts using `tr_id_walkable` and `tr_id_unwalkable`

# For the walkable Census Tract(s)
tr_id_walkable <- c("13121001205", "13121001206")

tract_walkable <- tract %>% 
  filter(GEOID %in% tr_id_walkable)

# For the unwalkable Census Tract(s)
tr_id_unwalkable <- c("13121002500", "13121002400")

tract_unwalkable <- tract %>% 
  filter(GEOID %in% tr_id_unwalkable)

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Create an interactive map showing `tract_walkable` and `tract_unwalkable`
#Green symbolizes walkable and red symbolizes unwalkable predictions.
tmap_mode("view")

## ℹ tmap mode set to "view".

tm_shape(tract_walkable) +
  tm_polygons(fill = "lightgreen",  fill_alpha = 0.5,
              col = "black", lwd = 2) +
tm_shape(tract_unwalkable) +
  tm_polygons(fill = "red", fill_alpha = 0.5,
              col = "black", lwd = 2) +
tm_title("Chosen Tracts", size = 1.2) +
tm_add_legend(
  title  = "Legend",
  type   = "polygons",
  labels = c("Walkable", "Unwalkable"),
  fill   = c("lightgreen", "red")
) +
tm_legend(
  outside    = FALSE,
  position   = c("right", "bottom"),
  title.size = 1.0,
  text.size  = 0.8
) +
tm_view(set_view = c(zoom = 13))

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_legend()`: use 'tm_legend()' inside a layer function, e.g.
## 'tm_polygons(..., fill.legend = tm_legend())'

# //TASK //////////////////////////////////////////////////////////////////////

Provide a brief description of your selected Census Tracts. Why do you consider these tracts walkable or unwalkable? What factors do you think contribute to their walkability?

Unwalkable Area: Bankhead For the least walkable area, I chose the tracts along Donald Lee Hollowell Parkway due to my GRA currently launching a project to target its unwalkable nature and to prioritize revitalization. Even though this is a major corridor, it isn’t very friendly for pedestrians. Sidewalks appear and disappear, the lanes are wide, and traffic moves fast. A lot of destinations are spaced far apart, so walking between them doesn’t feel convenient or safe. It’s the kind of area where the environment is clearly designed around cars first, and walking becomes more of a challenge than a comfortable option.

Walkable Area: Midtown I picked the Midtown tracts as my “walkable” example because whenever you’re in this part of Atlanta, it’s almost impossible not to notice how friendly it is for people on foot as someone who lives in the heart of Midtown. The sidewalks are continuous, wide enough to feel comfortable, and generally in good shape. There are tons of destinations clustered together too such as restaurants, shops, offices, parks. So you don’t need a car just to get from one place to another. The streets are mostly laid out in a grid, which makes navigation straightforward and the blocks feel shorter. Overall, it’s the kind of place where walking actually feels convenient, safe, and even enjoyable, and that’s why it stood out to me as a clearly walkable area.

Section 2. OSM, GSV, and Computer Vision.

Step 1. Get and clean OSM data.

To obtain the OSM network for your selected Census Tracts: (1) Create bounding boxes. (2) Use the bounding boxes to download OSM data. (3) Convert the data into an sfnetwork object and clean it.

# TASK ////////////////////////////////////////////////////////////////////////
# Create one bounding box (`tract_walkable_bb`) for your walkable Census Tract(s) and another (`tract_unwalkable_bb`) for your unwalkable Census Tract(s).

# For the walkable Census Tract(s)
tract_walkable_bb <- tract_walkable %>% 
  st_bbox()

# For the unwalkable Census Tract(s)  
tract_unwalkable_bb <- tract_unwalkable %>% 
  st_bbox()

# //TASK //////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get OSM data for the two bounding boxes
osm_walkable <- opq(bbox = tract_walkable_bb) %>%
  add_osm_feature(key = 'highway', 
                  value = c("primary", "secondary", "tertiary", "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()

osm_unwalkable <- opq(bbox = tract_unwalkable_bb) %>%
  add_osm_feature(key = 'highway', 
                  value = c("primary", "secondary", "tertiary", "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()
# =========== NO MODIFY ZONE ENDS HERE ========================================


# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert `osm_walkable` and `osm_unwalkable` into sfnetwork objects (as undirected networks),
# 2. Clean the network by (1) deleting parallel lines and loops, (2) creating missing nodes, and (3) removing pseudo nodes (make sure the `summarise_attributes` argument is set to 'first' when doing so).

net_walkable <- osm_walkable$osm_lines %>% 
  # Drop redundant columns 
  select(osm_id, highway) %>% 
  sfnetworks::as_sfnetwork(directed = FALSE) %>% 
  activate("edges") %>%
  filter(!edge_is_multiple()) %>% 
#Above code removes duplicated edges
  filter(!edge_is_loop()) %>% # remove loops
  convert(., sfnetworks::to_spatial_subdivision) %>%
  convert(., sfnetworks::to_spatial_smooth, summarise_attributes = "first")

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

net_unwalkable <- osm_unwalkable$osm_lines %>% 
  # Drop redundant columns 
  select(osm_id, highway) %>% 
  sfnetworks::as_sfnetwork(directed = FALSE) %>% 
  activate("edges") %>%
  filter(!edge_is_multiple()) %>% 
  filter(!edge_is_loop()) %>% 
  convert(., sfnetworks::to_spatial_subdivision) %>%
  convert(., sfnetworks::to_spatial_smooth, summarise_attributes = "first")

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

# //TASK //////////////////////////////////////////////////////////////////////
  
  
# TASK //////////////////////////////////////////////////////////////////////
# Using `net_walkable` and`net_unwalkable`,
# 1. Activate the edge component of each network.
# 2. Create a `length` column.
# 3. Filter out short (<300 feet) segments.
# 4. Randomly Sample 100 rows per road type.
# 5. Assign the results to `edges_walkable` and `edges_unwalkable`, respectively.

# OSM for the walkable part
set.seed(321)
edges_walkable <- net_walkable %>% 
  st_as_sf("edges") %>% 
  mutate(length = st_length(.) %>% unclass()) %>% 
  #used lab for the 90 threshold below
  filter(length >= 90) %>%
  group_by(highway) %>% 
  slice_sample(n = 100) %>%  
#Above code checks for 100 rows per road type assigned
  ungroup()

# OSM for the unwalkable part
edges_unwalkable <- net_unwalkable %>% 
  st_as_sf("edges") %>% 
  mutate(length = st_length(.) %>% unclass()) %>% 
  #change to 50 due to the low density around Hollowell Corridor 
  filter(length >= 50) %>% 
  group_by(highway) %>% 
  slice_sample(n = 100) %>%  
  ungroup()

# //TASK //////////////////////////////////////////////////////////////////////
  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Merge the two
edges <- bind_rows(edges_walkable %>% mutate(is_walkable = TRUE), 
                   edges_unwalkable %>% mutate(is_walkable = FALSE)) %>% 
  mutate(edge_id = seq(1,nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 2. Define `getAzimuth()` function.

In this assignment, you will collect two GSV images per road segment, as illustrated in the figure below. To do this, you will define a function that extracts the coordinates of the midpoint and the azimuths in both directions.

If you can’t see this image, try changing the markdown editing mode from ‘Source’ to ‘Visual’ (you can find the buttons in the top-left corner of this source pane).

getAzimuth <- function(line){

  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. Use the `st_line_sample()` function to sample three points at locations 0.48, 0.5, and 0.52 along the line. These points will be used to calculate the azimuth.
  # 2. Use `st_cast()` function to convert the 'MULTIPOINT' object into a 'POINT' object.
  # 3. Extract coordinates using `st_coordinates()`.
  # 4. Assign the coordinates of the midpoint to `mid_p`.
  # 5. Calculate the azimuths from the midpoint in both directions and save them as `mid_azi_1` and `mid_azi_2`, respectively.
  
  # 1-3
  mid_p3 <- line %>% 
    st_line_sample(sample = c(0.48, 0.5, 0.52)) %>% 
    st_cast("POINT") %>% 
    st_coordinates()
  
  # 4
  mid_p <- mid_p3[2,]
  
  # 5
  mid_azi_1 <- atan2(mid_p3[1,"X"] - mid_p3[2, "X"],
                     mid_p3[1,"Y"] - mid_p3[2, "Y"])*180/pi
  
  mid_azi_2 <- atan2(mid_p3[3,"X"] - mid_p3[2, "X"],
                     mid_p3[3,"Y"] - mid_p3[2, "Y"])*180/pi
  
  # //TASK //////////////////////////////////////////////////////////////////////
 
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  return(tribble(
    ~type,    ~X,            ~Y,             ~azi,
    "mid1",    mid_p["X"],   mid_p["Y"],      mid_azi_1,
    "mid2",    mid_p["X"],   mid_p["Y"],      mid_azi_2, )
  )
  # =========== NO MODIFY ZONE ENDS HERE ========================================

}

Step 3. Apply the function to all street segments

Apply the getAzimuth() function to the edges object. Once this step is complete, your data will be ready for downloading GSV images.

# TASK ////////////////////////////////////////////////////////////////////////
# Apply getAzimuth() function to all edges.
# Remember that you need to pass edges object to st_geometry() before you apply getAzimuth()
edges_azi <- edges %>% 
  st_geometry() %>% 
  map_df(getAzimuth, .progress = T)

# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
edges_azi <- edges_azi %>% 
  bind_cols(edges %>% 
              st_drop_geometry() %>% 
              slice(rep(1:nrow(edges),each=2))) %>% 
  st_as_sf(coords = c("X", "Y"), crs = 4326, remove=FALSE) %>% 
  mutate(img_id = seq(1, nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 4. Define a function that formats request URL and download images.

getImage <- function(iterrow){
  # This function takes one row of `edges_azi` and downloads GSV image using the information from the row.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. Extract required information from the row of `edges_azi`
  # 2. Format the full URL and store it in `request`. Refer to this page: https://developers.google.com/maps/documentation/streetview/request-streetview
  # 3. Format the full path (including the file name) of the image being downloaded and store it in `fpath`
  type <- iterrow$type
  location <- paste0(iterrow$Y %>% round(5), ",", iterrow$X %>% round(5))
  heading <- iterrow$azi %>% round(1)
  edge_id <- iterrow$edge_id
  img_id <- iterrow$img_id
  key <- Sys.getenv("GOOGLE_API_KEY")

  
  endpoint <- "https://maps.googleapis.com/maps/api/streetview"
  
  #URLs are from the Sampling Lecture Lab (copied and pasted below)
  request <- glue::glue("{endpoint}?size=640x640&location={location}&heading={heading}&fov=90&pitch=0&key={key}")
  furl <- request
  fname <- glue::glue("GSV-nid_{img_id}-eid_{edge_id}-type_{type}-Location_{location}-heading_{heading}.jpg") 
  fpath <- file.path("C:/Users/jenny/OneDrive - Georgia Institute of Technology/Desktop/CP8883/gsv_images",fname)
  
  # //TASK //////////////////////////////////////////////////////////////////////

  
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  # Download images
  if (!file.exists(fpath)){
    download.file(furl, fpath, mode = 'wb') 
  }
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 5. Download GSV images

Before you download GSV images, make sure the row number in edges_azi is not too large! Each row corresponds to one GSV image, so if the row count exceeds your API quota, consider selecting different Census Tracts.

You do not want to run the following code chunk more than once, so the code chunk option eval=FALSE is set to prevent the API call from executing again when knitting the script.

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
for (i in seq(1,nrow(edges_azi))){
  getImage(edges_azi[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================

ZIP THE DOWNLOADED IMAGES AND NAME IT ‘gsv_images.zip’ FOR STEP 6.

Step 6. Apply computer vision

Use this Google Colab script to apply the pretrained semantic segmentation model to your GSV images.

Step 7. Merging the processed data back to R

Once all of the images are processed and saved in your Colab session as a CSV file, download the CSV file and merge it back to edges_azi.

# TASK ////////////////////////////////////////////////////////////////////////
# Read the downloaded CSV file containing the semantic segmentation results.
seg_output <- read.csv("C:/Users/jenny/OneDrive - Georgia Institute of Technology/Desktop/CP8883/seg_output.csv")

edges_azi$img_id

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
## [253] 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
## [271] 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
## [289] 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
## [325] 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342
## [343] 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
## [361] 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
## [379] 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396
## [397] 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414
## [415] 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432
## [433] 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450

seg_output <- seg_output %>%
  mutate(img_id = row_number())

#check columnn names 
names(seg_output)

##  [1] "img_id"        "road"          "sidewalk"      "building"     
##  [5] "wall"          "fence"         "pole"          "traffic.light"
##  [9] "traffic.sign"  "vegetation"    "terrain"       "sky"          
## [13] "person"        "rider"         "car"           "truck"        
## [17] "bus"           "train"         "motorcycle"    "bicycle"

head(edges_azi$img_id)

## [1] 1 2 3 4 5 6

summary(edges_azi$img_id)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0   113.2   225.5   225.5   337.8   450.0

head(seg_output$img_id)

## [1] 1 2 3 4 5 6

# //TASK ////////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////  
# 1. Join the `seg_output` data to `edges_azi`.
# 2. Calculate the proportion of predicted pixels for the following categories: `building`, `sky`, `road`, and `sidewalk`. If there are other categories you are interested in, feel free to include their proportions as well.
# 3. Calculate the proportion of greenness using the `vegetation` and `terrain` categories.
# 4. Calculate the building-to-street ratio. For the street, use `road` and `sidewalk` pixels; including `car` pixels is optional.

edges_seg_output <- edges_azi %>%
  inner_join(seg_output, by = "img_id") %>%
  mutate(
    pct_building   = building   / (768*768),
    pct_sky        = sky        / (768*768),
    pct_road       = road       / (768*768),
    pct_sidewalk   = sidewalk   / (768*768),
    pct_vegetation = vegetation / (768*768),
    pct_terrain    = terrain    / (768*768),
    pct_car        = car        / (768*768),
    pct_greenness  = (vegetation + terrain) / (768*768),
    building_to_street_ratio = building / (road + sidewalk + car)
  )
#check that there are enough entries for both
table(edges_seg_output$is_walkable)

## 
## FALSE  TRUE 
##   310   140

D# //TASK ////////////////////////////////////////////////////////////////////////

## function (expr, name) 
## .External(C_doD, expr, name)
## <bytecode: 0x000001edac155cf0>
## <environment: namespace:stats>

Section 3. Summarize and analyze the results.

At the beginning of this assignment, you specified walkable and unwalkable Census Tracts. The key focus of this section is the comparison between these two types of tracts.

Analysis 1 - Visualize Spatial Distribution

Create interactive maps showing the proportion of sidewalk, greenness, and the building-to-street ratio for both walkable and unwalkable areas. In total, you will produce 6 maps. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Plot interactive map(s)
# As long as you can deliver the message clearly, you can use any format/package you want.
tmap_mode("view")

## ℹ tmap mode set to "view".

edges_walk_map  <- edges_seg_output %>% filter(is_walkable)
edges_unwalk_map <- edges_seg_output %>% filter(!is_walkable)


#Walkable
tm_basemap("OpenStreetMap") +
  tm_shape(edges_walk_map) +
  tm_dots(
    fill = "pct_sidewalk",
    size = 0.7,
    fill.scale = tm_scale(values = RColorBrewer::brewer.pal(5, "Greens"))
  ) +
  tm_title("Sidewalk Presence: Walkable Area")

# Unwalkable
tm_basemap("OpenStreetMap") +
  tm_shape(edges_unwalk_map) +
  tm_dots(
    fill = "pct_sidewalk",
    size = 0.7,
    fill.scale = tm_scale(values = RColorBrewer::brewer.pal(5, "Reds"))
  ) +
  tm_title("Sidewalk Presence: Unwalkable Area")

# //TASK //////////////////////////////////////////////////////////////////////

Findings from Analysis 1

When I compared the sidewalk patterns between my walkable and unwalkable tracts, the contrast was honestly really striking. Even without running any stats, the difference just jumps out visually.

Starting with the unwalkable area, most of the points show up in very light shades of red, with only a few darker spots mixed in. That tells me that sidewalks are either missing entirely or extremely patchy across most of this neighborhood. The street network seems more fragmented too, and a lot of blocks don’t appear to have consistent pedestrian infrastructure. You can almost feel the unevenness just by looking at the map. This definitely fits with the reputation of the Hollowell corridor, which has long stretches of car-oriented design and fewer pedestrian-friendly amenities. It makes walking feel less intuitive and probably a lot less safe.

In contrast, the walkable Midtown tract looks completely different. Sidewalk values there cluster on the higher end, and the map fills up with deeper green tones. What stood out to me is how continuous everything looks: long, uninterrupted stretches of sidewalk, narrower block sizes, and plenty of street connections. Midtown also has transit stations, mixed land use, and lots of destinations packed closely together, so having strong sidewalk coverage makes the area feel very comfortable for walking. The green points basically highlight how well the pedestrian network ties together.

Putting the two side by side really reinforces why Midtown scored as “walkable” while Hollowell did not. The maps make it clear that walkability isn’t just about density or street layout—it’s about how consistently the environment supports someone on foot. Midtown gives that structure and Hollowell doesn’t, which shows why the lived experience of walking would feel so different between the two areas.

If I had to sum it up: the walkable tract feels connected and intentional, while the unwalkable tract feels disconnected and uneven. And the sidewalk data visualizes that story perfectly.

Analysis 2 - Boxplot

Create boxplots for the proportion of each category (building, sky, road, sidewalk, greenness, and any additional categories of interest) and the building-to-street ratio for walkable and unwalkable tracts. Each plot should compare walkable and unwalkable tracts. In total, you will produce 6 or more boxplots. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Create boxplot(s) using ggplot2 package.

library(ggplot2)

vars_to_plot <- c(
  "pct_building", "pct_sky", "pct_road", "pct_sidewalk",
  "pct_greenness", "building_to_street_ratio"
)

# Loop through each variable and produce a boxplot
for (v in vars_to_plot) {
  print(
    ggplot(edges_seg_output, aes(
      x = factor(is_walkable, labels = c("Unwalkable", "Walkable")),
      y = .data[[v]],
      fill = factor(is_walkable, labels = c("Unwalkable", "Walkable"))
    )) +
      geom_boxplot(alpha = 0.7) +
      scale_fill_manual(values = c("Unwalkable" = "red", "Walkable" = "darkgreen")) +
      labs(
        title = paste("Distribution of", v, "by Walkability"),
        x = "Tract Type",
        y = v
      ) +
      theme_minimal(base_size = 12)
  )
}

# //TASK //////////////////////////////////////////////////////////////////////

Analysis 2 Description Analysis (6 Boxplots)

Looking across all six boxplots, a few patterns really stood out and helped me see how the visual environment differs between my walkable and unwalkable tracts.

Starting with building proportions, the unwalkable tract shows noticeably higher values and a wider spread. That tells me the Hollowell corridor tends to have larger, more continuous building footprints — the kind you’d expect in car-oriented commercial areas with big lots, long facades, or auto-centric uses. The walkable Midtown tract, on the other hand, has lower building coverage and a tighter distribution, which fits its more mixed and fine-grained urban layout.

The sky proportion was interesting because the two tracts look surprisingly similar. This suggests that both areas have roughly comparable building heights or openness when viewed from street level, even though the urban form and walkability differ a lot. It reminded me that sky visibility alone isn’t a strong predictor of walkability.

With road coverage, the walkable area showed slightly higher median values. At first that felt counterintuitive, but when I thought about it, it actually makes sense: Midtown’s street grid is finer, with more intersections and more visible roadway segments in GSV images. The unwalkable tract’s roads seem more spread out, with fewer connecting streets and bigger blocks.

The sidewalk proportion confirmed what I saw in the interactive maps earlier. Walkable areas still show more sidewalk presence overall, even though the spread overlaps the unwalkable area. Midtown’s consistent pedestrian infrastructure really shows through, while Hollowell’s sidewalks appear much more irregular and less prominent in the images.

For greenness, the two tracts again show similar medians, but the walkable tract has a slightly higher upper range. Midtown actually has more planted medians, street trees, and pocket parks than I expected, and the boxplot reflects that. The unwalkable tract shows vegetation too, but it’s less tied to walkable design — sometimes it’s just open lots or leftover green spaces rather than intentional pedestrian environments.

Finally, the building-to-street ratio shows a clear separation: the unwalkable tract has consistently higher values, meaning buildings dominate the visual field more relative to streets and sidewalks. This is a strong indicator of auto-orientation. In contrast, the walkable tract has a much lower ratio with fewer extreme outliers, which fits a denser, more pedestrian-friendly street network.

Overall, the boxplots reinforce what the maps hinted at: even when some categories overlap, the walkable tract consistently shows patterns tied to better pedestrian infrastructure, finer street networks, and more balanced visual environments. The unwalkable tract, meanwhile, leans toward heavier building presence, less consistent sidewalks, and a street layout that doesn’t support walking as naturally.

Analysis 3 - Mean Comparison (t-test)

Perform t-tests on the mean proportion of each category (building, sky, road, sidewalk, greenness, and any additional categories of interest) as well as the building-to-street ratio between street segments in the walkable and unwalkable tracts. This will result in 6 or more t-test results. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Perform t-tests and report both the differences in means and their statistical significance.
# As long as you can deliver the message clearly, you can use any format/package you want.

library(dplyr)

vars_to_test <- c(
  "pct_building", "pct_sky", "pct_road", "pct_sidewalk",
  "pct_greenness", "building_to_street_ratio"
)

for (v in vars_to_test) {
  cat("\n=============================\n")
  cat("T-test for", v, "\n")
  cat("=============================\n")
  
  t_res <- t.test(
    edges_seg_output[[v]] ~ edges_seg_output$is_walkable,
    var.equal = FALSE
  )
  
  print(t_res)
}

## 
## =============================
## T-test for pct_building 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = 1.0061, df = 265.14, p-value = 0.3153
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.01088752  0.03363936
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##          0.09479739          0.08342147 
## 
## 
## =============================
## T-test for pct_sky 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = 0.54679, df = 274.71, p-value = 0.585
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.01595941  0.02823421
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##           0.1863257           0.1801883 
## 
## 
## =============================
## T-test for pct_road 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = -1.6746, df = 210.58, p-value = 0.09549
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.043636727  0.003550503
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##           0.3564018           0.3764449 
## 
## 
## =============================
## T-test for pct_sidewalk 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = 1.7306, df = 282.88, p-value = 0.08461
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.0007184793  0.0111777393
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##          0.04119782          0.03596819 
## 
## 
## =============================
## T-test for pct_greenness 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = -0.44346, df = 251.42, p-value = 0.6578
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.03756843  0.02375923
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##           0.2817366           0.2886412 
## 
## 
## =============================
## T-test for building_to_street_ratio 
## =============================
## 
##  Welch Two Sample t-test
## 
## data:  edges_seg_output[[v]] by edges_seg_output$is_walkable
## t = 0.058091, df = 167.5, p-value = 0.9537
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -0.1182812  0.1254531
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##           0.2500591           0.2464732

# //TASK //////////////////////////////////////////////////////////////////////

Analysis 3 Description Analysis (Mean Comparison t-tests)

Once I looked at the t-test results across all of the visual categories, it became clear that the differences between the walkable and unwalkable tracts were much more subtle than I expected. Even though the maps and boxplots showed visual contrasts, the statistical tests showed that the average values across the two areas were not significantly different for any of the categories. This does not mean the areas are identical. Instead, it suggests that the distinctions between them are more about how the environment is arranged and experienced on the ground rather than big shifts in the raw percentages of what appears in each image.

For building proportion, the unwalkable area had a slightly higher average value, but the p-value showed that this difference was not statistically meaningful. This pattern suggested that while the Hollowell corridor feels heavier and more dominated by large structures, the actual amount of building surface captured in the Street View images was not dramatically different from what I saw in Midtown. The sky category told a similar story. The two areas had almost identical averages and the test confirmed that the difference was not significant. This fits with the idea that both tracts have comparable building heights and similar levels of open sky when you stand in the street and look upward.

Road coverage showed a small difference where the walkable area had a slightly higher mean value. The p-value was close to the threshold for significance but did not fully cross it. This result fits the real world situation in Midtown, where the grid system and frequent intersections create more visible roadway in each frame. The unwalkable area has wider and more irregular roads, so the amount of roadway captured in individual images varies more. Sidewalk proportion came out almost the same. The walkable tract had a slightly higher average, but the test results suggested that this difference was not strong enough to be statistically significant. This connects back to what I saw earlier in the maps. Sidewalks matter a lot for walkability, but it is their continuity and consistency that matter more than their raw percentage of pixels.

The greenness category was also very similar between the two areas. Even though Midtown sometimes feels greener and more shaded, the total vegetation captured in the sampled images did not differ much. This suggests that greenness alone does not separate walkable and unwalkable environments in a strict numerical way. The last category, the building to street ratio, also showed no statistically meaningful difference. This was interesting because the ratio often reflects how enclosed or open a street feels. When I saw the p-value, it suggested that the visual balance between buildings and the street environment was similar enough across both areas that it did not stand out statistically.

Overall, the t-test results showed that both tracts share some broad visual characteristics, even though one is far more comfortable for pedestrians in practice. The differences that shape walkability are therefore more about the quality, arrangement, and predictability of these features rather than large numerical shifts in average pixel categories. Once I stepped back and thought about this, it made sense. Walkability usually hinges on things like sidewalk continuity, crossing safety, street width, and the rhythm of the built environment. None of those things show up clearly when you only look at category proportions. They show up when you combine the numbers with the on-the-ground patterns that make one place pleasant and another place difficult to move through on foot.

My take on these results and a previous thesis I have explored (Urban Heat Island Effect):

UHI: There are also interesting connections to urban heat island behavior. Even though the tests did not show large numerical differences, the underlying environmental patterns still matter for heat. In the unwalkable area, the slightly higher building proportion hints at larger building footprints and heavier surface materials that tend to trap heat throughout the day and release it slowly at night. These kinds of surfaces often make unwalkable areas feel warmer and more exposed, especially for people waiting at bus stops or walking along long uninterrupted corridors.

Greenness did not differ significantly between the two tracts, but greenness in a walkable area tends to be arranged in ways that provide more shade and cooling at the pedestrian level. Midtown has street trees and planted medians that break up the exposure to sunlight, even if the amount of vegetation in total is similar to the unwalkable tract. A planted median in front of you cools you far more effectively than a patch of grass set back behind a parking lot.

The slightly higher road presence in the walkable area is interesting from a heat perspective because roads absorb heat very quickly. However, in a walkable grid system, the frequent shading from buildings and trees, combined with better airflow through narrower streets, can help offset the heat that roads collect. In the unwalkable area, the long wide roads and open asphalt surfaces tend to heat up quickly and stay hot for longer periods, especially if there is little shade or vegetation placed directly along the walking path.

Putting all of this together, the t-test findings and the UHI reflections point toward the same idea. Walkability is shaped by the design of the environment rather than large numerical differences in what appears in a Street View frame. The patterns that matter for heat are also patterns that matter for walkability. The walkable tract likely feels cooler and more comfortable at certain times of day even though the raw proportions of vegetation, buildings, or pavement are not dramatically different. It is the way these features are arranged that shapes both thermal comfort and the overall walking experience.