Task Description

Exploring Walkability Through Street View and Computer Vision

This assignment is divided into three main sections.

In the first section, you will select two Census Tracts within Fulton and DeKalb Counties, GA — one that you believe is the most walkable and another that is the least walkable. You may choose any tracts within these two counties. If the area you want to analyze is not well represented by a single tract, you may select multiple adjacent tracts (e.g., two contiguous tracts as one “walkable area”). The definition of walkable is up to you — it can be based on your personal experience (e.g., places where you’ve had particularly good or bad walking experiences), Walk Score data, or any combination of criteria. After making your selections, provide a brief explanation of why you chose those tracts.

The second section is the core of this assignment. You will prepare OpenStreetMap (OSM) data, download Google Street View (GSV) images, and apply the computer vision technique covered in class — semantic segmentation.

In the third section, you will summarize and analyze the results. After applying computer vision to the images, you will obtain pixel counts for 19 different object categories. Using the data, you will:

Create maps to visualize the spatial distribution of these objects,
Draw boxplots to compare their distributions between the walkable and unwalkable tracts, and
Perform t-tests to examine the differences in mean values and their statistical significance.

Section 0. Packages

Importing the necessary packages is part of this assignment. Add any required packages to the code chunk below as you progress through the tasks.

library(magrittr)
library(osmdata)

## Warning: package 'osmdata' was built under R version 4.5.2

## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright

library(sfnetworks)

## Warning: package 'sfnetworks' was built under R version 4.5.2

library(units)

## udunits database from C:/Users/xavier/AppData/Local/Programs/R/R-4.5.0/library/units/share/udunits/udunits2.xml

library(sf)

## Warning: package 'sf' was built under R version 4.5.2

## Linking to GEOS 3.13.1, GDAL 3.11.4, PROJ 9.7.0; sf_use_s2() is TRUE

library(tidygraph)

## Warning: package 'tidygraph' was built under R version 4.5.2

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:stats':
## 
##     filter

library(tmap)

## Warning: package 'tmap' was built under R version 4.5.2

library(here)

## Warning: package 'here' was built under R version 4.5.2

## here() starts at C:/Users/xavier/OneDrive - Atlanta Regional Commission/Desktop/Personal/Urban Analytics

library(progress)
library(nominatimlite)

## Warning: package 'nominatimlite' was built under R version 4.5.2

library(tidycensus)

## Warning: package 'tidycensus' was built under R version 4.5.1

library(tmap)
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.5.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(purrr)

## 
## Attaching package: 'purrr'

## The following object is masked from 'package:magrittr':
## 
##     set_names

library(ggplot2)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:magrittr':
## 
##     extract

Section 1. Choose your Census Tracts.

Use the Census Tract map in the following code chunk to identify the GEOIDs of the tracts you consider walkable and unwalkable.

key_path <- "C:/Users/xavier/OneDrive - Atlanta Regional Commission/Desktop/Personal/Urban Analytics/senses.txt"
api_key <- readLines(key_path, warn = FALSE)
census_api_key(api_key, install = TRUE, overwrite = TRUE)

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.

## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`

# TASK ////////////////////////////////////////////////////////////////////////
# Set up your api key here

  # **YOUR CODE HERE..*
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Download Census Tract polygon for Fulton and DeKalb
tract <- get_acs("tract", 
                 variables = c('pop' = 'B01001_001'),
                 year = 2023,
                 state = "GA", 
                 county = c("Fulton", "DeKalb"), 
                 geometry = TRUE)

## Getting data from the 2019-2023 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

tmap_mode("view")

## ℹ tmap modes "plot" - "view"

## ℹ toggle with `tmap::ttm()`
## This message is displayed once per session.

tm_basemap("OpenStreetMap") +
  tm_shape(tract) + 
  tm_polygons(fill_alpha = 0.2)
# =========== NO MODIFY ZONE ENDS HERE ========================================

Once you have the GEOIDs, create two Census Tract objects – one representing your most walkable area and the other your least walkable area.

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Specify the GEOIDs of your walkable and unwalkable Census Tracts.

# Walkable GEOID - 13121001902 - By the ARC and convention centers the area is really walkable and suited for ped activity

# unwalkable census tract - 13089023802 - South on moreland and near constitution isn't walkable with speeds all around being fast and limited sidewalks

#    e.g., tr_id_walkable <- c("13121001205", "13121001206")
# 2. Extract the selected Census Tracts using `tr_id_walkable` and `tr_id_unwalkable`

# For the walkable Census Tract(s)
tr_id_walkable <- tract %>% 
  filter(GEOID %in% '13121001902')
  # **YOUR CODE HERE..**

# For the unwalkable Census Tract(s)
tr_id_unwalkable <- tract %>%
  filter(GEOID %in% '13089023802')

           # **YOUR CODE HERE..**

  # **YOUR CODE HERE..**

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Create an interactive map showing `tract_walkable` and `tract_unwalkable`
# You'll have to zoom out since one is downtown and the other is in south atlanta. 
tmap_mode('view')+
  tm_shape(tr_id_unwalkable) +
  tm_borders(col = 'red') +
  tm_shape(tr_id_walkable) +
  tm_borders(col = 'green')

## ℹ tmap modes "plot" - "view"

# //TASK //////////////////////////////////////////////////////////////////////

Provide a brief description of your selected Census Tracts. Why do you consider these tracts walkable or unwalkable? What factors do you think contribute to their walkability?

So for the walkable tracts I chose a tract in downtown that include civic center, georgia state, and peachtree center. From experience I find that these places are pretty walkable with wide sidwewalks and ample crossing points. There are a lot of hotels, convention centers, college students, and bars/clubs which lend itself to a more walkable nature. For the unwalkable tracts, I chose nearby my home. From experience, these are large roadways with minimal sidewalks going north/south and east/west. It is very industrial and car oriented with 4+ road lanes that contribute to it being unwalkable.

Section 2. OSM, GSV, and Computer Vision.

Step 1. Get and clean OSM data.

To obtain the OSM network for your selected Census Tracts: (1) Create bounding boxes. (2) Use the bounding boxes to download OSM data. (3) Convert the data into an sfnetwork object and clean it.

# TASK ////////////////////////////////////////////////////////////////////////
# Create one bounding box (`tract_walkable_bb`) for your walkable Census Tract(s) and another (`tract_unwalkable_bb`) for your unwalkable Census Tract(s).

# For the walkable Census Tract(s)
tract_walkable_bb <- st_bbox(tr_id_walkable)


# **YOUR CODE HERE..**

# For the unwalkable Census Tract(s)  
tract_unwalkable_bb <- st_bbox (tr_id_unwalkable)# **YOUR CODE HERE..**

tm_shape(st_as_sfc(tract_walkable_bb)) + 
  tm_borders(col = "green") +
tm_shape(st_as_sfc(tract_unwalkable_bb)) +
  tm_borders(col = "red")

# //TASK //////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get OSM data for the two bounding boxes
osm_walkable <- opq(bbox = tract_walkable_bb) %>%
  add_osm_feature(key = 'highway', 
                  value = c("primary", "secondary", "tertiary", "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()


set_overpass_url("https://overpass.kumi.systems/api/interpreter") # API Was timing out

osm_unwalkable <- opq(bbox = st_bbox(tr_id_unwalkable)) %>%
  add_osm_feature(key = 'highway', 
                  value = c("primary", "secondary", "tertiary", "residential")) %>%
  osmdata_sf() %>%
  osm_poly2line()
# =========== NO MODIFY ZONE ENDS HERE ========================================

#### PROBLEM IS HERE ####


# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert `osm_walkable` and `osm_unwalkable` into sfnetwork objects (as undirected networks),
# 2. Clean the network by (1) deleting parallel lines and loops, (2) creating missing nodes, and (3) removing pseudo nodes (make sure the `summarise_attributes` argument is set to 'first' when doing so).

# --- Build sfnetwork for walkable tracts ---
net_walkable <- osm_walkable$osm_lines %>%
  select(osm_id, highway) %>%
  as_sfnetwork(directed = FALSE) %>%                      # create undirected network
  activate("edges") %>%
  filter(!edge_is_multiple(), !edge_is_loop()) %>%        # remove duplicates and loops
  convert(to_spatial_subdivision) %>%                     # create missing intersection nodes
  convert(to_spatial_simple, summarise_attributes = "first")  # remove pseudo nodes

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

# --- Build sfnetwork for unwalkable tracts ---
net_unwalkable <- osm_unwalkable$osm_lines %>%
  select(osm_id, highway) %>%
  as_sfnetwork(directed = FALSE) %>%                      
  activate("edges") %>%
  filter(!edge_is_multiple(), !edge_is_loop()) %>%
  convert(to_spatial_subdivision) %>%
  convert(to_spatial_simple, summarise_attributes = "first")

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

# //TASK //////////////////////////////////////////////////////////////////////
  
  
# TASK activate //////////////////////////////////////////////////////////////////////
# Using `net_walkable` and`net_unwalkable`,
# 1. Activate the edge component of each network.
# 2. Create a `length` column.
# 3. Filter out short (<300 feet) segments.
# 4. Randomly Sample 100 rows per road type.
# 5. Assign the results to `edges_walkable` and `edges_unwalkable`, respectively.

# OSM for the walkable part
edges_walkable <- net_walkable %>%
  activate("edges") %>%
  mutate(length = as.numeric(st_length(geometry)) * 3.28084) %>%  # meters → feet
  filter(length >= 300) %>%
  group_by(highway) %>%
  slice_sample(n = 100) %>%
  ungroup() %>%
  st_as_sf() %>%
  select(-.tidygraph_edge_index)
  # **YOUR CODE HERE..**

# OSM for the unwalkable part
edges_unwalkable <- net_unwalkable %>%
  activate("edges") %>%
  mutate(length = as.numeric(st_length(geometry)) * 3.28084) %>%  # meters → feet
  filter(length >= 300) %>%
  group_by(highway) %>%
  slice_sample(n = 100) %>%
  ungroup() %>%
  st_as_sf() %>%
  select(-.tidygraph_edge_index)
  # **YOUR CODE HERE..**

# //TASK //////////////////////////////////////////////////////////////////////
  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Merge the two
edges <- bind_rows(edges_walkable %>% mutate(is_walkable = TRUE), 
                   edges_unwalkable %>% mutate(is_walkable = FALSE)) %>% 
  mutate(edge_id = seq(1,nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================
tmap_mode("view")

## ℹ tmap modes "plot" - "view"

# Map edges, coloring by walkable/unwalkable
tm_shape(edges) +
  tm_lines(col = "is_walkable", 
           palette = c("red", "green"), 
           lwd = 2,
           title = "Walkable")

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_lines()`: migrate the argument(s) related to the scale of the
## visual variable `col` namely 'palette' (rename to 'values') to col.scale =
## tm_scale(<HERE>).[tm_lines()] Argument `title` unknown.

Step 2. Define `getAzimuth()` function.

In this assignment, you will collect two GSV images per road segment, as illustrated in the figure below. To do this, you will define a function that extracts the coordinates of the midpoint and the azimuths in both directions.

If you can’t see this image, try changing the markdown editing mode from ‘Source’ to ‘Visual’ (you can find the buttons in the top-left corner of this source pane).

getAzimuth <- function(line){

  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. Use the `st_line_sample()` function to sample three points at locations 0.48, 0.5, and 0.52 along the line. These points will be used to calculate the azimuth.
  # 2. Use `st_cast()` function to convert the 'MULTIPOINT' object into a 'POINT' object.
  # 3. Extract coordinates using `st_coordinates()`.
  # 4. Assign the coordinates of the midpoint to `mid_p`.
  # 5. Calculate the azimuths from the midpoint in both directions and save them as `mid_azi_1` and `mid_azi_2`, respectively.
  
 # Sample three points along the line
mid_p3 <- line %>% 
    st_line_sample(sample = c(0.48, 0.5, 0.52)) %>% 
    st_cast("POINT") %>% 
    st_coordinates()

  # Assign midpoint
  mid_p <- mid_p3[2, ]

  # Calculate azimuths: atan2(y2 - y1, x2 - x1)
  mid_azi_1 <- atan2(mid_p3[1,"Y"] - mid_p3[2,"Y"], 
                     mid_p3[1,"X"] - mid_p3[2,"X"]) * 180/pi
  mid_azi_2 <- atan2(mid_p3[3,"Y"] - mid_p3[2,"Y"], 
                     mid_p3[3,"X"] - mid_p3[2,"X"]) * 180/pi

  # Return as tribble
  return(tribble(
    ~type,    ~X,            ~Y,             ~azi,
    "mid1",    mid_p["X"],   mid_p["Y"],      mid_azi_1,
    "mid2",    mid_p["X"],   mid_p["Y"],      mid_azi_2
  ))
}
# **YOUR CODE HERE..**
  
  # //TASK //////////////////////////////////////////////////////////////////////
 
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================

  # =========== NO MODIFY ZONE ENDS HERE ========================================

Step 3. Apply the function to all street segments

Apply the getAzimuth() function to the edges object. Once this step is complete, your data will be ready for downloading GSV images.

# TASK ////////////////////////////////////////////////////////////////////////
# Apply getAzimuth() function to all edges.
# Remember that you need to pass edges object to st_geometry() before you apply getAzimuth()
edges_azi <- edges %>%
  st_geometry() %>% 
  map_df(getAzimuth, .progress = T)

  # **YOUR CODE HERE..**

# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
edges_azi <- edges_azi %>% 
  bind_cols(edges %>% 
              st_drop_geometry() %>% 
              slice(rep(1:nrow(edges),each=2))) %>% 
  st_as_sf(coords = c("X", "Y"), crs = 4326, remove=FALSE) %>% 
  mutate(img_id = seq(1, nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 4. Define a function that formats request URL and download images.

key <- readLines('Goog.txt')

## Warning in readLines("Goog.txt"): incomplete final line found on 'Goog.txt'

#### problem HERE UNIQUE IMAGE ID FOR EVERYONE
getImage <- function(iterrow){
  # This function takes one row of `edges_azi` and downloads GSV image using the information from the row.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. Extract required information from the row of `edges_azi`
  # 2. Format the full URL and store it in `request`. Refer to this page: https://developers.google.com/maps/documentation/streetview/request-streetview
  # 3. Format the full path (including the file name) of the image being downloaded and store it in `fpath`
   type <- iterrow$type
  location <- paste0(round(iterrow$Y, 5), ",", round(iterrow$X, 5))
  heading <- round(iterrow$azi, 1)
  edge_id <- iterrow$edge_id
  img_id <- iterrow$img_id          # unique image identifier

  # Google Street View API endpoint
  endpoint <- "https://maps.googleapis.com/maps/api/streetview"

  # Format the full request URL
  request <- paste0(endpoint,
                    "?size=640x640",
                    "&location=", location,
                    "&heading=", heading,
                    "&fov=90",
                    "&pitch=0",
                    "&key=", key)

  # Assign URL to furl
  furl <- request

  # Format file name and path
  fname <- paste0("GSV-nid_", img_id,
                  "-eid_", edge_id,
                  "-type_", type,
                  "-Location_", location,
                  "-heading_", heading, ".jpg")
  fpath <- file.path("C:\\Users\\xavier\\OneDrive - Atlanta Regional Commission\\Desktop\\Personal\\Urban Analytics\\gsv__", fname)

  Sys.sleep(1)
  # //TASK //////////////////////////////////////////////////////////////////////

  
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  # Download images
  if (!file.exists(fpath)){
    download.file(furl, fpath, mode = 'wb') 
  }
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 5. Download GSV images

Before you download GSV images, make sure the row number in edges_azi is not too large! Each row corresponds to one GSV image, so if the row count exceeds your API quota, consider selecting different Census Tracts.

You do not want to run the following code chunk more than once, so the code chunk option eval=FALSE is set to prevent the API call from executing again when knitting the script.

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
for (i in seq(1,nrow(edges_azi))){
  getImage(edges_azi[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================

ZIP THE DOWNLOADED IMAGES AND NAME IT ‘gsv_images.zip’ FOR STEP 6.

Step 6. Apply computer vision

Use this Google Colab script to apply the pretrained semantic segmentation model to your GSV images.

#use batch inference for this ^^ ## Step 7. Merging the processed data back to R

Once all of the images are processed and saved in your Colab session as a CSV file, download the CSV file and merge it back to edges_azi.

# TASK ////////////////////////////////////////////////////////////////////////
# Read the downloaded CSV file containing the semantic segmentation results.
seg_output <- read.csv('seg_output.csv')# **YOUR CODE HERE..**
# //TASK ////////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////  
# 1. Join the `seg_output` data to `edges_azi`.
# 2. Calculate the proportion of predicted pixels for the following categories: `building`, `sky`, `road`, and `sidewalk`. If there are other categories you are interested in, feel free to include their proportions as well.
# 3. Calculate the proportion of greenness using the `vegetation` and `terrain` categories.
# 4. Calculate the building-to-street ratio. For the street, use `road` and `sidewalk` pixels; including `car` pixels is optional.
seg_output <- seg_output %>%
  rename(edge_id = img_id) %>%
  mutate(total_pixels = rowSums(across(building:terrain, ~ .x), na.rm = TRUE))

edges_seg_output <- edges_azi %>%
  left_join(seg_output, by = "edge_id", relationship = "many-to-many") %>%
  rowwise() %>%
  mutate(
    prop_building      = (building / total_pixels) * 100,
    prop_sky           = (sky / total_pixels) * 100,
    prop_road          = (road / total_pixels) * 100,
    prop_sidewalk      = (sidewalk / total_pixels) * 100,
    prop_car           = ifelse("car" %in% names(.), car / total_pixels * 100, 0),
    prop_green         = ((vegetation + terrain) / total_pixels) * 100,
    prop_building_to_street = (prop_building / (prop_road + prop_sidewalk + prop_car)) * 100
  ) %>%
  ungroup()
  # **YOUR CODE HERE..**
  
# //TASK ////////////////////////////////////////////////////////////////////////

Section 3. Summarize and analyze the results.

At the beginning of this assignment, you specified walkable and unwalkable Census Tracts. The key focus of this section is the comparison between these two types of tracts.

Analysis 1 - Visualize Spatial Distribution

Create interactive maps showing the proportion of sidewalk, greenness, and the building-to-street ratio for both walkable and unwalkable areas. In total, you will produce 6 maps. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Plot interactive map(s)
edges_seg_output <- st_transform(edges_seg_output, st_crs(tr_id_walkable))

sidewalkmap <- tmap_mode("view")

## ℹ tmap modes "plot" - "view"

tm_shape(tr_id_walkable) +
  tm_borders(col = "green") +
tm_shape(edges_seg_output %>% filter(is_walkable == TRUE)) +
  tm_dots(col = "prop_sidewalk", palette = "Blues", size = 0.5, style = 'jenks') +
tm_shape(tr_id_unwalkable) +
  tm_borders(col = 'red') +
  tm_shape(edges_seg_output %>% filter(is_walkable == FALSE)) +
  tm_dots(col = 'prop_sidewalk', palette = 'Blues', size =.5, style = 'jenks')

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'[v3->v4] `tm_dots()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.

print(sidewalkmap)

## [1] "view"

greenmap <- tm_shape(tr_id_walkable) +
  tm_borders(col = "green") +
tm_shape(edges_seg_output %>% filter(is_walkable == TRUE)) +
  tm_dots(col = "prop_green", palette = "Greens", size = 0.5, style = 'jenks') +
  tm_shape(tr_id_unwalkable) +
  tm_borders(col = 'red') +
  tm_shape(edges_seg_output %>% filter(is_walkable == FALSE)) +
  tm_dots(col = 'prop_green', palette = 'Greens', size = .5, style = 'jenks')

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'

print(greenmap)

## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Greens" is named
## "brewer.greens"
## Multiple palettes called "greens" found: "brewer.greens", "matplotlib.greens". The first one, "brewer.greens", is returned.
## 
## Multiple palettes called "greens" found: "brewer.greens", "matplotlib.greens". The first one, "brewer.greens", is returned.

bldgmap <- tm_shape(tr_id_unwalkable) +
  tm_borders(col = "red") +
tm_shape(edges_seg_output %>% filter(is_walkable == FALSE)) +
  tm_dots(col = "prop_building_to_street", palette = "Oranges", size = 0.5, style = 'jenks') +
  tm_shape(tr_id_walkable) +
  tm_borders(col = "green") +
tm_shape(edges_seg_output %>% filter(is_walkable == TRUE)) +
  tm_dots(col = "prop_building_to_street", palette = "Oranges", size = 0.5, style = 'jenks')

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'

print(bldgmap)

## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Oranges" is named
## "brewer.oranges"
## Multiple palettes called "oranges" found: "brewer.oranges", "matplotlib.oranges". The first one, "brewer.oranges", is returned.
## 
## Multiple palettes called "oranges" found: "brewer.oranges", "matplotlib.oranges". The first one, "brewer.oranges", is returned.

# for building to street propportion I was worried that nothing appeared for my unwalkable tract, but when I multuplied the values by 100 to understand the proportions more as a percentage, they never really went above 2% ish.   

# As long as you can deliver the message clearly, you can use any format/package you want.


# //TASK //////////////////////////////////////////////////////////////////////

Analysis 2 - Boxplot

Create boxplots for the proportion of each category (building, sky, road, sidewalk, greenness, and any additional categories of interest) and the building-to-street ratio for walkable and unwalkable tracts. Each plot should compare walkable and unwalkable tracts. In total, you will produce 6 or more boxplots. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Create boxplot(s) using ggplot2 package.
edges_long <- edges_seg_output %>%
  st_drop_geometry() %>%   # Remove geometry column
  select(is_walkable, prop_building, prop_sky, prop_road, prop_sidewalk, prop_green, prop_building_to_street) %>%
  pivot_longer(
    cols = -is_walkable,        
    names_to = "category",      
    values_to = "proportion"    
  )

plot <- ggplot(edges_long, aes(x = as.factor(is_walkable), y = proportion, fill = as.factor(is_walkable))) +
  geom_boxplot() +
  facet_wrap(~category, scales = "free_y") +
  scale_fill_manual(values = c("TRUE" = "forestgreen", "FALSE" = "red")) +
  labs(
    x = "Walkable Tract",
    y = "Proportion / Ratio",
    fill = "Walkable",
    title = "Comparison of Streetscape Features by Walkable vs Unwalkable Tracts"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 0, vjust = 0.5))

print(plot)

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

# //TASK //////////////////////////////////////////////////////////////////////

So based on what I see in these box plots is that there’s likely some error in my calculation for prop.road and prop_building to street since those values don’t make any sense. besides that we see kind of why the unwalkable census tract would be considered unwalkable. It has low # of buildings in general a much lower mean than walkable, but a high mean of greenness, and lastly a very low mean of sidewalk. Compared to the walkable tract, the walkable tract was more buildings and sidewalks, but it sacrificed that for less green. What is interesting to me is seeing that the prop of sky is almost equal with somewhat similar outliers too. I wonder why that could be.

Analysis 3 - Mean Comparison (t-test)

Perform t-tests on the mean proportion of each category (building, sky, road, sidewalk, greenness, and any additional categories of interest) as well as the building-to-street ratio between street segments in the walkable and unwalkable tracts. This will result in 6 or more t-test results. Provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Perform t-tests and report both the differences in means and their statistical significance.
# As long as you can deliver the message clearly, you can use any format/package you want.
edges_numeric <- edges_seg_output %>%
  st_drop_geometry() %>%
  select(is_walkable, prop_building, prop_sky, prop_road, prop_sidewalk, prop_green, prop_building_to_street)

#builiding proportion
t_building <- t.test(prop_building ~ is_walkable, data = edges_numeric)
print(t_building)

## 
##  Welch Two Sample t-test
## 
## data:  prop_building by is_walkable
## t = -12.976, df = 715.78, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -32.46677 -23.93309
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            24.01904            52.21897

# Sky proportion # NS
t_sky <- t.test(prop_sky ~ is_walkable, data = edges_numeric)
print(t_sky)

## 
##  Welch Two Sample t-test
## 
## data:  prop_sky by is_walkable
## t = -0.63017, df = 692.68, p-value = 0.5288
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -11.705708   6.017328
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            42.80132            45.64551

# Road proportion #NS
t_road <- t.test(prop_road ~ is_walkable, data = edges_numeric)
print(t_road)

## 
##  Welch Two Sample t-test
## 
## data:  prop_road by is_walkable
## t = -1.5083, df = 709.08, p-value = 0.1319
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -54.493732   7.142384
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            84.43138           108.10705

# Sidewalk proportion
t_sidewalk <- t.test(prop_sidewalk ~ is_walkable, data = edges_numeric)
print(t_sidewalk)

## 
##  Welch Two Sample t-test
## 
## data:  prop_sidewalk by is_walkable
## t = -6.2036, df = 712.18, p-value = 9.358e-10
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -5.883277 -3.054618
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            7.901111           12.370059

# Greenness
t_green <- t.test(prop_green ~ is_walkable, data = edges_numeric)
print(t_green)

## 
##  Welch Two Sample t-test
## 
## data:  prop_green by is_walkable
## t = 12.414, df = 689.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  24.30732 33.44100
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            68.61533            39.74117

# Building-to-street ratio NS
t_building_to_street <- t.test(prop_building_to_street ~ is_walkable, data = edges_numeric)
print(t_building_to_street)

## 
##  Welch Two Sample t-test
## 
## data:  prop_building_to_street by is_walkable
## t = 1.4156, df = 345, p-value = 0.1578
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -8137.427 49934.661
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##         20957.61309            58.99608

### Tried to visualize them all together i read about bar charts with their mean and standard error. which is kind of like a box plot but I thought this did better than just printing the t test results

eedges_numeric <- edges_seg_output %>%
  st_drop_geometry() %>%
  select(is_walkable, prop_building, prop_sky, prop_road, prop_sidewalk, prop_green, prop_building_to_street)

# Pivot longer for faceted plotting
edges_long <- edges_numeric %>%
  pivot_longer(
    cols = -is_walkable,
    names_to = "category",
    values_to = "value"
  )

# Compute mean and standard error per group
edges_summary <- edges_long %>%
  group_by(category, is_walkable) %>%
  summarise(
    mean_val = mean(value, na.rm = TRUE),
    se = sd(value, na.rm = TRUE)/sqrt(n()),
    .groups = "drop"
  )

# Plot bar charts
ggplot(edges_summary, aes(x = as.factor(is_walkable), y = mean_val, fill = as.factor(is_walkable))) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.7)) +
  geom_errorbar(aes(ymin = mean_val - se, ymax = mean_val + se),
                width = 0.2, position = position_dodge(width = 0.7)) +
  facet_wrap(~category, scales = "free_y") +
  scale_fill_manual(values = c("TRUE" = "forestgreen", "FALSE" = "red"),
                    labels = c("Walkable", "Unwalkable")) +
  labs(
    x = "Tract Type",
    y = "Mean Proportion / Ratio",
    fill = "Walkable?",
    title = "Comparison of Walkable vs Unwalkable Tracts"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 0, hjust = 0.5))

# //TASK //////////////////////////////////////////////////////////////////////

The reuslts from the t tests and that bar chart show similar results to the boxplot. building street ratio, sky and road proportion were not significant with pvalues > .05. This means that the means of the two tracts on these proportions were not different enough to be statistically significant. Meanwhile, Greenness, sidewalk, and building proprotions were all significant and visually you can see why and to almost what degree. I do feel that the building to street proportion is off because the area I chose for my walkable tract should have a high proportion of building to street ratio.