Introduction to the assignment

This assignment consists of three main sections.

In the first section, you need to select one Census Tract that you think is the most walkable and another one that you think is least walkable within Fulton and DeKalb Counties, GA. As long as the two Census Tracts are within the two counties, you can pick any two you want. If the area you want to use as walkable/unwalkable area is not well-covered by one Census Tract, you can select multiple tracts (e.g., selecting three adjacent tracts as one walkable area). The definition of ‘walkable’ can be your own - you can choose solely based on your experience (e.g., had best/worst walking experience), refer to Walk Score, or any other mix of criteria you want. After you make the selection, provide a short write-up with a map explaining why you chose those Census Tracts.

The second section is the main part of this assignment in which you prepare OSM data, download GSV images, apply computer vision (i.e., semantic segmentation).

In the third section, you will summarise and analyze the output and provide your findings. After you apply computer vision to the images, you will have the number of pixels in each image that represent 150 categories in your data. You will focus on the following categories in your analysis: building, sky, tree, road, and sidewalk. Specifically, you will (1) create maps to visualize the spatial distribution of different elements, (2) compare the mean of each category between the two Census Tract and (3) draw box plots to compare the distributions.

library(tidyverse)
library(tidycensus)
library(osmdata)
library(sfnetworks)
library(units)
library(sf)
library(tidygraph)
library(tmap)
library(here)
library(progress)
library(tibble)
library(dplyr)
ttm()

Section 1. Choose your Census Tracts.

Select walkable Census Tract(s) and unwalkable Census Tract(s) within Fulton and DeKalb counties.

In the quest to search for Census Tracts, you can use an approach similar to what we did in Step 1 of ‘Module4_getting_GSV_images.Rmd’. This time, instead of cities, we are focusing on Census Tracts; and the search boundary is the two counties, instead of metro Atlanta.

Provide a brief description and visualization of your Census Tracts. Why do you think the Census Tracts are walkable and unwalkable? What were the contributing factors?

The most walkable census tract:Census Tract 35 in in Fulton County (the rating score from Walk Score for this place is more than 90, which is a walker’s paradise) Street address:18 Capitol Sq SW (GEOID: 13121003500 CENTLAT: +33.7509589 COUNTY CODE: 121 TRACT CODE: 003500)
The least walkable census tract:Census Tract 73.02 in Fulton County (the rating score from Walk Score for this place is only 35, which has a poor envrionment for walkers.) Street address:2914 Browns Mill Road Southeast (GEOID: 13121007302 CENTLAT: +33.6739849 COUNTY CODE: 121 TRACT CODE: 007302)

Section 2. OSM, GSV, and computer vision.

Fill out the template to complete the script.

Step 1. Get OSM data and clean it.

Using tidycensus package, download the Census Tract polygon for Fulton and DeKalb counties.
Extract two Census Tracts, which will be your most and least walkable Census Tracts.
Using their bounding boxes, get OSM data.
Convert them into sfnetworks data and clean it.

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.

## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`

## [1] "b5977f1cc24b3460cb08c5d8c0010e1511ca9232"

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Download Census Tract polygon for Fulton and DeKalb
tract <- get_acs("tract", 
                 variables = c('tot_pop' = 'B01001_001'),
                 year = 2020, 
                 state = "GA", 
                 county = c("Fulton", "DeKalb"), 
                 geometry = TRUE)

## Getting data from the 2016-2020 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%

# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# The purpose of this TASK is to create one bounding box for walkable Census Tract and another bounding box for unwalkable Census Tract.
# As long as you generate what's needed for the subsequent codes, you are good. The numbered list of tasks below is to provide some hints.
# 1. Write the GEOID of walkable & unwalkable Census Tracts. e.g., tr1_ID <- c("13121001205", "13121001206")
# 2. Extract the selected Census Tracts using tr1_ID & tr2_ID
# 3. Create their bounding boxes using st_bbox(), and 
# 4. Assign them to tract_1_bb and tract_1_bb, respectively.
# 5. Change the coordinate system to GCS, if necessary.

# For the walkable Census Tract(s)
# 1. 
tr1_ID <-  "13121003500"

# 2~4
tract_1_bb <- tract %>% 
  filter(GEOID == tr1_ID) %>% 
  st_bbox()

# For the unwalkable Census Tract(s)  
# 1.
tr2_ID <- "13121007302"

# 2~4
tract_2_bb <- tract %>% 
  filter(GEOID == tr2_ID) %>% 
  st_bbox() 

# //TASK //////////////////////////////////////////////////////////////////////

  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get OSM data for the two bounding box
osm_1 <- opq(bbox = tract_1_bb) %>%
  add_osm_feature(key = 'highway', 
                  value = c("motorway", "trunk", "primary", 
                            "secondary", "tertiary", "unclassified",
                            "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()

osm_2 <- opq(bbox = tract_2_bb) %>%
  add_osm_feature(key = 'highway', 
                  value = c("motorway", "trunk", "primary", 
                            "secondary", "tertiary", "unclassified",
                            "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()
# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_1 and osm_2 to sfnetworks objects (set directed = FALSE)
# 2. Clean the network by (1) deleting parallel lines and loops, (2) create missing nodes, and (3) remove pseudo nodes, 
# 3. Add a new column named length using edge_length() function.

net1 <- osm_1$osm_lines %>%
  as_sfnetwork(directed = FALSE) %>%
  activate("edges") %>%
  filter(!edge_is_multiple()) %>% # Remove parallel edges
  filter(!edge_is_loop()) %>% # Remove loops
  mutate(length = st_length(geometry)) # Add edge lengths

net1<-convert(net1, sfnetworks::to_spatial_subdivision)

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

net2 <- osm_2$osm_lines %>%
  as_sfnetwork(directed = FALSE) %>% 
  activate("edges") %>%
  filter(!edge_is_multiple()) %>% # Delete parallel lines and loops
  filter(!edge_is_loop()) %>%
  mutate(length = edge_length())  # Add a new column 
net2<-convert(net2, sfnetworks::to_spatial_subdivision)

## Warning: to_spatial_subdivision assumes attributes are constant over geometries

# //TASK //////////////////////////////////////////////////////////////////////
  
  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# OSM for the walkable part
edges_1 <- net1 %>% 
  # Extract 'edges'
  st_as_sf("edges") %>% 
  # Drop redundant columns 
  select(osm_id, highway, length) %>% 
  # Drop segments that are too short (100m)
  mutate(length = as.vector(length)) %>% 
  filter(length > 100) %>% 
  # Add a unique ID for each edge
  mutate(edge_id = seq(1,nrow(.)),
         is_walkable = "walkable")

# OSM for the unwalkable part
edges_2 <- net2 %>% 
  # Extract 'edges'
  st_as_sf("edges") %>% 
  # Drop redundant columns 
  select(osm_id, highway, length) %>% 
  # Drop segments that are too short (100m)
  mutate(length = as.vector(length)) %>% 
  filter(length > 100) %>% 
  # Add a unique ID for each edge
  mutate(edge_id = seq(1,nrow(.)),
         is_walkable = "unwalkable")

# Merge the two
edges <- bind_rows(edges_1, edges_2)
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 2. Define a function that performs Step 3.

getAzimuth <- function(line){
  # This function takes one edge (i.e., a street segment) as an input and
  # outputs a data frame with four points (start, mid1, mid2, and end) and their azimuth.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. From `line` object, extract the coordinates using st_coordinates() and extract the first two rows.
  # 2. Use atan2() function to calculate the azimuth in degree. 
  #    Make sure to adjust the value such that 0 is north, 90 is east, 180 is south, and 270 is west.
  # 1
 start_p <- line %>% 
  st_coordinates() %>% 
  .[1:2,1:2]
  
  # 2
  start_azi <- atan2(start_p[2,"X"] - start_p[1, "X"],
                     start_p[2,"Y"] - start_p[1, "Y"])*180/pi
  # //TASK //////////////////////////////////////////////////////////////////////

    
  # TASK ////////////////////////////////////////////////////////////////////////
  # Repeat what you did above, but for last two rows (instead of the first two rows).
  # Remember to flip the azimuth so that the camera would be looking at the street that's being measured
  end_p <- line %>% 
    st_coordinates() %>% 
    .[(nrow(.)-1):nrow(.),1:2]
    
  end_azi <- atan2(end_p[2,"X"] - end_p[1, "X"],
                   end_p[2,"Y"] - end_p[1, "Y"])*180/pi
    
  end_azi <- if (end_azi < 180) {end_azi + 180} else {end_azi - 180}
  # //TASK //////////////////////////////////////////////////////////////////////
  
  

  # TASK ////////////////////////////////////////////////////////////////////////
  # 1. From `line` object, use st_line_sample() function to generate points at 45%, 50% and 55% locations. 
  # 2. Use st_cast() function to convert 'MULTIPOINT' object to 'POINT' object.
  # 3. Extract coordinates using st_coordinates().
  # 4. Use the 50% location to define `mid_p` object.
  # 5. Use the 45% and 55% points and atan2() function to calculate azimuth `mid_azi`.
  
  mid_p3 <- line %>% 
    st_line_sample(sample = c(0.45, 0.5, 0.55)) %>% 
    st_cast("POINT") %>% 
    st_coordinates()
  
  mid_p <- mid_p3[2,]
  
  mid_azi <- atan2(mid_p3[3,"X"] - mid_p3[1, "X"],
                   mid_p3[3,"Y"] - mid_p3[1, "Y"])*180/pi
    
  mid_azi2 <- ifelse(mid_azi < 180, mid_azi + 180, mid_azi - 180)
  
  # //TASK //////////////////////////////////////////////////////////////////////
 
    
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
 return(tribble(
    ~type,    ~X,            ~Y,             ~azi,
    "start",   start_p[1,"X"], start_p[1,"Y"], start_azi,
    "mid1",    mid_p["X"],   mid_p["Y"],   mid_azi,
    "mid2",    mid_p["X"],   mid_p["Y"],   mid_azi2,
    "end",     end_p[2,"X"],   end_p[2,"Y"],   end_azi))
  # =========== NO MODIFY ZONE ENDS HERE ========================================

}

Step 3. Apply the function to all street segments

We can apply getAzi() function to the edges object. We finally append edges object to make use of the columns in edges object (e.g., is_walkable column). When you are finished with this code chunk, you will be ready to download GSV images.

# TASK ////////////////////////////////////////////////////////////////////////
# Apply getAzi() function to all edges.
# Remember that you need to pass edges object to st_geometry() before you apply getAzi()

endp_azi <- edges %>% 
  st_geometry() %>% 
  map_df(getAzimuth, .progress = T)


# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
endp <- endp_azi %>% 
  bind_cols(edges %>% 
              st_drop_geometry() %>% 
              slice(rep(1:nrow(edges),each=4))) %>% 
  st_as_sf(coords = c("X", "Y"), crs = 4326, remove=FALSE) %>% 
  mutate(node_id = seq(1, nrow(.)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 4. Define a function that formats request URL and download images.

get_image <- function(iterrow){
  # This function takes one row of endp and downloads GSV image using the information from endp.
  
  # TASK ////////////////////////////////////////////////////////////////////////
  # Finish this function definition.
  # 1. Extract required information from the row of endp, including 
  #    type (i.e., start, mid1, mid2, end), location, heading, edge_id, node_id, source (i.e., outdoor vs. default) and key.
  # 2. Format the full URL and store it in furl.
  # 3. Format the full path (including the file name) of the image being downloaded and store it in fpath
  type <- iterrow$type
  location <- paste0(iterrow$Y %>% round(5), ",", iterrow$X %>% round(5))
  heading <- iterrow$azi %>% round(1)
  edge_id <- iterrow$edge_id
  node_id <- iterrow$node_id
  key <- google
  
  furl <- glue::glue("https://maps.googleapis.com/maps/api/streetview?size=640x640&location={location}&heading={heading}&fov=90&pitch=0&key={key}")
  fname <- glue::glue("GSV-nid_{node_id}-eid_{edge_id}-type_{type}-Location_{location}-heading_{heading}.jpg") # Don't change this code for fname
  fpath <- here("major3", "images", fname)
  # //TASK //////////////////////////////////////////////////////////////////////

  
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  # Download images
  if (!file.exists(fpath)){
    download.file(furl, fpath, mode = 'wb') 
  }
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 5. Download GSV images

Before you download GSV images, make sure the row number of endp is not too large! The row number of endp will be the number of GSV images you will be downloading. Before you download images, always double-check your Google Cloud Console’s Billing tab to make sure that you will not go above the free credit of $200 each month. The price is $7 per 1000 images.

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Loop!
for (i in seq(1,nrow(endp))){
  get_image(endp[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Loop!
for (i in seq(2181,nrow(endp))){
  get_image(endp[i,])
}
# =========== NO MODIFY ZONE ENDS HERE ========================================

ZIP THE DOWNLOADED IMAGES AND NAME IT ‘gsv_images.zip’ FOR STEP 6.

Step 6. Apply computer vision

Now, use Google Colab to apply the semantic segmentation model.

Zip your images and upload the images to your Colab session.
Apply the semantic segmentation model to all the images.
Save the segmentation output as csv file and download it.

Step 7. Merging the processed data back to R

Merge the segmentation output to edges.

# Read the downloaded CSV file from Google Drive
seg_output <- read.csv("/Users/xy/Downloads/tutorial/major3/seg_output.csv")


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Join the segmentation result to endp object.
seg_output_nodes <- endp %>% inner_join(seg_output, by=c("node_id"="img_id")) %>% 
  select(type, X, Y, node_id, building, sky, tree, road, sidewalk) %>% 
  mutate(across(c(building, sky, tree, road, sidewalk), function(x) x/(640*640)))
# =========== NO MODIFY ZONE ENDS HERE ========================================

Section 3. Summarise and analyze the results.

At the beginning of this assignment, you defined one Census Tract as walkable and the other as unwalkable. The key to the following analysis is the comparison between walkable/unwalkable Census Tracts.

Analysis 1 - Create map(s) to visualize the spatial distribution of the streetscape.

Create maps of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. In total, you will have 10 maps.

Below the maps, provide a brief description of your findings from the maps.

# TASK ////////////////////////////////////////////////////////////////////////
# Create map(s) to visualize the `pspnet_nodes` objects. 
# As long as you can deliver the message clearly, you can use any format/package you want.
# Map!

output1<-seg_output_nodes[1:1704,]
output2<-seg_output_nodes[1705:nrow(seg_output_nodes),]

## For Builing
combined_buildings <- c(output1$building, output2$building)
quantile_breaks <- quantile(combined_buildings, probs = seq(0, 1, by = 0.25), na.rm = TRUE)

t1 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output1) +
  tm_dots(col = "building", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Walkable")

t2 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output2) +
  tm_dots(col = "building", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Unwalkable")

tmap_arrange(t1, t2, sync = TRUE)

# //TASK //////////////////////////////////////////////////////////////////////

## For Sky
combined_sky <- c(output1$sky, output2$sky)
quantile_breaks <- quantile(combined_sky, probs = seq(0, 1, by = 0.25), na.rm = TRUE)

t3 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output1) +
  tm_dots(col = "sky", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Walkable")

t4 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output2) +
  tm_dots(col = "sky", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Unwalkable")

tmap_arrange(t3, t4, sync = TRUE)

## For Tree
combined_tree <- c(output1$tree, output2$tree)
quantile_breaks <- quantile(combined_tree, probs = seq(0, 1, by = 0.25), na.rm = TRUE)

t5 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output1) +
  tm_dots(col = "tree", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Walkable")

t6 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output2) +
  tm_dots(col = "tree", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Unwalkable")

tmap_arrange(t5, t6, sync = TRUE)

## For Road
combined_road <- c(output1$roaa, output2$road)

## Warning: Unknown or uninitialised column: `roaa`.

quantile_breaks <- quantile(combined_road, probs = seq(0, 1, by = 0.25), na.rm = TRUE)

t7 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output1) +
  tm_dots(col = "road", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Walkable")

t8 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output2) +
  tm_dots(col = "road", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Unwalkable")

tmap_arrange(t7, t8, sync = TRUE)

## Warning: Values have found that are higher than the highest break

## Warning: Values have found that are higher than the highest break

## For Sidewalk
combined_sidewalk <- c(output1$sidewalk, output2$sidewalk)
quantile_breaks <- quantile(combined_sidewalk, probs = seq(0, 1, by = 0.25), na.rm = TRUE)

t9 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output1) +
  tm_dots(col = "sidewalk", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
 tm_layout(title = "Walkable")

t10 <- tm_basemap("CartoDB.Positron") +
  tm_shape(output2) +
  tm_dots(col = "sidewalk", style = "fixed", breaks = quantile_breaks, palette = 'viridis')+
  tm_layout(title = "Unwalkable")

tmap_arrange(t9, t10, sync = TRUE)

Findings for analysis1: for the most walkable tract, there are higher percetage of the sidewalk, road and sky for most streets nodes, while the least walkable tract has low percentage of the sidewalk,road and sky for most streets nodes.The most walkable tract has higher percentage of buldings for most streets nodes, and the least walkable tract has higher percentage of trees for most streets nodes, which may due to the difference of land use. However, the percentage of the buildings and trees seem not have a signifcant effect on the two tracts’ walkabliliy, complared to the percentage of sidework.

Analysis 2 - Compare the means.

Calculate the mean of the proportion of building, sky, tree, road, and sidewalk for walkable and unwalkable areas. In total, you will have 10 mean values.

After the calculation, provide a brief description of your findings.

install.packages("kableExtra")

# TASK ////////////////////////////////////////////////////////////////////////
# Perform the calculation as described above.
# As long as you can deliver the message clearly, you can use any format/package you want.

# culculate the mean value
output1_mean_building <- mean(output1$building, na.rm = TRUE)
output1_mean_sky <- mean(output1$sky, na.rm = TRUE)
output1_mean_tree <- mean(output1$tree, na.rm = TRUE)
output1_mean_road <- mean(output1$road, na.rm = TRUE)
output1_mean_sidewalk <- mean(output1$sidewalk, na.rm = TRUE)

output2_mean_building <- mean(output2$building, na.rm = TRUE)
output2_mean_sky <- mean(output2$sky, na.rm = TRUE)
output2_mean_tree <- mean(output2$tree, na.rm = TRUE)
output2_mean_road <- mean(output2$road, na.rm = TRUE)
output2_mean_sidewalk <- mean(output2$sidewalk, na.rm = TRUE)

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

# Create a data frame of mean values
mean_values <- data.frame(
  Attribute = c("Building", "Sky", "Tree", "Road", "Sidewalk"),
  Walkable = c(output1_mean_building, output1_mean_sky, output1_mean_tree, output1_mean_road, output1_mean_sidewalk),
  Unwalkable = c(output2_mean_building, output2_mean_sky, output2_mean_tree, output2_mean_road, output2_mean_sidewalk)
)

# Create a table with kable
kable(mean_values, caption = "Mean Proportions of Features in Walkable vs Unwalkable Areas") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Mean Proportions of Features in Walkable vs Unwalkable Areas
Attribute	Walkable	Unwalkable
Building	0.1941033	0.0130495
Sky	0.2462187	0.3674739
Tree	0.0779915	0.2274384
Road	0.3606496	0.2744575
Sidewalk	0.0363764	0.0063892

#Make a Plot
mean_values_df <- data.frame(
  Category = c("Building", "Sky", "Tree", "Road", "Sidewalk"),
  Walkable = c(output1_mean_building, output1_mean_sky, output1_mean_tree, output1_mean_road, output1_mean_sidewalk),
  Unwalkable = c(output2_mean_building, output2_mean_sky, output2_mean_tree, output2_mean_road, output2_mean_sidewalk)
)

# Convert from wide to long format for plotting with ggplot2
long_mean_values_df <- tidyr::pivot_longer(mean_values_df, cols = -Category, names_to = "Area", values_to = "Mean")

ggplot(long_mean_values_df, aes(x = Category, y = Mean, fill = Area)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(
    title = "Mean Proportion Comparison",
    x = "Feature",
    y = "Mean Proportion"
  ) +
  scale_fill_brewer(palette = "Pastel1") # for nice color shades

Findings for analysis2: The plots and tables indicates that the walkable tract is characterized by higher mean values for buildings, sidewalks, and roads, with sidewalks showing particularly elevated averages. This suggests that the presence of well-defined pedestrian pathways is a significant factor contributing to walkability. On the other hand, unwalkable tract us marked by a notably lower mean value for sidewalks proportion, implying limited infrastructure to support pedestrian activities. Additionally, these less walkable areas display higher mean values for trees and sky, which may reflect a more open space or less densely built environment.

# //TASK //////////////////////////////////////////////////////////////////////

Analysis 3 - Draw boxplots.

Draw box plots comparing the proportion of building, sky, tree, road, and sidewalk between walkable and unwalkable areas. Each plot presents two boxes: one for walkable areas and the other for unwalkable areas. In total, you will have 5 plots.

After the calculation, provide a brief description of your findings.

# TASK ////////////////////////////////////////////////////////////////////////
# Create box plot(s) using geom_boxplot() function from ggplot2 package.
# Use `seg_output_nodes` object to draw the box plots.
# You will find `pivot_longer` function useful.
# Assuming 'df' is your original wide-format dataframe with the area type (walkable or unwalkable) as a column

# Add the 'area' column to the dataframe
seg_output_nodes$area <- ifelse(1:nrow(seg_output_nodes) <= 1704, "walkable", "unwalkable")

df_long <- seg_output_nodes %>%
  pivot_longer(
    cols = c(building, sky, tree, road, sidewalk), 
    names_to = "Feature",
    values_to = "Proportion"
  )

# Create boxplot
ggplot(df_long, aes(x = Feature, y = Proportion, fill = area)) +
  geom_boxplot() +
  facet_wrap(~ area, scales = "free_y") +
  theme_minimal() +
  labs(
    title = "Distribution of Proportions by Walkability",
    x = "Feature",
    y = "Proportion"
  ) +
  scale_fill_brewer(palette = "Set2")

The boxplot reveals distinct patterns in the distribution of features between walkable and unwalkable areas. In walkable tract, there is a higher median proportion of sidewalks, which is consistent with the expectation that good pedestrian infrastructure correlates with walkability. Buildings also show a higher median in walkable areas, potentially reflecting more urban and developed environments that are conducive to walking. The proportion of trees is higher in unwalkable areas, which could suggest a more suburban or rural setting, typically less designed for pedestrian travel. The sky follows a similar pattern to trees, with unwalkable areas showing a higher median proportion, which may be due to fewer tall buildings obstructing the view.The road feature does not exhibit a strong difference between walkable and unwalkable areas. In conclusion, the proportion of sidewalks is a significant indicator of an area’s walkability, as opposed to the presence of trees or the amount of visible sky.

# //TASK //////////////////////////////////////////////////////////////////////