Introduction

My capstone project for Data 205 is an interactive map showing affordable housing and transporation options in Montgomery County. I knew I was going to do this GIS project coming into the semester. My sister-in-law, who works for a non-profit promoting alternative energy policies and programs, suggested the map of electric vehicle charging availability to low-to-middle income households. We had several conversations about the data to include on the map, affordable housing, public transit, bike sharing, car sharing, etc.

The first thing I did was take several online courses about working with geo-spatial data and using Leaflet in R to create maps. And I began gathering data before the semester started.

I was able to find data for all but one of the layers I had planned. I couldn’t find substantial Car Sharing data without joining multiple hourly car rental websites, which isn’t something I was willing to do. The data that Montgomery County has about Car Sharing locations is minimal.

Creating this report

I coded 8 separate .Rmd files for the project, including the file to create this report. There are five files for creating layers in the map.

  1. PubTrans.Rmd creates the public transportation marker data.
  2. ev_charging.Rmd creates the the Electric Vehicle charging marker data.
  3. bikeshare.Rmd creates the Bike Sharing marker data.
  4. Dwellings.Rmd creates the affordable dwelling marker data.
  5. ACS_Income.Rmd creates the income heat data.
  6. MoCo_TransportationEquityMap.Rmd creates the interactive map.
  7. MoCo_TransportationEquity.Rmd create the project presentation.
  8. MoCo_TransportationEquityReport.Rmd creates this report.

The first six files are imported into this report as child .Rmd files.

Marker layers

I created separate datasets from each of my source input files as I developed each layer in turn. I chose to use longitude and latitude as separate fields in tibbles for my marker data. There were a few instances where SpatialPoints were needed for a specific function, but the results of those functions were added to the tibble. I became more confident using the spatial data types as my work on the project progressed, but I chose to keep using tibbles rather than rework code that was performing well.

Public Transportation Data - PubTrans.Rmd

Ride-On Bus Stops

The Ride-On Bus stops data, Data Montgomery - Montgomery_County_Ride_On_Bus_Stops contains the bus stop name and the longitude and latitude point data. This data is used to create a marker layer in the map. The data needed minimal cleaning.

bus <- read_csv('./sourcedData/Montgomery_County_Ride_On_Bus_Stops.csv')  
names(bus) <- str_to_lower(str_replace_all(names(bus), ' ','_'))         # clean the names
bus$stopname <- str_to_title(bus$stopname)                               # clean the stop names
bus$stopname <- str_replace(bus$stopname, "& @", "@")                    # replace some odd looking punctuation

glimpse(bus)                                                             # look at the bus data  
## Rows: 1,000
## Columns: 2
## $ stopname <chr> "Rockledge Dr @6600", "Cheshire Dr & Old Georgetown Rd", "Che~
## $ point    <chr> "POINT (-77.13205700033012 39.028854000012245)", "POINT (-77.~

Filter stops close together

The data contains stops that are across the street from each other, or otherwise close to each other. I filtered the data so that the bus stops are at least 300 feet apart. I parsed out the longitude and latitude fields and created a matrix to use in the filterByProximity function from the rangeBuilder package.

This is an example of code that I could have refactored using SpatialPoints, but it gets the job done as it is.

p1 <- bus 
p1$xy <-  substr(p1$point, 8, str_locate(p1$point, "\\)") - 1)  # parse the point data into separate columns
options(digits=17)                                              # set the decimal digits for float fields
options(pillar.sigfig = 17)                                     # set significant digits for float fields
bus$lon <- substr(p1$xy, 1, str_locate(p1$xy, " ")) %>% as.numeric()       # parse the point data into lon
bus$lat <- substr(p1$xy, str_locate(p1$xy, " "), length(p1$xy)) %>% as.numeric()   # and lat

bus$category <- "Bus_Stop"                                            # add category column to the data
bus_all <- bus %>% dplyr::select(category, name = stopname, lon, lat) # select fields for marker layer

bmat <- cbind(bus$lon, bus$lat)                                       # create a matrix of lon & lat
b2 <- filterByProximity(bmat, 0.09, returnIndex = F) %>%              # remove duplicates that are less than
      as_tibble(.name_repair = "unique")                              # 300 ft apart
                                                                              

names(b2) <- c("lon", "lat")          # name the columns         

bus <- left_join(b2, bus_all)             # join to get the desired bus stops

Metro Bus Stops

The Metro Bus Stop data comes from Maryland Open Data, Metro Bus Stops, Maryland_Transit_-_WMATA_Metro_Bus_Stops

The Metro Bus Stops are in a shape file which contains all of the stops in the system. The Cartographic Reference System (CRS) of the shape data is Mercator which I transformed to WGS84 to be compatible with the rest of the data.

In order to get the stops that are in Montgomery County I use the Zipcodes shape file from Data Montgomery at the mcatlas.org website, Montgomery Planning - MC Atlas Zipcodes-Montgomery. The data is filtered to select the stops that are in Montgomery County using the point.in.poly function from the spatialEco package. Then the stop name, longitude and latitude are selected from the data and it’s filtered so that the stops are at least 300 feet apart.

# metro bus data

metro_bus <- raster::shapefile("./sourcedData/Maryland_Transit_-_WMATA_Metro_Bus_Stops/TRAN_WMATAMetroBusStops_WMATA.shp")

# zipcode polygons for Montgomery County

zipcode <- raster::shapefile("./shapefiles/zipcode.shp", GDAL1_integer64_policy = TRUE)

zipcode@data <- zipcode@data %>% dplyr::select(zipcode, mail_city, shape_area, shape_leng)

metro_bus <- sp::spTransform(metro_bus, proj4string(zipcode))  # transform CRS from mercator to WGS84

require(spatialEco)
mbus <- point.in.poly(metro_bus, zipcode, duplicate = F)   # filter for bus stops in MC

mbus <- mbus %>% as_tibble(.name_repair = "unique") %>%    # create a df
                 dplyr::filter(pid1 != "NA") %>%           # remove missing data
                 dplyr::arrange(AT_STR, ON_STR)            # sort by cross street names

mbus$category <-  "Metro_Bus_Stop"                         # add category column
mbus$name <- base::paste(mbus$ON_STR, "@", mbus$AT_STR)    # create the stop name from the cross streets


mbus_all <- mbus %>% dplyr::select(category,                     # select the fields for the marker data
               name,
               lon  = coords.x1,
               lat = coords.x2)

bmat <- cbind(mbus_all$lon, mbus_all$lat)                          # matrix of lon & lat
b2 <- filterByProximity(bmat, 0.09, returnIndex = F) %>%   # filter out stops less than 300 ft apart
          as_tibble(.name_repair = "unique" )

names(b2) <- c("lon", "lat")                               # rename the lon / lat columns

mbus <- dplyr::left_join(b2, mbus_all)                    # join to get the desired bus stops

Metro Rail Stations

There are 13 Red Line Metro Rail stations in Montgomery County. The data from Data Montgomery Metro-Rail-Stations is already tidy and includes WGS84 longitude and latitude attributes.

metro <- read_csv('./sourcedData/Metro_stations.csv')                 # read metro station data
names(metro) <- str_to_lower(str_replace_all(names(metro), ' ','_'))  # clean up the names

glimpse(metro)         # look at the file
## Rows: 13
## Columns: 14
## $ objectid  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
## $ category  <chr> "METRO STATIONS", "METRO STATIONS", "METRO STATIONS", "METRO~
## $ name      <chr> "Wheaton", "Forest Glen", "Silver Spring", "Friendship Heigh~
## $ address   <chr> "11171 Georgia Ave", "9730 Georgia Ave", "8400 Colesville Rd~
## $ city      <chr> "Wheaton", "Wheaton", "Silver Spring", "Friendship Heights",~
## $ zipcode   <dbl> 20902, 20902, 20910, 20815, 20814, 20892, 20852, 20852, 2085~
## $ phone     <chr> "202-637-7000", "202-637-7000", "202-637-7000", "202-637-700~
## $ url       <chr> "http://www.wmata.com", "http://www.wmata.com", "http://www.~
## $ data      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ point_x   <dbl> 1298033.3787258500, 1300013.8594956901, 1303492.1235629299, ~
## $ point_y   <dbl> 499621.71499564999, 491147.35620375001, 483379.39024451998, ~
## $ shape     <chr> "(39.038541001000056, -77.05034726599996)", "(39.01527587900~
## $ latitude  <dbl> 39.038533139999998, 39.015268030000001, 38.993943440000002, ~
## $ longitude <dbl> -77.050344989999999, -77.043358260000005, -77.03110723999999~
metro <- metro %>% dplyr::select(category,                            # the file already contains category
                                 name,                                # the station name
                                 address,                             # the station address
                                 lon = longitude, lat = latitude)     # geo point data
p1$category <- "Metro_station"                           

Combine public transportation data

I’ve assigned a category field to each of the three datasets, and I combine them to create a single file with all of the public transportation points. This file will be used to calculate the distance of affordable housing from public transportation.

The raster package has a select() method which causes conflicts when I try to select from a tibble. So I need to specify the package when I have both packages loaded. This is an instance where using spatial data types would have been a bit cleaner.

b1 <- rbind(bus_all, mbus_all)         # combine ride-on and metro bus stops

bmat <- cbind(b1$lon, b1$lat)                               # matrix of lon & lat
b2 <- filterByProximity(bmat, 0.09, returnIndex = F) %>%    # filter out stops closer than 300 feet
      as_tibble(.name_repair = "unique")

names(b2) <- c("lon", "lat")          # rename the lon & lat fields

b1 <- left_join(b2, b1)               # join to get desired bus stops

p1 <- metro %>% dplyr::select(!address)  # get the metro station geo points

public_transport <- rbind(p1, b1)     # bind into single file

pacman::p_unload(rgdal, GISTools, rangeBuilder)

The three marker files are written to the /markers folder and will be used for building the map. The combined pub_trans file is written to the /working directory.

# write_csv(bus, './markers/bus_stops.csv')
# write.csv(mbus, "./markers/metro_bus.csv")
# write.csv(metro, "./markers/metro.csv")
# write_csv(public_transport, './working/public_transport.csv')

Electric Vehicle Charging Data - ev_charging.Rmd

I found electric vehicle charging locations data from the Alternative Fuels Data Center (AFDC) website - Alternative Fuel Stations. The data is downloadable by state, so I selected “ELEC” as the fuel-type and MD for the state.

The alternative fuel data contains 65 fields most of which I don’t need. I select the category, address data, access_code (public/private) and geom point data.

evc <- evc %>% dplyr::select(category = fuel_type_code,      # select desired columns
                      address = street_address, 
                      city, 
                      zip, 
                      access_code,
                      l1_num = ev_level1_evse_num,
                      l2_num = ev_level2_evse_num,
                      dc_fast_num = ev_dc_fast_count,
                      lat = latitude,
                      lon = longitude                     
                      )

evc$l1_num <- replace_na(evc$l1_num, 0 )                   # replace n/a with zeros for counts
evc$l2_num <- replace_na(evc$l2_num, 0 )                   # in the charging station columns
evc$dc_fast_num <- replace_na(evc$dc_fast_num, 0 )

evc$evse_tot <- evc$l1_num + evc$l2_num + evc$dc_fast_num  # calculate the total station counts

The ev charging data was the first set of data I worked with, and I hadn’t found the zip code shape file yet. I used the zip code file from USPS and filtered it for Montgomery County zip codes.

zipcodes <- read_csv('./sourcedData/usazipcode-1512j.csv') %>% tibble()         # get the USPS zipcode data  
glimpse(zipcodes)
## Rows: 38,805
## Columns: 5
## $ zip            <chr> "501", "1001", "1002", "1003", "1004", "1005", "1007", ~
## $ `Zipcode name` <chr> "HOLTSVILLE, NY", "AGAWAM, MA", "AMHERST, MA", "AMHERST~
## $ City           <chr> "HOLTSVILLE", "AGAWAM", "AMHERST", "AMHERST", "AMHERST"~
## $ State          <chr> "NY", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "~
## $ `County Name`  <chr> "SUFFOLK", "HAMPDEN", "HAMPSHIRE", "HAMPSHIRE", "HAMPSH~
names(zipcodes) <- str_to_lower(str_replace_all(names(zipcodes), ' ','_'))      # clean the column names

moco_zip <-  zipcodes %>% 
  filter( state == "MD" & county_name == "MONTGOMERY") %>%             # filter for Montgomery County
  dplyr::select(zip)                                                   # select the zipcode column

I use a semi-join to get the rows in ev charging that have Montgomery County zip codes.

evc <- semi_join(evc, moco_zip, by = "zip")  # get ev records for Montgomery County

Then I sum up the total number of charging stations per location.

evc <- evc %>% 
  group_by(address, city, zip, access_code, lon, lat) %>% 
  summarise(evse_tot = sum(evse_tot))                      # sum the total stations

evc$address <- paste0(evc$address, " ", evc$city, ", MD ", evc$zip)  # create marker description

And write the EV Charging data file to the markers folder.

# write_csv(evc, './markers/ev_charging.csv')

Bike Sharing Data - bikeshare.Rmd

The bike sharing data is a tidy dataset, I can select the fields I want and do some simple reformatting.

bike <- read_csv('./sourcedData/Bikeshare.csv')
names(bike) <- str_to_lower(names(bike))           # clean the column names

glimpse(bike)
## Rows: 94
## Columns: 10
## $ category   <chr> "Bikeshare Station", "Bikeshare Station", "Bikeshare Statio~
## $ name       <chr> "Blueridge & Elkins", "Broschart & Blackwell", "Stewart & A~
## $ the_geom   <chr> "POINT (-77.0500623 39.0431378)", "POINT (-77.2003244 39.10~
## $ url        <chr> "http://www.montgomerycountymd.gov/bikeshare", "http://www.~
## $ city       <chr> "Wheaton", "Rockville", "Silver Spring", "Rockville", "Rock~
## $ objectid_1 <dbl> 70, 1, 89, 6, 11, 57, 43, 25, 38, 26, 53, 15, 21, 47, 94, 5~
## $ docks      <chr> "14 docks", "15 docks", "19 docks", "15 docks", "19 docks",~
## $ phone      <chr> "(240) 777-8380", "(240) 777-8380", "(240) 777-8380", "(240~
## $ objectid   <dbl> NA, 1, NA, 6, 12, NA, 45, 27, 40, 28, NA, 16, 23, 49, NA, N~
## $ zipcode    <dbl> 20902, 20850, 20904, 20850, 20850, 20902, 20910, 20815, 209~
# select the desired columns

p1 <- bike %>% dplyr::select(category, name, city, zipcode, the_geom) 

# reformat the geom point column into lon & lat
p1$xy <-  substr(p1$the_geom, 8, str_locate(p1$the_geom, "\\)") - 1)  # grab the point info string

options(digits=9)                                                     # set the desired digits
p1$lon <- substr(p1$xy, 1, str_locate(p1$xy, " ")) %>%                # grab the lon points
          as.numeric()                                                # change to numeric

p1$lat <- substr(p1$xy, str_locate(p1$xy, " "), 28) %>%               # grab the lat points
  as.numeric()

bike <- p1 %>% dplyr::select(category, name, city, zipcode, lon, lat) # select final columns

bike$description <- str_c(bike$name, '\n', bike$city, '\n', bike$zipcode) # add marker description

# write_csv(bike, './markers/bike.csv')

Affordable Dwelling Units - Dwellings.Rmd

Affordable and Moderately Priced housing in Montgomery County

Annual Rental Facility Survey

2021 Annual Rental Facility Occupancy Survey Facilities includes the average reported rent by bedroom count for Montgomery County facilities. This data is available from

https://data.montgomerycountymd.gov/Consumer-Housing/2021-Rental-Facility-Occupancy-Survey-Results/7wpt-giig

The affordable rents for 2016 through 2021 are separate columns in the data.

rents <- read_csv("./sourcedData/2021_Rental_Facility_Occupancy_Survey_Results.csv",
                  locale=locale(encoding = "windows-1252"))
names(rents) <- str_to_lower(str_replace_all(names(rents), ' ','_'))
glimpse(rents)
## Rows: 1,550
## Columns: 11
## $ community_name                                <chr> "8513 Flower", "Central"~
## $ community_address                             <chr> "8513 FLOWER AVE TAKOMA ~
## $ bedroom_types                                 <chr> "1 bedroom", "2 bedroom"~
## $ average_rent_2016                             <chr> NA, NA, NA, NA, "$   875~
## $ average_rent_2017                             <chr> NA, NA, "$   650", NA, "~
## $ average_rent_2018                             <chr> NA, "$   2,624", "$   65~
## $ average_rent_2019                             <chr> "$   800", "$   2,637", ~
## $ average_rent_2020                             <chr> "$   895", "$   2,644", ~
## $ average_rent_2021                             <chr> "$   895", "$   2,653", ~
## $ `percent_change_from_previous_year_2020-2021` <dbl> 0.0000, 0.0032, 0.0000, ~
## $ geo_location                                  <chr> NA, NA, NA, NA, NA, NA, ~
cat('\nunique addresses = ', length(unique(rents$community_address)),
    '\nunique community names', length(unique(rents$community_name)))
## 
## unique addresses =  706 
## unique community names 703

I’m interesed in the most recent information so I select the 2021 rent data, and look at the summary information.

#Select the name, address, bedroom type, 2021 average rent and the geo-location fields

rents_2021 <- rents %>% dplyr::select(starts_with(c('comm', 'bed')), average_rent_2021, geo_location)

#Create an integer bedroom count field from the first character of bedroom_type, assign 0 to efficiency 

rents_2021$bedroom_count <- str_sub(rents_2021$bedroom_types, 1,1) %>% str_replace('e', '0') %>% as.integer()

#Convert the average rent field to an integer and sort the data by bedroom count and average rent

rents_2021$average_rent_2021 <-  as.integer(gsub("[$,]", "", rents_2021$average_rent_2021))
rents_2021 <- arrange(rents_2021, bedroom_count, average_rent_2021)

#Look at the summary data for 2021 rents
summary(rents_2021)
##  community_name     community_address  bedroom_types      average_rent_2021 
##  Length:1550        Length:1550        Length:1550        Min.   :  155.00  
##  Class :character   Class :character   Class :character   1st Qu.: 1127.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 1422.00  
##                                                           Mean   : 1600.68  
##                                                           3rd Qu.: 1778.25  
##                                                           Max.   :13798.00  
##  geo_location       bedroom_count    
##  Length:1550        Min.   :0.00000  
##  Class :character   1st Qu.:1.00000  
##  Mode  :character   Median :2.00000  
##                     Mean   :1.56516  
##                     3rd Qu.:2.00000  
##                     Max.   :4.00000
rents_2021 %>% group_by(bedroom_count) %>% summarise(min(average_rent_2021), max(average_rent_2021))
## # A tibble: 5 x 3
##   bedroom_count `min(average_rent_2021)` `max(average_rent_2021)`
##           <int>                    <int>                    <int>
## 1             0                      500                     8211
## 2             1                      166                    11442
## 3             2                      155                    13798
## 4             3                      600                     6391
## 5             4                     1138                     6325

I create a boxplot to look at the spread of rents by bedroom counts.

rp <- ggplot(rents_2021,aes(x=bedroom_count, y=average_rent_2021, fill = factor(bedroom_count))) +
  geom_boxplot() +
  guides(fill = "none") +  
  scale_x_continuous(n.breaks = 6)
rp

There are large numbers of high outliers. The highest rents are for apartments at a senior living facility in Kensington that provides meal, assisted living and medical services. The lowest rents are less than $200 a month for 1 or 2 bedroom apartments in Gaithersburg and Takoma Park. These may be subsidized housing or data entry errors.

Affordable rents

According to the Montgomery County Department of Housing and Community Affairs annual report Montgomery County currently has more than 20,000 households who are severely housing cost burdened; they earn less than $31,000/ year and spend more than half their income on rent. And more than 80 percent of renters earning up to $70,000/year are housing cost burdened. https://www.montgomerycountymd.gov/DHCA/Resources/Files/director/publications/dhca_annual_report-fy20-21.pdf

HUD defines housing cost burden as over 30% of income spent on housing. https://www.huduser.gov/portal/pdredge/pdr-edge-featd-article-081417.html

#Filter for rents that are <= to 30% of a 70k income

affordable <- (70000 * .3) / 12
affordable_rents <- rents_2021 %>% 
  dplyr::filter(average_rent_2021 <= affordable)

#Look at the summary data for affordable rents
summary(affordable_rents)
##  community_name     community_address  bedroom_types      average_rent_2021
##  Length:1150        Length:1150        Length:1150        Min.   : 155.00  
##  Class :character   Class :character   Class :character   1st Qu.:1047.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :1273.00  
##                                                           Mean   :1256.63  
##                                                           3rd Qu.:1494.00  
##                                                           Max.   :1750.00  
##  geo_location       bedroom_count    
##  Length:1150        Min.   :0.00000  
##  Class :character   1st Qu.:1.00000  
##  Mode  :character   Median :1.00000  
##                     Mean   :1.37043  
##                     3rd Qu.:2.00000  
##                     Max.   :4.00000
affordable_rents %>% group_by(bedroom_count) %>% summarise(min(average_rent_2021), max(average_rent_2021))
## # A tibble: 5 x 3
##   bedroom_count `min(average_rent_2021)` `max(average_rent_2021)`
##           <int>                    <int>                    <int>
## 1             0                      500                     1743
## 2             1                      166                     1749
## 3             2                      155                     1746
## 4             3                      600                     1750
## 5             4                     1138                     1719
affordable_pct <- round(nrow(affordable_rents) / nrow(rents_2021), 2) * 100

The affordable threshold for for an income of $70,000/year is $1,750.00 a month. 74 percent of the rents in the dataset are affordable.

Facility geo-location data

geo_loc <- affordable_rents %>% 
   dplyr::filter(is.na(geo_location)) %>%                             # find the records w/o geoloc data
   dplyr::distinct(community_name, community_address, geo_location)

dplyr::filter(geo_loc, is.na(community_address))                      # find the records w/o address
## # A tibble: 2 x 3
##   community_name           community_address geo_location
##   <chr>                    <chr>             <chr>       
## 1 Boulevard Of Chevy Chase <NA>              <NA>        
## 2 Charter House            <NA>              <NA>

There are 424 facilities missing geo-location information, the majority can be geo-coded using the address field. Two facilities have missing addresses. The address for Charter House is 1316 Fenwick Lane Silver Spring, MD 20910. An obvious address doesn’t show up in a search for Boulevard Of Chevy Chase.

# assign the address for Charter House

affordable_rents[affordable_rents$community_name == 'Charter House', "community_address"] <- 
  '1316 Fenwick Lane Silver Spring, MD 20910'

affordable_rents <-  affordable_rents %>% 
                   dplyr::filter(!is.na(community_address))           # filter the remaining blank address

There are multiple rows in the file, one for each apartment type in each building. I only need one record for each facility, so I’ll filter out the duplicates.

dupes <- duplicated(affordable_rents$community_address)                     # find duplicate rows
affordable_rents <- affordable_rents %>% 
  dplyr::filter(!dupes) %>%                                              # filter the duplicates
  rename(name = community_name, address = community_address)      # rename the columns to match mpdu file 

Moderately Priced Dwelling Units data

The Moderately Prices Dwelling Units (MPDU) data is available from https://montgomeryplanning.org/tools/gis-and-mapping/data-downloads/. There are 71 apartment buildings listed in the file. The file does not contain the address of the building or geo-location data.

mpdu <- read_csv("./sourcedData/RentalsGrid.csv") %>% tibble()
names(mpdu)
##  [1] "Name"                    "City"                   
##  [3] "Unit Type"               "Efficiency"             
##  [5] "One Bedroom"             "One Bedroom and Den"    
##  [7] "2 Bedrooms, 1 Bathroom"  "2 Bedrooms, 2 Bathrooms"
##  [9] "3 Bedrooms"              "Phone"                  
## [11] "Email"                   "Web"

I manually updated the file with the addresses by copying them from the websites listed for each record and saved a new file in the working directory.

mpdu <- read_csv("./working/MPDU_address.csv") %>% tibble() %>%  # read the updated rental grid file
  dplyr::select(name, address)

glimpse(mpdu)
## Rows: 71
## Columns: 2
## $ name    <chr> "7001 Arlington at Bethesda", "7770 Norfolk Avenue", "Bainbrid~
## $ address <chr> "7001 ARLINGTON RD BETHESDA, MD 20814", "7770 Norfolk Ave Beth~

Add description headers for dwelling data and bind the two data sets together.

mpdu$description <- "Moderately Priced Dwelling Unit (MPDU)"

affordable_rents$description <- "Affordable Rent"

dwellings <- bind_rows(mpdu, affordable_rents)

Make sure there are no duplicated addresses between the two files.

dwellings$address <- str_to_upper(dwellings$address)     # change all the addresses to upper case

dupes <- duplicated(dwellings)                           # find duplicate rows

dwellings <- dwellings %>% 
  filter(!dupes)               # filter duplicate rows

Geo-code the addresses.

pacman::p_load(ggmap)
#dwellings <- mutate_geocode(dwellings,address)

# Write and read to avoid running geocode for repeat executions

# write_csv(dwellings, './working/dwellings_working.csv')

dwellings <- read_csv('./working/dwellings_working.csv',
                      locale=locale(encoding = "windows-1252")) %>% tibble()

Calculate the distance from dwelling locations to transportation options

Read the transportation files.

ptrans <- read_csv("./working/Public_Transport.csv")  # combined public transportation data from pubtrans.rmd
bikeshare <- read_csv("./markers/bike.csv")           # bike sharing marker layer file from bikeshare.rmd
ev_charging <- read_csv("./markers/ev_charging.csv") %>%  # ev charging marker layer file from ev_charging.rmd
  tibble() %>% 
  dplyr::filter(access_code == "public")                     # only use the public charging stations

spDists from the sp package computes the distance between two geo-location point and returns the value in kilometers, the kilometer to mile conversion factor = 0.6213712 is used to compute the distance in miles.

km2mi <- 0.6213712                               # set the kilometer to mile conversion factor

mmat <- cbind(dwellings$lon, dwellings$lat)            # create matrix of the dwelling lon/lat
pmat <- cbind(ptrans$lon, ptrans$lat)            # create matrix public trans lat & lon

d1 <- spDists(mmat, pmat, longlat = TRUE)          # compute the distance between dwelling and public trans
                                                  # output is a large matrix of all distances
dwellings$distance <- apply(d1, 1, min) * km2mi               # apply kilometer to mile conversion to the minimum distance

# take a look at the percentage > 1/4 mile from public transportation

d2 <- dwellings %>% dplyr::filter(distance > 0.25)         

pct_far <- round(count(d2) / count(dwellings), 2) %>% as.numeric() 
pct_far <- paste0(pct_far * 100, '%')

# compute distances for bikeshare and ev_charging

bmat <- cbind(bikeshare$lon, bikeshare$lat)                 # matrix of bikeshare lon/lat
d1 <- spDists(mmat, bmat, longlat = TRUE)                   # compute the distance
dwellings$distance_b <- apply(d1, 1, min) * km2mi              # convert the minimum distance to miles

emat <- cbind(ev_charging$lon, ev_charging$lat)             # matrix of charging station lon/lat
d1 <- spDists(mmat, emat, longlat = TRUE)                   # compute the distance
dwellings$distance_ev <- apply(d1, 1, min) * km2mi             # convert minimum distance to miles

42% of affordable dwellings are more than 1/4 mile (as the crow flies) from public transportation

Calculate the walking distance from dwellings to public transportation.

d2 <- spDists(cbind(dwellings$lon, dwellings$lat), cbind(ptrans$lon, ptrans$lat), longlat = TRUE)

d2_min <- apply(d2, 1, min)             # find the location with min geo-distance     

ind_m <- rep(0, nrow(d2))
for (i in 1:nrow(d2)) {
  ind_m[i] <- which(d2[i,] == d2_min[i], arr.ind = TRUE)  # use which() function to find index for min distance
} 

orig <- paste0(dwellings$lat, "+", dwellings$lon)
dest <- paste0(ptrans$lat[ind_m], "+", ptrans$lon[ind_m])

# pacman::p_load(gmapsdistance)
# 
# results <- gmapsdistance(origin = orig,
#                         destination = dest,
#                         combinations = "pairwise",
#                         mode = "walking")
# 
# 
# walking <- tibble(distance_w = results[["Distance"]][["Distance"]])
# 
# # Write and read to avoid running gmapsdistance for repeat executions
# write_csv(tibble(walking), './working/walking_distance.csv')

walking <- read_csv('./working/walking_distance.csv') %>% tibble()
 # convert meters to miles

dwellings$distance_w <- walking$distance_w / 1000

dwellings$distance_w <- dwellings$distance_w * km2mi

Create the pop-up text for the dwellings markers.

dwellings$address <- str_to_title(dwellings$address)
dwellings$description <- paste0("<b>", dwellings$name, "</b><br/>",
         dwellings$address,  "<br/>",
         dwellings$description,"<br/>",
         "Distance to Public Transportation: ", round(dwellings$distance,2)," mi.<br/>",
         "Walking Distance to Public Transportation: ", round(dwellings$distance_w,2)," mi.<br/>",
         "Distance to Electric Vehicle Charging: ", round(dwellings$distance_ev,2)," mi.<br/>",
         "Distance to Bikeshare: ",  round(dwellings$distance_b,2), " mi.")

# write_csv(dwellings, "./markers/dwellings.csv")

Income Heatmap - ACS_Income.Rmd

Create the Income polygon layer

The income map layer is a heat map polygon layer using the Census Block Group shape data and the American Community Survey Income data.

ACS Income data file

The ACS Income file contains counts of survey results for each census block group. There are sixteen income ranges going from up to $10,000 per year to over $200,000 per year. Each range has a count and a margin of error (ME) field. The ME field isn’t needed to create the map layer, the income counts, total count and id columns are selected.

ACS Median Income data file

The median income file contains the calculated median income and margin-of-error for each Census Block Group.

glimpse(med_income)
## Rows: 614
## Columns: 4
## $ median_income <chr> "156250", "222619", "135313", "113910", "218173", "17298~
## $ margin_of_err <chr> "13398", "69001", "30011", "12407", "79761", "22856", "8~
## $ geo_id        <chr> "1500000US240317001011", "1500000US240317001012", "15000~
## $ block_group   <chr> "Block Group 1, Census Tract 7001.01, Montgomery County,~

Create tidy income data table

The income data is pivoted longer into two columns and the income ranges are converted to numeric values. Then the median income for each block group is joined to the income data.

p1 <- income %>% 
    dplyr::select (cols = !starts_with("ME"))   # remove the Margin of Error columns

# rename the income range columns

names(p1) <- c('total',
               "$10,000", "$14,999", "$19,999", "$24,999", "$29,999", "$34,999", "$39,999", "$44,999", 
               "$49,999", "$59,999", "$74,999", "$99,999", "$124,999", "$149,999", "$199,999", "$200,000",
               "geo_id", "block_group" )

# pivot the income range counts into a longer dataset

p1 <- p1 %>%  pivot_longer(starts_with("$"), names_to = "income", values_to = "cnt")

# function to convert the currency character amounts into numeric

convertCurrency <- function(currency) {
  currency1 <- sub('$','',as.character(currency),fixed=TRUE)
  currency2 <- as.numeric(gsub('\\,','',as.character(currency1))) 
  currency2
}


p1$income <- convertCurrency(p1$income)           # convert the income range amounts to numeric

med_income <- dplyr::select(med_income, -margin_of_err)  # remove the ME column from median income 

p1 <- left_join(p1, med_income)                   # join median income data to income data

Heatmap data

The percentage of responses that are below 50% of Montgomery County’s median income is computed for each block group.

mc_med_income <- 110389                         # the median income for Montgomery County

p2 <- p1 %>% filter(income <= mc_med_income)    # get the records below median income
p3 <- p2 %>%                                    
  group_by(geo_id) %>%
  summarise(sum(cnt), 
            pct = sum((cnt/total)),             # compute the percentage below median income 
            median_income, 
            total, 
            block_group) %>%                    # per block group
  distinct()

p2 <- p1 %>% filter(income <= mc_med_income/2)  # get record below 50% median income

acs_income <- p2 %>%                            # acs_income will be the final dataset
  group_by(geo_id) %>%
  summarise(cnt_less= sum(cnt), 
            LMI50_pct = sum((cnt/total)),       # compute percentage below 50% median income
            median_income, 
            total, 
            block_group) %>%                    # per block group
  distinct()

acs_income$pct <- p3$pct                        # add pct below median income to acs_income
acs_income$bg2010 <- str_sub(acs_income$geo_id, 10)  # substring out the block group name

acs_income <- acs_income %>% 
              dplyr::select(bg2010, cnt_less, pct, LMI50_pct, median_income, total, block_group)

# write_csv(acs_income, './income.csv')

Census block group shape file

The Census Block Group polygon shape data is read from the shape file and joined to ACS income fields.

pacman::p_load(raster, sp)
census <- shapefile("./sourcedData/CensusBlockGroups2010/geo_export_eb847666-3b06-45be-88d3-a9f779b2211e.shp",
                    GDAL1_integer64_policy = TRUE)

p4 <- acs_income
p4$bg2010 <- as.character(p4$bg2010)    # convert block group ids to character

LMI50 <- p4 %>% dplyr::select(bg2010, LMI50_pct, i_total = total, cnt_less)  # select the < 50% median income data
census@data <- left_join(census@data, LMI50)                                 # join to the shape file data

# shapefile(census, filename = 'income.shp', overwrite = TRUE)     # write the census income shape file
detach("package:raster", unload = TRUE)

Heatmap

Test the income heatmap layer using leaflet.

pacman::p_load(leaflet)

#colorNumeric
nc_pal <- colorNumeric(palette = "Oranges",
                       domain = census@data$LMI50_pct)
m_income <- census %>%
  leaflet() %>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, fillOpacity = 0.5,
              color = ~nc_pal(LMI50_pct),
              label = ~paste0("Census Block ID: ", bg2010, "\nPct < 50% of median income: ",
                              round(LMI50_pct * 100, 2)),
              highlight = highlightOptions(weight = 3,color = "red",
                                           bringToFront = TRUE))

m_income

Map - MoCo_TransportationEquityMap.Rmd

Read the layer files

The shape files and data files for the map layers are added to the environment. The affordable dwelling data is split into two data frames:

  • dwellings <= 1/4 mile from public transportation
  • dwelling > 1/4 mile from public transportation

Create the palette for the income heatmap and custom icons for the marker layers.

# palette for income block group heat map
income_pal <- colorNumeric(palette = "Oranges", domain = acs_income@data$LMI50_pct)

# Icons for marker layers
near_icon <- makeIcon("./markers/home-purple.png", 18, 18)
far_icon <- makeIcon("./markers/home-red.png", 18, 18)
bike_icon <- makeIcon("./markers/bicycle-brown.png", 19, 19)
#car_icon <-  makeIcon("./markers/car.png", 18, 18)
bolt_icon <- makeIcon("./markers/bolt.png", 19, 19)
metro_icon <- makeIcon("./markers/metro.png", 16, 16)
bus_icon <- makeIcon("./markers/rideon_2.png", 10, 10)

image_list <- c("./markers/home-purple.png",
                "./markers/home-red.png", 
                "./markers/bolt.png", 
                "./markers/bicycle-brown.png",
                "./markers/rideon_2.png",
                "./markers/metro.png",
                "./markers/blue-circle.png")
image_labels <- c("<= 1/4 mi. / pub. trans", 
                  "> 1/4 mi. / pub. trans", 
                  "EV Charging", 
                  "BikeShare", 
                  "Ride-On Bus stop",
                  "Metro Station",
                  "Metro Bus stop")

Build the interactive map using leaflet:

  1. Two base layers using provider tiles
  • Gray tone map from CartoDB
  • Color map from OpenStreetMap
  1. polygon layers
  • Income data heatmap using Census Block Group shapes
  • Montgomery County Planning Area shapes
  • Zipcode shapes
  1. Marker layers
  • near affordable dwellings
  • far affordable dwellings
  • electric vehicle charging stations
  • Montgomery County Ride-On bus stops
  • WMATA Metro stations
  • WMATA Metro Bus Stops
  • Bike Sharing stations
  1. Legends
  • Income heatmap
  • Marker icons
  1. Search and map reset buttons
evc_map <- leaflet() %>%
  addProviderTiles("CartoDB", group = "Carto - gray") %>%
  addProviderTiles("OpenStreetMap.Mapnik", group = "OpenStreetMap - color") %>%
  setView(-77.1607098, 39.150, zoom = 11) %>%
  addPolygons(data = acs_income,
              weight = 1, fillOpacity = 0.5,
              color = ~income_pal(LMI50_pct),
              label = ~paste0("Census Block: ", bgLbls),
              popup = ~paste0("<b>American Community Survey (ACS)</b><br/> ",
                              "<b>Census.gov, Low-to-Moderate Income</b><br/> ",
                              "Pct LMI: ", round(LMI50_pct * 100, 2),
                              "<br/>LMI Count: ", cnt_less,
                              "<br/>Survey total: ", i_total),
              highlight = highlightOptions(weight = 3,color = "red",
                                           bringToFront = TRUE),
              group = "Census Block (LMI)") %>%
  addPolygons(data = mc_plan,
              weight = 1, fillOpacity = 0,
              dashArray = "dash", #look up dash pattern 
              color = "grey",
              label = ~paste0("Planning Area: ", NAME),
              highlight = highlightOptions(weight = 3,color = "black",
                                           bringToFront = FALSE),
              group = "Planning Area") %>%
  addPolygons(data = zipcode,
              weight = 1, fillOpacity = 0,
              color = "blue",
              label = ~paste0("Zipcode: ", zipcode),
              highlight = highlightOptions(weight = 3,color = "red",
                                           bringToFront = FALSE),
              group = "Zipcode") %>%
  addMarkers(data = ev_charging,
             icon = bolt_icon,
                   label = ~paste0("EV Charging: ", address),
                   group = "EV Charging") %>%
  addMarkers(data = near,
                   icon = near_icon,
                   label = ~htmlEscape(address),
                   popup = ~paste0(description),
                   group = "Affordable Dwelling",
#             clusterOptions = markerClusterOptions()
             ) %>%
  addMarkers(data = far,
                   icon = far_icon,
                   label = ~htmlEscape(address),
                   popup = ~paste0(description),
                   group = "Affordable Dwelling",
#             clusterOptions = markerClusterOptions()
             ) %>%
  addMarkers(data = metro, 
             icon = metro_icon,
                   label = ~htmlEscape(paste0(name, " Metro Station",  ', ', address)),
                   group = "Metro") %>%
  addMarkers(data = bus,
             icon = bus_icon,
                   label = ~paste0(" Bus Stop: ", name),
                   group = "Ride-On Bus") %>%
  addCircleMarkers(data = metro_bus,
                   radius = 2,
                   fill = T,
                   label = ~paste0(" Bus Stop: ", name),
                   group = "Metro Bus") %>%
  addMarkers(data = bikeshare,
                   icon = bike_icon,
                   label = ~paste0(category, ": ", description),
                   group = "BikeShare") %>%
  addLegend(data = acs_income, pal = income_pal, values = ~LMI50_pct, 
            title = "% LMI",
            labFormat = labelFormat(prefix = "(", suffix = ")%", between = ", ",
                                    transform = function(x) 100 * x), 
            group = "Census Block (LMI)", position = "bottomleft" ) %>%
  addLegendImage(images = image_list,
                 labels = image_labels,
                 labelStyle = "font-size: 12px; vertical-align: middle;",
                 width = 16, 
                 height = 16,
                 orientation = 'vertical',
                 position = 'bottomleft',
                 group = "Legend") %>% 
  addLayersControl(baseGroups = c("Carto - gray","OpenStreetMap - color"),
                   overlayGroups = c("Zipcode", "Census Block (LMI)", "Planning Area",
                                     "Affordable Dwelling", "EV Charging", 
                                     "BikeShare", "Metro", "Ride-On Bus", "Metro Bus", "Legend"),
                   position = "topleft") %>% 
  addSearchOSM() %>% 
  addResetMapButton()


# htmlwidgets::saveWidget(evc_map, "./MoCo_TransportationEquityMap.html", title = "Transportation Equity in Montgomery County, MD")

# hideGroup(evc_map, c("Ride-On Bus", "Metro Bus", "Planning Area"))


# markdown::rpubsUpload("MoCo_TransportationEquityMap.html", "./MoCo_TransportationEquityMap.html", "https://api.rpubs.com/api/v1/document/891486/7c245b76630748f4a260643dc9094fe6")

Because I built the map in several iterations I researched how to upload the html file to rPubs without having it create a new link id. Using the rpubsUpload from the markdown package allows this once you have captured the ID from an initial upload.

Research

My research was prompted from my first conversations with my sister-in-law, I wanted to have references for each of the ideas we discussed. I found more information and references as I was researching data sources.

References

  1. The Leadership Conference Education Fund, “Getting to Work: Transportation Policy and Access to Job Opportunities”, (2011). https://civilrightsdocs.info/pdf/docs/transportation/getting-to-work-july20.pdf
  2. The White House Briefing Room, FACT SHEET: President Biden Announces Steps to Drive American Leadership Forward on Clean Cars and Trucks, (August 5, 2021). https://www.whitehouse.gov/briefing-room/statements-releases/2021/08/05/fact-sheet-president-biden-announces-steps-to-drive-american-leadership-forward-on-clean-cars-and-trucks/
  3. CNBC, General Motors plans to exclusively offer electric vehicles by 2035https, (January 28, 2021).//www.cnbc.com/2021/01/28/general-motors-plans-to-exclusively-offer-electric-vehicles-by-2035.html
  4. Department of Numbers, Montgomery County (2019). https://www.deptofnumbers.com/income/maryland/montgomery-county/
  5. DataUSA: Montgomery County, MD (2019). https://datausa.io/profile/geo/montgomery-county-md/
  6. Montgomery County Housing Needs Assessment, (2020). https://montgomeryplanning.org/wp-content/uploads/2020/07/MoCo-HNA-July-2020.pdf;
  7. Montgomery County Affordable Housing Preservation Study (2020). https://montgomeryplanning.org/wp-content/uploads/2020/11/200914-Montgomery-County-Preservation-Study.pdf
  8. Department of Energy, President Biden, DOE and DOT Announce $5 Billion over Five Years for National EV Charging Network, (February 10, 2022). https://www.energy.gov/articles/president-biden-doe-and-dot-announce-5-billion-over-five-years-national-ev-charging

Presentation - MoCo_TransportationEquity.Rmd

I tried several options for creating my presentation in R Markdown. I wanted reusable code since I knew I’d be going through several iterations in order to create the presentation. I finally settled on a Slidy presentation. The presentation themes for Slidy are simple, I didn’t have the time to explore using more interesting themes. In order to document the coding for the presentation, I’ve pulled the code into the report .Rmd file.

After receiving feedback on the presentation from the “dry-run”, I decided to edit and re-order the slides so that I can use them as I present the map layers. I added a few plots to visually present statistics. I pulled the link to the interactive map to the front of the presentation and added acknowledgments on the same slide.

Income

I begin the presentation by looking at income information for LMI residents and display the income heatmap layer of the map.

# pacman::p_unload(leaflet, raster, htmltools, leaflet.extras, leaflegend)
# detach("package:spatialEco", unload = TRUE)
# detach("raster", unload = TRUE, force = TRUE)
pacman::p_load(tidyverse, knitr, kableExtra)

# American Communities Survey Income data
income <- read_csv("./sourcedData/ACS_income_2019.csv") %>% tibble()


p1 <- income %>% 
  dplyr::select (cols = !starts_with("ME"))    # deselect the margin of error columns

# rename the columns

names(p1) <- c('total',
               "lt_10K", "lt_15K", "lt_20K", "lt_25K", "lt_30K", "lt_35K", "lt_40K", "lt_45K", 
               "lt_50K", "lt_60K", "lt_75K", "lt_100K", "lt_125K", "lt_150K", "lt_200K", "gt_200K",
               "geo_id", "block_group" )

I spent some time adjusting the format of the tables and plots in order for them to render well in the presentation.

Low-Moderate Income households
# Less than AMI % # Moderate Income % # Low Income % # Less than $30K %
170,414 46 46,858 13 78,990 21 42,708 12

Affordable Housing layer

The next slide looks at affordable housing information and the affordable housing marker layer of the map on top of the income heatmap. Showing how affordable housing aligns with the census blocks.

I used the boxplot from my EDA code, but turned it horizontally. I wanted the table of rents statistics and the plot side-by-side on the slide. So I did some research and found the gridExtra package to layout the slide the way I wanted. I couldn’t use kable objects grid.arrange, so I created the table with tableGrob.

pacman::p_load(gridExtra)

rents <- read_csv("./sourcedData/2021_Rental_Facility_Occupancy_Survey_Results.csv")
names(rents) <- str_to_lower(str_replace_all(names(rents), ' ','_'))

#Select the name, address, bedroom type, 2021 average rent

rents_2021 <- rents %>% dplyr::select(starts_with(c('comm', 'bed')), average_rent_2021)

rents_2021$bedroom_count <- str_sub(rents_2021$bedroom_types, 1,1) %>% str_replace('e', '0') %>% as.integer()

#Convert the average rent field to an integer and sort the data by bedroom count and average rent

rents_2021$average_rent_2021 <-  as.integer(gsub("[$,]", "", rents_2021$average_rent_2021))
rents_2021 <- arrange(rents_2021, bedroom_count, average_rent_2021)

r1 <- rents_2021 %>% group_by(bedroom_count) %>%                # calc max/min rents by bedroom count
                     summarise(min(average_rent_2021), 
                               max(average_rent_2021))

r1$avg <- rents_2021 %>%                                        # calc the average rent by bedroom count
        group_by(bedroom_count) %>% 
        summarise( avg = mean(average_rent_2021)) %>% 
        dplyr::select(avg) %>% 
        round(2)

# modify the table theme

rtt <- gridExtra::ttheme_minimal(core = list(fg_params = list(fontsize=14)),
                    colhead = list(fg_params = list(fontsize=14, fontface="bold")))

# create the table

rt <- gridExtra::tableGrob(r1, cols = c("# Bedrooms", "Min. Rent", "Max. Rent", "Avg. Rent"),
             theme=rtt, rows = NULL)  # transform into a tableGrob

# box plot of rents by bedroom count

rp <- ggplot(rents_2021,aes(y=bedroom_count, x=average_rent_2021, fill = factor(bedroom_count))) +
  geom_boxplot(show.legend = F) +
  scale_x_continuous(n.breaks = 6) +
  labs(x = "Average Rent 2021", y = "# Bedrooms") +
    theme_minimal() +
    theme(legend.text=element_text(size=14), legend.title = element_blank(), legend.position = c(.85, .85),
          axis.text=element_text(size=10), axis.title=element_text(size=14))

gridExtra::grid.arrange(rt, rp, nrow=1)  # arrange the table and plot for display

I added a small table showing the percentage of rents in the survey that are below $1,750 and 1,250.

r2 <- rents_2021 %>% filter(average_rent_2021 <= 1750) %>%       # filter moderately affordable rents
  summarise(le1750 = n(), pct1750 = n() / nrow(rents_2021))

r2$pct1750 <- round(r2$pct1750 * 100)                            # calc percentage moderately affordable

r3 <- rents_2021 %>% filter(average_rent_2021 <= 1250) %>%                 # filter for affordable rents
              summarise(le1250 = n(), pct1250 = n() / nrow(rents_2021))

r3$pct1250 <-  round(r3$pct1250 * 100)                                     # cals percentage affordable

cbind(r2, r3) %>%   kable(caption =                                        #format the table
                            "<caption font-size:28px>Affordable Rents</caption>",
             col.names = c("# <= $1750", "%",
                           "# <= $1250", "%"),
        format.args = list(big.mark = ",")) %>% 
  kable_styling(full_width = F, position = "left", font_size = 18) 
Affordable Rents
# <= $1750 % # <= $1250 %
1,150 74 550 35

Transportation layers

The next slide is the public transportation marker where I added a frequency polygon plot to illustrate the distances from affordable dwelling locations to the transportation options. With this slide I demonstrate the Ride-on bus, Metro station, Metro bus and bike sharing marker layers on the map.

mpdu <- read_csv("./markers/dwellings.csv")

# format the statistics for distances to trans options 

s2 <- tibble(trans  = c("Public Transit (Geom)", "Public Transit (Walking)", "Bike Sharing", "EV Charging"))
s2$avg <- c(paste(round(mean(mpdu$distance), 2), "mi"),
            paste(round(mean(mpdu$distance_w), 2), "mi"),
            paste(round(mean(mpdu$distance_b), 2), "mi"),
            paste(round(mean(mpdu$distance_ev), 2), "mi"))
s2$max <- c(paste(round(max(mpdu$distance), 2), "mi"),
            paste(round(max(mpdu$distance_w), 2), "mi"),
            paste(round(max(mpdu$distance_b), 2), "mi"),
            paste(round(max(mpdu$distance_ev), 2), "mi"))
s2$min <- c(paste(round(min(mpdu$distance), 2), "mi"),
            paste(round(min(mpdu$distance_w), 2), "mi"),
            paste(round(min(mpdu$distance_b), 2), "mi"),
            paste(round(min(mpdu$distance_ev), 2), "mi"))

# s2 %>% kable(caption = "Distance from Transportation Options",
#              col.names = c("Transportation",
#                            "Avg",
#                            "Max",
#                            "Min")) %>%
#              kable_styling(full_width = F)

hj <- matrix(c(0, 1, 1, 1), ncol=ncol(s2), nrow=nrow(s2), byrow=TRUE)      # set up format for the table header
x <- matrix(c(0.05, 1, 1, 1), ncol=ncol(s2), nrow=nrow(s2), byrow=TRUE)    # set up format for the table body 

tt1 <- ttheme_minimal(core = list(fg_params = list(hjust = as.vector(hj),            # modify the table theme
                      x = as.vector(x), fontsize=14)),                               # using the setup matrices
                      colhead = list(fg_params = list(fontsize=14, fontface="bold")))

t1 <- tableGrob(s2, cols = c("Transportation", "Avg", "Max","Min"),       # create the table
               theme=tt1, rows = NULL)  # transform into a tableGrob

# get the distance columns from the dwellings data

distances <- mpdu %>% dplyr::select(Geometric = distance, BikeShare = distance_b,    
                             EV_Charging = distance_ev, Walking = distance_w) %>%
                             pivot_longer(everything(), names_to = "category", values_to = "distance")

# create a frequency polygon line plot of the distance data

p_dist <- ggplot(distances, aes(distance, color = category)) +
  geom_freqpoly(size = 1.5) +
    scale_x_log10(breaks = c(0.1, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0)) +
    labs(x = "Distance (miles)", y = NULL) +
    theme_minimal() +
    theme(legend.text=element_text(size=14), legend.title = element_blank(), legend.position = c(.85, .85),
          axis.text=element_text(size=10), axis.text.x=element_text(angle = 30), axis.title=element_text(size=14))

# arrange the table and plot in a single row

grid.arrange(t1, p_dist, nrow=1)

# subset the distance data by 1/4 mile, 1/2 mile and 1 mile distances.

distances <- mpdu %>% mutate(ptrans.25 = (distance > 0.25),
                             ptrans.50 = (distance > 0.5), ptrans1 = (distance > 1),
                             walk.25 = (distance_w > 0.25),
                             walk.50 = (distance_w > 0.5), walk1 = (distance > 1),
                             bike.25 = (distance_b > 0.25),
                             bike.50 = (distance_b > 0.5), bike1 = (distance_b > 1),
                             evchrg.25 = (distance_ev > 0.25),
                             evchrg.50 = (distance_ev > 0.5), evchrg1 = (distance_ev > 1))

cnt <- count(distances)   # get the number of distance records

ptrans = c(count(distances, wt = ptrans.25),                       # count the ptrans records by distance category
           count(distances, wt = ptrans.50),
           count(distances, wt = ptrans1)) %>% unlist() %>% unname()

ptrans.pct <- ptrans                                               # compute the ptrans percentage by category
for (i in 1:3) {ptrans.pct[i] = round(ptrans[i] / cnt * 100)}

walk = c(count(distances, wt = walk.25),                           # count the walking records by distance category
           count(distances, wt = walk.50),
           count(distances, wt = walk1)) %>% unlist() %>% unname()

walk.pct <- walk                                                   # compute the walking percentage by category
for (i in 1:3) {walk.pct[i] = round(walk[i] / cnt * 100)}

bike = c(count(distances, wt = bike.25),                           # count the bike share records by distance category
         count(distances, wt = bike.50),
         count(distances, wt = bike1))  %>% unlist() %>% unname()

bike.pct <- bike                                                   # compute the bike share percentage by category
for (i in 1:3) {bike.pct[i] = round(bike[i] / cnt * 100)}

evchrg = c(count(distances, wt = evchrg.25),                       # count the ev charging records by distance category
           count(distances, wt = evchrg.50),
           count(distances, wt = evchrg1)) %>% unlist() %>% unname()

evchrg.pct <- evchrg                                               # compute the ev charging pct by category
for (i in 1:3) {evchrg.pct[i] = round(evchrg[i] / cnt * 100)}

all = c(count(distances, wt = (ptrans.25 & bike.25 & evchrg.25)),            # combined counts
        count(distances, wt = (ptrans.50 & bike.50 & evchrg.50)),
        count(distances, wt = (ptrans1 & bike1 & evchrg1))) %>% unlist() %>% unname()

all.pct <- all                                                    # combined percentage     
for (i in 1:3) {all.pct[i] = round(all[i] / cnt * 100)}

s1 <- tibble(distance = c("1/4 mi", "1/2 mi", "1 mile"),
                          ptrans,
                          ptrans.pct,
                          walk,
                          walk.pct,
                          bike,
                          bike.pct,
                          evchrg,
                          evchrg.pct,
                          all,
                          all.pct)

# format the table 

s1 %>% kable(caption =
               "<caption font-size:24px>Distance from Transportation Options (Counts and Percentage)</caption>",
             col.names = c("Further Than",
                           "Public Trans", "%",
                           "Walking", "%",
                           "Bike Sharing", "%",
                           "EV Charging", "%",
                           "All",  "%")) %>%
             kable_styling(full_width = F, font_size = 18) %>%
             column_spec(c(1:9), width = "6em")
Distance from Transportation Options (Counts and Percentage)
Further Than Public Trans % Walking % Bike Sharing % EV Charging % All %
1/4 mi 290 41 408 58 390 55 485 69 205 29
1/2 mi 64 9 170 24 188 27 376 53 29 4
1 mile 15 2 15 2 137 19 75 11 13 2

Electric Vehicle charging layer

The final layer to demonstrate is the EV charging markers layer. I create a small table to show the counts of EV charging locations and stations using kable.

ev_charging <- read_csv("./markers/ev_charging.csv") %>% tibble()

ev_charging %>% group_by(access_code) %>%                 # summarise private and public ev charging counts
  summarise(n = n(), total = sum(evse_tot)) %>%           # display in a table
  kable(caption = 
          "Electric Vehicle Charging Locations in Montgomery County",
        col.names = c("Access", "# Locations", "# Stations"), font_size = 18) %>%
        kable_styling(full_width = F, position = "left") %>%
        column_spec(c(1:3), width = "8em")
Electric Vehicle Charging Locations in Montgomery County
Access # Locations # Stations
private 24 104
public 216 509

And I add my observations about access to public transportation to this slide.

Counts slides

The next two slides contain counts of the data for dwellings and transportation options within zipcodes and planning areas for Montgomery County. I use them to discuss the relative frequency of public transportation to affordable housing. First I convert the longitude and latitude data for the marker layers to spatial points and project them to the CRS for the zipcode polygons.

pacman::p_load(rgdal, GISTools)

# zipcode polygons for Montgomery County
zipcode <- raster::shapefile("./shapefiles/zipcode.shp", GDAL1_integer64_policy = TRUE)

# Montgomery County planning area shape files
mc_plan <- raster::shapefile("./shapefiles/planning_areas.shp", GDAL1_integer64_policy = TRUE)

pubtrans <- read_csv("./working/public_transport.csv")    # public transportation locations

bikeshare <- read_csv("./markers/bike.csv")               # Bike share locations

coordinates(mpdu) <- c("lon", "lat")                      # grab the dwellings lon/lat data 
mpdu <- as(mpdu, "SpatialPoints")                         # convert to spatial points
proj4string(mpdu) <- CRS("+proj=longlat +datum=WGS84")    # set a CRS for the points 
proj4string(mpdu) <- proj4string(zipcode)                 # project the CRS to zipcode polygons

coordinates(bikeshare) <- c("lon", "lat")                 # repeat for bikeshare
bikeshare <- as(bikeshare,"SpatialPoints")
proj4string(bikeshare) <- CRS("+proj=longlat +datum=WGS84")
proj4string(bikeshare) <- proj4string(zipcode)

coordinates(ev_charging) <- c("lon", "lat")               # repeat for ev charging
ev_charging <- as(ev_charging,"SpatialPoints")
proj4string(ev_charging) <- CRS("+proj=longlat +datum=WGS84")
proj4string(ev_charging) <- proj4string(zipcode)

coordinates(pubtrans) <- c("lon", "lat")                  # repeat for public transportation
pubtrans <- as(pubtrans,"SpatialPoints")
proj4string(pubtrans) <- CRS("+proj=longlat +datum=WGS84")
proj4string(pubtrans) <- proj4string(zipcode)

Counts by Zipcode

Then count the number of points in the zipcode polygons and create the table to display them.

mpdu_cnt <- GISTools::poly.counts(mpdu, zipcode)        # count dwellings in zipcode
bike_cnt <- GISTools::poly.counts(bikeshare, zipcode)   # count bike share in zipcode
evc_cnt <- GISTools::poly.counts(ev_charging, zipcode)  # count ev charging in zipcode
pubtrans_cnt <- GISTools::poly.counts(pubtrans, zipcode) # count public trans in zipcode

zip_counts <- tibble(zip = zipcode@data$zipcode,      # put zipcode and city name in a tibble  
                     city = zipcode@data$mail_city)   
zip_counts$dwelling <- mpdu_cnt                       # add the counts
zip_counts$ptrans <- pubtrans_cnt
zip_counts$bike <- bike_cnt
zip_counts$evc <- evc_cnt


zip_counts <-  zip_counts %>% dplyr::filter(dwelling + bike + evc + ptrans > 0)   # filter out zipcode with 0 counts
zip_counts %>%  dplyr::arrange(desc(dwelling)) %>%                                # sort desc by dwelling count
               head(6) %>%                                                        # select the first few rows  
               kable(caption = "Top 6 counts of Affordable Dwellings by Zipcode", # format the table for display
                 col.names = c("Zipcode", "City", "Dwellings",
                                   "Public Trans", "Bike Sharing",
                                   "EV Charging")) %>%
               kable_styling(full_width = F)
Top 6 counts of Affordable Dwellings by Zipcode
Zipcode City Dwellings Public Trans Bike Sharing EV Charging
20912 TAKOMA PARK 251 34 7 7
20910 SILVER SPRING 96 65 14 17
20814 BETHESDA 42 26 8 27
20877 GAITHERSBURG 39 9 0 3
20901 SILVER SPRING 34 53 4 0
20852 ROCKVILLE 27 40 10 35
zip_counts %>%  dplyr::arrange(desc(ptrans)) %>%                                  # re-sort the table the pub. trans counts
               head(6) %>%
               kable(caption = "Top 6 counts of Public Transit by Zipcode",
                 col.names = c("Zipcode", "City", "Dwellings",
                                   "Public Trans", "Bike Sharing",
                                   "EV Charging")) %>%
               kable_styling(full_width = F)
Top 6 counts of Public Transit by Zipcode
Zipcode City Dwellings Public Trans Bike Sharing EV Charging
20904 SILVER SPRING 15 141 7 15
20815 CHEVY CHASE 22 74 6 10
20906 SILVER SPRING 16 71 0 2
20902 SILVER SPRING 23 66 8 8
20910 SILVER SPRING 96 65 14 17
20850 ROCKVILLE 25 57 19 43

Counts by Planning Area

The planning area polygon shapes use the same CRS as the zipcodes, so it isn’t necessary to project the points data again. I count the points by planning area and format a table to put on the slide.

mpdu_cnt <- GISTools::poly.counts(mpdu, mc_plan)             # count dwelling points in planning area
bike_cnt <- GISTools::poly.counts(bikeshare, mc_plan)        # count bike share points in planning area
evc_cnt <- GISTools::poly.counts(ev_charging, mc_plan)       # count ev charging in planning area
pubtrans_cnt <- GISTools::poly.counts(pubtrans, mc_plan)     # count public trans in planning area

plan_counts <- tibble(plan = mc_plan@data$PLANNING,          # create a tibble with planning id and name
                      name = mc_plan@data$NAME)
plan_counts$dwelling <- mpdu_cnt                             # add counts to the tibble
plan_counts$ptrans <- pubtrans_cnt
plan_counts$bike <- bike_cnt
plan_counts$evc <- evc_cnt


plan_counts %>% dplyr::arrange(desc(dwelling)) %>%      # sort descending by dwelling count       
                head(10) %>%                            # select a few rows and format for display
                kable(caption =                      
                        "Top 10 counts of Affordable Dwellings by Planning Area",
                  col.names = c("Plan", "Name", "Dwellings",
                                    "Public Trans", "Bike Sharing",
                                    "EV Charging")) %>%
                kable_styling(full_width = F)
Top 10 counts of Affordable Dwellings by Planning Area
Plan Name Dwellings Public Trans Bike Sharing EV Charging
37 Takoma Park 271 44 7 7
36 Silver Spring 100 63 14 17
35 Bethesda/Chevy Chase 67 146 18 47
21 Gaithersburg City 40 16 0 10
31 Kensington/Wheaton 36 124 8 9
32 Kemp Mill/4 Corners 32 37 1 0
30 North Bethesda 29 49 7 30
20 Gaithersburg Vicinity 27 22 12 29
19 Germantown 20 68 0 15
26 Rockville 20 74 14 37

Concluding the map demonstration the next slide outlines my conclusions and recommendations for the Transportation Equity map and project. I include my data sources and references on two additional slides. Then I outline what my next steps would be if I were to continue with the project on the final slide.

Report Conclusion

This project was challenging in many ways. Taking a deep dive into working with spatial data I needed to do a lot of research and experimentation to get the results I wanted. Since I knew what my goal was early on I was able to focus my efforts towards it.

Creating this report itself was a challenge, particularly because of the overloading of functions, i.e. select(), between dplyr and raster. I wanted to minimize the amount of duplicate code between the report and the separate .Rmd files I had created. Getting the child documents to knit from the parent report document meant tweaking the child .Rmd files.

I did a bit statistical analysis for the project, but some of the statistics that I state in the presentation came from my research.

I’m very pleased and proud of the result. I enjoyed this challenge.

Datasets

  1. Affordable dwellings, Data Montgomery - 2021_Rental_Facility_Occupancy_Survey_Results

  2. Moderately Priced Dwelling Units (MPDU) Rentals List, apps.montgomerycountymd.gov/dhca-mpdu/Rental/List

  3. Bike Sharing stations, Data Montgomery - Bikeshare dataset

  4. Electric Vehicle Charging stations, Alternative Fuels Data Center - Alternative Fuel Stations

  5. Ride On Bus Stops, Data Montgomery - Montgomery_County_Ride_On_Bus_Stops

  6. Metro Rail Stations, Data Montgomery Metro-Rail-Stations

  7. Metro Bus Stops, Maryland_Transit_-_WMATA_Metro_Bus_Stops

  8. Low-Moderate Income American Community Surveys Income data