Introduction

As a city with vast job opportunities for different industries that can satisfy one’s needs and pursue one’s aspirations and one of the most popular destinations for tourism globally, New York City attracts thousands and millions of people to reside and visit each year. However, New York City is also knowns as the most expensive city in the United States. According to a biannual report by the Economist Intelligence Unit in 2021, The Big Apple ranks 6th as the most expensive city globally. What potential factors might impact the property price in New York City? With a staggering 56% of New York City residents relying on city public transit (New York Public Transit Association), public transportation is the first factor that potentially impacts the property price in New York City.

There is no secret that public transportation systems perform a primary function and provide convenience for modern urban cities’ business, economic, and social development. Many studies discuss that public transport plays a significant role in the value of properties. Having a public transit station nearby should result the residential properties having more value than those further afield. In other words, the closer a residential property to a rail station, the more expensive on it will be, when other factors known to influence residential property price are similar, such as property condition, crime rate, and tax rates.

This project aims to investigate the relationship between the New York City property’s sales price in Brooklyn and Queens and the access to the subway in these two focused boroughs. The project is conducted by analyzing geospatial data and making various maps to reflect property prices in Brooklyn and Queens and the distribution of subway lines and subway stations in New York City. Two datasets from the New York City Department of Finance have been used for this project (https://www.nyc.gov/site/finance/taxes/property-annualized-sales-update.page). These two datasets contain 2021 New York City property sales data for Brooklyn and Bronx.

Before Importing

Before importing the datasets into R, I first deleted some columns in excel that are not related to the interest of my project, including tax class at present, block, lot, easement, building class at present, apartment number, residential units, total units, land square feet, year built, tax class at the time of sale, building class at the time of sale, and sale date. I also created a new column in both datasets called sales price per gross square feet by using sales price divided by gross square feet. After finding out that the new column sales price per gross square feet contains many missing values, I exclude any rows with this column having values missing.

Importing and Cleaning Datasets

I imported the updated two datasets into R Studio and excluded any rows with sales price equals to 0, and gross square feet equals to 0 from the datasets.

library(readxl)
brooklyn_2021 <- read_excel("~/Desktop/MEA Independent Study/Data/2021_brooklyn_clean.xlsx", 
                            skip = 6)
View(brooklyn_2021)

colnames(brooklyn_2021)
colnames(brooklyn_2021)[6] <- "RESIDENTIAL UNITS"
colnames(brooklyn_2021)[7] <- "GROSS SQUARE FEET"
View(brooklyn_2021)

library(readxl)
queens_2021 <- read_excel("~/Desktop/MEA Independent Study/Data/2021_queens_clean.xlsx", 
                          skip = 6)
View(queens_2021)

colnames(queens_2021)
colnames(queens_2021)[6] <- "RESIDENTIAL UNITS"
colnames(queens_2021)[7] <- "GROSS SQUARE FEET"
View(queens_2021)

library(dplyr)
brooklyn_2021 %>%
  filter(`SALE PRICE`!= 0) %>%
  filter(`GROSS SQUARE FEET`!= 0) -> brooklyn_2021
View(brooklyn_2021)

queens_2021 %>%
  filter(`SALE PRICE`!= 0) %>%
  filter(`GROSS SQUARE FEET`!= 0) -> queens_2021
View(queens_2021)

Mean Sales Price based on Zip Code

I created two tables displaying the mean sales price for properties based on zip codes in Brooklyn and Queens. I also found the highest and lowest sale prices in these two boroughs.

Brooklyn

brooklyn_2021 %>%
  group_by(`ZIP CODE`) %>%
  summarise(mean(`SALE PRICE PER GROSS SQUARE FEET`)) -> avg_brooklyn_2021
View(avg_brooklyn_2021)

colnames(avg_brooklyn_2021)
colnames(avg_brooklyn_2021)[2] <- "AVG PRICE PER GROSS SQUARE FEET"
View(avg_brooklyn_2021)
knitr::kable(avg_brooklyn_2021)
ZIP CODE AVG PRICE PER GROSS SQUARE FEET
11201 862.7936
11203 364.7076
11204 581.5831
11205 752.3495
11206 526.1113
11207 356.1703
11208 344.5969
11209 633.1645
11210 497.0832
11211 693.1178
11212 350.4087
11213 404.6472
11214 536.4122
11215 1066.6118
11216 895.3750
11217 1010.4147
11218 681.1163
11219 519.1548
11220 466.7412
11221 499.2058
11222 1229.4274
11223 679.9615
11224 327.7796
11225 629.5159
11226 553.5148
11228 592.0438
11229 530.0481
11230 692.1706
11231 1069.4003
11232 666.3988
11233 440.8665
11234 453.6013
11235 503.9871
11236 365.2623
11237 598.3377
11238 908.1895
11239 315.4400
11249 2550.8628
max(avg_brooklyn_2021$`AVG PRICE PER GROSS SQUARE FEET`)
## [1] 2550.863
min(avg_brooklyn_2021$`AVG PRICE PER GROSS SQUARE FEET`)
## [1] 315.44

I also found the highest sale price per gross square feet in Brooklyn is 2550.863, and the lowest sale price per gross square feet is 315.44.

Queens

queens_2021 %>%
  group_by(`ZIP CODE`) %>%
  summarise(mean(`SALE PRICE PER GROSS SQUARE FEET`)) -> avg_queens_2021
View(avg_queens_2021)

colnames(avg_queens_2021)
colnames(avg_queens_2021)[2] <- "AVG PRICE PER GROSS SQUARE FEET"
View(avg_queens_2021)
knitr::kable(avg_queens_2021)
ZIP CODE AVG PRICE PER GROSS SQUARE FEET
11001 503.3111
11004 565.9519
11040 594.8567
11101 985.5806
11102 553.3970
11103 612.1626
11104 659.4729
11105 614.4458
11106 558.1881
11354 600.2637
11355 577.7160
11356 483.7733
11357 611.4370
11358 586.7649
11360 571.7917
11361 597.9003
11362 584.2241
11363 565.6871
11364 650.1251
11365 638.6149
11366 582.4770
11367 549.3428
11368 497.7642
11369 463.4678
11370 525.2667
11372 537.1479
11373 497.5879
11374 563.0498
11375 660.0694
11377 507.9552
11378 551.6695
11379 550.1387
11385 5398.3107
11411 398.2550
11412 369.9270
11413 406.3184
11414 416.5484
11415 436.2913
11416 406.0620
11417 457.1963
11418 388.2766
11419 456.9580
11420 482.9122
11421 426.5173
11422 407.2822
11423 435.4351
11426 519.6030
11427 506.2591
11428 489.2448
11429 405.9665
11430 346.7061
11432 567.7322
11433 405.4162
11434 415.5773
11435 562.8604
11436 471.5388
11691 358.2270
11692 301.4881
11693 364.6227
11694 487.0282
max(avg_queens_2021$`AVG PRICE PER GROSS SQUARE FEET`)
## [1] 5398.311
min(avg_queens_2021$`AVG PRICE PER GROSS SQUARE FEET`)
## [1] 301.4881

I also found the highest sale price per gross square feet in Queens is 5398.311, and the lowest sale price per gross square feet is 301.4881.

Maps

library(tigris)
library(tidyverse)
library(sf)
library(leaflet)
library(mapview)

Subway Lines and Subway Stations in New Yrok City

read_sf("Borough Boundaries.geojson") -> boroughs
read_sf("Subway Stations.geojson") -> stations
read_sf("Subway Lines.geojson") -> lines

mapview(boroughs) + mapview(lines, color="red") + mapview(stations, color = "green")

This map provides an overview of New York City subway lines and subway stations.

New York City Boroughs

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = boroughs$geometry,
              
              color = 'purple',
              stroke = 'red',
              popup = paste0(
                boroughs$boro_name
                
              )
  )

This is a map displaying the five boroughs of New York City. This project focuses on Brooklyn and Queens.

Distribution of Subway Stations in New York City

leaflet() %>% 
  addTiles() %>% 
  addCircleMarkers(data = stations$geometry,
                   radius = 2, 
                   color = 'purple',
                   stroke = 'red',
                   popup = paste0("Name: ",
                                  stations$name,
                                  "<br/>",
                                  "Line: ",stations$line
                   )
  )

Distribution of Subway Lines in New York City

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = lines$geometry,
              popup = paste0(
                lines$name
                
              )
  )

Based on the distribution of subway stations and subway lines in New York City, I found out that most of the Brooklyn subways go through this borough’s west side. People on the west side of Brooklyn have easier access to the subway than residents in other areas. Subway in Queens covers the northwest part of Queens. Therefore, people in the northwest part of Queens have better access to the subway.

Average Property Sales Price Based on Zip Code for Brooklyn and Queens

read_sf("~/Desktop/MEA Independent Study/nyc_zip.geojson") -> nyc
colnames(nyc)

colnames(nyc)[2] <- "ZIP CODE"
as.character(avg_brooklyn_2021$`ZIP CODE`) -> avg_brooklyn_2021$`ZIP CODE`
as.character(avg_queens_2021$`ZIP CODE`) -> avg_queens_2021$`ZIP CODE`
avg_brooklyn_2021 %>% 
  full_join(avg_queens_2021) -> both
## Joining, by = c("ZIP CODE", "AVG PRICE PER GROSS SQUARE FEET")
nyc %>% 
  inner_join(both, by="ZIP CODE") -> merged


labels <- sprintf(
  "<strong>%s</strong><br/>  Average Price / SqFt: $ %g",
  merged$PO_NAME, merged$`AVG PRICE PER GROSS SQUARE FEET`
) %>% lapply(htmltools::HTML)

qpal <- colorQuantile(rev(viridis::viridis(10)), merged$`AVG PRICE PER GROSS SQUARE FEET`, n = 10)

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data=merged,
    label = labels,
    smoothFactor = 0.3, 
    fillOpacity = 0.7,
    weight=1,
    color = "white",
    fillColor = ~qpal(merged$`AVG PRICE PER GROSS SQUARE FEET`)
    
  )

The distribution of subway stations and subway lines in Brooklyn and Queens also reflects the average property prices in these two boroughs. Based on this map, the average property prices in the west side of Brooklyn, and the northwest part of Queens are more expensive than the rest of these Brooklyn and Queens.

Conclusion

Based on my study, access to the subway will impact property prices in Brooklyn and Queens in New York City. A public transit station nearby should make the residential properties more valuable than those further afield.

Living near public transit systems can improve people’s quality of life and provide convenience for modern urban cities’ business, economic, and social development. Therefore, access to public transportation is highly valued for people living in the New York City Metropolitan Area and many urban areas in the country. Studies throughout the U.S. also indicate that people are willing to pay more for their housing with a location near a public transit station, helping them to reduce time spent commuting. Distance to a transit station could become a significant predictor for property values in San Diego, according to a study examining condominiums in the city (https://www.jstor.org/stable/43081599#metadata_info_tab_contents). This phenomenon is also common to see worldwide. According to a study focusing on the impacts of urban rail systems on property values in the city of Naples, light rail, metro, and other urban rail transit systems can directly influence the attractiveness of surrounding properties near these stations (https://www.sciencedirect.com/science/article/pii/S0966692310000153).