NYC Housing Data Analysis

Introduction

The NYC Department of Planning provides various data services that are crucial for understanding housing trends in the city. These services include essential datasets that help real estate professionals, planners, and analysts forecast market trends and make informed decisions.

In this report, we will explore the housing data to analyze trends in housing completions across different boroughs in New York City.

Loading the Data

We begin by loading the NYC Housing Database Post 2010 containing the information relevant for this analysis. This information includes number of added units, borough, and the status of each project.

# Load necessary packages
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)

# Reading the housing data
housing_data <- read_csv("HousingDB_post2010.csv", show_col_types = FALSE)

Let’s visualize the components of the housing database after 2010. This table was trimmed to save space. The entire database can be found in this link: NYC Planning Housing Database

# Showing the first few rows of the dataset for visualization and explanining with a sentence.

head(housing_data)

## # A tibble: 6 × 63
##   Job_Number Job_Type   ResidFlag   NonresFlag  Job_Status CompltYear PermitYear
##   <chr>      <chr>      <chr>       <chr>       <chr>           <dbl>      <dbl>
## 1 102133451  Alteration Residential <NA>        3. Permit…         NA       2003
## 2 102138820  Alteration Residential <NA>        5. Comple…       2012       2008
## 3 102151083  Alteration Residential <NA>        5. Comple…       2011       2002
## 4 102284172  Alteration Residential Non-Reside… 5. Comple…       2016       2001
## 5 102284804  Alteration Residential <NA>        5. Comple…       2022       2001
## 6 102292010  Alteration Residential Non-Reside… 5. Comple…       2022       2001
## # ℹ 56 more variables: ClassAInit <dbl>, ClassAProp <dbl>, ClassANet <dbl>,
## #   HotelInit <dbl>, HotelProp <dbl>, OtherBInit <dbl>, OtherBProp <dbl>,
## #   Units_CO <dbl>, Boro <dbl>, BIN <dbl>, BBL <dbl>, AddressNum <chr>,
## #   AddressSt <chr>, Occ_Init <chr>, Occ_Prop <chr>, Bldg_Class <chr>,
## #   Job_Desc <chr>, DateFiled <date>, DatePermit <date>, DateLstUpd <date>,
## #   DateComplt <date>, ZoningDst1 <chr>, ZoningDst2 <chr>, ZoningDst3 <chr>,
## #   SpeclDst1 <chr>, SpeclDst2 <chr>, Landmark <chr>, FloorsInit <dbl>, …

print(paste0("The dataset contains ", prettyNum(nrow(housing_data), big.mark = ","), " rows and ", prettyNum(ncol(housing_data), big.mark = ","), " columns."))

## [1] "The dataset contains 75,444 rows and 63 columns."

Data Cleaning

Let’s clean the data removing missing values and analyse some key statistics. We will also add proper labels to represent boroughs appropriately This data includes new developments and rehabilitation projects per Boro and city totals.

# Filtering out rows with NA values in Boro and ClassANet
housing_data <- housing_data %>%
  filter(!is.na(Boro), !is.na(ClassANet), !is.na(CompltYear))

# Converting the Boro column to a factor with proper labels
housing_data$Boro <- factor(housing_data$Boro, levels = c(1, 2, 3, 4, 5), 
                            labels = c("Manhattan", "Bronx", "Brooklyn", "Queens", "Staten Island"))

print(paste0("After cleaning, the dataset contains ", prettyNum(nrow(housing_data), big.mark = ","), " rows."))

## [1] "After cleaning, the dataset contains 59,970 rows."

summary(housing_data[, c("Boro", "CompltYear", "ClassANet")])

##             Boro         CompltYear     ClassANet       
##  Manhattan    : 6133   Min.   :2010   Min.   :-303.000  
##  Bronx        : 4811   1st Qu.:2013   1st Qu.:   0.000  
##  Brooklyn     :19054   Median :2016   Median :   1.000  
##  Queens       :20568   Mean   :2016   Mean   :   5.133  
##  Staten Island: 9404   3rd Qu.:2020   3rd Qu.:   2.000  
##                        Max.   :2024   Max.   :1175.000

Now that the data is clean, we will proceed to filter for housing projects that have been marked as “Completed Construction”. We hereby summarize the total number of units by borough and by year.

# Filtering the data for completed housing projects and select relevant columns
completed_housing <- housing_data %>%
  filter(Job_Status == "5. Completed Construction") %>%
  select(Boro, CompltYear, ClassANet)

# Summarizing the data by Boro and Completed Year (housing completions per borough per year)
completed_housing_summary_boro <- completed_housing %>%
  group_by(Boro, CompltYear) %>%
  summarise(Total_Completions = sum(ClassANet, na.rm = TRUE), .groups = 'drop')

# Summarizing data for the total completions in NYC per year
completed_housing_summary_nyc <- completed_housing %>%
  group_by(CompltYear) %>%
  summarise(Total_Completions_NYC = sum(ClassANet, na.rm = TRUE), .groups = 'drop')

# Viewing the summarized data (both per borough and NYC total)
print(paste0("The breakdown by borough is as follows:"))

## [1] "The breakdown by borough is as follows:"

print(completed_housing_summary_boro)

## # A tibble: 75 × 3
##    Boro      CompltYear Total_Completions
##    <fct>          <dbl>             <dbl>
##  1 Manhattan       2010              6946
##  2 Manhattan       2011              5779
##  3 Manhattan       2012              1505
##  4 Manhattan       2013              3423
##  5 Manhattan       2014              2774
##  6 Manhattan       2015              4248
##  7 Manhattan       2016              7569
##  8 Manhattan       2017              6071
##  9 Manhattan       2018              7446
## 10 Manhattan       2019              5423
## # ℹ 65 more rows

print(paste0("The total number of housing units added for NYC as a whole from ", min(completed_housing_summary_nyc$CompltYear), " to ", max(completed_housing_summary_nyc$CompltYear), " is summarized below."))

## [1] "The total number of housing units added for NYC as a whole from 2010 to 2024 is summarized below."

print(completed_housing_summary_nyc)

## # A tibble: 15 × 2
##    CompltYear Total_Completions_NYC
##         <dbl>                 <dbl>
##  1       2010                 23492
##  2       2011                 17910
##  3       2012                 10567
##  4       2013                 14559
##  5       2014                 12455
##  6       2015                 15590
##  7       2016                 23738
##  8       2017                 25844
##  9       2018                 29219
## 10       2019                 26730
## 11       2020                 21450
## 12       2021                 29077
## 13       2022                 26566
## 14       2023                 29945
## 15       2024                   121

These results give us a good idea of the net amount of units that have been added to the market. However, it does not reflect if the units were in a usable state and renewed or if the existing units were abandoned.

Let’s focus on new developments: those that had no units when the project was proposed and were added to the market after the completion of the project.

# Filtering for new developments (projects with 0 initial housing units that added new units)
new_developments <- housing_data %>%
  filter(ClassAInit == 0, ClassANet > 0)  # ClassAInit is 0, and new units were added

# Removing rows where CompltYear is NA
new_developments <- new_developments %>%
  filter(!is.na(CompltYear))  # Remove rows where CompltYear is NA

# Summarizing by year: count the new developments and sum the total units added
new_developments_summary <- new_developments %>%
  group_by(CompltYear) %>%
  summarise(
    New_Developments_Count = n(),              # Count of new developments
    Total_New_Units = sum(ClassANet, na.rm = TRUE), .groups = 'drop'  # Sum of new housing units added
  )

# Viewing the summarized data
print(paste0("The total number of new developments added across NYC from ", min(new_developments_summary$CompltYear), " to ", max(new_developments_summary$CompltYear), " is summarized below."))

## [1] "The total number of new developments added across NYC from 2010 to 2024 is summarized below."

print(new_developments_summary)

## # A tibble: 15 × 3
##    CompltYear New_Developments_Count Total_New_Units
##         <dbl>                  <int>           <dbl>
##  1       2010                   2295           24085
##  2       2011                   1778           18413
##  3       2012                   1431           11421
##  4       2013                   1382           15887
##  5       2014                   1338           14241
##  6       2015                   1489           17801
##  7       2016                   1787           25178
##  8       2017                   1730           27842
##  9       2018                   1732           30425
## 10       2019                   1743           27326
## 11       2020                   1370           21568
## 12       2021                   1427           29713
## 13       2022                   1275           26693
## 14       2023                   1371           29336
## 15       2024                     11             118

Finally, here is a visual of the number of new developments added each year via a bar chart.

ggplot(new_developments_summary, aes(x = CompltYear, y = New_Developments_Count)) +
  geom_bar(stat = "identity") +
  labs(title = "New Developments per Year", x = "Year", y = "Developments") +
  theme_minimal()

Conclusion

In this report, we analyzed housing completions and new developments across New York City from the dataset provided by the NYC Department of Planning. This analysis provides useful insights into the trends in housing development, both citywide and in individual boroughs.

NYC Housing Data Analysis

Miguel Lee

2024-09-04

Introduction

Loading the Data

Data Cleaning

Conclusion