The NYC Department of Planning provides various data services that are crucial for understanding housing trends in the city. These services include essential datasets that help real estate professionals, planners, and analysts forecast market trends and make informed decisions.
In this report, we will explore the housing data to analyze trends in housing completions across different boroughs in New York City.
We begin by loading the NYC Housing Database Post 2010 containing the information relevant for this analysis. This information includes number of added units, borough, and the status of each project.
# Load necessary packages
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)
# Reading the housing data
housing_data <- read_csv("HousingDB_post2010.csv", show_col_types = FALSE)
Let’s visualize the components of the housing database after 2010. This table was trimmed to save space. The entire database can be found in this link: NYC Planning Housing Database
# Showing the first few rows of the dataset for visualization and explanining with a sentence.
head(housing_data)
## # A tibble: 6 × 63
## Job_Number Job_Type ResidFlag NonresFlag Job_Status CompltYear PermitYear
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 102133451 Alteration Residential <NA> 3. Permit… NA 2003
## 2 102138820 Alteration Residential <NA> 5. Comple… 2012 2008
## 3 102151083 Alteration Residential <NA> 5. Comple… 2011 2002
## 4 102284172 Alteration Residential Non-Reside… 5. Comple… 2016 2001
## 5 102284804 Alteration Residential <NA> 5. Comple… 2022 2001
## 6 102292010 Alteration Residential Non-Reside… 5. Comple… 2022 2001
## # ℹ 56 more variables: ClassAInit <dbl>, ClassAProp <dbl>, ClassANet <dbl>,
## # HotelInit <dbl>, HotelProp <dbl>, OtherBInit <dbl>, OtherBProp <dbl>,
## # Units_CO <dbl>, Boro <dbl>, BIN <dbl>, BBL <dbl>, AddressNum <chr>,
## # AddressSt <chr>, Occ_Init <chr>, Occ_Prop <chr>, Bldg_Class <chr>,
## # Job_Desc <chr>, DateFiled <date>, DatePermit <date>, DateLstUpd <date>,
## # DateComplt <date>, ZoningDst1 <chr>, ZoningDst2 <chr>, ZoningDst3 <chr>,
## # SpeclDst1 <chr>, SpeclDst2 <chr>, Landmark <chr>, FloorsInit <dbl>, …
print(paste0("The dataset contains ", prettyNum(nrow(housing_data), big.mark = ","), " rows and ", prettyNum(ncol(housing_data), big.mark = ","), " columns."))
## [1] "The dataset contains 75,444 rows and 63 columns."
Let’s clean the data removing missing values and analyse some key statistics. We will also add proper labels to represent boroughs appropriately This data includes new developments and rehabilitation projects per Boro and city totals.
# Filtering out rows with NA values in Boro and ClassANet
housing_data <- housing_data %>%
filter(!is.na(Boro), !is.na(ClassANet), !is.na(CompltYear))
# Converting the Boro column to a factor with proper labels
housing_data$Boro <- factor(housing_data$Boro, levels = c(1, 2, 3, 4, 5),
labels = c("Manhattan", "Bronx", "Brooklyn", "Queens", "Staten Island"))
print(paste0("After cleaning, the dataset contains ", prettyNum(nrow(housing_data), big.mark = ","), " rows."))
## [1] "After cleaning, the dataset contains 59,970 rows."
summary(housing_data[, c("Boro", "CompltYear", "ClassANet")])
## Boro CompltYear ClassANet
## Manhattan : 6133 Min. :2010 Min. :-303.000
## Bronx : 4811 1st Qu.:2013 1st Qu.: 0.000
## Brooklyn :19054 Median :2016 Median : 1.000
## Queens :20568 Mean :2016 Mean : 5.133
## Staten Island: 9404 3rd Qu.:2020 3rd Qu.: 2.000
## Max. :2024 Max. :1175.000
Now that the data is clean, we will proceed to filter for housing projects that have been marked as “Completed Construction”. We hereby summarize the total number of units by borough and by year.
# Filtering the data for completed housing projects and select relevant columns
completed_housing <- housing_data %>%
filter(Job_Status == "5. Completed Construction") %>%
select(Boro, CompltYear, ClassANet)
# Summarizing the data by Boro and Completed Year (housing completions per borough per year)
completed_housing_summary_boro <- completed_housing %>%
group_by(Boro, CompltYear) %>%
summarise(Total_Completions = sum(ClassANet, na.rm = TRUE), .groups = 'drop')
# Summarizing data for the total completions in NYC per year
completed_housing_summary_nyc <- completed_housing %>%
group_by(CompltYear) %>%
summarise(Total_Completions_NYC = sum(ClassANet, na.rm = TRUE), .groups = 'drop')
# Viewing the summarized data (both per borough and NYC total)
print(paste0("The breakdown by borough is as follows:"))
## [1] "The breakdown by borough is as follows:"
print(completed_housing_summary_boro)
## # A tibble: 75 × 3
## Boro CompltYear Total_Completions
## <fct> <dbl> <dbl>
## 1 Manhattan 2010 6946
## 2 Manhattan 2011 5779
## 3 Manhattan 2012 1505
## 4 Manhattan 2013 3423
## 5 Manhattan 2014 2774
## 6 Manhattan 2015 4248
## 7 Manhattan 2016 7569
## 8 Manhattan 2017 6071
## 9 Manhattan 2018 7446
## 10 Manhattan 2019 5423
## # ℹ 65 more rows
print(paste0("The total number of housing units added for NYC as a whole from ", min(completed_housing_summary_nyc$CompltYear), " to ", max(completed_housing_summary_nyc$CompltYear), " is summarized below."))
## [1] "The total number of housing units added for NYC as a whole from 2010 to 2024 is summarized below."
print(completed_housing_summary_nyc)
## # A tibble: 15 × 2
## CompltYear Total_Completions_NYC
## <dbl> <dbl>
## 1 2010 23492
## 2 2011 17910
## 3 2012 10567
## 4 2013 14559
## 5 2014 12455
## 6 2015 15590
## 7 2016 23738
## 8 2017 25844
## 9 2018 29219
## 10 2019 26730
## 11 2020 21450
## 12 2021 29077
## 13 2022 26566
## 14 2023 29945
## 15 2024 121
These results give us a good idea of the net amount of units that have been added to the market. However, it does not reflect if the units were in a usable state and renewed or if the existing units were abandoned.
Let’s focus on new developments: those that had no units when the project was proposed and were added to the market after the completion of the project.
# Filtering for new developments (projects with 0 initial housing units that added new units)
new_developments <- housing_data %>%
filter(ClassAInit == 0, ClassANet > 0) # ClassAInit is 0, and new units were added
# Removing rows where CompltYear is NA
new_developments <- new_developments %>%
filter(!is.na(CompltYear)) # Remove rows where CompltYear is NA
# Summarizing by year: count the new developments and sum the total units added
new_developments_summary <- new_developments %>%
group_by(CompltYear) %>%
summarise(
New_Developments_Count = n(), # Count of new developments
Total_New_Units = sum(ClassANet, na.rm = TRUE), .groups = 'drop' # Sum of new housing units added
)
# Viewing the summarized data
print(paste0("The total number of new developments added across NYC from ", min(new_developments_summary$CompltYear), " to ", max(new_developments_summary$CompltYear), " is summarized below."))
## [1] "The total number of new developments added across NYC from 2010 to 2024 is summarized below."
print(new_developments_summary)
## # A tibble: 15 × 3
## CompltYear New_Developments_Count Total_New_Units
## <dbl> <int> <dbl>
## 1 2010 2295 24085
## 2 2011 1778 18413
## 3 2012 1431 11421
## 4 2013 1382 15887
## 5 2014 1338 14241
## 6 2015 1489 17801
## 7 2016 1787 25178
## 8 2017 1730 27842
## 9 2018 1732 30425
## 10 2019 1743 27326
## 11 2020 1370 21568
## 12 2021 1427 29713
## 13 2022 1275 26693
## 14 2023 1371 29336
## 15 2024 11 118
Finally, here is a visual of the number of new developments added each year via a bar chart.
ggplot(new_developments_summary, aes(x = CompltYear, y = New_Developments_Count)) +
geom_bar(stat = "identity") +
labs(title = "New Developments per Year", x = "Year", y = "Developments") +
theme_minimal()
In this report, we analyzed housing completions and new developments across New York City from the dataset provided by the NYC Department of Planning. This analysis provides useful insights into the trends in housing development, both citywide and in individual boroughs.