Project Proposal

Where is NYC landscape changing faster in multi-dwelling (residential rental unit) housing?

Origin of the Data set

The Department of Housing Preservation and Development collects registration information from owners of residential rental unit. The data set has 171708 observations and 16 variables beginning from 1993 to present. The data is provided by NYC OpenData | Department of Housing Preservation and Development (HPD) as an observational study.

The dataset column name and description:
Variable Definition
RegistrationID Unique identifier of Registration
BuildingID Unique identifier of building being registered
BoroID Unique identifier of a borough
Boro Boro code (1 = Manhattan, 2 = Bronx, 3 = Brooklyn, 4 = Queens, 5 = Staten Island)
HouseNumber Address informatin for the building
LowHouseNumber Address information for the building
HighHouseNumber Address information for the building
StreetName Address information for the building
StreetCode Address information for the building
Zip Address information for the building
Block Tax block for building
Lot Tax lot for building
BIN DCP Building Identification Number for building
CommunityBoard Community Board for building
LastRegistrationDate Date on which the registration information was processed
RegistrationEndDate Expiration date of registration record

Data Preparation | Relevant summary statistics


Creating new data frame

1. Removing variables with duplicate definition and non-influential variables for the data visualization analysis.

2. Explore the data frame summary statistical values for each vector:

##      boroid          boro           housenumber         streetname       
##  Min.   :1.000   Length:171708      Length:171708      Length:171708     
##  1st Qu.:2.000   Class :character   Class :character   Class :character  
##  Median :3.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2.816                                                           
##  3rd Qu.:4.000                                                           
##  Max.   :5.000                                                           
##                                                                          
##    streetcode         zip             block            lot        
##  Min.   :    0   Min.   :  1138   Min.   :    1   Min.   :   1.0  
##  1st Qu.:15910   1st Qu.: 10461   1st Qu.: 1372   1st Qu.:  18.0  
##  Median :31320   Median : 11212   Median : 2707   Median :  37.0  
##  Mean   :35427   Mean   : 10930   Mean   : 3532   Mean   : 532.8  
##  3rd Qu.:53000   3rd Qu.: 11234   3rd Qu.: 4977   3rd Qu.:  62.0  
##  Max.   :99900   Max.   :112233   Max.   :16317   Max.   :9100.0  
##                  NA's   :711                                      
##       bin          communityboard   lastregistrationdate         
##  Min.   :      0   Min.   : 0.000   Min.   :1993-04-01 00:00:00  
##  1st Qu.:2060667   1st Qu.: 3.000   1st Qu.:2021-07-02 00:00:00  
##  Median :3077899   Median : 7.000   Median :2021-10-29 00:00:00  
##  Mean   :2935937   Mean   : 7.102   Mean   :2020-10-10 00:05:57  
##  3rd Qu.:4006507   3rd Qu.:11.000   3rd Qu.:2022-07-13 00:00:00  
##  Max.   :5861193   Max.   :86.000   Max.   :2022-08-31 00:00:00  
##  NA's   :36                         NA's   :1849                 
##  registrationenddate          
##  Min.   :1994-05-31 00:00:00  
##  1st Qu.:2021-09-01 00:00:00  
##  Median :2022-09-01 00:00:00  
##  Mean   :2021-09-03 20:47:36  
##  3rd Qu.:2023-09-01 00:00:00  
##  Max.   :2023-09-01 00:00:00  
## 

The summary() function shows the variable object class (integer, character, number, and POSIXct/date-time), the minimum value, the 1st quartile (25th percentile), the median value, the 3rd quartile (75th percentile), and the maximum value.

The data frame contains missing values, NA's, for these variables:

  • zip
  • bin
  • lastregistrationdate

these missing values will remain due to a possible negative impact on data analysis. To avoid data reduction, we will explore the Kaggle boosters imputation techniques.


3. Removing the time values from the POSIXct class in variables, lastregistrationdate and registrationenddate.



Visualizations

1. Viewing the distribution of residential rental dwellings among NYC boroughs.

2. A table display of the cumulative count for each borough:

boro count
BRONX 23099
BROOKLYN 75316
MANHATTAN 28672
QUEENS 40318
STATEN ISLAND 4303


3. A time series span on the registration expiration data for each borough.

4. A time series span of the processed registration by each borough.



Summary

The data indicates a high growth in registration of residential rental units in the borough of Brooklyn and then Queens. The time series visualizations indicates a constant low activity for approximately 28 years, then a sudden increase during 2021. A granular analysts will provide insights on the yearly and monthly quantitative distribution for each borough.

The document analysis may be reproduced as a static or dynamic format using shiny applications and PDF outputs.

RPubs Source