Where is NYC landscape changing faster in multi-dwelling (residential rental unit) housing?
The Department of Housing Preservation and Development collects registration information from owners of residential rental unit. The data set has 171708 observations and 16 variables beginning from 1993 to present. The data is provided by NYC OpenData | Department of Housing Preservation and Development (HPD) as an observational study.
The dataset column name and description:| Variable | Definition |
|---|---|
| RegistrationID | Unique identifier of Registration |
| BuildingID | Unique identifier of building being registered |
| BoroID | Unique identifier of a borough |
| Boro | Boro code (1 = Manhattan, 2 = Bronx, 3 = Brooklyn, 4 = Queens, 5 = Staten Island) |
| HouseNumber | Address informatin for the building |
| LowHouseNumber | Address information for the building |
| HighHouseNumber | Address information for the building |
| StreetName | Address information for the building |
| StreetCode | Address information for the building |
| Zip | Address information for the building |
| Block | Tax block for building |
| Lot | Tax lot for building |
| BIN | DCP Building Identification Number for building |
| CommunityBoard | Community Board for building |
| LastRegistrationDate | Date on which the registration information was processed |
| RegistrationEndDate | Expiration date of registration record |
1. Removing variables with duplicate definition and non-influential variables for the data visualization analysis.
2. Explore the data frame summary
statistical values for each vector:
## boroid boro housenumber streetname
## Min. :1.000 Length:171708 Length:171708 Length:171708
## 1st Qu.:2.000 Class :character Class :character Class :character
## Median :3.000 Mode :character Mode :character Mode :character
## Mean :2.816
## 3rd Qu.:4.000
## Max. :5.000
##
## streetcode zip block lot
## Min. : 0 Min. : 1138 Min. : 1 Min. : 1.0
## 1st Qu.:15910 1st Qu.: 10461 1st Qu.: 1372 1st Qu.: 18.0
## Median :31320 Median : 11212 Median : 2707 Median : 37.0
## Mean :35427 Mean : 10930 Mean : 3532 Mean : 532.8
## 3rd Qu.:53000 3rd Qu.: 11234 3rd Qu.: 4977 3rd Qu.: 62.0
## Max. :99900 Max. :112233 Max. :16317 Max. :9100.0
## NA's :711
## bin communityboard lastregistrationdate
## Min. : 0 Min. : 0.000 Min. :1993-04-01 00:00:00
## 1st Qu.:2060667 1st Qu.: 3.000 1st Qu.:2021-07-02 00:00:00
## Median :3077899 Median : 7.000 Median :2021-10-29 00:00:00
## Mean :2935937 Mean : 7.102 Mean :2020-10-10 00:05:57
## 3rd Qu.:4006507 3rd Qu.:11.000 3rd Qu.:2022-07-13 00:00:00
## Max. :5861193 Max. :86.000 Max. :2022-08-31 00:00:00
## NA's :36 NA's :1849
## registrationenddate
## Min. :1994-05-31 00:00:00
## 1st Qu.:2021-09-01 00:00:00
## Median :2022-09-01 00:00:00
## Mean :2021-09-03 20:47:36
## 3rd Qu.:2023-09-01 00:00:00
## Max. :2023-09-01 00:00:00
##
The summary() function shows the variable object class
(integer, character, number, and
POSIXct/date-time), the minimum value, the 1st quartile
(25th percentile), the median value, the 3rd quartile (75th percentile),
and the maximum value.
The data frame contains missing values, NA's, for these
variables:
these missing values will remain due to a possible negative impact on data analysis. To avoid data reduction, we will explore the Kaggle boosters imputation techniques.
3. Removing the time values from the
POSIXct class in variables, lastregistrationdate
and registrationenddate.
1. Viewing the distribution of residential rental dwellings among NYC boroughs.
2. A table display of the cumulative count for each borough:
| boro | count |
|---|---|
| BRONX | 23099 |
| BROOKLYN | 75316 |
| MANHATTAN | 28672 |
| QUEENS | 40318 |
| STATEN ISLAND | 4303 |
3. A time series span on the registration expiration data for each borough.
4. A time series span of the processed registration by each borough.
The data indicates a high growth in registration of residential
rental units in the borough of Brooklyn and then
Queens. The time series visualizations indicates a constant
low activity for approximately 28 years, then a sudden increase during
2021. A granular analysts will provide insights on the yearly and
monthly quantitative distribution for each borough.
The document analysis may be reproduced as a static or dynamic format
using shiny applications and PDF outputs.