Abstract: I know the assignment asked for three data sets but I am using 6. I wanted to look at specifically rent increase over the past few years in nyc. I would add a 6th data set and that is cpi or use the inflation data if rent is not a part of the cpi calculation. The goal that I am trying to accomplish in this project is artificial inflation in the rent prices. In the past three years, nyc has experienced a population decrease. So why has rent increased if the basic principals of supply and demand are to take effect and rent prices are supposed to drop as we seen in the pandemic. Is there market manipulation happening here in the rent prices? My theory is that there is an extreme amount of market manipulation. I cannot form a conclusion in my analysis in this project because this could take months upon months of research. I will however try to get an idea and form a hypothesis at the end of analysis here. Over the past couple of years, nyc has seen an increase of new residential buildings being built, especially empty and vacated land area. So I will ask the question again, why is it that rent increases when there is a decrease in population in nyc residents that is including legal and illegal residents, along with an increase in apartment units in this city if the laws of supply and demand are to be applied. Are we in a fair and free market? Let’s find out.
Part 1: Reading the data
median_income_nyc_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/Median%20individual%20income%20in%20New%20York%20City.csv"
median_nyc_income <- read.csv(median_income_nyc_url)
head(median_nyc_income)
## Entity.DCID Entity.properties.isoCode Entity.properties.name
## 1 geoId/3651000 NA New York City
## 2 geoId/3651000 NA New York City
## 3 geoId/3651000 NA New York City
## 4 geoId/3651000 NA New York City
## 5 geoId/3651000 NA New York City
## 6 geoId/3651000 NA New York City
## Variable.DCID Variable.observation.date
## 1 Median_Income_Person 2011
## 2 Median_Income_Person 2012
## 3 Median_Income_Person 2013
## 4 Median_Income_Person 2014
## 5 Median_Income_Person 2015
## 6 Median_Income_Person 2016
## Variable.observation.metadata.importName
## 1 CensusACS5YearSurvey
## 2 CensusACS5YearSurvey
## 3 CensusACS5YearSurvey
## 4 CensusACS5YearSurvey
## 5 CensusACS5YearSurvey
## 6 CensusACS5YearSurvey
## Variable.observation.metadata.provenanceUrl
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## Variable.observation.metadata.scalingFactor
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## Variable.observation.metadata.unit
## 1 USDollar
## 2 USDollar
## 3 USDollar
## 4 USDollar
## 5 USDollar
## 6 USDollar
## Variable.observation.metadata.unitDisplayName Variable.observation.value
## 1 USD 27252
## 2 USD 27422
## 3 USD 27486
## 4 USD 27683
## 5 USD 28043
## 6 USD 29187
## Variable.properties.name
## 1 Median Income of a Population
## 2 Median Income of a Population
## 3 Median Income of a Population
## 4 Median Income of a Population
## 5 Median Income of a Population
## 6 Median Income of a Population
nyc_housing_count_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/myc_housing_count.csv"
nyc_housing_count_data <-read.csv(nyc_housing_count_url)
head(nyc_housing_count_data)
## placeDcid placeName Date.Count_HousingUnit Value.Count_HousingUnit
## 1 zip/10001 10001 2011 12161
## 2 zip/10001 10001 2012 12473
## 3 zip/10001 10001 2013 12617
## 4 zip/10001 10001 2014 12638
## 5 zip/10001 10001 2015 13305
## 6 zip/10001 10001 2016 13520
## Source.Count_HousingUnit
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
nyc_population_2020_2023_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_population_2020-2023.csv"
nyc_population_2020_2023_data <- read.csv(nyc_population_2020_2023_url)
head(nyc_population_2020_2023_data)
## Geographic.Area April.1..2020.Estimates.Base X2020 X2021 X2022
## 1 Adams village, New York 1,694 1,688 1,698 1,686
## 2 Addison village, New York 1,608 1,604 1,589 1,576
## 3 Afton village, New York 801 800 793 792
## 4 Airmont village, New York 10,135 10,065 10,133 10,148
## 5 Akron village, New York 2,912 2,909 2,903 2,893
## 6 Albany city, New York 99,239 98,682 100,710 100,884
## X2023
## 1 1,678
## 2 1,564
## 3 786
## 4 10,163
## 5 2,883
## 6 101,228
nyc_population_count_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_population_count.csv"
nyc_population_count_data <- read.csv(nyc_population_count_url)
head(nyc_population_count_data)
## placeDcid placeName Date.Count_Person Value.Count_Person
## 1 zip/10001 10001 2011 21097
## 2 zip/10001 10001 2012 20579
## 3 zip/10001 10001 2013 21966
## 4 zip/10001 10001 2014 22767
## 5 zip/10001 10001 2015 23537
## 6 zip/10001 10001 2016 23332
## Source.Count_Person
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
median_rent_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/Zip_zori_uc_sfrcondomfr_sm_month.csv"
nyc_median_rent_data <- read.csv(median_rent_url)
head(nyc_median_rent_data)
## RegionID SizeRank RegionName RegionType StateName State City
## 1 91982 1 77494 zip TX TX Katy
## 2 91940 3 77449 zip TX TX Katy
## 3 62080 4 11368 zip NY NY New York
## 4 91733 5 77084 zip TX TX Houston
## 5 93144 6 79936 zip TX TX El Paso
## 6 62093 7 11385 zip NY NY New York
## Metro CountyName X1.31.2015 X2.28.2015
## 1 Houston-The Woodlands-Sugar Land, TX Fort Bend County 1608.145 1612.486
## 2 Houston-The Woodlands-Sugar Land, TX Harris County 1323.000 1318.618
## 3 New York-Newark-Jersey City, NY-NJ-PA Queens County NA NA
## 4 Houston-The Woodlands-Sugar Land, TX Harris County 1249.562 1246.504
## 5 El Paso, TX El Paso County NA NA
## 6 New York-Newark-Jersey City, NY-NJ-PA Queens County NA NA
## X3.31.2015 X4.30.2015 X5.31.2015 X6.30.2015 X7.31.2015 X8.31.2015 X9.30.2015
## 1 1622.156 1629.322 1635.142 1637.402 1644.772 1641.979 1637.055
## 2 1326.726 1337.962 1348.856 1356.630 1359.682 1365.164 1368.848
## 3 NA NA NA NA NA NA NA
## 4 1261.545 1276.478 1287.994 1291.040 1289.377 1295.035 1299.514
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## X10.31.2015 X11.30.2015 X12.31.2015 X1.31.2016 X2.29.2016 X3.31.2016
## 1 1625.837 1618.156 1596.582 1589.528 1576.256 1588.424
## 2 1370.367 1367.053 1363.083 1356.252 1356.841 1365.354
## 3 NA NA NA NA NA NA
## 4 1295.799 1292.824 1275.386 1289.026 1283.337 1300.826
## 5 NA NA NA NA NA NA
## 6 NA NA NA NA NA NA
## X4.30.2016 X5.31.2016 X6.30.2016 X7.31.2016 X8.31.2016 X9.30.2016 X10.31.2016
## 1 1577.117 1577.575 1573.725 1574.991 1572.879 1561.637 1552.095
## 2 1375.994 1380.825 1372.580 1367.023 1366.093 1367.075 1369.247
## 3 NA NA NA NA NA NA NA
## 4 1292.692 1292.858 1284.914 1289.592 1297.683 1299.804 1298.137
## 5 NA NA NA NA NA NA NA
## 6 NA 2266.140 2292.023 2320.788 2343.339 2337.120 2345.339
## X11.30.2016 X12.31.2016 X1.31.2017 X2.28.2017 X3.31.2017 X4.30.2017
## 1 1534.646 1520.193 1519.930 1538.621 1556.170 1563.519
## 2 1360.313 1353.082 1346.414 1353.121 1361.972 1370.222
## 3 NA NA NA NA NA NA
## 4 1295.004 1298.242 1291.994 1292.831 1298.754 1306.661
## 5 NA NA NA NA NA NA
## 6 2320.083 2326.450 2310.855 2313.101 2316.134 2309.757
## X5.31.2017 X6.30.2017 X7.31.2017 X8.31.2017 X9.30.2017 X10.31.2017
## 1 1570.041 1577.817 1577.463 1566.557 1573.932 1587.188
## 2 1372.884 1376.444 1383.182 1384.290 1390.200 1388.765
## 3 NA NA NA NA NA NA
## 4 1314.942 1319.403 1326.232 1328.778 1344.557 1354.943
## 5 NA NA NA NA NA NA
## 6 2336.220 2327.477 2349.503 2354.110 2393.023 2363.952
## X11.30.2017 X12.31.2017 X1.31.2018 X2.28.2018 X3.31.2018 X4.30.2018
## 1 1602.971 1603.653 1613.512 1621.200 1625.153 1622.228
## 2 1396.765 1400.185 1413.150 1419.085 1420.260 1423.940
## 3 NA NA NA NA NA NA
## 4 1364.352 1350.103 1347.248 1345.481 1347.513 1357.129
## 5 NA NA NA NA NA NA
## 6 2360.901 2336.269 2331.220 2314.654 2317.821 2337.118
## X5.31.2018 X6.30.2018 X7.31.2018 X8.31.2018 X9.30.2018 X10.31.2018
## 1 1619.678 1619.904 1613.590 1601.379 1586.974 1581.381
## 2 1428.269 1436.775 1435.404 1436.542 1431.220 1428.874
## 3 NA NA NA NA NA NA
## 4 1360.322 1368.881 1363.022 1368.367 1370.118 1366.982
## 5 NA NA NA NA NA NA
## 6 2352.811 2362.330 2367.660 2376.613 2380.886 2402.277
## X11.30.2018 X12.31.2018 X1.31.2019 X2.28.2019 X3.31.2019 X4.30.2019
## 1 1586.398 1592.021 1602.337 1607.856 1619.725 1624.987
## 2 1435.280 1432.779 1432.576 1427.195 1435.062 1439.857
## 3 NA NA NA NA NA NA
## 4 1376.080 1377.496 1380.147 1366.159 1376.160 1392.823
## 5 NA NA NA NA NA NA
## 6 2393.917 2397.149 2374.795 2397.457 2381.838 2398.852
## X5.31.2019 X6.30.2019 X7.31.2019 X8.31.2019 X9.30.2019 X10.31.2019
## 1 1626.211 1623.705 1623.966 1633.453 1629.237 1628.901
## 2 1449.515 1454.263 1460.957 1455.115 1455.808 1453.984
## 3 NA NA NA NA NA NA
## 4 1413.991 1413.849 1419.157 1415.652 1410.156 1400.941
## 5 NA NA NA NA NA NA
## 6 2381.123 2440.723 2471.034 2495.005 2476.095 2464.924
## X11.30.2019 X12.31.2019 X1.31.2020 X2.29.2020 X3.31.2020 X4.30.2020
## 1 1623.093 1626.621 1631.052 1638.967 1644.800 1654.749
## 2 1451.602 1452.760 1458.677 1471.609 1483.701 1485.524
## 3 NA NA NA NA NA NA
## 4 1400.391 1405.868 1409.353 1416.134 1426.521 1429.873
## 5 NA NA NA NA NA NA
## 6 2488.450 2498.628 2509.859 2515.646 2501.732 2475.083
## X5.31.2020 X6.30.2020 X7.31.2020 X8.31.2020 X9.30.2020 X10.31.2020
## 1 1647.455 1648.247 1639.324 1649.510 1662.275 1659.069
## 2 1476.320 1477.038 1482.860 1500.223 1509.817 1515.313
## 3 NA NA NA NA NA NA
## 4 1435.425 1440.077 1449.173 1453.679 1461.254 1472.444
## 5 NA NA NA NA NA NA
## 6 2475.563 2469.365 2475.826 2427.843 2401.526 2368.533
## X11.30.2020 X12.31.2020 X1.31.2021 X2.28.2021 X3.31.2021 X4.30.2021
## 1 1668.940 1670.079 1690.808 1692.998 1700.745 1717.851
## 2 1523.964 1535.151 1542.668 1547.833 1554.156 1578.005
## 3 NA NA NA NA NA NA
## 4 1467.932 1467.848 1460.713 1468.667 1473.275 1488.798
## 5 NA NA NA NA NA NA
## 6 2337.224 2305.878 2292.884 2277.712 2282.339 2278.929
## X5.31.2021 X6.30.2021 X7.31.2021 X8.31.2021 X9.30.2021 X10.31.2021
## 1 1750.980 1795.362 1834.038 1863.494 1878.153 1891.734
## 2 1616.404 1651.964 1693.315 1707.366 1721.099 1731.629
## 3 NA NA NA NA NA NA
## 4 1519.331 1551.723 1607.096 1660.267 1706.364 1695.565
## 5 1169.709 1172.574 1172.414 1178.988 1200.197 1220.085
## 6 2322.021 2368.377 2420.586 2449.848 2482.230 2518.714
## X11.30.2021 X12.31.2021 X1.31.2022 X2.28.2022 X3.31.2022 X4.30.2022
## 1 1908.323 1918.317 1924.789 1929.261 1938.351 1960.820
## 2 1730.442 1747.786 1744.426 1779.906 1794.948 1806.075
## 3 NA NA NA NA NA NA
## 4 1673.322 1675.055 1696.101 1709.500 1708.509 1719.433
## 5 1239.429 1242.151 1242.191 1242.710 1266.113 1275.779
## 6 2556.784 2580.724 2592.727 2591.473 2607.010 2657.466
## X5.31.2022 X6.30.2022 X7.31.2022 X8.31.2022 X9.30.2022 X10.31.2022
## 1 1985.426 2011.009 2028.075 2031.906 2031.043 2026.256
## 2 1789.815 1785.972 1825.483 1875.844 1877.390 1863.603
## 3 NA NA NA NA NA NA
## 4 1746.314 1768.846 1782.872 1769.804 1763.618 1749.751
## 5 1284.918 1301.894 1340.722 1369.575 1384.930 1369.386
## 6 2727.357 2830.789 2893.427 2951.086 2957.775 2966.494
## X11.30.2022 X12.31.2022 X1.31.2023 X2.28.2023 X3.31.2023 X4.30.2023
## 1 2007.671 2003.009 2016.930 2027.761 2040.195 2021.600
## 2 1837.675 1844.636 1837.186 1840.907 1854.599 1870.026
## 3 NA NA NA NA NA NA
## 4 1753.449 1746.937 1755.166 1752.060 1764.965 1767.850
## 5 1378.816 1383.020 1417.562 1431.443 1431.708 1425.782
## 6 2937.695 2898.732 2851.205 2847.664 2865.429 2908.589
## X5.31.2023 X6.30.2023 X7.31.2023 X8.31.2023 X9.30.2023 X10.31.2023
## 1 2031.902 2039.850 2050.557 2056.771 2056.602 2053.728
## 2 1876.601 1884.102 1880.272 1891.205 1889.575 1880.798
## 3 NA NA NA NA NA NA
## 4 1784.556 1797.168 1797.136 1802.712 1799.171 1828.528
## 5 1430.076 1429.599 1450.118 1452.019 1456.963 1459.902
## 6 2950.697 2986.174 3004.259 3021.107 3039.225 3042.726
## X11.30.2023 X12.31.2023 X1.31.2024 X2.29.2024 X3.31.2024 X4.30.2024
## 1 2031.526 2017.766 2006.277 2026.201 2039.152 2064.044
## 2 1875.237 1883.844 1897.293 1909.784 1919.376 1929.857
## 3 NA NA NA NA NA NA
## 4 1825.755 1830.712 1814.003 1820.641 1822.235 1846.918
## 5 1460.857 1478.183 1479.396 1479.012 1471.360 1469.008
## 6 3022.294 2985.509 2976.598 2971.280 3022.816 3032.851
## X5.31.2024 X6.30.2024 X7.31.2024 X8.31.2024
## 1 2067.249 2082.409 2086.030 2096.459
## 2 1924.842 1925.223 1937.730 1949.967
## 3 NA NA 2322.418 2282.500
## 4 1859.796 1866.253 1859.417 1855.383
## 5 1484.655 1501.267 1514.330 1504.006
## 6 3102.032 3132.088 3203.117 3199.751
nyc_zip_code_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_zipcodes.csv"
nyc_zip_code_list <- read.csv(nyc_zip_code_url, header = FALSE)
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on
## 'https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_zipcodes.csv'
nyc_zipcodes_list <- as.list(as.data.frame(t(nyc_zip_code_list)))
#print(nyc_zipcodes_list)
Filtering columns/Renaming columns
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
colnames(median_nyc_income)
## [1] "Entity.DCID"
## [2] "Entity.properties.isoCode"
## [3] "Entity.properties.name"
## [4] "Variable.DCID"
## [5] "Variable.observation.date"
## [6] "Variable.observation.metadata.importName"
## [7] "Variable.observation.metadata.provenanceUrl"
## [8] "Variable.observation.metadata.scalingFactor"
## [9] "Variable.observation.metadata.unit"
## [10] "Variable.observation.metadata.unitDisplayName"
## [11] "Variable.observation.value"
## [12] "Variable.properties.name"
median_nyc_income <- median_nyc_income %>%
select("Variable.observation.date", "Variable.observation.value")
median_nyc_income <- median_nyc_income %>%
rename(
year = Variable.observation.date,
nyc_median_income = Variable.observation.value
)
#print(median_nyc_income)
colnames(nyc_housing_count_data)
## [1] "placeDcid" "placeName"
## [3] "Date.Count_HousingUnit" "Value.Count_HousingUnit"
## [5] "Source.Count_HousingUnit"
nyc_housing_count_data <- nyc_housing_count_data %>%
select("placeName", "Date.Count_HousingUnit", "Value.Count_HousingUnit")
nyc_housing_count_data <- nyc_housing_count_data %>%
rename(
zipcode = placeName,
year = Date.Count_HousingUnit,
total_units = Value.Count_HousingUnit
)
head(nyc_housing_count_data)
## zipcode year total_units
## 1 10001 2011 12161
## 2 10001 2012 12473
## 3 10001 2013 12617
## 4 10001 2014 12638
## 5 10001 2015 13305
## 6 10001 2016 13520
#need only one row from here and that is region = New York city, New York
colnames(nyc_population_2020_2023_data)
## [1] "Geographic.Area" "April.1..2020.Estimates.Base"
## [3] "X2020" "X2021"
## [5] "X2022" "X2023"
nyc_population_2020_2023_data <- nyc_population_2020_2023_data %>%
select("Geographic.Area", "X2020", "X2021", "X2022", "X2023")
nyc_population_2020_2023_data <- nyc_population_2020_2023_data %>%
rename(
region = Geographic.Area,
'2020' = X2020,
'2021' = X2021,
'2022' = X2022,
'2023' = X2023
)
#print(nyc_population_2020_2023_data)
colnames(nyc_population_count_data)
## [1] "placeDcid" "placeName" "Date.Count_Person"
## [4] "Value.Count_Person" "Source.Count_Person"
nyc_population_count_data <- nyc_population_count_data %>%
select("placeName", "Date.Count_Person", "Value.Count_Person")
nyc_population_count_data <- nyc_population_count_data %>%
rename(
zipcode = placeName,
year = Date.Count_Person,
population_total = Value.Count_Person
)
head(nyc_population_count_data)
## zipcode year population_total
## 1 10001 2011 21097
## 2 10001 2012 20579
## 3 10001 2013 21966
## 4 10001 2014 22767
## 5 10001 2015 23537
## 6 10001 2016 23332
Transforming and merging nyc population
library(tidyr)
nyc_population_2020_2023_data_filtered <- nyc_population_2020_2023_data %>%
filter(region == "New York city, New York")
#print(nyc_population_2020_2023_data_filtered)
nyc_population2020_2023_data <- nyc_population_2020_2023_data_filtered %>%
pivot_longer(
cols = `2020`:`2023`,
names_to = "year",
values_to = "total_population"
)
#there is no need for the region column
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
select("year", "total_population")
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
mutate(year = as.integer(year))
#print(nyc_population2020_2023_data)
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
mutate(total_population = gsub("[^0-9]", "", total_population))
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
mutate(total_population = as.integer(total_population))
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
filter(year %in% c(2021, 2022, 2023))
#print(nyc_population2020_2023_data)
total_population_by_year <- nyc_population_count_data %>%
group_by(year) %>%
summarize(total_population = sum(population_total, na.rm = TRUE))
total_population_by_year <- total_population_by_year %>%
mutate(year = as.integer(year))
total_population_by_year_filtered <- total_population_by_year %>%
filter(!year %in% c(2021, 2022))
#print(total_population_by_year_filtered)
merged_population_data <- bind_rows(total_population_by_year_filtered, nyc_population2020_2023_data)
head(merged_population_data)
## # A tibble: 6 Ă— 2
## year total_population
## <int> <int>
## 1 2011 8122397
## 2 2012 8192973
## 3 2013 8262986
## 4 2014 8348866
## 5 2015 8420477
## 6 2016 8455824
graphing the total population by year from 2011 - 2023
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
ggplot(merged_population_data, aes(x = year, y = total_population)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = merged_population_data$year) +
scale_y_continuous(breaks = seq(8000000, 8700000, by = 100000)) +
labs(
title = "NYC Total Population 2011 - 2023",
x = "Year",
y = "Total Population"
)
Let me explain why I dropped the years 2021 - 2022 from the
total_population_by_year df. The total population did not make sense nor
was it consistent with the actual data and what was happening during
that time. This was during the pandemic and people were leaving the city
to either go upstate, back to other states where they were originally
from to work remotely, or completley moved to other states where the
cost of living was cheaper. During this time, landlords were struggling
to find tenants as rent prices did drop and businesses were struggling
to find employees. This was due to the laws of supply and demand. There
was a shock to the renter’s market where it couldnt handle the shock of
all these people leaving at once. The other dataframe that had the
population from 2020 - 2023 was more accurate and consistent with other
data i found online. That is why I did the filtering and merging of the
two datasets the way I did. I would like to find data dating all the way
back to the 1900’s for nyc population, but I found it was difficult to
find this data.
ggplot(median_nyc_income, aes(x = year, y = nyc_median_income)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = median_nyc_income$year) +
labs(
title = "NYC median income 2011 - 2022",
x = "Year",
y = "Median Income"
)
#print(median_nyc_income)
I do not like this dataset because it does not capture the upper middle class income that is 80k and above. Especially the 6 figure incomes that I am aware are indeed outliers on the national level as well as in nyc. However, the oppurtunity of these kinds of income jobs are available here in nyc than any other part of the country that is why I would like to capture that data. Has that played a role in the rent price boom over the years. Especially during the height of the new era of internet tech companies. The rise of 4G made a big difference in our lives at the start of 2010s and completley changed our worlds. We can see the fast rise in income from the start at 2013 until now. However, I wanted to capture a dataset that had information for the years prior to 2011. Perhaps maybe all the way back to 1980s. That was difficult for me to find as well as government sites are not the friendliest to naviagate. Has the rise of the median household income a direct result of the minimum age rise as well under the DeBlasio administration? Or the big rise in tech jobs all across the city as the demand for tech jobs were hot and the recruitment firms for tech jobs made a killing on their revenue placing these tech workers. I can say that the rise in 2018 to 2020 was when the job market was hot and there was a lot of hiring during that time. This was due to the Tax cuts and jobs Act that lowered personal income tax, and corporate income tax. There was also regulation in doing business overseas in the financial sector, therefore a lot of dollars were moved here into the USA. We also so a second wave boom during the pandemic when it was the employees market vs the employor market. The employee had more negotiating power than the employer, and employers were having a hard time hiring. People were switiching jobs for more pay (including myself) during the pandemic. Companies were willing to compensate employees more not just give a minimum wage rate.
So comparing this chart to the population chart, we see the rise in wages but a decrease in population. Lets now start working with the suppy of housing data and do a comparison analysis.
grouping the data by year
nyc_housing_count_data <- nyc_housing_count_data %>%
group_by(year) %>%
summarize(total_units = sum(total_units, na.rm = TRUE))
head(nyc_housing_count_data)
## # A tibble: 6 Ă— 2
## year total_units
## <int> <int>
## 1 2011 3354535
## 2 2012 3369168
## 3 2013 3378223
## 4 2014 3405660
## 5 2015 3419891
## 6 2016 3433873
ggplot(nyc_housing_count_data, aes(x = year, y = total_units)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = median_nyc_income$year) +
labs(
title = "NYC Total Units 2011 - 2022",
x = "Year",
y = "Total Units"
)
So far we see an increase in units, increase in wages, but a decrease in population.
Lets take a look at the median rent data. This one will be a bit tricky to work with as we are given the median rent per zipcode
colnames(nyc_median_rent_data)
## [1] "RegionID" "SizeRank" "RegionName" "RegionType" "StateName"
## [6] "State" "City" "Metro" "CountyName" "X1.31.2015"
## [11] "X2.28.2015" "X3.31.2015" "X4.30.2015" "X5.31.2015" "X6.30.2015"
## [16] "X7.31.2015" "X8.31.2015" "X9.30.2015" "X10.31.2015" "X11.30.2015"
## [21] "X12.31.2015" "X1.31.2016" "X2.29.2016" "X3.31.2016" "X4.30.2016"
## [26] "X5.31.2016" "X6.30.2016" "X7.31.2016" "X8.31.2016" "X9.30.2016"
## [31] "X10.31.2016" "X11.30.2016" "X12.31.2016" "X1.31.2017" "X2.28.2017"
## [36] "X3.31.2017" "X4.30.2017" "X5.31.2017" "X6.30.2017" "X7.31.2017"
## [41] "X8.31.2017" "X9.30.2017" "X10.31.2017" "X11.30.2017" "X12.31.2017"
## [46] "X1.31.2018" "X2.28.2018" "X3.31.2018" "X4.30.2018" "X5.31.2018"
## [51] "X6.30.2018" "X7.31.2018" "X8.31.2018" "X9.30.2018" "X10.31.2018"
## [56] "X11.30.2018" "X12.31.2018" "X1.31.2019" "X2.28.2019" "X3.31.2019"
## [61] "X4.30.2019" "X5.31.2019" "X6.30.2019" "X7.31.2019" "X8.31.2019"
## [66] "X9.30.2019" "X10.31.2019" "X11.30.2019" "X12.31.2019" "X1.31.2020"
## [71] "X2.29.2020" "X3.31.2020" "X4.30.2020" "X5.31.2020" "X6.30.2020"
## [76] "X7.31.2020" "X8.31.2020" "X9.30.2020" "X10.31.2020" "X11.30.2020"
## [81] "X12.31.2020" "X1.31.2021" "X2.28.2021" "X3.31.2021" "X4.30.2021"
## [86] "X5.31.2021" "X6.30.2021" "X7.31.2021" "X8.31.2021" "X9.30.2021"
## [91] "X10.31.2021" "X11.30.2021" "X12.31.2021" "X1.31.2022" "X2.28.2022"
## [96] "X3.31.2022" "X4.30.2022" "X5.31.2022" "X6.30.2022" "X7.31.2022"
## [101] "X8.31.2022" "X9.30.2022" "X10.31.2022" "X11.30.2022" "X12.31.2022"
## [106] "X1.31.2023" "X2.28.2023" "X3.31.2023" "X4.30.2023" "X5.31.2023"
## [111] "X6.30.2023" "X7.31.2023" "X8.31.2023" "X9.30.2023" "X10.31.2023"
## [116] "X11.30.2023" "X12.31.2023" "X1.31.2024" "X2.29.2024" "X3.31.2024"
## [121] "X4.30.2024" "X5.31.2024" "X6.30.2024" "X7.31.2024" "X8.31.2024"
nyc_zipcodes_vector <- unlist(nyc_zipcodes_list)
filtered_rent_data <- nyc_median_rent_data %>%
filter(RegionName %in% nyc_zipcodes_vector)
head(filtered_rent_data)
## RegionID SizeRank RegionName RegionType StateName State City
## 1 62080 4 11368 zip NY NY New York
## 2 62093 7 11385 zip NY NY New York
## 3 62019 9 11208 zip NY NY New York
## 4 62046 16 11236 zip NY NY New York
## 5 61807 17 10467 zip NY NY New York
## 6 62085 18 11373 zip NY NY New York
## Metro CountyName X1.31.2015 X2.28.2015
## 1 New York-Newark-Jersey City, NY-NJ-PA Queens County NA NA
## 2 New York-Newark-Jersey City, NY-NJ-PA Queens County NA NA
## 3 New York-Newark-Jersey City, NY-NJ-PA Kings County NA NA
## 4 New York-Newark-Jersey City, NY-NJ-PA Kings County NA NA
## 5 New York-Newark-Jersey City, NY-NJ-PA Bronx County NA NA
## 6 New York-Newark-Jersey City, NY-NJ-PA Queens County NA NA
## X3.31.2015 X4.30.2015 X5.31.2015 X6.30.2015 X7.31.2015 X8.31.2015 X9.30.2015
## 1 NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## X10.31.2015 X11.30.2015 X12.31.2015 X1.31.2016 X2.29.2016 X3.31.2016
## 1 NA NA NA NA NA NA
## 2 NA NA NA NA NA NA
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 NA NA NA NA NA NA
## 6 NA NA NA NA NA NA
## X4.30.2016 X5.31.2016 X6.30.2016 X7.31.2016 X8.31.2016 X9.30.2016 X10.31.2016
## 1 NA NA NA NA NA NA NA
## 2 NA 2266.140 2292.023 2320.788 2343.339 2337.12 2345.339
## 3 NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA
## 5 NA 1349.679 1327.656 1351.738 1313.098 1331.15 1317.193
## 6 NA NA NA NA NA NA NA
## X11.30.2016 X12.31.2016 X1.31.2017 X2.28.2017 X3.31.2017 X4.30.2017
## 1 NA NA NA NA NA NA
## 2 2320.083 2326.450 2310.855 2313.101 2316.134 2309.757
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1307.569 1294.087 1354.713 1381.230 1443.584 1427.754
## 6 NA NA NA NA NA NA
## X5.31.2017 X6.30.2017 X7.31.2017 X8.31.2017 X9.30.2017 X10.31.2017
## 1 NA NA NA NA NA NA
## 2 2336.22 2327.477 2349.503 2354.110 2393.023 2363.952
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 NA 1419.295 1427.326 1391.902 1388.167 1402.915
## 6 NA NA NA NA NA NA
## X11.30.2017 X12.31.2017 X1.31.2018 X2.28.2018 X3.31.2018 X4.30.2018
## 1 NA NA NA NA NA NA
## 2 2360.901 2336.269 2331.220 2314.654 2317.821 2337.118
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1443.141 1448.317 1415.071 1447.512 1475.749 1550.592
## 6 NA NA NA NA NA NA
## X5.31.2018 X6.30.2018 X7.31.2018 X8.31.2018 X9.30.2018 X10.31.2018
## 1 NA NA NA NA NA NA
## 2 2352.811 2362.330 2367.660 2376.613 2380.886 2402.277
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1557.876 1554.092 1551.874 1552.840 1580.368 1576.065
## 6 NA NA NA NA NA NA
## X11.30.2018 X12.31.2018 X1.31.2019 X2.28.2019 X3.31.2019 X4.30.2019
## 1 NA NA NA NA NA NA
## 2 2393.917 2397.149 2374.795 2397.457 2381.838 2398.852
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1554.715 1593.641 1588.411 1600.744 1578.046 1572.738
## 6 NA NA NA NA NA NA
## X5.31.2019 X6.30.2019 X7.31.2019 X8.31.2019 X9.30.2019 X10.31.2019
## 1 NA NA NA NA NA NA
## 2 2381.123 2440.723 2471.034 2495.005 2476.095 2464.924
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1587.081 1568.751 1613.196 1620.662 1685.971 1699.779
## 6 NA NA NA NA NA NA
## X11.30.2019 X12.31.2019 X1.31.2020 X2.29.2020 X3.31.2020 X4.30.2020
## 1 NA NA NA NA NA NA
## 2 2488.45 2498.628 2509.859 2515.646 2501.732 2475.083
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1695.05 1669.128 1649.561 1665.102 1668.471 1653.317
## 6 NA NA NA NA NA NA
## X5.31.2020 X6.30.2020 X7.31.2020 X8.31.2020 X9.30.2020 X10.31.2020
## 1 NA NA NA NA NA NA
## 2 2475.563 2469.365 2475.826 2427.843 2401.526 2368.533
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1612.794 1598.750 1655.226 1699.814 1745.707 1724.810
## 6 NA NA NA NA NA NA
## X11.30.2020 X12.31.2020 X1.31.2021 X2.28.2021 X3.31.2021 X4.30.2021
## 1 NA NA NA NA NA NA
## 2 2337.224 2305.878 2292.884 2277.712 2282.339 2278.929
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1774.450 1725.417 1744.514 1701.264 1737.235 1690.341
## 6 NA NA NA NA NA 1872.391
## X5.31.2021 X6.30.2021 X7.31.2021 X8.31.2021 X9.30.2021 X10.31.2021
## 1 NA NA NA NA NA NA
## 2 2322.021 2368.377 2420.586 2449.848 2482.230 2518.714
## 3 NA NA NA NA NA NA
## 4 NA NA NA NA NA NA
## 5 1715.512 1725.934 1751.695 1742.938 1772.303 1807.374
## 6 1875.980 1908.656 1930.109 1949.615 1966.494 1910.506
## X11.30.2021 X12.31.2021 X1.31.2022 X2.28.2022 X3.31.2022 X4.30.2022
## 1 NA NA NA NA NA NA
## 2 2556.784 2580.724 2592.727 2591.473 2607.010 2657.466
## 3 NA NA NA NA 2432.634 2468.666
## 4 NA NA NA NA NA NA
## 5 1812.226 1747.475 1728.976 1785.026 1831.988 1886.640
## 6 1934.053 1923.586 1927.948 1969.994 2023.199 2100.825
## X5.31.2022 X6.30.2022 X7.31.2022 X8.31.2022 X9.30.2022 X10.31.2022
## 1 NA NA NA NA NA NA
## 2 2727.357 2830.789 2893.427 2951.086 2957.775 2966.494
## 3 2553.099 2571.974 2541.777 2549.192 2533.571 2596.307
## 4 2328.869 2376.032 2347.321 2400.155 2397.660 2524.344
## 5 1881.483 1904.170 1937.096 1944.353 1931.878 1893.956
## 6 2107.368 2138.288 2166.511 2151.894 2144.004 2141.909
## X11.30.2022 X12.31.2022 X1.31.2023 X2.28.2023 X3.31.2023 X4.30.2023
## 1 NA NA NA NA NA NA
## 2 2937.695 2898.732 2851.205 2847.664 2865.429 2908.589
## 3 2624.750 2660.759 2683.873 2685.560 2657.591 2672.800
## 4 2502.490 2582.495 2432.282 2429.336 2391.820 2451.315
## 5 1891.447 1899.311 1929.648 1945.849 1958.650 1909.791
## 6 2163.435 2181.641 2213.842 2226.211 2214.526 2217.911
## X5.31.2023 X6.30.2023 X7.31.2023 X8.31.2023 X9.30.2023 X10.31.2023
## 1 NA NA NA NA NA NA
## 2 2950.697 2986.174 3004.259 3021.107 3039.225 3042.726
## 3 2699.051 2825.592 2817.253 2797.145 2773.578 2815.385
## 4 2501.957 2598.722 2642.102 2672.396 2715.256 2700.885
## 5 1963.970 2049.137 2084.984 2100.269 2182.744 2238.053
## 6 2218.745 2253.525 2261.120 2295.725 2275.544 2302.135
## X11.30.2023 X12.31.2023 X1.31.2024 X2.29.2024 X3.31.2024 X4.30.2024
## 1 NA NA NA NA NA NA
## 2 3022.294 2985.509 2976.598 2971.280 3022.816 3032.851
## 3 2934.593 2977.254 3015.684 2987.130 2963.105 2895.445
## 4 2743.458 2753.950 2848.811 2899.215 2977.176 2935.365
## 5 2317.342 2243.914 2269.451 2233.032 2244.841 2303.026
## 6 2293.275 2290.069 2283.986 2273.511 2281.456 2302.458
## X5.31.2024 X6.30.2024 X7.31.2024 X8.31.2024
## 1 NA NA 2322.418 2282.500
## 2 3102.032 3132.088 3203.117 3199.751
## 3 2817.406 2858.719 2816.037 2895.667
## 4 2959.498 2847.246 2862.954 2976.389
## 5 2282.180 2309.057 2294.031 2271.611
## 6 2377.143 2439.551 2491.626 2476.333
#creating a new df and taking the median of each date column
date_columns <- colnames(filtered_rent_data)[grepl("^X", colnames(filtered_rent_data))]
average_rent_nyc_df <- data.frame(
Date = gsub("^X", "", date_columns), # Remove the "X" from the column names to get the date
Median = sapply(filtered_rent_data[, date_columns], median, na.rm = TRUE) # Compute the median for each date column
)
average_rent_nyc_df$Mean <- sapply(filtered_rent_data[, date_columns], mean, na.rm = TRUE)
average_rent_nyc_df$Date <- as.Date(average_rent_nyc_df$Date, format = "%m.%d.%Y")
head(average_rent_nyc_df)
## Date Median Mean
## X1.31.2015 2015-01-31 2732.498 2727.223
## X2.28.2015 2015-02-28 2706.051 2726.790
## X3.31.2015 2015-03-31 2662.657 2722.277
## X4.30.2015 2015-04-30 2664.783 2725.026
## X5.31.2015 2015-05-31 2694.550 2733.493
## X6.30.2015 2015-06-30 2723.866 2739.232
ggplot(average_rent_nyc_df, aes(x = Date, y = Median)) +
geom_line() + # Line plot to connect the points
geom_point() + # Add points at each date
labs(title = "Median Rent in NYC Over Time",
x = "Date",
y = "Median Rent")
ggplot(average_rent_nyc_df, aes(x = Date, y = Mean)) +
geom_line() + # Line plot to connect the points
geom_point() + # Add points at each date
labs(title = "Mean Rent in NYC Over Time",
x = "Date",
y = "Mean Rent")
long_average_rent_nyc_df <- pivot_longer(average_rent_nyc_df,
cols = c("Median", "Mean"),
names_to = "Statistic",
values_to = "Value")
ggplot(long_average_rent_nyc_df, aes(x = Statistic, y = Value)) +
geom_boxplot() +
labs(title = "Box Plot of Median and Mean Rent Values",
x = "Statistic",
y = "Rent Values")
#colnames(filtered_rent_data) <- ifelse(grepl("^X", colnames(filtered_rent_data)), gsub("^X", "", #colnames(filtered_rent_data)), colnames(filtered_rent_data))
#nyc_median_rent_data <- nyc_median_rent_data %>%
# select("placeName", "Date.Monthly_Median_GrossRent_HousingUnit", "Value.Monthly_Median_GrossRent_HousingUnit")
#nyc_median_rent_data <- nyc_median_rent_data %>%
# rename(
# zipcode = placeName,
# year = Date.Monthly_Median_GrossRent_HousingUnit,
# median_gross_rent = Value.Monthly_Median_GrossRent_HousingUnit
# )
#print(nyc_zipcodes_list)
#head(nyc_median_rent_data)
long_rent_data <- pivot_longer(filtered_rent_data,
cols = starts_with("X"),
names_to = "Date",
values_to = "Rent")
long_rent_data$Date <- gsub("^X", "", long_rent_data$Date)
# Convert the Date column to proper date format
long_rent_data$Date <- as.Date(long_rent_data$Date, format = "%m.%d.%Y")
ggplot(long_rent_data, aes(x = Date, y = Rent)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(title = "Scatterplot of Rent Values Over Time for all zipcodes in nyc",
x = "Date",
y = "Rent Values") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6575 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6575 rows containing missing values or values outside the scale range
## (`geom_point()`).
Final Thoughts after doing this data. Since the median rent price in nyc reached over $2800 in nyc and the median income is under $40,000, then more than half the population is homeless. But somehow that is not the case. What my data shows is a broad view of what is going on in the rent prices. Except obviously for my last chart where I decided to graph everything by brute force and see whats been going on per zipcode in all five of the boroughs. What I would like to talk about is my findings here. Despite the popultion decrease and the increase in the supply in units, there is an increase in the mean rent prices on a month by month basis. I want to hypothesize there is some market manipulation going on and I wouldn’t even account inflation as the problem here. The dollar is the strongest currency in the world, despite what many people think about money printing since the pandemic, we actually shrunk our money supply since then. Please check the M2 money supply on the FRED website to confirm.
We did have a lot of people return to nyc and return to the office since the start of 2023. However we still got a decrease in population. I would even argue a decrease in white colar wages, although I would need to add a 7th data set here to confirm. I am making the case for the decrease in wages because I was unemployed for 3 months this year, and I was applying, the jobs postings had a less hourly rate than they did back during the pandemic or even in 2019. If I were to go into more debth for this I would go look into vacant apartment units vs nonvacant in 2022-2024. The reason for this is that I want to confirm what is being told on the news with landlords keeping units vacant on purpose in order to drive up the rent prices. I would also try to look for salaries of white colar workers in nyc to see if they are also driving up the rent prices.
I want to add one final note about something I observed in real life that is not part of the data. Over the past 5 years I have seen many new construction of high rise buildings all around queens and brooklyn. I would also belive that the dataset for the total units are being under counted in the city.