Abstract: I know the assignment asked for three data sets but I am using 6. I wanted to look at specifically rent increase over the past few years in nyc. I would add a 6th data set and that is cpi or use the inflation data if rent is not a part of the cpi calculation. The goal that I am trying to accomplish in this project is artificial inflation in the rent prices. In the past three years, nyc has experienced a population decrease. So why has rent increased if the basic principals of supply and demand are to take effect and rent prices are supposed to drop as we seen in the pandemic. Is there market manipulation happening here in the rent prices? My theory is that there is an extreme amount of market manipulation. I cannot form a conclusion in my analysis in this project because this could take months upon months of research. I will however try to get an idea and form a hypothesis at the end of analysis here. Over the past couple of years, nyc has seen an increase of new residential buildings being built, especially empty and vacated land area. So I will ask the question again, why is it that rent increases when there is a decrease in population in nyc residents that is including legal and illegal residents, along with an increase in apartment units in this city if the laws of supply and demand are to be applied. Are we in a fair and free market? Let’s find out.

Part 1: Reading the data

median_income_nyc_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/Median%20individual%20income%20in%20New%20York%20City.csv"
median_nyc_income <- read.csv(median_income_nyc_url)
head(median_nyc_income)
##     Entity.DCID Entity.properties.isoCode Entity.properties.name
## 1 geoId/3651000                        NA          New York City
## 2 geoId/3651000                        NA          New York City
## 3 geoId/3651000                        NA          New York City
## 4 geoId/3651000                        NA          New York City
## 5 geoId/3651000                        NA          New York City
## 6 geoId/3651000                        NA          New York City
##          Variable.DCID Variable.observation.date
## 1 Median_Income_Person                      2011
## 2 Median_Income_Person                      2012
## 3 Median_Income_Person                      2013
## 4 Median_Income_Person                      2014
## 5 Median_Income_Person                      2015
## 6 Median_Income_Person                      2016
##   Variable.observation.metadata.importName
## 1                     CensusACS5YearSurvey
## 2                     CensusACS5YearSurvey
## 3                     CensusACS5YearSurvey
## 4                     CensusACS5YearSurvey
## 5                     CensusACS5YearSurvey
## 6                     CensusACS5YearSurvey
##                          Variable.observation.metadata.provenanceUrl
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
##   Variable.observation.metadata.scalingFactor
## 1                                          NA
## 2                                          NA
## 3                                          NA
## 4                                          NA
## 5                                          NA
## 6                                          NA
##   Variable.observation.metadata.unit
## 1                           USDollar
## 2                           USDollar
## 3                           USDollar
## 4                           USDollar
## 5                           USDollar
## 6                           USDollar
##   Variable.observation.metadata.unitDisplayName Variable.observation.value
## 1                                           USD                      27252
## 2                                           USD                      27422
## 3                                           USD                      27486
## 4                                           USD                      27683
## 5                                           USD                      28043
## 6                                           USD                      29187
##        Variable.properties.name
## 1 Median Income of a Population
## 2 Median Income of a Population
## 3 Median Income of a Population
## 4 Median Income of a Population
## 5 Median Income of a Population
## 6 Median Income of a Population
nyc_housing_count_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/myc_housing_count.csv"
nyc_housing_count_data <-read.csv(nyc_housing_count_url)
head(nyc_housing_count_data)
##   placeDcid placeName Date.Count_HousingUnit Value.Count_HousingUnit
## 1 zip/10001     10001                   2011                   12161
## 2 zip/10001     10001                   2012                   12473
## 3 zip/10001     10001                   2013                   12617
## 4 zip/10001     10001                   2014                   12638
## 5 zip/10001     10001                   2015                   13305
## 6 zip/10001     10001                   2016                   13520
##                                             Source.Count_HousingUnit
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
nyc_population_2020_2023_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_population_2020-2023.csv"
nyc_population_2020_2023_data <- read.csv(nyc_population_2020_2023_url)
head(nyc_population_2020_2023_data)
##             Geographic.Area April.1..2020.Estimates.Base  X2020   X2021   X2022
## 1   Adams village, New York                        1,694  1,688   1,698   1,686
## 2 Addison village, New York                        1,608  1,604   1,589   1,576
## 3   Afton village, New York                          801    800     793     792
## 4 Airmont village, New York                       10,135 10,065  10,133  10,148
## 5   Akron village, New York                        2,912  2,909   2,903   2,893
## 6     Albany city, New York                       99,239 98,682 100,710 100,884
##     X2023
## 1   1,678
## 2   1,564
## 3     786
## 4  10,163
## 5   2,883
## 6 101,228
nyc_population_count_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_population_count.csv"
nyc_population_count_data <- read.csv(nyc_population_count_url)
head(nyc_population_count_data)
##   placeDcid placeName Date.Count_Person Value.Count_Person
## 1 zip/10001     10001              2011              21097
## 2 zip/10001     10001              2012              20579
## 3 zip/10001     10001              2013              21966
## 4 zip/10001     10001              2014              22767
## 5 zip/10001     10001              2015              23537
## 6 zip/10001     10001              2016              23332
##                                                  Source.Count_Person
## 1 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 2 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 3 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 4 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 5 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
## 6 https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html
median_rent_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/Zip_zori_uc_sfrcondomfr_sm_month.csv"
nyc_median_rent_data <- read.csv(median_rent_url)
head(nyc_median_rent_data)
##   RegionID SizeRank RegionName RegionType StateName State     City
## 1    91982        1      77494        zip        TX    TX     Katy
## 2    91940        3      77449        zip        TX    TX     Katy
## 3    62080        4      11368        zip        NY    NY New York
## 4    91733        5      77084        zip        TX    TX  Houston
## 5    93144        6      79936        zip        TX    TX  El Paso
## 6    62093        7      11385        zip        NY    NY New York
##                                   Metro       CountyName X1.31.2015 X2.28.2015
## 1  Houston-The Woodlands-Sugar Land, TX Fort Bend County   1608.145   1612.486
## 2  Houston-The Woodlands-Sugar Land, TX    Harris County   1323.000   1318.618
## 3 New York-Newark-Jersey City, NY-NJ-PA    Queens County         NA         NA
## 4  Houston-The Woodlands-Sugar Land, TX    Harris County   1249.562   1246.504
## 5                           El Paso, TX   El Paso County         NA         NA
## 6 New York-Newark-Jersey City, NY-NJ-PA    Queens County         NA         NA
##   X3.31.2015 X4.30.2015 X5.31.2015 X6.30.2015 X7.31.2015 X8.31.2015 X9.30.2015
## 1   1622.156   1629.322   1635.142   1637.402   1644.772   1641.979   1637.055
## 2   1326.726   1337.962   1348.856   1356.630   1359.682   1365.164   1368.848
## 3         NA         NA         NA         NA         NA         NA         NA
## 4   1261.545   1276.478   1287.994   1291.040   1289.377   1295.035   1299.514
## 5         NA         NA         NA         NA         NA         NA         NA
## 6         NA         NA         NA         NA         NA         NA         NA
##   X10.31.2015 X11.30.2015 X12.31.2015 X1.31.2016 X2.29.2016 X3.31.2016
## 1    1625.837    1618.156    1596.582   1589.528   1576.256   1588.424
## 2    1370.367    1367.053    1363.083   1356.252   1356.841   1365.354
## 3          NA          NA          NA         NA         NA         NA
## 4    1295.799    1292.824    1275.386   1289.026   1283.337   1300.826
## 5          NA          NA          NA         NA         NA         NA
## 6          NA          NA          NA         NA         NA         NA
##   X4.30.2016 X5.31.2016 X6.30.2016 X7.31.2016 X8.31.2016 X9.30.2016 X10.31.2016
## 1   1577.117   1577.575   1573.725   1574.991   1572.879   1561.637    1552.095
## 2   1375.994   1380.825   1372.580   1367.023   1366.093   1367.075    1369.247
## 3         NA         NA         NA         NA         NA         NA          NA
## 4   1292.692   1292.858   1284.914   1289.592   1297.683   1299.804    1298.137
## 5         NA         NA         NA         NA         NA         NA          NA
## 6         NA   2266.140   2292.023   2320.788   2343.339   2337.120    2345.339
##   X11.30.2016 X12.31.2016 X1.31.2017 X2.28.2017 X3.31.2017 X4.30.2017
## 1    1534.646    1520.193   1519.930   1538.621   1556.170   1563.519
## 2    1360.313    1353.082   1346.414   1353.121   1361.972   1370.222
## 3          NA          NA         NA         NA         NA         NA
## 4    1295.004    1298.242   1291.994   1292.831   1298.754   1306.661
## 5          NA          NA         NA         NA         NA         NA
## 6    2320.083    2326.450   2310.855   2313.101   2316.134   2309.757
##   X5.31.2017 X6.30.2017 X7.31.2017 X8.31.2017 X9.30.2017 X10.31.2017
## 1   1570.041   1577.817   1577.463   1566.557   1573.932    1587.188
## 2   1372.884   1376.444   1383.182   1384.290   1390.200    1388.765
## 3         NA         NA         NA         NA         NA          NA
## 4   1314.942   1319.403   1326.232   1328.778   1344.557    1354.943
## 5         NA         NA         NA         NA         NA          NA
## 6   2336.220   2327.477   2349.503   2354.110   2393.023    2363.952
##   X11.30.2017 X12.31.2017 X1.31.2018 X2.28.2018 X3.31.2018 X4.30.2018
## 1    1602.971    1603.653   1613.512   1621.200   1625.153   1622.228
## 2    1396.765    1400.185   1413.150   1419.085   1420.260   1423.940
## 3          NA          NA         NA         NA         NA         NA
## 4    1364.352    1350.103   1347.248   1345.481   1347.513   1357.129
## 5          NA          NA         NA         NA         NA         NA
## 6    2360.901    2336.269   2331.220   2314.654   2317.821   2337.118
##   X5.31.2018 X6.30.2018 X7.31.2018 X8.31.2018 X9.30.2018 X10.31.2018
## 1   1619.678   1619.904   1613.590   1601.379   1586.974    1581.381
## 2   1428.269   1436.775   1435.404   1436.542   1431.220    1428.874
## 3         NA         NA         NA         NA         NA          NA
## 4   1360.322   1368.881   1363.022   1368.367   1370.118    1366.982
## 5         NA         NA         NA         NA         NA          NA
## 6   2352.811   2362.330   2367.660   2376.613   2380.886    2402.277
##   X11.30.2018 X12.31.2018 X1.31.2019 X2.28.2019 X3.31.2019 X4.30.2019
## 1    1586.398    1592.021   1602.337   1607.856   1619.725   1624.987
## 2    1435.280    1432.779   1432.576   1427.195   1435.062   1439.857
## 3          NA          NA         NA         NA         NA         NA
## 4    1376.080    1377.496   1380.147   1366.159   1376.160   1392.823
## 5          NA          NA         NA         NA         NA         NA
## 6    2393.917    2397.149   2374.795   2397.457   2381.838   2398.852
##   X5.31.2019 X6.30.2019 X7.31.2019 X8.31.2019 X9.30.2019 X10.31.2019
## 1   1626.211   1623.705   1623.966   1633.453   1629.237    1628.901
## 2   1449.515   1454.263   1460.957   1455.115   1455.808    1453.984
## 3         NA         NA         NA         NA         NA          NA
## 4   1413.991   1413.849   1419.157   1415.652   1410.156    1400.941
## 5         NA         NA         NA         NA         NA          NA
## 6   2381.123   2440.723   2471.034   2495.005   2476.095    2464.924
##   X11.30.2019 X12.31.2019 X1.31.2020 X2.29.2020 X3.31.2020 X4.30.2020
## 1    1623.093    1626.621   1631.052   1638.967   1644.800   1654.749
## 2    1451.602    1452.760   1458.677   1471.609   1483.701   1485.524
## 3          NA          NA         NA         NA         NA         NA
## 4    1400.391    1405.868   1409.353   1416.134   1426.521   1429.873
## 5          NA          NA         NA         NA         NA         NA
## 6    2488.450    2498.628   2509.859   2515.646   2501.732   2475.083
##   X5.31.2020 X6.30.2020 X7.31.2020 X8.31.2020 X9.30.2020 X10.31.2020
## 1   1647.455   1648.247   1639.324   1649.510   1662.275    1659.069
## 2   1476.320   1477.038   1482.860   1500.223   1509.817    1515.313
## 3         NA         NA         NA         NA         NA          NA
## 4   1435.425   1440.077   1449.173   1453.679   1461.254    1472.444
## 5         NA         NA         NA         NA         NA          NA
## 6   2475.563   2469.365   2475.826   2427.843   2401.526    2368.533
##   X11.30.2020 X12.31.2020 X1.31.2021 X2.28.2021 X3.31.2021 X4.30.2021
## 1    1668.940    1670.079   1690.808   1692.998   1700.745   1717.851
## 2    1523.964    1535.151   1542.668   1547.833   1554.156   1578.005
## 3          NA          NA         NA         NA         NA         NA
## 4    1467.932    1467.848   1460.713   1468.667   1473.275   1488.798
## 5          NA          NA         NA         NA         NA         NA
## 6    2337.224    2305.878   2292.884   2277.712   2282.339   2278.929
##   X5.31.2021 X6.30.2021 X7.31.2021 X8.31.2021 X9.30.2021 X10.31.2021
## 1   1750.980   1795.362   1834.038   1863.494   1878.153    1891.734
## 2   1616.404   1651.964   1693.315   1707.366   1721.099    1731.629
## 3         NA         NA         NA         NA         NA          NA
## 4   1519.331   1551.723   1607.096   1660.267   1706.364    1695.565
## 5   1169.709   1172.574   1172.414   1178.988   1200.197    1220.085
## 6   2322.021   2368.377   2420.586   2449.848   2482.230    2518.714
##   X11.30.2021 X12.31.2021 X1.31.2022 X2.28.2022 X3.31.2022 X4.30.2022
## 1    1908.323    1918.317   1924.789   1929.261   1938.351   1960.820
## 2    1730.442    1747.786   1744.426   1779.906   1794.948   1806.075
## 3          NA          NA         NA         NA         NA         NA
## 4    1673.322    1675.055   1696.101   1709.500   1708.509   1719.433
## 5    1239.429    1242.151   1242.191   1242.710   1266.113   1275.779
## 6    2556.784    2580.724   2592.727   2591.473   2607.010   2657.466
##   X5.31.2022 X6.30.2022 X7.31.2022 X8.31.2022 X9.30.2022 X10.31.2022
## 1   1985.426   2011.009   2028.075   2031.906   2031.043    2026.256
## 2   1789.815   1785.972   1825.483   1875.844   1877.390    1863.603
## 3         NA         NA         NA         NA         NA          NA
## 4   1746.314   1768.846   1782.872   1769.804   1763.618    1749.751
## 5   1284.918   1301.894   1340.722   1369.575   1384.930    1369.386
## 6   2727.357   2830.789   2893.427   2951.086   2957.775    2966.494
##   X11.30.2022 X12.31.2022 X1.31.2023 X2.28.2023 X3.31.2023 X4.30.2023
## 1    2007.671    2003.009   2016.930   2027.761   2040.195   2021.600
## 2    1837.675    1844.636   1837.186   1840.907   1854.599   1870.026
## 3          NA          NA         NA         NA         NA         NA
## 4    1753.449    1746.937   1755.166   1752.060   1764.965   1767.850
## 5    1378.816    1383.020   1417.562   1431.443   1431.708   1425.782
## 6    2937.695    2898.732   2851.205   2847.664   2865.429   2908.589
##   X5.31.2023 X6.30.2023 X7.31.2023 X8.31.2023 X9.30.2023 X10.31.2023
## 1   2031.902   2039.850   2050.557   2056.771   2056.602    2053.728
## 2   1876.601   1884.102   1880.272   1891.205   1889.575    1880.798
## 3         NA         NA         NA         NA         NA          NA
## 4   1784.556   1797.168   1797.136   1802.712   1799.171    1828.528
## 5   1430.076   1429.599   1450.118   1452.019   1456.963    1459.902
## 6   2950.697   2986.174   3004.259   3021.107   3039.225    3042.726
##   X11.30.2023 X12.31.2023 X1.31.2024 X2.29.2024 X3.31.2024 X4.30.2024
## 1    2031.526    2017.766   2006.277   2026.201   2039.152   2064.044
## 2    1875.237    1883.844   1897.293   1909.784   1919.376   1929.857
## 3          NA          NA         NA         NA         NA         NA
## 4    1825.755    1830.712   1814.003   1820.641   1822.235   1846.918
## 5    1460.857    1478.183   1479.396   1479.012   1471.360   1469.008
## 6    3022.294    2985.509   2976.598   2971.280   3022.816   3032.851
##   X5.31.2024 X6.30.2024 X7.31.2024 X8.31.2024
## 1   2067.249   2082.409   2086.030   2096.459
## 2   1924.842   1925.223   1937.730   1949.967
## 3         NA         NA   2322.418   2282.500
## 4   1859.796   1866.253   1859.417   1855.383
## 5   1484.655   1501.267   1514.330   1504.006
## 6   3102.032   3132.088   3203.117   3199.751
nyc_zip_code_url <- "https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_zipcodes.csv"
nyc_zip_code_list <- read.csv(nyc_zip_code_url, header = FALSE)
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on
## 'https://raw.githubusercontent.com/Ahmed572/Data607_HW_Assignments_Ahmed_Hassan/refs/heads/master/data/nyc_data/nyc_zipcodes.csv'
nyc_zipcodes_list <- as.list(as.data.frame(t(nyc_zip_code_list)))

#print(nyc_zipcodes_list)

Filtering columns/Renaming columns

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
colnames(median_nyc_income)
##  [1] "Entity.DCID"                                  
##  [2] "Entity.properties.isoCode"                    
##  [3] "Entity.properties.name"                       
##  [4] "Variable.DCID"                                
##  [5] "Variable.observation.date"                    
##  [6] "Variable.observation.metadata.importName"     
##  [7] "Variable.observation.metadata.provenanceUrl"  
##  [8] "Variable.observation.metadata.scalingFactor"  
##  [9] "Variable.observation.metadata.unit"           
## [10] "Variable.observation.metadata.unitDisplayName"
## [11] "Variable.observation.value"                   
## [12] "Variable.properties.name"
median_nyc_income <- median_nyc_income %>%
  select("Variable.observation.date", "Variable.observation.value")

median_nyc_income <- median_nyc_income %>%
  rename(
    year = Variable.observation.date,
    nyc_median_income = Variable.observation.value
  )

#print(median_nyc_income)



colnames(nyc_housing_count_data)
## [1] "placeDcid"                "placeName"               
## [3] "Date.Count_HousingUnit"   "Value.Count_HousingUnit" 
## [5] "Source.Count_HousingUnit"
nyc_housing_count_data <- nyc_housing_count_data %>%
  select("placeName", "Date.Count_HousingUnit", "Value.Count_HousingUnit")
nyc_housing_count_data <- nyc_housing_count_data %>%
  rename(
    zipcode = placeName,
    year = Date.Count_HousingUnit,
    total_units = Value.Count_HousingUnit
  )
head(nyc_housing_count_data)
##   zipcode year total_units
## 1   10001 2011       12161
## 2   10001 2012       12473
## 3   10001 2013       12617
## 4   10001 2014       12638
## 5   10001 2015       13305
## 6   10001 2016       13520
#need only one row from here and that is region = New York city, New York
colnames(nyc_population_2020_2023_data)
## [1] "Geographic.Area"              "April.1..2020.Estimates.Base"
## [3] "X2020"                        "X2021"                       
## [5] "X2022"                        "X2023"
nyc_population_2020_2023_data <- nyc_population_2020_2023_data %>%
  select("Geographic.Area", "X2020", "X2021", "X2022", "X2023")

nyc_population_2020_2023_data <- nyc_population_2020_2023_data %>%
  rename(
    region = Geographic.Area,
    '2020' = X2020,
    '2021' = X2021,
    '2022' = X2022,
    '2023' = X2023 
  )
#print(nyc_population_2020_2023_data)


colnames(nyc_population_count_data)
## [1] "placeDcid"           "placeName"           "Date.Count_Person"  
## [4] "Value.Count_Person"  "Source.Count_Person"
nyc_population_count_data <- nyc_population_count_data %>%
  select("placeName", "Date.Count_Person", "Value.Count_Person")

nyc_population_count_data <- nyc_population_count_data %>%
  rename(
    zipcode = placeName, 
    year = Date.Count_Person, 
    population_total = Value.Count_Person
  )

head(nyc_population_count_data)
##   zipcode year population_total
## 1   10001 2011            21097
## 2   10001 2012            20579
## 3   10001 2013            21966
## 4   10001 2014            22767
## 5   10001 2015            23537
## 6   10001 2016            23332

Transforming and merging nyc population

library(tidyr)


nyc_population_2020_2023_data_filtered <- nyc_population_2020_2023_data %>%
  filter(region == "New York city, New York")

#print(nyc_population_2020_2023_data_filtered)


nyc_population2020_2023_data <- nyc_population_2020_2023_data_filtered %>%
  pivot_longer(
    cols = `2020`:`2023`, 
    names_to = "year",    
    values_to = "total_population"  
  )

#there is no need for the region column 
nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
  select("year", "total_population")


nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
  mutate(year = as.integer(year))

#print(nyc_population2020_2023_data)


nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
  mutate(total_population = gsub("[^0-9]", "", total_population))

nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
  mutate(total_population = as.integer(total_population))


nyc_population2020_2023_data <- nyc_population2020_2023_data %>%
  filter(year %in% c(2021, 2022, 2023))


#print(nyc_population2020_2023_data)





total_population_by_year <- nyc_population_count_data %>%
  group_by(year) %>%
  summarize(total_population = sum(population_total, na.rm = TRUE))


total_population_by_year <- total_population_by_year %>%
  mutate(year = as.integer(year))


total_population_by_year_filtered <- total_population_by_year %>%
  filter(!year %in% c(2021, 2022))

#print(total_population_by_year_filtered)

merged_population_data <- bind_rows(total_population_by_year_filtered, nyc_population2020_2023_data)

head(merged_population_data)
## # A tibble: 6 Ă— 2
##    year total_population
##   <int>            <int>
## 1  2011          8122397
## 2  2012          8192973
## 3  2013          8262986
## 4  2014          8348866
## 5  2015          8420477
## 6  2016          8455824

graphing the total population by year from 2011 - 2023

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
ggplot(merged_population_data, aes(x = year, y = total_population)) +
  geom_line() +
  geom_point() +  
  scale_x_continuous(breaks = merged_population_data$year) +
  scale_y_continuous(breaks = seq(8000000, 8700000, by = 100000)) +  
  labs(
    title = "NYC Total Population 2011 - 2023",
    x = "Year",
    y = "Total Population"
  ) 

Let me explain why I dropped the years 2021 - 2022 from the total_population_by_year df. The total population did not make sense nor was it consistent with the actual data and what was happening during that time. This was during the pandemic and people were leaving the city to either go upstate, back to other states where they were originally from to work remotely, or completley moved to other states where the cost of living was cheaper. During this time, landlords were struggling to find tenants as rent prices did drop and businesses were struggling to find employees. This was due to the laws of supply and demand. There was a shock to the renter’s market where it couldnt handle the shock of all these people leaving at once. The other dataframe that had the population from 2020 - 2023 was more accurate and consistent with other data i found online. That is why I did the filtering and merging of the two datasets the way I did. I would like to find data dating all the way back to the 1900’s for nyc population, but I found it was difficult to find this data.

ggplot(median_nyc_income, aes(x = year, y = nyc_median_income)) +
  geom_line() +
  geom_point() +  
  scale_x_continuous(breaks = median_nyc_income$year) +
  labs(
    title = "NYC median income 2011 - 2022",
    x = "Year",
    y = "Median Income"
  ) 

#print(median_nyc_income)

I do not like this dataset because it does not capture the upper middle class income that is 80k and above. Especially the 6 figure incomes that I am aware are indeed outliers on the national level as well as in nyc. However, the oppurtunity of these kinds of income jobs are available here in nyc than any other part of the country that is why I would like to capture that data. Has that played a role in the rent price boom over the years. Especially during the height of the new era of internet tech companies. The rise of 4G made a big difference in our lives at the start of 2010s and completley changed our worlds. We can see the fast rise in income from the start at 2013 until now. However, I wanted to capture a dataset that had information for the years prior to 2011. Perhaps maybe all the way back to 1980s. That was difficult for me to find as well as government sites are not the friendliest to naviagate. Has the rise of the median household income a direct result of the minimum age rise as well under the DeBlasio administration? Or the big rise in tech jobs all across the city as the demand for tech jobs were hot and the recruitment firms for tech jobs made a killing on their revenue placing these tech workers. I can say that the rise in 2018 to 2020 was when the job market was hot and there was a lot of hiring during that time. This was due to the Tax cuts and jobs Act that lowered personal income tax, and corporate income tax. There was also regulation in doing business overseas in the financial sector, therefore a lot of dollars were moved here into the USA. We also so a second wave boom during the pandemic when it was the employees market vs the employor market. The employee had more negotiating power than the employer, and employers were having a hard time hiring. People were switiching jobs for more pay (including myself) during the pandemic. Companies were willing to compensate employees more not just give a minimum wage rate.

So comparing this chart to the population chart, we see the rise in wages but a decrease in population. Lets now start working with the suppy of housing data and do a comparison analysis.

grouping the data by year

nyc_housing_count_data <- nyc_housing_count_data %>%
  group_by(year) %>%
  summarize(total_units = sum(total_units, na.rm = TRUE))


head(nyc_housing_count_data)
## # A tibble: 6 Ă— 2
##    year total_units
##   <int>       <int>
## 1  2011     3354535
## 2  2012     3369168
## 3  2013     3378223
## 4  2014     3405660
## 5  2015     3419891
## 6  2016     3433873
ggplot(nyc_housing_count_data, aes(x = year, y = total_units)) +
  geom_line() +
  geom_point() +  
  scale_x_continuous(breaks = median_nyc_income$year) +
  labs(
    title = "NYC Total Units 2011 - 2022",
    x = "Year",
    y = "Total Units"
  ) 

So far we see an increase in units, increase in wages, but a decrease in population.

Lets take a look at the median rent data. This one will be a bit tricky to work with as we are given the median rent per zipcode

colnames(nyc_median_rent_data)
##   [1] "RegionID"    "SizeRank"    "RegionName"  "RegionType"  "StateName"  
##   [6] "State"       "City"        "Metro"       "CountyName"  "X1.31.2015" 
##  [11] "X2.28.2015"  "X3.31.2015"  "X4.30.2015"  "X5.31.2015"  "X6.30.2015" 
##  [16] "X7.31.2015"  "X8.31.2015"  "X9.30.2015"  "X10.31.2015" "X11.30.2015"
##  [21] "X12.31.2015" "X1.31.2016"  "X2.29.2016"  "X3.31.2016"  "X4.30.2016" 
##  [26] "X5.31.2016"  "X6.30.2016"  "X7.31.2016"  "X8.31.2016"  "X9.30.2016" 
##  [31] "X10.31.2016" "X11.30.2016" "X12.31.2016" "X1.31.2017"  "X2.28.2017" 
##  [36] "X3.31.2017"  "X4.30.2017"  "X5.31.2017"  "X6.30.2017"  "X7.31.2017" 
##  [41] "X8.31.2017"  "X9.30.2017"  "X10.31.2017" "X11.30.2017" "X12.31.2017"
##  [46] "X1.31.2018"  "X2.28.2018"  "X3.31.2018"  "X4.30.2018"  "X5.31.2018" 
##  [51] "X6.30.2018"  "X7.31.2018"  "X8.31.2018"  "X9.30.2018"  "X10.31.2018"
##  [56] "X11.30.2018" "X12.31.2018" "X1.31.2019"  "X2.28.2019"  "X3.31.2019" 
##  [61] "X4.30.2019"  "X5.31.2019"  "X6.30.2019"  "X7.31.2019"  "X8.31.2019" 
##  [66] "X9.30.2019"  "X10.31.2019" "X11.30.2019" "X12.31.2019" "X1.31.2020" 
##  [71] "X2.29.2020"  "X3.31.2020"  "X4.30.2020"  "X5.31.2020"  "X6.30.2020" 
##  [76] "X7.31.2020"  "X8.31.2020"  "X9.30.2020"  "X10.31.2020" "X11.30.2020"
##  [81] "X12.31.2020" "X1.31.2021"  "X2.28.2021"  "X3.31.2021"  "X4.30.2021" 
##  [86] "X5.31.2021"  "X6.30.2021"  "X7.31.2021"  "X8.31.2021"  "X9.30.2021" 
##  [91] "X10.31.2021" "X11.30.2021" "X12.31.2021" "X1.31.2022"  "X2.28.2022" 
##  [96] "X3.31.2022"  "X4.30.2022"  "X5.31.2022"  "X6.30.2022"  "X7.31.2022" 
## [101] "X8.31.2022"  "X9.30.2022"  "X10.31.2022" "X11.30.2022" "X12.31.2022"
## [106] "X1.31.2023"  "X2.28.2023"  "X3.31.2023"  "X4.30.2023"  "X5.31.2023" 
## [111] "X6.30.2023"  "X7.31.2023"  "X8.31.2023"  "X9.30.2023"  "X10.31.2023"
## [116] "X11.30.2023" "X12.31.2023" "X1.31.2024"  "X2.29.2024"  "X3.31.2024" 
## [121] "X4.30.2024"  "X5.31.2024"  "X6.30.2024"  "X7.31.2024"  "X8.31.2024"
nyc_zipcodes_vector <- unlist(nyc_zipcodes_list)

filtered_rent_data <- nyc_median_rent_data %>%
  filter(RegionName %in% nyc_zipcodes_vector)

head(filtered_rent_data)
##   RegionID SizeRank RegionName RegionType StateName State     City
## 1    62080        4      11368        zip        NY    NY New York
## 2    62093        7      11385        zip        NY    NY New York
## 3    62019        9      11208        zip        NY    NY New York
## 4    62046       16      11236        zip        NY    NY New York
## 5    61807       17      10467        zip        NY    NY New York
## 6    62085       18      11373        zip        NY    NY New York
##                                   Metro    CountyName X1.31.2015 X2.28.2015
## 1 New York-Newark-Jersey City, NY-NJ-PA Queens County         NA         NA
## 2 New York-Newark-Jersey City, NY-NJ-PA Queens County         NA         NA
## 3 New York-Newark-Jersey City, NY-NJ-PA  Kings County         NA         NA
## 4 New York-Newark-Jersey City, NY-NJ-PA  Kings County         NA         NA
## 5 New York-Newark-Jersey City, NY-NJ-PA  Bronx County         NA         NA
## 6 New York-Newark-Jersey City, NY-NJ-PA Queens County         NA         NA
##   X3.31.2015 X4.30.2015 X5.31.2015 X6.30.2015 X7.31.2015 X8.31.2015 X9.30.2015
## 1         NA         NA         NA         NA         NA         NA         NA
## 2         NA         NA         NA         NA         NA         NA         NA
## 3         NA         NA         NA         NA         NA         NA         NA
## 4         NA         NA         NA         NA         NA         NA         NA
## 5         NA         NA         NA         NA         NA         NA         NA
## 6         NA         NA         NA         NA         NA         NA         NA
##   X10.31.2015 X11.30.2015 X12.31.2015 X1.31.2016 X2.29.2016 X3.31.2016
## 1          NA          NA          NA         NA         NA         NA
## 2          NA          NA          NA         NA         NA         NA
## 3          NA          NA          NA         NA         NA         NA
## 4          NA          NA          NA         NA         NA         NA
## 5          NA          NA          NA         NA         NA         NA
## 6          NA          NA          NA         NA         NA         NA
##   X4.30.2016 X5.31.2016 X6.30.2016 X7.31.2016 X8.31.2016 X9.30.2016 X10.31.2016
## 1         NA         NA         NA         NA         NA         NA          NA
## 2         NA   2266.140   2292.023   2320.788   2343.339    2337.12    2345.339
## 3         NA         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA         NA          NA
## 5         NA   1349.679   1327.656   1351.738   1313.098    1331.15    1317.193
## 6         NA         NA         NA         NA         NA         NA          NA
##   X11.30.2016 X12.31.2016 X1.31.2017 X2.28.2017 X3.31.2017 X4.30.2017
## 1          NA          NA         NA         NA         NA         NA
## 2    2320.083    2326.450   2310.855   2313.101   2316.134   2309.757
## 3          NA          NA         NA         NA         NA         NA
## 4          NA          NA         NA         NA         NA         NA
## 5    1307.569    1294.087   1354.713   1381.230   1443.584   1427.754
## 6          NA          NA         NA         NA         NA         NA
##   X5.31.2017 X6.30.2017 X7.31.2017 X8.31.2017 X9.30.2017 X10.31.2017
## 1         NA         NA         NA         NA         NA          NA
## 2    2336.22   2327.477   2349.503   2354.110   2393.023    2363.952
## 3         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA          NA
## 5         NA   1419.295   1427.326   1391.902   1388.167    1402.915
## 6         NA         NA         NA         NA         NA          NA
##   X11.30.2017 X12.31.2017 X1.31.2018 X2.28.2018 X3.31.2018 X4.30.2018
## 1          NA          NA         NA         NA         NA         NA
## 2    2360.901    2336.269   2331.220   2314.654   2317.821   2337.118
## 3          NA          NA         NA         NA         NA         NA
## 4          NA          NA         NA         NA         NA         NA
## 5    1443.141    1448.317   1415.071   1447.512   1475.749   1550.592
## 6          NA          NA         NA         NA         NA         NA
##   X5.31.2018 X6.30.2018 X7.31.2018 X8.31.2018 X9.30.2018 X10.31.2018
## 1         NA         NA         NA         NA         NA          NA
## 2   2352.811   2362.330   2367.660   2376.613   2380.886    2402.277
## 3         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA          NA
## 5   1557.876   1554.092   1551.874   1552.840   1580.368    1576.065
## 6         NA         NA         NA         NA         NA          NA
##   X11.30.2018 X12.31.2018 X1.31.2019 X2.28.2019 X3.31.2019 X4.30.2019
## 1          NA          NA         NA         NA         NA         NA
## 2    2393.917    2397.149   2374.795   2397.457   2381.838   2398.852
## 3          NA          NA         NA         NA         NA         NA
## 4          NA          NA         NA         NA         NA         NA
## 5    1554.715    1593.641   1588.411   1600.744   1578.046   1572.738
## 6          NA          NA         NA         NA         NA         NA
##   X5.31.2019 X6.30.2019 X7.31.2019 X8.31.2019 X9.30.2019 X10.31.2019
## 1         NA         NA         NA         NA         NA          NA
## 2   2381.123   2440.723   2471.034   2495.005   2476.095    2464.924
## 3         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA          NA
## 5   1587.081   1568.751   1613.196   1620.662   1685.971    1699.779
## 6         NA         NA         NA         NA         NA          NA
##   X11.30.2019 X12.31.2019 X1.31.2020 X2.29.2020 X3.31.2020 X4.30.2020
## 1          NA          NA         NA         NA         NA         NA
## 2     2488.45    2498.628   2509.859   2515.646   2501.732   2475.083
## 3          NA          NA         NA         NA         NA         NA
## 4          NA          NA         NA         NA         NA         NA
## 5     1695.05    1669.128   1649.561   1665.102   1668.471   1653.317
## 6          NA          NA         NA         NA         NA         NA
##   X5.31.2020 X6.30.2020 X7.31.2020 X8.31.2020 X9.30.2020 X10.31.2020
## 1         NA         NA         NA         NA         NA          NA
## 2   2475.563   2469.365   2475.826   2427.843   2401.526    2368.533
## 3         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA          NA
## 5   1612.794   1598.750   1655.226   1699.814   1745.707    1724.810
## 6         NA         NA         NA         NA         NA          NA
##   X11.30.2020 X12.31.2020 X1.31.2021 X2.28.2021 X3.31.2021 X4.30.2021
## 1          NA          NA         NA         NA         NA         NA
## 2    2337.224    2305.878   2292.884   2277.712   2282.339   2278.929
## 3          NA          NA         NA         NA         NA         NA
## 4          NA          NA         NA         NA         NA         NA
## 5    1774.450    1725.417   1744.514   1701.264   1737.235   1690.341
## 6          NA          NA         NA         NA         NA   1872.391
##   X5.31.2021 X6.30.2021 X7.31.2021 X8.31.2021 X9.30.2021 X10.31.2021
## 1         NA         NA         NA         NA         NA          NA
## 2   2322.021   2368.377   2420.586   2449.848   2482.230    2518.714
## 3         NA         NA         NA         NA         NA          NA
## 4         NA         NA         NA         NA         NA          NA
## 5   1715.512   1725.934   1751.695   1742.938   1772.303    1807.374
## 6   1875.980   1908.656   1930.109   1949.615   1966.494    1910.506
##   X11.30.2021 X12.31.2021 X1.31.2022 X2.28.2022 X3.31.2022 X4.30.2022
## 1          NA          NA         NA         NA         NA         NA
## 2    2556.784    2580.724   2592.727   2591.473   2607.010   2657.466
## 3          NA          NA         NA         NA   2432.634   2468.666
## 4          NA          NA         NA         NA         NA         NA
## 5    1812.226    1747.475   1728.976   1785.026   1831.988   1886.640
## 6    1934.053    1923.586   1927.948   1969.994   2023.199   2100.825
##   X5.31.2022 X6.30.2022 X7.31.2022 X8.31.2022 X9.30.2022 X10.31.2022
## 1         NA         NA         NA         NA         NA          NA
## 2   2727.357   2830.789   2893.427   2951.086   2957.775    2966.494
## 3   2553.099   2571.974   2541.777   2549.192   2533.571    2596.307
## 4   2328.869   2376.032   2347.321   2400.155   2397.660    2524.344
## 5   1881.483   1904.170   1937.096   1944.353   1931.878    1893.956
## 6   2107.368   2138.288   2166.511   2151.894   2144.004    2141.909
##   X11.30.2022 X12.31.2022 X1.31.2023 X2.28.2023 X3.31.2023 X4.30.2023
## 1          NA          NA         NA         NA         NA         NA
## 2    2937.695    2898.732   2851.205   2847.664   2865.429   2908.589
## 3    2624.750    2660.759   2683.873   2685.560   2657.591   2672.800
## 4    2502.490    2582.495   2432.282   2429.336   2391.820   2451.315
## 5    1891.447    1899.311   1929.648   1945.849   1958.650   1909.791
## 6    2163.435    2181.641   2213.842   2226.211   2214.526   2217.911
##   X5.31.2023 X6.30.2023 X7.31.2023 X8.31.2023 X9.30.2023 X10.31.2023
## 1         NA         NA         NA         NA         NA          NA
## 2   2950.697   2986.174   3004.259   3021.107   3039.225    3042.726
## 3   2699.051   2825.592   2817.253   2797.145   2773.578    2815.385
## 4   2501.957   2598.722   2642.102   2672.396   2715.256    2700.885
## 5   1963.970   2049.137   2084.984   2100.269   2182.744    2238.053
## 6   2218.745   2253.525   2261.120   2295.725   2275.544    2302.135
##   X11.30.2023 X12.31.2023 X1.31.2024 X2.29.2024 X3.31.2024 X4.30.2024
## 1          NA          NA         NA         NA         NA         NA
## 2    3022.294    2985.509   2976.598   2971.280   3022.816   3032.851
## 3    2934.593    2977.254   3015.684   2987.130   2963.105   2895.445
## 4    2743.458    2753.950   2848.811   2899.215   2977.176   2935.365
## 5    2317.342    2243.914   2269.451   2233.032   2244.841   2303.026
## 6    2293.275    2290.069   2283.986   2273.511   2281.456   2302.458
##   X5.31.2024 X6.30.2024 X7.31.2024 X8.31.2024
## 1         NA         NA   2322.418   2282.500
## 2   3102.032   3132.088   3203.117   3199.751
## 3   2817.406   2858.719   2816.037   2895.667
## 4   2959.498   2847.246   2862.954   2976.389
## 5   2282.180   2309.057   2294.031   2271.611
## 6   2377.143   2439.551   2491.626   2476.333
#creating a new df and taking the median of each date column 

date_columns <- colnames(filtered_rent_data)[grepl("^X", colnames(filtered_rent_data))]

average_rent_nyc_df <- data.frame(
  Date = gsub("^X", "", date_columns),  # Remove the "X" from the column names to get the date
  Median = sapply(filtered_rent_data[, date_columns], median, na.rm = TRUE)  # Compute the median for each date column
)
average_rent_nyc_df$Mean <- sapply(filtered_rent_data[, date_columns], mean, na.rm = TRUE)


average_rent_nyc_df$Date <- as.Date(average_rent_nyc_df$Date, format = "%m.%d.%Y")

head(average_rent_nyc_df)
##                  Date   Median     Mean
## X1.31.2015 2015-01-31 2732.498 2727.223
## X2.28.2015 2015-02-28 2706.051 2726.790
## X3.31.2015 2015-03-31 2662.657 2722.277
## X4.30.2015 2015-04-30 2664.783 2725.026
## X5.31.2015 2015-05-31 2694.550 2733.493
## X6.30.2015 2015-06-30 2723.866 2739.232
ggplot(average_rent_nyc_df, aes(x = Date, y = Median)) +
  geom_line() +  # Line plot to connect the points
  geom_point() +  # Add points at each date
  labs(title = "Median Rent in NYC Over Time",
       x = "Date",
       y = "Median Rent")

ggplot(average_rent_nyc_df, aes(x = Date, y = Mean)) +
  geom_line() +  # Line plot to connect the points
  geom_point() +  # Add points at each date
  labs(title = "Mean Rent in NYC Over Time",
       x = "Date",
       y = "Mean Rent")

long_average_rent_nyc_df <- pivot_longer(average_rent_nyc_df, 
                               cols = c("Median", "Mean"), 
                               names_to = "Statistic", 
                               values_to = "Value")

ggplot(long_average_rent_nyc_df, aes(x = Statistic, y = Value)) +
  geom_boxplot() +
  labs(title = "Box Plot of Median and Mean Rent Values",
       x = "Statistic",
       y = "Rent Values")

#colnames(filtered_rent_data) <- ifelse(grepl("^X", colnames(filtered_rent_data)), gsub("^X", "", #colnames(filtered_rent_data)), colnames(filtered_rent_data))





#nyc_median_rent_data <- nyc_median_rent_data %>%
#  select("placeName", "Date.Monthly_Median_GrossRent_HousingUnit", "Value.Monthly_Median_GrossRent_HousingUnit")
#nyc_median_rent_data <- nyc_median_rent_data %>%
#  rename(
#    zipcode = placeName,
#    year = Date.Monthly_Median_GrossRent_HousingUnit,
#    median_gross_rent = Value.Monthly_Median_GrossRent_HousingUnit
#  )


#print(nyc_zipcodes_list)

#head(nyc_median_rent_data)
long_rent_data <- pivot_longer(filtered_rent_data, 
                               cols = starts_with("X"), 
                               names_to = "Date", 
                               values_to = "Rent")

long_rent_data$Date <- gsub("^X", "", long_rent_data$Date)

# Convert the Date column to proper date format
long_rent_data$Date <- as.Date(long_rent_data$Date, format = "%m.%d.%Y")


ggplot(long_rent_data, aes(x = Date, y = Rent)) +
  geom_point(color = "blue") + 
  geom_smooth(method = "lm", se = FALSE, color = "black") + 
  labs(title = "Scatterplot of Rent Values Over Time for all zipcodes in nyc",
       x = "Date",
       y = "Rent Values") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6575 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6575 rows containing missing values or values outside the scale range
## (`geom_point()`).

Final Thoughts after doing this data. Since the median rent price in nyc reached over $2800 in nyc and the median income is under $40,000, then more than half the population is homeless. But somehow that is not the case. What my data shows is a broad view of what is going on in the rent prices. Except obviously for my last chart where I decided to graph everything by brute force and see whats been going on per zipcode in all five of the boroughs. What I would like to talk about is my findings here. Despite the popultion decrease and the increase in the supply in units, there is an increase in the mean rent prices on a month by month basis. I want to hypothesize there is some market manipulation going on and I wouldn’t even account inflation as the problem here. The dollar is the strongest currency in the world, despite what many people think about money printing since the pandemic, we actually shrunk our money supply since then. Please check the M2 money supply on the FRED website to confirm.

We did have a lot of people return to nyc and return to the office since the start of 2023. However we still got a decrease in population. I would even argue a decrease in white colar wages, although I would need to add a 7th data set here to confirm. I am making the case for the decrease in wages because I was unemployed for 3 months this year, and I was applying, the jobs postings had a less hourly rate than they did back during the pandemic or even in 2019. If I were to go into more debth for this I would go look into vacant apartment units vs nonvacant in 2022-2024. The reason for this is that I want to confirm what is being told on the news with landlords keeping units vacant on purpose in order to drive up the rent prices. I would also try to look for salaries of white colar workers in nyc to see if they are also driving up the rent prices.

I want to add one final note about something I observed in real life that is not part of the data. Over the past 5 years I have seen many new construction of high rise buildings all around queens and brooklyn. I would also belive that the dataset for the total units are being under counted in the city.