This dataset from Zillow seemed interesting to me as I can see very interesting features. Also, the data has good depth. I think it shows us how the numbers differ before and after 2008 state-wise. I downloaded int from the Zillow researd data.
I tried to do some illustrative data visualizations with short analysis. I have also performed Linear regression on the New york House prices.
library(dplyr)
library(ggplot2)
library(reshape)## Warning: package 'reshape' was built under R version 4.1.2
library(wesanderson)## Warning: package 'wesanderson' was built under R version 4.1.2
library(stringr)## Warning: package 'stringr' was built under R version 4.1.2
library(doBy)## Warning: package 'doBy' was built under R version 4.1.2
library(plotly)## Warning: package 'plotly' was built under R version 4.1.2
library(corrplot)## Warning: package 'corrplot' was built under R version 4.1.2
library(wesanderson)
library(RColorBrewer)
library(gridExtra)## Warning: package 'gridExtra' was built under R version 4.1.2
data=read.csv("C:/Users/neeya/Downloads/State_time_series.csv")
sprintf("The data set has %d rows and %d columns", nrow(data), ncol(data) )## [1] "The data set has 13212 rows and 82 columns"
The data set has 13212 rows and 82 columns. We will get an overview of the data set using the str, and summary.
We find :
str(data)## 'data.frame': 13212 obs. of 82 variables:
## $ Date : chr "1996-04-30" "1996-04-30" "1996-04-30" "1996-04-30" ...
## $ RegionName : chr "Alabama" "Arizona" "Arkansas" "California" ...
## $ DaysOnZillow_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ InventorySeasonallyAdjusted_AllHomes : int NA NA NA NA NA NA NA NA NA NA ...
## $ InventoryRaw_AllHomes : int NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_1Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_2Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_3Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_4Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_5BedroomOrMore : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_DuplexTriplex : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPricePerSqft_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_1Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_2Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_3Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_4Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_5BedroomOrMore : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_DuplexTriplex : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianListingPrice_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPctOfPriceReduction_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPctOfPriceReduction_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPctOfPriceReduction_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPriceCutDollar_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPriceCutDollar_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianPriceCutDollar_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_1Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_2Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_3Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_4Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_5BedroomOrMore : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_DuplexTriplex : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPricePerSqft_Studio : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_1Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_2Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_3Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_4Bedroom : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_5BedroomOrMore : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_DuplexTriplex : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_MultiFamilyResidence5PlusUnits : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ MedianRentalPrice_Studio : num NA NA NA NA NA NA NA NA NA NA ...
## $ ZHVIPerSqft_AllHomes : int 50 62 42 102 82 85 71 56 55 185 ...
## $ PctOfHomesDecreasingInValues_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfHomesIncreasingInValues_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfHomesSellingForGain_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfHomesSellingForLoss_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductionsSeasAdj_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductionsSeasAdj_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence: num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductions_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductions_CondoCoop : num NA NA NA NA NA NA NA NA NA NA ...
## $ PctOfListingsWithPriceReductions_SingleFamilyResidence : num NA NA NA NA NA NA NA NA NA NA ...
## $ PriceToRentRatio_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ Sale_Counts : num NA NA NA NA NA NA NA NA NA NA ...
## $ Sale_Counts_Seas_Adj : num NA NA NA NA NA NA NA NA NA NA ...
## $ Sale_Prices : num NA NA NA NA NA NA NA NA NA NA ...
## $ ZHVI_1bedroom : int 61500 59200 53000 93700 77800 64700 90100 45400 74900 152300 ...
## $ ZHVI_2bedroom : int 48900 86400 54500 123400 97500 97000 88200 65400 64700 186600 ...
## $ ZHVI_3bedroom : int 78200 96100 76800 150900 129000 130400 103500 89100 88000 231800 ...
## $ ZHVI_4bedroom : int 146500 128400 135100 196100 176100 194800 157800 133600 149700 303400 ...
## $ ZHVI_5BedroomOrMore : int 206300 190500 186000 265300 212900 299800 176100 199900 212800 345500 ...
## $ ZHVI_AllHomes : int 79500 103600 64400 157900 128100 132000 106800 86300 92000 227400 ...
## $ ZHVI_BottomTier : int 45600 67100 38400 95100 82700 83700 77200 52500 57200 144500 ...
## $ ZHVI_CondoCoop : int 99500 78900 70300 136100 99400 85000 NA 70600 89300 177000 ...
## $ ZHVI_MiddleTier : int 79500 103600 64400 157900 128100 132000 106800 86300 92000 227400 ...
## $ ZHVI_SingleFamilyResidence : int 79000 107500 64500 162000 133600 141000 107400 92100 92400 262600 ...
## $ ZHVI_TopTier : int 140200 168700 115200 270600 209300 231600 161600 155300 163900 374700 ...
## $ ZRI_AllHomes : int NA NA NA NA NA NA NA NA NA NA ...
## $ ZRI_AllHomesPlusMultifamily : int NA NA NA NA NA NA NA NA NA NA ...
## $ ZriPerSqft_AllHomes : num NA NA NA NA NA NA NA NA NA NA ...
## $ Zri_MultiFamilyResidenceRental : int NA NA NA NA NA NA NA NA NA NA ...
## $ Zri_SingleFamilyResidenceRental : int NA NA NA NA NA NA NA NA NA NA ...
summary(data)## Date RegionName DaysOnZillow_AllHomes
## Length:13212 Length:13212 Min. : 49.25
## Class :character Class :character 1st Qu.: 90.25
## Mode :character Mode :character Median :108.50
## Mean :110.12
## 3rd Qu.:126.75
## Max. :251.62
## NA's :8367
## InventorySeasonallyAdjusted_AllHomes InventoryRaw_AllHomes
## Min. : 972 Min. : 911
## 1st Qu.: 9828 1st Qu.: 9756
## Median : 21713 Median : 21289
## Mean : 33293 Mean : 33299
## 3rd Qu.: 47453 3rd Qu.: 46891
## Max. :260687 Max. :268055
## NA's :8316 NA's :8316
## MedianListingPricePerSqft_1Bedroom MedianListingPricePerSqft_2Bedroom
## Min. : 57.14 Min. : 60.00
## 1st Qu.:125.69 1st Qu.: 92.16
## Median :162.75 Median :121.30
## Mean :182.47 Mean :135.49
## 3rd Qu.:202.63 3rd Qu.:152.24
## Max. :627.55 Max. :550.64
## NA's :9626 NA's :8678
## MedianListingPricePerSqft_3Bedroom MedianListingPricePerSqft_4Bedroom
## Min. : 56.48 Min. : 61.8
## 1st Qu.: 93.28 1st Qu.: 99.6
## Median :116.63 Median :119.8
## Mean :129.45 Mean :133.5
## 3rd Qu.:143.19 3rd Qu.:144.8
## Max. :460.46 Max. :480.8
## NA's :8605 NA's :8535
## MedianListingPricePerSqft_5BedroomOrMore MedianListingPricePerSqft_AllHomes
## Min. : 63.78 Min. : 62.14
## 1st Qu.: 99.08 1st Qu.: 96.01
## Median :119.75 Median :120.58
## Mean :135.67 Mean :136.66
## 3rd Qu.:149.35 3rd Qu.:153.44
## Max. :617.96 Max. :520.72
## NA's :8643 NA's :8538
## MedianListingPricePerSqft_CondoCoop MedianListingPricePerSqft_DuplexTriplex
## Min. : 61.92 Min. : 32.14
## 1st Qu.: 113.08 1st Qu.: 60.89
## Median : 141.00 Median : 81.21
## Mean : 163.55 Mean : 97.09
## 3rd Qu.: 177.07 3rd Qu.:113.52
## Max. :1000.00 Max. :446.43
## NA's :9063 NA's :9248
## MedianListingPricePerSqft_SingleFamilyResidence MedianListingPrice_1Bedroom
## Min. : 63.27 Min. : 49900
## 1st Qu.: 95.55 1st Qu.: 99000
## Median :120.09 Median :130000
## Mean :133.37 Mean :147083
## 3rd Qu.:149.84 3rd Qu.:169900
## Max. :475.36 Max. :399000
## NA's :8573 NA's :10205
## MedianListingPrice_2Bedroom MedianListingPrice_3Bedroom
## Min. : 57000 Min. :109900
## 1st Qu.:109500 1st Qu.:149000
## Median :147000 Median :189900
## Mean :158873 Mean :209226
## 3rd Qu.:179900 3rd Qu.:240000
## Max. :599000 Max. :687000
## NA's :8839 NA's :8842
## MedianListingPrice_4Bedroom MedianListingPrice_5BedroomOrMore
## Min. :169000 Min. : 159900
## 1st Qu.:238745 1st Qu.: 310000
## Median :283020 Median : 369700
## Mean :310850 Mean : 416375
## 3rd Qu.:339900 3rd Qu.: 452173
## Max. :950000 Max. :1847500
## NA's :8876 NA's :8989
## MedianListingPrice_AllHomes MedianListingPrice_CondoCoop
## Min. :112944 Min. : 82500
## 1st Qu.:159900 1st Qu.:152363
## Median :209000 Median :184900
## Mean :223379 Mean :202333
## 3rd Qu.:259900 3rd Qu.:228000
## Max. :610000 Max. :754500
## NA's :8966 NA's :9402
## MedianListingPrice_DuplexTriplex MedianListingPrice_SingleFamilyResidence
## Min. : 64900 Min. :112900
## 1st Qu.:129900 1st Qu.:159900
## Median :178900 Median :209900
## Mean :207475 Mean :228170
## 3rd Qu.:245000 3rd Qu.:265000
## Max. :939000 Max. :725000
## NA's :9323 NA's :9082
## MedianPctOfPriceReduction_AllHomes MedianPctOfPriceReduction_CondoCoop
## Min. :1.744 Min. : 1.676
## 1st Qu.:3.239 1st Qu.: 3.261
## Median :3.718 Median : 3.807
## Mean :3.848 Mean : 4.015
## 3rd Qu.:4.350 3rd Qu.: 4.561
## Max. :8.340 Max. :10.000
## NA's :8724 NA's :9340
## MedianPctOfPriceReduction_SingleFamilyResidence MedianPriceCutDollar_AllHomes
## Min. :1.716 Min. : 5000
## 1st Qu.:3.233 1st Qu.: 5100
## Median :3.737 Median : 7500
## Mean :3.847 Mean : 8034
## 3rd Qu.:4.349 3rd Qu.:10000
## Max. :8.347 Max. :24000
## NA's :8724 NA's :8724
## MedianPriceCutDollar_CondoCoop MedianPriceCutDollar_SingleFamilyResidence
## Min. : 2050 Min. : 5000
## 1st Qu.: 5000 1st Qu.: 5300
## Median : 6800 Median : 7900
## Mean : 7453 Mean : 8245
## 3rd Qu.:10000 3rd Qu.:10000
## Max. :27754 Max. :26000
## NA's :9340 NA's :8724
## MedianRentalPricePerSqft_1Bedroom MedianRentalPricePerSqft_2Bedroom
## Min. :0.716 Min. :0.521
## 1st Qu.:0.992 1st Qu.:0.806
## Median :1.211 Median :0.952
## Mean :1.397 Mean :1.106
## 3rd Qu.:1.606 3rd Qu.:1.262
## Max. :3.369 Max. :3.113
## NA's :9588 NA's :9065
## MedianRentalPricePerSqft_3Bedroom MedianRentalPricePerSqft_4Bedroom
## Min. :0.533 Min. :0.489
## 1st Qu.:0.751 1st Qu.:0.678
## Median :0.863 Median :0.781
## Mean :0.968 Mean :0.847
## 3rd Qu.:1.073 3rd Qu.:0.938
## Max. :2.453 Max. :2.134
## NA's :8985 NA's :9808
## MedianRentalPricePerSqft_5BedroomOrMore MedianRentalPricePerSqft_AllHomes
## Min. :0.363 Min. :0.580
## 1st Qu.:0.629 1st Qu.:0.760
## Median :0.745 Median :0.880
## Mean :0.763 Mean :1.015
## 3rd Qu.:0.891 3rd Qu.:1.125
## Max. :1.267 Max. :3.147
## NA's :11752 NA's :8864
## MedianRentalPricePerSqft_CondoCoop MedianRentalPricePerSqft_DuplexTriplex
## Min. :0.637 Min. :0.499
## 1st Qu.:0.896 1st Qu.:0.740
## Median :1.075 Median :0.871
## Mean :1.297 Mean :1.066
## 3rd Qu.:1.414 3rd Qu.:1.119
## Max. :4.824 Max. :3.862
## NA's :10004 NA's :10293
## MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits
## Min. :0.585
## 1st Qu.:0.837
## Median :1.000
## Mean :1.170
## 3rd Qu.:1.300
## Max. :3.380
## NA's :9189
## MedianRentalPricePerSqft_SingleFamilyResidence MedianRentalPricePerSqft_Studio
## Min. :0.581 Min. :0.565
## 1st Qu.:0.749 1st Qu.:0.739
## Median :0.861 Median :0.853
## Mean :0.939 Mean :1.176
## 3rd Qu.:1.031 3rd Qu.:1.260
## Max. :2.440 Max. :3.982
## NA's :8923 NA's :10875
## MedianRentalPrice_1Bedroom MedianRentalPrice_2Bedroom
## Min. : 495.0 Min. : 575
## 1st Qu.: 650.0 1st Qu.: 775
## Median : 860.0 Median : 925
## Mean : 978.6 Mean :1098
## 3rd Qu.:1195.0 3rd Qu.:1279
## Max. :2690.0 Max. :3215
## NA's :9686 NA's :9168
## MedianRentalPrice_3Bedroom MedianRentalPrice_4Bedroom
## Min. : 750 Min. : 950
## 1st Qu.:1050 1st Qu.:1395
## Median :1200 Median :1595
## Mean :1357 Mean :1722
## 3rd Qu.:1515 3rd Qu.:1950
## Max. :3550 Max. :3850
## NA's :9075 NA's :9856
## MedianRentalPrice_5BedroomOrMore MedianRentalPrice_AllHomes
## Min. : 795 Min. : 750
## 1st Qu.:1750 1st Qu.:1050
## Median :1995 Median :1200
## Mean :2139 Mean :1362
## 3rd Qu.:2495 3rd Qu.:1590
## Max. :4500 Max. :3600
## NA's :11994 NA's :9060
## MedianRentalPrice_CondoCoop MedianRentalPrice_DuplexTriplex
## Min. : 697.5 Min. : 500.0
## 1st Qu.:1050.0 1st Qu.: 685.0
## Median :1295.0 Median : 800.0
## Mean :1410.2 Mean : 963.6
## 3rd Qu.:1595.0 3rd Qu.:1100.0
## Max. :3200.0 Max. :2895.0
## NA's :10437 NA's :10068
## MedianRentalPrice_MultiFamilyResidence5PlusUnits
## Min. : 550
## 1st Qu.: 750
## Median : 950
## Mean :1091
## 3rd Qu.:1296
## Max. :2895
## NA's :9029
## MedianRentalPrice_SingleFamilyResidence MedianRentalPrice_Studio
## Min. : 750 Min. : 490
## 1st Qu.:1050 1st Qu.: 975
## Median :1205 Median :1150
## Mean :1363 Mean :1221
## 3rd Qu.:1582 3rd Qu.:1400
## Max. :3400 Max. :2500
## NA's :9120 NA's :10211
## ZHVIPerSqft_AllHomes PctOfHomesDecreasingInValues_AllHomes
## Min. : 35.0 Min. : 0.18
## 1st Qu.: 77.0 1st Qu.:16.98
## Median : 98.0 Median :27.45
## Mean :116.4 Mean :33.29
## 3rd Qu.:141.0 3rd Qu.:45.08
## Max. :499.0 Max. :99.38
## NA's :620 NA's :4292
## PctOfHomesIncreasingInValues_AllHomes PctOfHomesSellingForGain_AllHomes
## Min. : 0.47 Min. : 50.99
## 1st Qu.:44.22 1st Qu.: 79.08
## Median :63.54 Median : 89.97
## Mean :59.03 Mean : 85.88
## 3rd Qu.:76.30 3rd Qu.: 95.64
## Max. :99.76 Max. :100.00
## NA's :4292 NA's :12609
## PctOfHomesSellingForLoss_AllHomes
## Min. : 0.000
## 1st Qu.: 4.365
## Median :10.030
## Mean :14.121
## 3rd Qu.:20.915
## Max. :49.010
## NA's :12609
## PctOfListingsWithPriceReductionsSeasAdj_AllHomes
## Min. : 4.509
## 1st Qu.:10.908
## Median :12.309
## Mean :12.382
## 3rd Qu.:13.917
## Max. :20.636
## NA's :8724
## PctOfListingsWithPriceReductionsSeasAdj_CondoCoop
## Min. : 2.233
## 1st Qu.: 8.528
## Median :10.381
## Mean :10.394
## 3rd Qu.:12.189
## Max. :19.608
## NA's :9164
## PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence
## Min. : 4.438
## 1st Qu.:11.094
## Median :12.536
## Mean :12.602
## 3rd Qu.:14.114
## Max. :20.762
## NA's :8724
## PctOfListingsWithPriceReductions_AllHomes
## Min. : 3.564
## 1st Qu.:10.225
## Median :12.302
## Mean :12.364
## 3rd Qu.:14.366
## Max. :20.918
## NA's :8724
## PctOfListingsWithPriceReductions_CondoCoop
## Min. : 1.788
## 1st Qu.: 8.246
## Median :10.249
## Mean :10.379
## 3rd Qu.:12.381
## Max. :22.904
## NA's :9164
## PctOfListingsWithPriceReductions_SingleFamilyResidence
## Min. : 3.745
## 1st Qu.:10.423
## Median :12.498
## Mean :12.584
## 3rd Qu.:14.624
## Max. :21.689
## NA's :8724
## PriceToRentRatio_AllHomes Sale_Counts Sale_Counts_Seas_Adj
## Min. : 7.05 Min. : 130 Min. : 242
## 1st Qu.: 9.73 1st Qu.: 1672 1st Qu.: 1713
## Median :11.14 Median : 4546 Median : 4764
## Mean :11.44 Mean : 7066 Mean : 7049
## 3rd Qu.:12.74 3rd Qu.: 9247 3rd Qu.: 9393
## Max. :21.55 Max. :50275 Max. :41779
## NA's :8912 NA's :7837 NA's :7837
## Sale_Prices ZHVI_1bedroom ZHVI_2bedroom ZHVI_3bedroom
## Min. : 83800 Min. : 30900 Min. : 32800 Min. : 49600
## 1st Qu.:137000 1st Qu.: 74600 1st Qu.: 86700 1st Qu.:116400
## Median :180900 Median :100400 Median :115400 Median :141200
## Mean :194552 Mean :117060 Mean :135169 Mean :167063
## 3rd Qu.:235775 3rd Qu.:142300 3rd Qu.:166800 3rd Qu.:204400
## Max. :543100 Max. :390200 Max. :542400 Max. :639700
## NA's :9218 NA's :2607 NA's :1467 NA's :425
## ZHVI_4bedroom ZHVI_5BedroomOrMore ZHVI_AllHomes ZHVI_BottomTier
## Min. : 64700 Min. : 68600 Min. : 38200 Min. : 32600
## 1st Qu.:174900 1st Qu.: 217900 1st Qu.:114500 1st Qu.: 66600
## Median :218000 Median : 288000 Median :144750 Median : 87400
## Mean :243830 Mean : 323734 Mean :169753 Mean :102670
## 3rd Qu.:281000 3rd Qu.: 365300 3rd Qu.:207600 3rd Qu.:128200
## Max. :850400 Max. :1497000 Max. :620400 Max. :335600
## NA's :853 NA's :1398 NA's :774 NA's :896
## ZHVI_CondoCoop ZHVI_MiddleTier ZHVI_SingleFamilyResidence ZHVI_TopTier
## Min. : 42200 Min. : 38200 Min. : 37900 Min. : 70900
## 1st Qu.:111300 1st Qu.:114500 1st Qu.:115000 1st Qu.:194700
## Median :134700 Median :144750 Median :147300 Median :251100
## Mean :156770 Mean :169753 Mean :174154 Mean :293974
## 3rd Qu.:175800 3rd Qu.:207600 3rd Qu.:211775 3rd Qu.:349400
## Max. :782900 Max. :620400 Max. :737500 Max. :988100
## NA's :1530 NA's :774 NA's :774 NA's :688
## ZRI_AllHomes ZRI_AllHomesPlusMultifamily ZriPerSqft_AllHomes
## Min. : 799 Min. : 799 Min. :0.560
## 1st Qu.:1047 1st Qu.:1036 1st Qu.:0.728
## Median :1210 Median :1210 Median :0.862
## Mean :1321 Mean :1318 Mean :0.930
## 3rd Qu.:1474 3rd Qu.:1477 3rd Qu.:1.074
## Max. :2690 Max. :2653 Max. :2.294
## NA's :8958 NA's :8876 NA's :8876
## Zri_MultiFamilyResidenceRental Zri_SingleFamilyResidenceRental
## Min. : 713.0 Min. : 799
## 1st Qu.: 959.8 1st Qu.:1039
## Median :1126.0 Median :1220
## Mean :1233.0 Mean :1328
## 3rd Qu.:1399.5 3rd Qu.:1468
## Max. :2606.0 Max. :2754
## NA's :8876 NA's :8958
Now, we will investigate missing values in detail.
rbind( table(is.na(data)) , round(prop.table(table(is.na(data)))*100,1) )## FALSE TRUE
## [1,] 449546.0 633838.0
## [2,] 41.5 58.5
58.5% of values are missing !
missing_value <- lapply(data, function(x) { round((sum(is.na(x)) / length(x)) * 100, 1) })
melt(data.frame(missing_value))%>%
ggplot(aes(x= reorder(variable, -value), y= value))+
geom_col(width=1, fill= "darkred", alpha=0.7 )+
labs(y="Missing pct(%)", x=NULL, title = "Missing Values by feature")+
theme(axis.ticks.x=element_line(colour="gray90"),
axis.text.y=element_text(size=5.5))+
coord_flip()Now we are ready to start visualizing the data. Let’s start with some overview plots for individual features, to explore the data set.
gbp1<-wes_palette("GrandBudapest2")[1]
ggplot(data, aes(x=Sale_Prices))+
geom_histogram(fill=gbp1, alpha=.9, binwidth=10000)+
labs(x=NULL, y=NULL, title = "Histogram of Sale Price")+
scale_x_continuous(breaks= seq(0,600000, by=100000))+
theme_minimal() + theme(plot.title=element_text(vjust=3, size=15) )As you can see, the sale prices are right-skewed. It means we might need to transform this variable and make it more normally distributed before modeling.
gbp2<-wes_palette("GrandBudapest2")[4]
ggplot(data, aes(x=MedianRentalPrice_AllHomes))+
geom_histogram(fill=gbp2, alpha=.9, binwidth=100)+
labs(x=NULL, y=NULL, title = "Histograms of Rental Price")+
scale_x_continuous(breaks= seq(0,4000, by=500))+
theme( plot.title=element_text(vjust=3, size=15) ) + theme_minimal()As you can see, the rental prices are right-skewed.
subset(data, select=c(Sale_Counts, Sale_Prices, MedianRentalPrice_AllHomes))%>%
na.omit()%>% cor()%>%
corrplot.mixed(lower = "number", upper = "ellipse", tl.cex=0.7,
cl.ratio=0.2, cl.cex=0.6, tl.col="black")We can guess that house value for sale and rent will go the same way.
state<- read.csv("C:/Users/neeya/Downloads/State_time_series.csv")
state_new_york <- state %>%
filter(RegionName=='NewYork')
state_new_york %>%
ggplot(aes(x=ZHVI_BottomTier))+
geom_histogram()na_count <- function(x){sum(is.na(x))}
names_of_columns_to_keep <- unlist(apply(X=state_new_york,FUN = na_count,MARGIN = 2))
names_of_columns_to_keep <- names_of_columns_to_keep[names_of_columns_to_keep==0]
names_of_columns_to_keep <- names(names_of_columns_to_keep)
names_of_columns_to_keep## [1] "Date" "RegionName" "ZHVI_2bedroom"
## [4] "ZHVI_3bedroom" "ZHVI_5BedroomOrMore" "ZHVI_BottomTier"
state_new_york_lm <- state_new_york[names_of_columns_to_keep]
# Remove the RegionName
state_new_york_lm <- state_new_york_lm %>% select(-RegionName)
state_new_york_lm_date <- state_new_york_lm %>%
select(-Date)
mod_lm <-lm(data=state_new_york_lm_date,ZHVI_BottomTier~.)
summary(mod_lm)##
## Call:
## lm(formula = ZHVI_BottomTier ~ ., data = state_new_york_lm_date)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3080.75 -917.90 34.26 713.69 3078.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.927e+03 8.467e+02 -9.362 <2e-16 ***
## ZHVI_2bedroom -3.097e-02 5.628e-02 -0.550 0.583
## ZHVI_3bedroom 1.173e+00 5.851e-02 20.047 <2e-16 ***
## ZHVI_5BedroomOrMore -1.509e-01 4.294e-03 -35.133 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1357 on 257 degrees of freedom
## Multiple R-squared: 0.9938, Adjusted R-squared: 0.9937
## F-statistic: 1.364e+04 on 3 and 257 DF, p-value: < 2.2e-16
THANK YOU