This dataset from Zillow seemed interesting to me as I can see very interesting features. Also, the data has good depth. I think it shows us how the numbers differ before and after 2008 state-wise. I downloaded int from the Zillow researd data.

I tried to do some illustrative data visualizations with short analysis. I have also performed Linear regression on the New york state, ZHVI_Allhome prices.

1 Preparations

1.1 Load libraries

library(dplyr)
library(ggplot2)
library(reshape)
## Warning: package 'reshape' was built under R version 4.1.2
library(wesanderson)
## Warning: package 'wesanderson' was built under R version 4.1.2
library(stringr)
## Warning: package 'stringr' was built under R version 4.1.2
library(doBy)
## Warning: package 'doBy' was built under R version 4.1.2
library(plotly)
## Warning: package 'plotly' was built under R version 4.1.2
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.1.2
library(wesanderson)
library(RColorBrewer)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.1.2

1.2 Load data

data=read.csv("C:/Users/neeya/Downloads/State_time_series.csv")
sprintf("The data set has %d rows and %d columns", nrow(data), ncol(data) )
## [1] "The data set has 13212 rows and 82 columns"

2 Overview

The data set has 13212 rows and 82 columns. We will get an overview of the data set using the str, and summary.

We find :

2.1 Structure

str(data)
## 'data.frame':    13212 obs. of  82 variables:
##  $ Date                                                         : chr  "1996-04-30" "1996-04-30" "1996-04-30" "1996-04-30" ...
##  $ RegionName                                                   : chr  "Alabama" "Arizona" "Arkansas" "California" ...
##  $ DaysOnZillow_AllHomes                                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ InventorySeasonallyAdjusted_AllHomes                         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ InventoryRaw_AllHomes                                        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_1Bedroom                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_2Bedroom                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_3Bedroom                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_4Bedroom                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_5BedroomOrMore                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_AllHomes                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_CondoCoop                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_DuplexTriplex                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPricePerSqft_SingleFamilyResidence              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_1Bedroom                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_2Bedroom                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_3Bedroom                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_4Bedroom                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_5BedroomOrMore                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_AllHomes                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_CondoCoop                                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_DuplexTriplex                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianListingPrice_SingleFamilyResidence                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPctOfPriceReduction_AllHomes                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPctOfPriceReduction_CondoCoop                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPctOfPriceReduction_SingleFamilyResidence              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPriceCutDollar_AllHomes                                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPriceCutDollar_CondoCoop                               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianPriceCutDollar_SingleFamilyResidence                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_1Bedroom                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_2Bedroom                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_3Bedroom                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_4Bedroom                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_5BedroomOrMore                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_AllHomes                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_CondoCoop                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_DuplexTriplex                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_SingleFamilyResidence               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPricePerSqft_Studio                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_1Bedroom                                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_2Bedroom                                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_3Bedroom                                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_4Bedroom                                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_5BedroomOrMore                             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_AllHomes                                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_CondoCoop                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_DuplexTriplex                              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_MultiFamilyResidence5PlusUnits             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_SingleFamilyResidence                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MedianRentalPrice_Studio                                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ZHVIPerSqft_AllHomes                                         : int  50 62 42 102 82 85 71 56 55 185 ...
##  $ PctOfHomesDecreasingInValues_AllHomes                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfHomesIncreasingInValues_AllHomes                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfHomesSellingForGain_AllHomes                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfHomesSellingForLoss_AllHomes                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductionsSeasAdj_AllHomes             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductionsSeasAdj_CondoCoop            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductions_AllHomes                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductions_CondoCoop                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PctOfListingsWithPriceReductions_SingleFamilyResidence       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ PriceToRentRatio_AllHomes                                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Sale_Counts                                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Sale_Counts_Seas_Adj                                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Sale_Prices                                                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ ZHVI_1bedroom                                                : int  61500 59200 53000 93700 77800 64700 90100 45400 74900 152300 ...
##  $ ZHVI_2bedroom                                                : int  48900 86400 54500 123400 97500 97000 88200 65400 64700 186600 ...
##  $ ZHVI_3bedroom                                                : int  78200 96100 76800 150900 129000 130400 103500 89100 88000 231800 ...
##  $ ZHVI_4bedroom                                                : int  146500 128400 135100 196100 176100 194800 157800 133600 149700 303400 ...
##  $ ZHVI_5BedroomOrMore                                          : int  206300 190500 186000 265300 212900 299800 176100 199900 212800 345500 ...
##  $ ZHVI_AllHomes                                                : int  79500 103600 64400 157900 128100 132000 106800 86300 92000 227400 ...
##  $ ZHVI_BottomTier                                              : int  45600 67100 38400 95100 82700 83700 77200 52500 57200 144500 ...
##  $ ZHVI_CondoCoop                                               : int  99500 78900 70300 136100 99400 85000 NA 70600 89300 177000 ...
##  $ ZHVI_MiddleTier                                              : int  79500 103600 64400 157900 128100 132000 106800 86300 92000 227400 ...
##  $ ZHVI_SingleFamilyResidence                                   : int  79000 107500 64500 162000 133600 141000 107400 92100 92400 262600 ...
##  $ ZHVI_TopTier                                                 : int  140200 168700 115200 270600 209300 231600 161600 155300 163900 374700 ...
##  $ ZRI_AllHomes                                                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ZRI_AllHomesPlusMultifamily                                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ ZriPerSqft_AllHomes                                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Zri_MultiFamilyResidenceRental                               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Zri_SingleFamilyResidenceRental                              : int  NA NA NA NA NA NA NA NA NA NA ...

2.2 Summary

summary(data)
##      Date            RegionName        DaysOnZillow_AllHomes
##  Length:13212       Length:13212       Min.   : 49.25       
##  Class :character   Class :character   1st Qu.: 90.25       
##  Mode  :character   Mode  :character   Median :108.50       
##                                        Mean   :110.12       
##                                        3rd Qu.:126.75       
##                                        Max.   :251.62       
##                                        NA's   :8367         
##  InventorySeasonallyAdjusted_AllHomes InventoryRaw_AllHomes
##  Min.   :   972                       Min.   :   911       
##  1st Qu.:  9828                       1st Qu.:  9756       
##  Median : 21713                       Median : 21289       
##  Mean   : 33293                       Mean   : 33299       
##  3rd Qu.: 47453                       3rd Qu.: 46891       
##  Max.   :260687                       Max.   :268055       
##  NA's   :8316                         NA's   :8316         
##  MedianListingPricePerSqft_1Bedroom MedianListingPricePerSqft_2Bedroom
##  Min.   : 57.14                     Min.   : 60.00                    
##  1st Qu.:125.69                     1st Qu.: 92.16                    
##  Median :162.75                     Median :121.30                    
##  Mean   :182.47                     Mean   :135.49                    
##  3rd Qu.:202.63                     3rd Qu.:152.24                    
##  Max.   :627.55                     Max.   :550.64                    
##  NA's   :9626                       NA's   :8678                      
##  MedianListingPricePerSqft_3Bedroom MedianListingPricePerSqft_4Bedroom
##  Min.   : 56.48                     Min.   : 61.8                     
##  1st Qu.: 93.28                     1st Qu.: 99.6                     
##  Median :116.63                     Median :119.8                     
##  Mean   :129.45                     Mean   :133.5                     
##  3rd Qu.:143.19                     3rd Qu.:144.8                     
##  Max.   :460.46                     Max.   :480.8                     
##  NA's   :8605                       NA's   :8535                      
##  MedianListingPricePerSqft_5BedroomOrMore MedianListingPricePerSqft_AllHomes
##  Min.   : 63.78                           Min.   : 62.14                    
##  1st Qu.: 99.08                           1st Qu.: 96.01                    
##  Median :119.75                           Median :120.58                    
##  Mean   :135.67                           Mean   :136.66                    
##  3rd Qu.:149.35                           3rd Qu.:153.44                    
##  Max.   :617.96                           Max.   :520.72                    
##  NA's   :8643                             NA's   :8538                      
##  MedianListingPricePerSqft_CondoCoop MedianListingPricePerSqft_DuplexTriplex
##  Min.   :  61.92                     Min.   : 32.14                         
##  1st Qu.: 113.08                     1st Qu.: 60.89                         
##  Median : 141.00                     Median : 81.21                         
##  Mean   : 163.55                     Mean   : 97.09                         
##  3rd Qu.: 177.07                     3rd Qu.:113.52                         
##  Max.   :1000.00                     Max.   :446.43                         
##  NA's   :9063                        NA's   :9248                           
##  MedianListingPricePerSqft_SingleFamilyResidence MedianListingPrice_1Bedroom
##  Min.   : 63.27                                  Min.   : 49900             
##  1st Qu.: 95.55                                  1st Qu.: 99000             
##  Median :120.09                                  Median :130000             
##  Mean   :133.37                                  Mean   :147083             
##  3rd Qu.:149.84                                  3rd Qu.:169900             
##  Max.   :475.36                                  Max.   :399000             
##  NA's   :8573                                    NA's   :10205              
##  MedianListingPrice_2Bedroom MedianListingPrice_3Bedroom
##  Min.   : 57000              Min.   :109900             
##  1st Qu.:109500              1st Qu.:149000             
##  Median :147000              Median :189900             
##  Mean   :158873              Mean   :209226             
##  3rd Qu.:179900              3rd Qu.:240000             
##  Max.   :599000              Max.   :687000             
##  NA's   :8839                NA's   :8842               
##  MedianListingPrice_4Bedroom MedianListingPrice_5BedroomOrMore
##  Min.   :169000              Min.   : 159900                  
##  1st Qu.:238745              1st Qu.: 310000                  
##  Median :283020              Median : 369700                  
##  Mean   :310850              Mean   : 416375                  
##  3rd Qu.:339900              3rd Qu.: 452173                  
##  Max.   :950000              Max.   :1847500                  
##  NA's   :8876                NA's   :8989                     
##  MedianListingPrice_AllHomes MedianListingPrice_CondoCoop
##  Min.   :112944              Min.   : 82500              
##  1st Qu.:159900              1st Qu.:152363              
##  Median :209000              Median :184900              
##  Mean   :223379              Mean   :202333              
##  3rd Qu.:259900              3rd Qu.:228000              
##  Max.   :610000              Max.   :754500              
##  NA's   :8966                NA's   :9402                
##  MedianListingPrice_DuplexTriplex MedianListingPrice_SingleFamilyResidence
##  Min.   : 64900                   Min.   :112900                          
##  1st Qu.:129900                   1st Qu.:159900                          
##  Median :178900                   Median :209900                          
##  Mean   :207475                   Mean   :228170                          
##  3rd Qu.:245000                   3rd Qu.:265000                          
##  Max.   :939000                   Max.   :725000                          
##  NA's   :9323                     NA's   :9082                            
##  MedianPctOfPriceReduction_AllHomes MedianPctOfPriceReduction_CondoCoop
##  Min.   :1.744                      Min.   : 1.676                     
##  1st Qu.:3.239                      1st Qu.: 3.261                     
##  Median :3.718                      Median : 3.807                     
##  Mean   :3.848                      Mean   : 4.015                     
##  3rd Qu.:4.350                      3rd Qu.: 4.561                     
##  Max.   :8.340                      Max.   :10.000                     
##  NA's   :8724                       NA's   :9340                       
##  MedianPctOfPriceReduction_SingleFamilyResidence MedianPriceCutDollar_AllHomes
##  Min.   :1.716                                   Min.   : 5000                
##  1st Qu.:3.233                                   1st Qu.: 5100                
##  Median :3.737                                   Median : 7500                
##  Mean   :3.847                                   Mean   : 8034                
##  3rd Qu.:4.349                                   3rd Qu.:10000                
##  Max.   :8.347                                   Max.   :24000                
##  NA's   :8724                                    NA's   :8724                 
##  MedianPriceCutDollar_CondoCoop MedianPriceCutDollar_SingleFamilyResidence
##  Min.   : 2050                  Min.   : 5000                             
##  1st Qu.: 5000                  1st Qu.: 5300                             
##  Median : 6800                  Median : 7900                             
##  Mean   : 7453                  Mean   : 8245                             
##  3rd Qu.:10000                  3rd Qu.:10000                             
##  Max.   :27754                  Max.   :26000                             
##  NA's   :9340                   NA's   :8724                              
##  MedianRentalPricePerSqft_1Bedroom MedianRentalPricePerSqft_2Bedroom
##  Min.   :0.716                     Min.   :0.521                    
##  1st Qu.:0.992                     1st Qu.:0.806                    
##  Median :1.211                     Median :0.952                    
##  Mean   :1.397                     Mean   :1.106                    
##  3rd Qu.:1.606                     3rd Qu.:1.262                    
##  Max.   :3.369                     Max.   :3.113                    
##  NA's   :9588                      NA's   :9065                     
##  MedianRentalPricePerSqft_3Bedroom MedianRentalPricePerSqft_4Bedroom
##  Min.   :0.533                     Min.   :0.489                    
##  1st Qu.:0.751                     1st Qu.:0.678                    
##  Median :0.863                     Median :0.781                    
##  Mean   :0.968                     Mean   :0.847                    
##  3rd Qu.:1.073                     3rd Qu.:0.938                    
##  Max.   :2.453                     Max.   :2.134                    
##  NA's   :8985                      NA's   :9808                     
##  MedianRentalPricePerSqft_5BedroomOrMore MedianRentalPricePerSqft_AllHomes
##  Min.   :0.363                           Min.   :0.580                    
##  1st Qu.:0.629                           1st Qu.:0.760                    
##  Median :0.745                           Median :0.880                    
##  Mean   :0.763                           Mean   :1.015                    
##  3rd Qu.:0.891                           3rd Qu.:1.125                    
##  Max.   :1.267                           Max.   :3.147                    
##  NA's   :11752                           NA's   :8864                     
##  MedianRentalPricePerSqft_CondoCoop MedianRentalPricePerSqft_DuplexTriplex
##  Min.   :0.637                      Min.   :0.499                         
##  1st Qu.:0.896                      1st Qu.:0.740                         
##  Median :1.075                      Median :0.871                         
##  Mean   :1.297                      Mean   :1.066                         
##  3rd Qu.:1.414                      3rd Qu.:1.119                         
##  Max.   :4.824                      Max.   :3.862                         
##  NA's   :10004                      NA's   :10293                         
##  MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits
##  Min.   :0.585                                          
##  1st Qu.:0.837                                          
##  Median :1.000                                          
##  Mean   :1.170                                          
##  3rd Qu.:1.300                                          
##  Max.   :3.380                                          
##  NA's   :9189                                           
##  MedianRentalPricePerSqft_SingleFamilyResidence MedianRentalPricePerSqft_Studio
##  Min.   :0.581                                  Min.   :0.565                  
##  1st Qu.:0.749                                  1st Qu.:0.739                  
##  Median :0.861                                  Median :0.853                  
##  Mean   :0.939                                  Mean   :1.176                  
##  3rd Qu.:1.031                                  3rd Qu.:1.260                  
##  Max.   :2.440                                  Max.   :3.982                  
##  NA's   :8923                                   NA's   :10875                  
##  MedianRentalPrice_1Bedroom MedianRentalPrice_2Bedroom
##  Min.   : 495.0             Min.   : 575              
##  1st Qu.: 650.0             1st Qu.: 775              
##  Median : 860.0             Median : 925              
##  Mean   : 978.6             Mean   :1098              
##  3rd Qu.:1195.0             3rd Qu.:1279              
##  Max.   :2690.0             Max.   :3215              
##  NA's   :9686               NA's   :9168              
##  MedianRentalPrice_3Bedroom MedianRentalPrice_4Bedroom
##  Min.   : 750               Min.   : 950              
##  1st Qu.:1050               1st Qu.:1395              
##  Median :1200               Median :1595              
##  Mean   :1357               Mean   :1722              
##  3rd Qu.:1515               3rd Qu.:1950              
##  Max.   :3550               Max.   :3850              
##  NA's   :9075               NA's   :9856              
##  MedianRentalPrice_5BedroomOrMore MedianRentalPrice_AllHomes
##  Min.   : 795                     Min.   : 750              
##  1st Qu.:1750                     1st Qu.:1050              
##  Median :1995                     Median :1200              
##  Mean   :2139                     Mean   :1362              
##  3rd Qu.:2495                     3rd Qu.:1590              
##  Max.   :4500                     Max.   :3600              
##  NA's   :11994                    NA's   :9060              
##  MedianRentalPrice_CondoCoop MedianRentalPrice_DuplexTriplex
##  Min.   : 697.5              Min.   : 500.0                 
##  1st Qu.:1050.0              1st Qu.: 685.0                 
##  Median :1295.0              Median : 800.0                 
##  Mean   :1410.2              Mean   : 963.6                 
##  3rd Qu.:1595.0              3rd Qu.:1100.0                 
##  Max.   :3200.0              Max.   :2895.0                 
##  NA's   :10437               NA's   :10068                  
##  MedianRentalPrice_MultiFamilyResidence5PlusUnits
##  Min.   : 550                                    
##  1st Qu.: 750                                    
##  Median : 950                                    
##  Mean   :1091                                    
##  3rd Qu.:1296                                    
##  Max.   :2895                                    
##  NA's   :9029                                    
##  MedianRentalPrice_SingleFamilyResidence MedianRentalPrice_Studio
##  Min.   : 750                            Min.   : 490            
##  1st Qu.:1050                            1st Qu.: 975            
##  Median :1205                            Median :1150            
##  Mean   :1363                            Mean   :1221            
##  3rd Qu.:1582                            3rd Qu.:1400            
##  Max.   :3400                            Max.   :2500            
##  NA's   :9120                            NA's   :10211           
##  ZHVIPerSqft_AllHomes PctOfHomesDecreasingInValues_AllHomes
##  Min.   : 35.0        Min.   : 0.18                        
##  1st Qu.: 77.0        1st Qu.:16.98                        
##  Median : 98.0        Median :27.45                        
##  Mean   :116.4        Mean   :33.29                        
##  3rd Qu.:141.0        3rd Qu.:45.08                        
##  Max.   :499.0        Max.   :99.38                        
##  NA's   :620          NA's   :4292                         
##  PctOfHomesIncreasingInValues_AllHomes PctOfHomesSellingForGain_AllHomes
##  Min.   : 0.47                         Min.   : 50.99                   
##  1st Qu.:44.22                         1st Qu.: 79.08                   
##  Median :63.54                         Median : 89.97                   
##  Mean   :59.03                         Mean   : 85.88                   
##  3rd Qu.:76.30                         3rd Qu.: 95.64                   
##  Max.   :99.76                         Max.   :100.00                   
##  NA's   :4292                          NA's   :12609                    
##  PctOfHomesSellingForLoss_AllHomes
##  Min.   : 0.000                   
##  1st Qu.: 4.365                   
##  Median :10.030                   
##  Mean   :14.121                   
##  3rd Qu.:20.915                   
##  Max.   :49.010                   
##  NA's   :12609                    
##  PctOfListingsWithPriceReductionsSeasAdj_AllHomes
##  Min.   : 4.509                                  
##  1st Qu.:10.908                                  
##  Median :12.309                                  
##  Mean   :12.382                                  
##  3rd Qu.:13.917                                  
##  Max.   :20.636                                  
##  NA's   :8724                                    
##  PctOfListingsWithPriceReductionsSeasAdj_CondoCoop
##  Min.   : 2.233                                   
##  1st Qu.: 8.528                                   
##  Median :10.381                                   
##  Mean   :10.394                                   
##  3rd Qu.:12.189                                   
##  Max.   :19.608                                   
##  NA's   :9164                                     
##  PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence
##  Min.   : 4.438                                               
##  1st Qu.:11.094                                               
##  Median :12.536                                               
##  Mean   :12.602                                               
##  3rd Qu.:14.114                                               
##  Max.   :20.762                                               
##  NA's   :8724                                                 
##  PctOfListingsWithPriceReductions_AllHomes
##  Min.   : 3.564                           
##  1st Qu.:10.225                           
##  Median :12.302                           
##  Mean   :12.364                           
##  3rd Qu.:14.366                           
##  Max.   :20.918                           
##  NA's   :8724                             
##  PctOfListingsWithPriceReductions_CondoCoop
##  Min.   : 1.788                            
##  1st Qu.: 8.246                            
##  Median :10.249                            
##  Mean   :10.379                            
##  3rd Qu.:12.381                            
##  Max.   :22.904                            
##  NA's   :9164                              
##  PctOfListingsWithPriceReductions_SingleFamilyResidence
##  Min.   : 3.745                                        
##  1st Qu.:10.423                                        
##  Median :12.498                                        
##  Mean   :12.584                                        
##  3rd Qu.:14.624                                        
##  Max.   :21.689                                        
##  NA's   :8724                                          
##  PriceToRentRatio_AllHomes  Sale_Counts    Sale_Counts_Seas_Adj
##  Min.   : 7.05             Min.   :  130   Min.   :  242       
##  1st Qu.: 9.73             1st Qu.: 1672   1st Qu.: 1713       
##  Median :11.14             Median : 4546   Median : 4764       
##  Mean   :11.44             Mean   : 7066   Mean   : 7049       
##  3rd Qu.:12.74             3rd Qu.: 9247   3rd Qu.: 9393       
##  Max.   :21.55             Max.   :50275   Max.   :41779       
##  NA's   :8912              NA's   :7837    NA's   :7837        
##   Sale_Prices     ZHVI_1bedroom    ZHVI_2bedroom    ZHVI_3bedroom   
##  Min.   : 83800   Min.   : 30900   Min.   : 32800   Min.   : 49600  
##  1st Qu.:137000   1st Qu.: 74600   1st Qu.: 86700   1st Qu.:116400  
##  Median :180900   Median :100400   Median :115400   Median :141200  
##  Mean   :194552   Mean   :117060   Mean   :135169   Mean   :167063  
##  3rd Qu.:235775   3rd Qu.:142300   3rd Qu.:166800   3rd Qu.:204400  
##  Max.   :543100   Max.   :390200   Max.   :542400   Max.   :639700  
##  NA's   :9218     NA's   :2607     NA's   :1467     NA's   :425     
##  ZHVI_4bedroom    ZHVI_5BedroomOrMore ZHVI_AllHomes    ZHVI_BottomTier 
##  Min.   : 64700   Min.   :  68600     Min.   : 38200   Min.   : 32600  
##  1st Qu.:174900   1st Qu.: 217900     1st Qu.:114500   1st Qu.: 66600  
##  Median :218000   Median : 288000     Median :144750   Median : 87400  
##  Mean   :243830   Mean   : 323734     Mean   :169753   Mean   :102670  
##  3rd Qu.:281000   3rd Qu.: 365300     3rd Qu.:207600   3rd Qu.:128200  
##  Max.   :850400   Max.   :1497000     Max.   :620400   Max.   :335600  
##  NA's   :853      NA's   :1398        NA's   :774      NA's   :896     
##  ZHVI_CondoCoop   ZHVI_MiddleTier  ZHVI_SingleFamilyResidence  ZHVI_TopTier   
##  Min.   : 42200   Min.   : 38200   Min.   : 37900             Min.   : 70900  
##  1st Qu.:111300   1st Qu.:114500   1st Qu.:115000             1st Qu.:194700  
##  Median :134700   Median :144750   Median :147300             Median :251100  
##  Mean   :156770   Mean   :169753   Mean   :174154             Mean   :293974  
##  3rd Qu.:175800   3rd Qu.:207600   3rd Qu.:211775             3rd Qu.:349400  
##  Max.   :782900   Max.   :620400   Max.   :737500             Max.   :988100  
##  NA's   :1530     NA's   :774      NA's   :774                NA's   :688     
##   ZRI_AllHomes  ZRI_AllHomesPlusMultifamily ZriPerSqft_AllHomes
##  Min.   : 799   Min.   : 799                Min.   :0.560      
##  1st Qu.:1047   1st Qu.:1036                1st Qu.:0.728      
##  Median :1210   Median :1210                Median :0.862      
##  Mean   :1321   Mean   :1318                Mean   :0.930      
##  3rd Qu.:1474   3rd Qu.:1477                3rd Qu.:1.074      
##  Max.   :2690   Max.   :2653                Max.   :2.294      
##  NA's   :8958   NA's   :8876                NA's   :8876       
##  Zri_MultiFamilyResidenceRental Zri_SingleFamilyResidenceRental
##  Min.   : 713.0                 Min.   : 799                   
##  1st Qu.: 959.8                 1st Qu.:1039                   
##  Median :1126.0                 Median :1220                   
##  Mean   :1233.0                 Mean   :1328                   
##  3rd Qu.:1399.5                 3rd Qu.:1468                   
##  Max.   :2606.0                 Max.   :2754                   
##  NA's   :8876                   NA's   :8958

3 Missing Values

Now, we will investigate missing values in detail.

3.1 Total

rbind( table(is.na(data)) , round(prop.table(table(is.na(data)))*100,1) )
##         FALSE     TRUE
## [1,] 449546.0 633838.0
## [2,]     41.5     58.5

58.5% of values are missing !

3.2 By feature

missing_value <- lapply(data, function(x) { round((sum(is.na(x)) / length(x)) * 100, 1) })
melt(data.frame(missing_value))%>%
    ggplot(aes(x= reorder(variable, -value), y= value))+
    geom_col(width=1, fill= "darkred", alpha=0.7 )+
    labs(y="Missing pct(%)", x=NULL, title = "Missing Values by feature")+
    theme(axis.ticks.x=element_line(colour="gray90"),
          axis.text.y=element_text(size=5.5))+
    coord_flip()

4 Individual feature visualisations

Now we are ready to start visualizing the data. Let’s start with some overview plots for individual features, to explore the data set.

library(lubridate)
data$Year<-year(ymd(data$Date))

Now we are ready to start visualizing the data. Let’s start with some overview plots for individual features, to explore the data set.

4.1 All states

4.1.1 Sale Price

4.1.1.1 Histogram

gbp1<-wes_palette("GrandBudapest2")[1]

ggplot(data, aes(x=Sale_Prices))+
    geom_histogram(fill=gbp1, alpha=.9, binwidth=10000)+
    labs(x=NULL, y=NULL, title = "Histogram of Sale Price")+
    scale_x_continuous(breaks= seq(0,600000, by=100000))+
    theme_minimal() + theme(plot.title=element_text(vjust=3, size=15) )

As you can see, the sale prices are right-skewed. It means we might need to transform this variable and make it more normally distributed before modeling.

4.1.2 Rental Price

4.1.2.1 Histogram

gbp2<-wes_palette("GrandBudapest2")[4]

ggplot(data, aes(x=MedianRentalPrice_AllHomes))+
    geom_histogram(fill=gbp2, alpha=.9, binwidth=100)+
    labs(x=NULL, y=NULL, title = "Histograms of Rental Price")+
    scale_x_continuous(breaks= seq(0,4000, by=500))+
    theme( plot.title=element_text(vjust=3, size=15) ) + theme_minimal()

As you can see, the rental prices are right-skewed.

4.1.3 Correlation plot; sale·rental price and counts

subset(data, select=c(Sale_Counts, Sale_Prices, MedianRentalPrice_AllHomes))%>%
    na.omit()%>% cor()%>% 
    corrplot.mixed(lower = "number", upper = "ellipse", tl.cex=0.7,
                   cl.ratio=0.2, cl.cex=0.6, tl.col="black")

We can guess that house value for sale and rent will go the same way.

4.2 How is the dependent variable distributed?

state<- read.csv("C:/Users/neeya/Downloads/State_time_series.csv")
state_new_york <- state %>%
  filter(RegionName=='NewYork') 

state_new_york %>%
  ggplot(aes(x=ZHVI_AllHomes))+
  geom_histogram()

na_count <- function(x){sum(is.na(x))}

names_of_columns_to_keep <- unlist(apply(X=state_new_york,FUN = na_count,MARGIN = 2))
names_of_columns_to_keep <- names_of_columns_to_keep[names_of_columns_to_keep==0]
names_of_columns_to_keep <- names(names_of_columns_to_keep)

names_of_columns_to_keep
## [1] "Date"                "RegionName"          "ZHVI_2bedroom"      
## [4] "ZHVI_3bedroom"       "ZHVI_5BedroomOrMore" "ZHVI_BottomTier"

5 Simple Linear Regression

state_new_york_lm <- state_new_york[names_of_columns_to_keep]

# Remove the RegionName

state_new_york_lm <- state_new_york_lm %>% select(-RegionName)

state_new_york_lm_date <- state_new_york_lm %>% 
  select(-Date)

mod_lm <-lm(data=state_new_york_lm_date,ZHVI_BottomTier~.)
summary(mod_lm)
## 
## Call:
## lm(formula = ZHVI_BottomTier ~ ., data = state_new_york_lm_date)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3080.75  -917.90    34.26   713.69  3078.79 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -7.927e+03  8.467e+02  -9.362   <2e-16 ***
## ZHVI_2bedroom       -3.097e-02  5.628e-02  -0.550    0.583    
## ZHVI_3bedroom        1.173e+00  5.851e-02  20.047   <2e-16 ***
## ZHVI_5BedroomOrMore -1.509e-01  4.294e-03 -35.133   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1357 on 257 degrees of freedom
## Multiple R-squared:  0.9938, Adjusted R-squared:  0.9937 
## F-statistic: 1.364e+04 on 3 and 257 DF,  p-value: < 2.2e-16

6 Arima modeling

Steps to be followed for ARIMA modeling:

We need to understand the three components of a time series data. I used the following R code to find out the components of this time series:

train2 = na.omit(with(data,data.frame( ds = Date, y= log(ZHVI_AllHomes), rn = RegionName)))
m_train = aggregate(y ~ds, train2, mean)
NY<-ts(train2$y[train2$rn=="NewYork"], start=c(1996,04), freq=12)
plot(decompose(NY), color)

m_train
##             ds        y
## 1   1996-04-30 11.47763
## 2   1996-05-31 11.47665
## 3   1996-06-30 11.47869
## 4   1996-07-31 11.48061
## 5   1996-08-31 11.48241
## 6   1996-09-30 11.48456
## 7   1996-10-31 11.48653
## 8   1996-11-30 11.48931
## 9   1996-12-31 11.49239
## 10  1997-01-31 11.49657
## 11  1997-02-28 11.49656
## 12  1997-03-31 11.50119
## 13  1997-04-30 11.50453
## 14  1997-05-31 11.50751
## 15  1997-06-30 11.51054
## 16  1997-07-31 11.51418
## 17  1997-08-31 11.51757
## 18  1997-09-30 11.52125
## 19  1997-10-31 11.52530
## 20  1997-11-30 11.52955
## 21  1997-12-31 11.53364
## 22  1998-01-31 11.53650
## 23  1998-02-28 11.54119
## 24  1998-03-31 11.54560
## 25  1998-04-30 11.54889
## 26  1998-05-31 11.55140
## 27  1998-06-30 11.55425
## 28  1998-07-31 11.56962
## 29  1998-08-31 11.57267
## 30  1998-09-30 11.57546
## 31  1998-10-31 11.57804
## 32  1998-11-30 11.58077
## 33  1998-12-31 11.58480
## 34  1999-01-31 11.58969
## 35  1999-02-28 11.59504
## 36  1999-03-31 11.60033
## 37  1999-04-30 11.60592
## 38  1999-05-31 11.61132
## 39  1999-06-30 11.61934
## 40  1999-07-31 11.62403
## 41  1999-08-31 11.62794
## 42  1999-09-30 11.63238
## 43  1999-10-31 11.63746
## 44  1999-11-30 11.64251
## 45  1999-12-31 11.64695
## 46  2000-01-31 11.65220
## 47  2000-02-29 11.65809
## 48  2000-03-31 11.66380
## 49  2000-04-30 11.66849
## 50  2000-05-31 11.67407
## 51  2000-06-30 11.68006
## 52  2000-07-31 11.68558
## 53  2000-08-31 11.69121
## 54  2000-09-30 11.69656
## 55  2000-10-31 11.70122
## 56  2000-11-30 11.70555
## 57  2000-12-31 11.71104
## 58  2001-01-31 11.71754
## 59  2001-02-28 11.72355
## 60  2001-03-31 11.72899
## 61  2001-04-30 11.73493
## 62  2001-05-31 11.74131
## 63  2001-06-30 11.74733
## 64  2001-07-31 11.75282
## 65  2001-08-31 11.75754
## 66  2001-09-30 11.76242
## 67  2001-10-31 11.76736
## 68  2001-11-30 11.77199
## 69  2001-12-31 11.77668
## 70  2002-01-31 11.78203
## 71  2002-02-28 11.78749
## 72  2002-03-31 11.79258
## 73  2002-04-30 11.79734
## 74  2002-05-31 11.80197
## 75  2002-06-30 11.80694
## 76  2002-07-31 11.81281
## 77  2002-08-31 11.81935
## 78  2002-09-30 11.82621
## 79  2002-10-31 11.83270
## 80  2002-11-30 11.83916
## 81  2002-12-31 11.84505
## 82  2003-01-31 11.85087
## 83  2003-02-28 11.85731
## 84  2003-03-31 11.86389
## 85  2003-04-30 11.86993
## 86  2003-05-31 11.87613
## 87  2003-06-30 11.88247
## 88  2003-07-31 11.88859
## 89  2003-08-31 11.89464
## 90  2003-09-30 11.90480
## 91  2003-10-31 11.91114
## 92  2003-11-30 11.91816
## 93  2003-12-31 11.92560
## 94  2004-01-31 11.93240
## 95  2004-02-29 11.93905
## 96  2004-03-31 11.94650
## 97  2004-04-30 11.95448
## 98  2004-05-31 11.96270
## 99  2004-06-30 11.97120
## 100 2004-07-31 11.98010
## 101 2004-08-31 11.98992
## 102 2004-09-30 11.99933
## 103 2004-10-31 12.00158
## 104 2004-11-30 12.00997
## 105 2004-12-31 12.01900
## 106 2005-01-31 12.02763
## 107 2005-02-28 12.03580
## 108 2005-03-31 12.04390
## 109 2005-04-30 12.05285
## 110 2005-05-31 12.06233
## 111 2005-06-30 12.07197
## 112 2005-07-31 12.08091
## 113 2005-08-31 12.08832
## 114 2005-09-30 12.09488
## 115 2005-10-31 12.10194
## 116 2005-11-30 12.10882
## 117 2005-12-31 12.11507
## 118 2006-01-31 12.12034
## 119 2006-02-28 12.12462
## 120 2006-03-31 12.12824
## 121 2006-04-30 12.13167
## 122 2006-05-31 12.13457
## 123 2006-06-30 12.13676
## 124 2006-07-31 12.13819
## 125 2006-08-31 12.13923
## 126 2006-09-30 12.14066
## 127 2006-10-31 12.14211
## 128 2006-11-30 12.14233
## 129 2006-12-31 12.14181
## 130 2007-01-31 12.14223
## 131 2007-02-28 12.14311
## 132 2007-03-31 12.14365
## 133 2007-04-30 12.14426
## 134 2007-05-31 12.14479
## 135 2007-06-30 12.14461
## 136 2007-07-31 12.14399
## 137 2007-08-31 12.15475
## 138 2007-09-30 12.15328
## 139 2007-10-31 12.15118
## 140 2007-11-30 12.14847
## 141 2007-12-31 12.14549
## 142 2008-01-31 12.14242
## 143 2008-02-29 12.13827
## 144 2008-03-31 12.13370
## 145 2008-04-30 12.12890
## 146 2008-05-31 12.12368
## 147 2008-06-30 12.11813
## 148 2008-07-31 12.11334
## 149 2008-08-31 12.09963
## 150 2008-09-30 12.09829
## 151 2008-10-31 12.09686
## 152 2008-11-30 12.09536
## 153 2008-12-31 12.09472
## 154 2009-01-31 12.09468
## 155 2009-02-28 12.09398
## 156 2009-03-31 12.09192
## 157 2009-04-30 12.08896
## 158 2009-05-31 12.08536
## 159 2009-06-30 12.08121
## 160 2009-07-31 12.07715
## 161 2009-08-31 12.07354
## 162 2009-09-30 12.07006
## 163 2009-10-31 12.06700
## 164 2009-11-30 12.06545
## 165 2009-12-31 12.05926
## 166 2010-01-31 12.04927
## 167 2010-02-28 12.04215
## 168 2010-03-31 12.03978
## 169 2010-04-30 12.03714
## 170 2010-05-31 12.03447
## 171 2010-06-30 12.03125
## 172 2010-07-31 12.02740
## 173 2010-08-31 12.02278
## 174 2010-09-30 12.01743
## 175 2010-10-31 12.01233
## 176 2010-11-30 12.00799
## 177 2010-12-31 12.00527
## 178 2011-01-31 12.00314
## 179 2011-02-28 12.00046
## 180 2011-03-31 11.99679
## 181 2011-04-30 11.99271
## 182 2011-05-31 11.98858
## 183 2011-06-30 11.98516
## 184 2011-07-31 11.98197
## 185 2011-08-31 11.97886
## 186 2011-09-30 11.97587
## 187 2011-10-31 11.97299
## 188 2011-11-30 11.96994
## 189 2011-12-31 11.96873
## 190 2012-01-31 11.96895
## 191 2012-02-29 11.96926
## 192 2012-03-31 11.96905
## 193 2012-04-30 11.96964
## 194 2012-05-31 11.97094
## 195 2012-06-30 11.97301
## 196 2012-07-31 11.97565
## 197 2012-08-31 11.97840
## 198 2012-09-30 11.98133
## 199 2012-10-31 11.98410
## 200 2012-11-30 11.98677
## 201 2012-12-31 11.99035
## 202 2013-01-31 11.99481
## 203 2013-02-28 11.99886
## 204 2013-03-31 12.00251
## 205 2013-04-30 12.00660
## 206 2013-05-31 12.01116
## 207 2013-06-30 12.01626
## 208 2013-07-31 12.02150
## 209 2013-08-31 12.02597
## 210 2013-09-30 12.02998
## 211 2013-10-31 12.03375
## 212 2013-11-30 12.03678
## 213 2013-12-31 12.03991
## 214 2014-01-31 12.04336
## 215 2014-02-28 12.04649
## 216 2014-03-31 12.04929
## 217 2014-04-30 12.05235
## 218 2014-05-31 12.05556
## 219 2014-06-30 12.05872
## 220 2014-07-31 12.06198
## 221 2014-08-31 12.06487
## 222 2014-09-30 12.06716
## 223 2014-10-31 12.06921
## 224 2014-11-30 12.07119
## 225 2014-12-31 12.07354
## 226 2015-01-31 12.07637
## 227 2015-02-28 12.08004
## 228 2015-03-31 12.08447
## 229 2015-04-30 12.08935
## 230 2015-05-31 12.09411
## 231 2015-06-30 12.09871
## 232 2015-07-31 12.10361
## 233 2015-08-31 12.10817
## 234 2015-09-30 12.11224
## 235 2015-10-31 12.11591
## 236 2015-11-30 12.11936
## 237 2015-12-31 12.12298
## 238 2016-01-31 12.12683
## 239 2016-02-29 12.13107
## 240 2016-03-31 12.13579
## 241 2016-04-30 12.14025
## 242 2016-05-31 12.14393
## 243 2016-06-30 12.14802
## 244 2016-07-31 12.15248
## 245 2016-08-31 12.15678
## 246 2016-09-30 12.16159
## 247 2016-10-31 12.16699
## 248 2016-11-30 12.17218
## 249 2016-12-31 12.17737
## 250 2017-01-31 12.18266
## 251 2017-02-28 12.18806
## 252 2017-03-31 12.19305
## 253 2017-04-30 12.19767
## 254 2017-05-31 12.20201
## 255 2017-06-30 12.20559
## 256 2017-07-31 12.20846
## 257 2017-08-31 12.21206
## 258 2017-09-30 12.21700
## 259 2017-10-31 12.22196
## 260 2017-11-30 12.22722
## 261 2017-12-31 12.23260

Here we get 4 components:

Observed – the actual data plot Trend – the overall upward or downward movement of the data points Seasonal – any monthly/yearly pattern of the data points Random – unexplainable part of the data Observing these 4 graphs closely, we can find out if the data satisfies all the assumptions of ARIMA modeling, mainly, stationarity and seasonality.

Various plots and functions that help in detecting seasonality:

A seasonal subseries plot

library(forecast)
NY_adjusted <- NY- (decompose(NY))$seasonal
ndiffs(NY_adjusted)
## [1] 2

The autocorrelation function (acf()) gives the autocorrelation at all possible lags. The autocorrelation at lag 0 is included by default which always takes the value 1 as it represents the correlation between the data and themselves.

6.1 ADF and KPSS Tests for Stationarity

acf(NY,lag.max=40)

pacf(NY,lag.max=40)

library(tseries)
kpss.test(NY,lshort = TRUE)
## 
##  KPSS Test for Level Stationarity
## 
## data:  NY
## KPSS Level = 0.76198, Truncation lag parameter = 4, p-value = 0.01

As we can infer from the acf graph above, the autocorrelation continues to exponentially decrease as the lag increases, confirming that there is no linear association between observations separated by larger lags.

pacf() at lag k is autocorrelation function which describes the correlation between all data points that are exactly k steps apart- after accounting for their correlation with the data between those k steps. It helps to identify the number of autoregression (AR) coefficients(p-value) in an ARIMA model.

fitARIMA <- arima(NY, order=c(1,1,1),seasonal = list(order = c(1,0,0), period = 12),method="ML")
library(lmtest)
coeftest(fitARIMA) 
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.852615   0.045789 18.6206 < 2.2e-16 ***
## ma1  0.664270   0.052247 12.7141 < 2.2e-16 ***
## sar1 0.376175   0.105187  3.5763 0.0003485 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(fitARIMA)
##          2.5 %    97.5 %
## ar1  0.7628706 0.9423589
## ma1  0.5618685 0.7666711
## sar1 0.1700133 0.5823377

6.2 Choosing the best model

I picked the model with the lowest AIC values.

mod1 = arima(NY,order=c(0,1,1))
mod2 = arima(NY,order=c(0,1,0))
mod3 = arima(NY,order=c(1,1,0))
summary(mod1)
## 
## Call:
## arima(x = NY, order = c(0, 1, 1))
## 
## Coefficients:
##          ma1
##       0.8836
## s.e.  0.0276
## 
## sigma^2 estimated as 1.232e-05:  log likelihood = 524.16,  aic = -1044.32
## 
## Training set error measures:
##                         ME        RMSE         MAE          MPE      MAPE
## Training set -0.0002381121 0.003676344 0.002659507 -0.001881159 0.0212097
##                  MASE      ACF1
## Training set 0.589279 0.6812555
summary(mod2)
## 
## Call:
## arima(x = NY, order = c(0, 1, 0))
## 
## 
## sigma^2 estimated as 3.851e-05:  log likelihood = 454.25,  aic = -906.51
## 
## Training set error measures:
##                        ME        RMSE         MAE          MPE       MAPE
## Training set -0.000546988 0.006284733 0.004578796 -0.004329216 0.03650425
##                  MASE      ACF1
## Training set 1.014544 0.9006195
summary(mod3)
## 
## Call:
## arima(x = NY, order = c(1, 1, 0))
## 
## Coefficients:
##          ar1
##       0.9303
## s.e.  0.0301
## 
## sigma^2 estimated as 4.757e-06:  log likelihood = 582.91,  aic = -1161.83
## 
## Training set error measures:
##                       ME        RMSE         MAE          MPE       MAPE
## Training set 8.11102e-05 0.002452145 0.001594108 0.0006579105 0.01270284
##                   MASE      ACF1
## Training set 0.3532137 0.2761833

Try Out auto.arima() Auto ARIMA says MA3 performs the best, with 2 orders of differencing Note that KPSS test suggested that a second difference may be preferred as well (still non-stationary after one difference)

mod_auto = auto.arima(NY,stepwise=FALSE,approximation=FALSE,stationary=FALSE)
summary(mod_auto)
## Series: NY 
## ARIMA(0,2,3)(0,0,1)[12] 
## 
## Coefficients:
##          ma1      ma2      ma3    sma1
##       0.7762  -0.1869  -0.4297  0.3512
## s.e.  0.0894   0.1143   0.0785  0.1164
## 
## sigma^2 = 5.343e-06:  log likelihood = 616.39
## AIC=-1222.78   AICc=-1222.26   BIC=-1208.72
## 
## Training set error measures:
##                         ME        RMSE         MAE           MPE       MAPE
## Training set -7.666097e-05 0.002255378 0.001337053 -0.0005862646 0.01065657
##                    MASE       ACF1
## Training set 0.02532907 -0.1590147

7 Box-Ljung test

It is a test of independence at all lags up to the one specified. Instead of testing randomness at each distinct lag, it tests the “overall” randomness based on a number of lags It is applied to the residuals of a fitted ARIMA model, not the original series, and in such applications the hypothesis actually being tested is that the residuals from the ARIMA model have no autocorrelation.

R code to obtain the box test results:

checkresiduals(mod_auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)(0,0,1)[12]
## Q* = 12.285, df = 20, p-value = 0.9059
## 
## Model df: 4.   Total lags used: 24
Box.test(mod_auto$residuals,type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  mod_auto$residuals
## X-squared = 3.2372, df = 1, p-value = 0.07198
Box.test(mod_auto$residuals,type = 'Ljung-Box',lag=5)
## 
##  Box-Ljung test
## 
## data:  mod_auto$residuals
## X-squared = 6.0532, df = 5, p-value = 0.3011

I calculated the RMSE value

in_sample_pred = mod_auto$residuals+NY

RMSE = sqrt(mean((in_sample_pred - NY)^2,na.rm=T))
RMSE
## [1] 0.002255378

8 Forecasting using an ARIMA model

The parameters of that ARIMA model can be used as a predictive model for making forecasts for future values of the time series once the best-suited model is selected for time series data. There is a function called predict() which is used for predictions from the results of various model fitting functions. It takes an argument n.ahead() specifying how many time steps ahead to predict.

predict(mod_auto,n.ahead = 5)
## $pred
##           Jan Feb Mar Apr May Jun Jul Aug      Sep      Oct      Nov      Dec
## 2006                                      12.64041 12.64502 12.65156 12.66003
## 2007 12.66780                                                                
## 
## $se
##              Jan Feb Mar Apr May Jun Jul Aug         Sep         Oct
## 2006                                         0.002311537 0.006820873
## 2007 0.023457124                                                    
##              Nov         Dec
## 2006 0.012180012 0.017648411
## 2007
library(forecast)
futurVal <- forecast(mod_auto,h=10, level=c(99.5))
plot(futurVal)

The forecasts are shown as a blue line, with the 80% prediction intervals as a dark shaded area, and the 95% prediction intervals as a light shaded area.

THANK YOU