MAIN DATA SET
We can see from below summary that we have id as primary key, data from 01 Jan 2013 to 15 Aug 2017. There are lot of NA values in the oil prices
summary(df_main)
## id date store_nbr family
## Min. : 0 Min. :2013-01-01 Min. : 1.0 Length:3054348
## 1st Qu.: 754677 1st Qu.:2014-03-01 1st Qu.:14.0 Class :character
## Median :1507572 Median :2015-04-28 Median :27.5 Mode :character
## Mean :1504277 Mean :2015-04-26 Mean :27.5
## 3rd Qu.:2255120 3rd Qu.:2016-06-22 3rd Qu.:41.0
## Max. :3000887 Max. :2017-08-15 Max. :54.0
##
## sales onpromotion dcoilwtico type.x
## Min. : 0 Min. : 0.000 Min. : 26.2 Length:3054348
## 1st Qu.: 0 1st Qu.: 0.000 1st Qu.: 46.4 Class :character
## Median : 11 Median : 0.000 Median : 53.4 Mode :character
## Mean : 359 Mean : 2.618 Mean : 68.0
## 3rd Qu.: 196 3rd Qu.: 0.000 3rd Qu.: 95.8
## Max. :124717 Max. :741.000 Max. :110.6
## NA's :955152
## locale locale_name description transferred
## Length:3054348 Length:3054348 Length:3054348 Mode :logical
## Class :character Class :character Class :character FALSE:486486
## Mode :character Mode :character Mode :character TRUE :16038
## NA's :2551824
##
##
##
## city state type.y cluster
## Length:3054348 Length:3054348 Length:3054348 Min. : 1.000
## Class :character Class :character Class :character 1st Qu.: 4.000
## Mode :character Mode :character Mode :character Median : 8.500
## Mean : 8.481
## 3rd Qu.:13.000
## Max. :17.000
##