Data Analytics in Marketing Project
Data Analytics in Marketing Project
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.2
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'tidyr' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.3
## Warning: package 'purrr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'psych' was built under R version 3.5.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## Warning: package 'rattle' was built under R version 3.5.3
## Rattle: A free graphical interface for data science with R.
## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
## Warning: package 'gplots' was built under R version 3.5.3
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
## corrplot 0.84 loaded
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
##
## Attaching package: 'corrgram'
## The following object is masked from 'package:plyr':
##
## baseball
## Warning: package 'MASS' was built under R version 3.5.3
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
Data Cleaning
The first stage of our project would be to identify ommited and anomality in the data and clean
## [1] 24973 9
# In order to underrstand the data better we would like to look at what are the variables
colnames(df)## [1] "Brand" "Condition" "Fuel"
## [4] "KMs.Driven" "Model" "Price"
## [7] "Registered.City" "Transaction.Type" "Year"
We will next determine which are the variables which have empty values
## [1] 0
## [1] 0
## [1] 0
## [1] 2286
## [1] 0
## [1] 0
## [1] 0
## [1] 2284
From the following operations we see there are missing values in KMs Driven and Year but we would like to investigate further if these are the only empty fields or are there others too
## 'data.frame': 24973 obs. of 9 variables:
## $ Brand : Factor w/ 24 levels "","Audi","BMW",..: 24 23 23 23 24 24 23 24 23 10 ...
## $ Condition : Factor w/ 3 levels "","New","Used": 3 3 3 3 3 3 2 2 3 3 ...
## $ Fuel : Factor w/ 6 levels "","CNG","Diesel",..: 3 6 2 6 6 6 2 6 2 6 ...
## $ KMs.Driven : int 1 100000 12345 94000 100000 80000 65000 10241 83000 1 ...
## $ Model : Factor w/ 304 levels "","120 Y","2 Series",..: 221 49 49 31 98 100 109 100 31 83 ...
## $ Price : int 2100000 380000 340000 535000 1430000 1620000 450000 2900000 490000 480000 ...
## $ Registered.City : Factor w/ 62 levels "","Abbottabad",..: 26 26 26 26 26 26 26 26 26 26 ...
## $ Transaction.Type: Factor w/ 3 levels "","Cash","Installment/Leasing": 2 2 2 2 2 2 2 2 2 2 ...
## $ Year : int 1997 2006 1998 2010 2013 2012 2006 2017 2009 1997 ...
In the following case we can see that many of the Factor variables are starting with "", this means they are empty strings and are orginally missing values, we would need to clean these up as well
# We would First change the observation "" to NA or a blank space
omit1 <- which( levels(df$Brand) == "" )
levels(df$Brand)[omit1] <- NA
omit2 <- which( levels(df$Condition) == "" )
levels(df$Condition)[omit2] <- NA
omit3 <- which( levels(df$Fuel) == "" )
levels(df$Fuel)[omit3] <- NA
omit4 <- which( levels(df$Model) == "" )
levels(df$Model)[omit4] <- NA
omit5 <- which( levels(df$Registered.City) == "" )
levels(df$Registered.City)[omit5] <- NA
omit6 <- which( levels(df$Transaction.Type) == "" )
levels(df$Transaction.Type)[omit6] <- NA# By doing the above activity we see there is a stark difference of hidden omitted value being seen
sum(is.na(df$Brand))## [1] 2137
## [1] 2136
## [1] 2445
## [1] 2286
## [1] 2448
## [1] 4636
## [1] 2445
## [1] 2284
From the information given above, we determine there were many hidden omitted values We would next like to fix the continous variables first by replace the blank space by something relevant
# We would start off with fixing the KMs Driven Variable
# Counting unique, missing, median and mean values in KMs Driven
df %>% summarise(n = n_distinct(df$KMs.Driven),
na = sum(is.na(df$KMs.Driven)),
m = median(df$KMs.Driven, na.rm = TRUE),
x = mean(df$KMs.Driven, na.rm = TRUE))## n na m x
## 1 3146 2286 66510 127811.2
# We would like to fill up the missing values with median value so it won't affect the distribution that much
# Mutate missing values
df <- df %>%
mutate(KMs.Driven
= replace(KMs.Driven,
is.na(KMs.Driven),
median(KMs.Driven, na.rm = TRUE)))# Next we would like to do the same to the other variable called Year
# Counting unique, missing, median and mean values in KMs Driven
df %>% summarise(n = n_distinct(df$Year),
na = sum(is.na(df$Year)),
m = median(df$Year, na.rm = TRUE),
x = mean(df$Year, na.rm = TRUE))## n na m x
## 1 66 2284 2008 2005.902
# We would like to fill up the missing values with median value so it won't affect the distribution that much
# Mutate missing values
df <- df %>%
mutate(Year
= replace(Year,
is.na(Year),
median(Year, na.rm = TRUE)))Now since the contionous variables are fixed, next we would try to manipulate the data to our convinience for our research
# In order to understand what is happening to the factor levels we investigate further
levels(df$Registered.City)## [1] "Abbottabad" "Ali Masjid" "Askoley"
## [4] "Attock" "Badin" "Bagh"
## [7] "Bahawalnagar" "Bahawalpur" "Bela"
## [10] "Bhimber" "Burewala" "Chilas"
## [13] "Chiniot" "Chitral" "Dera Ghazi Khan"
## [16] "Dera Ismail Khan" "Faisalabad" "Gujranwala"
## [19] "Gujrat" "Haripur" "Hyderabad"
## [22] "Islamabad" "Jhelum" "Kandhura"
## [25] "Karachi" "Karak" "Kasur"
## [28] "Khairpur" "Khanewal" "Khanpur"
## [31] "Khaplu" "Khushab" "Kohat"
## [34] "Lahore" "Larkana" "Lasbela"
## [37] "Mandi Bahauddin" "Mardan" "Mirpur"
## [40] "Multan" "Muzaffarabad" "Muzaffargarh"
## [43] "Nawabshah" "Nowshera" "Okara"
## [46] "Pakpattan" "Peshawar" "Quetta"
## [49] "Rahimyar Khan" "Rawalpindi" "Sahiwal"
## [52] "Sargodha" "Sheikhüpura" "Sialkot"
## [55] "Sukkar" "Sukkur" "Swabi"
## [58] "Swat" "Tank" "Vehari"
## [61] "Wah"
# In order to derive meaning out of this project we would like to divide the cities in capital city and non-capital city
revalue(df$Registered.City, c("Muzaffarabad" = "Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Quetta" = "Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Islamabad" = "Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Peshawar" = "Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Lahore" = "Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Karachi" = "Capital")) -> df$Registered.City## [1] "Abbottabad" "Ali Masjid" "Askoley"
## [4] "Attock" "Badin" "Bagh"
## [7] "Bahawalnagar" "Bahawalpur" "Bela"
## [10] "Bhimber" "Burewala" "Chilas"
## [13] "Chiniot" "Chitral" "Dera Ghazi Khan"
## [16] "Dera Ismail Khan" "Faisalabad" "Gujranwala"
## [19] "Gujrat" "Haripur" "Hyderabad"
## [22] "Capital" "Jhelum" "Kandhura"
## [25] "Karak" "Kasur" "Khairpur"
## [28] "Khanewal" "Khanpur" "Khaplu"
## [31] "Khushab" "Kohat" "Larkana"
## [34] "Lasbela" "Mandi Bahauddin" "Mardan"
## [37] "Mirpur" "Multan" "Muzaffargarh"
## [40] "Nawabshah" "Nowshera" "Okara"
## [43] "Pakpattan" "Rahimyar Khan" "Rawalpindi"
## [46] "Sahiwal" "Sargodha" "Sheikhüpura"
## [49] "Sialkot" "Sukkar" "Sukkur"
## [52] "Swabi" "Swat" "Tank"
## [55] "Vehari" "Wah"
# Since capital cities are sorted we would next sort the non-capitals as well
revalue(df$Registered.City, c("Abbottabad" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Ali Masjid" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Askoley" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Attock" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Badin" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Bagh" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Bahawalnagar" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Bahawalpur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Bela" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Bhimber" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Burewala" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Chilas" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Chiniot" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Chitral" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Dera Ghazi Khan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Dera Ismail Khan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Faisalabad" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Gujranwala" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Gujrat" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Haripur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Hyderabad" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Jhelum" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Kandhura" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Karak" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Kasur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Khairpur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Khanewal" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Khanpur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Khaplu" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Khushab" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Kohat" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Larkana" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Lasbela" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Mandi Bahauddin" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Mardan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Mirpur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Multan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Muzaffargarh" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Nawabshah" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Nowshera" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Okara" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Pakpattan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Rahimyar Khan" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Rawalpindi" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sahiwal" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sargodha" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sheikhüpura" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sialkot" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sukkar" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Sukkur" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Swabi" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Swat" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Tank" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Vehari" = "Non-Capital")) -> df$Registered.City
revalue(df$Registered.City, c("Wah" = "Non-Capital")) -> df$Registered.CityNow that Registered city is fixed we would like to manipulate the models of the car as well
revalue(df$Model, c("120 Y" = "Coupe")) -> df$Model
revalue(df$Model, c("2 Series" = "Sedan")) -> df$Model
revalue(df$Model, c("200 D" = "Sedan")) -> df$Model
revalue(df$Model, c("240 Gd" = "SUV")) -> df$Model
revalue(df$Model, c("250 D" = "Hatchback")) -> df$Model
revalue(df$Model, c("3 Series" = "Sedan")) -> df$Model
revalue(df$Model, c("323" = "Sedan")) -> df$Model
revalue(df$Model, c("350Z" = "Coupe")) -> df$Model
revalue(df$Model, c("5 Series" = "Sedan")) -> df$Model
revalue(df$Model, c("6 Series" = "Coupe")) -> df$Model
revalue(df$Model, c("626" = "Sedan")) -> df$Model
revalue(df$Model, c("7 Series" = "Sedan")) -> df$Model
revalue(df$Model, c("808" = "Sedan")) -> df$Model
revalue(df$Model, c("86" = "Coupe")) -> df$Model
revalue(df$Model, c("929" = "Sedan")) -> df$Model
revalue(df$Model, c("A Class" = "Hatchback")) -> df$Model
revalue(df$Model, c("A1" = "Hatchback")) -> df$Model
revalue(df$Model, c("A3" = "Hatchback")) -> df$Model
revalue(df$Model, c("A4" = "Sedan")) -> df$Model
revalue(df$Model, c("A5" = "Sedan")) -> df$Model
revalue(df$Model, c("A6" = "Sedan")) -> df$Model
revalue(df$Model, c("Accent" = "Sedan")) -> df$Model
revalue(df$Model, c("Accord" = "Sedan")) -> df$Model
revalue(df$Model, c("Acty" = "Truck")) -> df$Model
revalue(df$Model, c("Acura" = "Hatchback")) -> df$Model
revalue(df$Model, c("AD Van" = "SUV")) -> df$Model
revalue(df$Model, c("Airwave" = "SUV")) -> df$Model
revalue(df$Model, c("Allion" = "Sedan")) -> df$Model
revalue(df$Model, c("Alphard Hybrid" = "Hatchback")) -> df$Model
revalue(df$Model, c("Alto" = "Hatchback")) -> df$Model
revalue(df$Model, c("Alto Lapin" = "SUV")) -> df$Model
revalue(df$Model, c("APV" = "SUV")) -> df$Model
revalue(df$Model, c("Aqua" = "Hatchback")) -> df$Model
revalue(df$Model, c("Atrai Wagon" = "SUV")) -> df$Model
revalue(df$Model, c("Auris" = "Hatchback")) -> df$Model
revalue(df$Model, c("Avanza" = "Hatchback")) -> df$Model
revalue(df$Model, c("Axela" = "Hatchback")) -> df$Model
revalue(df$Model, c("Aygo" = "Hatchback")) -> df$Model
revalue(df$Model, c("Azwagon" = "SUV")) -> df$Model
revalue(df$Model, c("B B" = "SUV")) -> df$Model
revalue(df$Model, c("B2200" = "Truck")) -> df$Model
revalue(df$Model, c("Baleno" = "Hatchback")) -> df$Model
revalue(df$Model, c("Beat" = "Hatchback")) -> df$Model
revalue(df$Model, c("Bego" = "SUV")) -> df$Model
revalue(df$Model, c("Belta" = "Sedan")) -> df$Model
revalue(df$Model, c("Blue Bird" = "Sedan")) -> df$Model
revalue(df$Model, c("Bluebird Sylphy" = "Sedan")) -> df$Model
revalue(df$Model, c("Bolan" = "SUV")) -> df$Model
revalue(df$Model, c("Boon" = "Hatchback")) -> df$Model
revalue(df$Model, c("BR-V" = "Hatchback")) -> df$Model
revalue(df$Model, c("C-HR" = "Hatchback")) -> df$Model
revalue(df$Model, c("C Class" = "Sedan")) -> df$Model
revalue(df$Model, c("Cami" = "SUV")) -> df$Model
revalue(df$Model, c("Camry" = "Sedan")) -> df$Model
revalue(df$Model, c("Cappuccino" = "Coupe")) -> df$Model
revalue(df$Model, c("Carisma" = "Sedan")) -> df$Model
revalue(df$Model, c("Carol" = "Hatchback")) -> df$Model
revalue(df$Model, c("Carol Eco" = "Hatchback")) -> df$Model
revalue(df$Model, c("Carrier" = "Hatchback")) -> df$Model
revalue(df$Model, c("Carry" = "Truck")) -> df$Model
revalue(df$Model, c("cars-other-23" = "SUV")) -> df$Model
revalue(df$Model, c("cars-other-37" = "SUV")) -> df$Model
revalue(df$Model, c("cars-other-5" = "SUV")) -> df$Model
revalue(df$Model, c("cars-other-7" = "SUV")) -> df$Model
revalue(df$Model, c("cars-suzuki-86" = "Sedan")) -> df$Model
revalue(df$Model, c("Cast" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cayenne" = "Hatchback")) -> df$Model
revalue(df$Model, c("Celica" = "Sedan")) -> df$Model
revalue(df$Model, c("Cervo" = "Hatchback")) -> df$Model
revalue(df$Model, c("Charade" = "Sedan")) -> df$Model
revalue(df$Model, c("Charmant" = "Sedan")) -> df$Model
revalue(df$Model, c("Chitral" = "Sedan")) -> df$Model
revalue(df$Model, c("Ciaz" = "Sedan")) -> df$Model
revalue(df$Model, c("Cielo" = "Sedan")) -> df$Model
revalue(df$Model, c("City Aspire" = "Sedan")) -> df$Model
revalue(df$Model, c("City IDSI" = "Sedan")) -> df$Model
revalue(df$Model, c("City IVTEC" = "Sedan")) -> df$Model
revalue(df$Model, c("City Vario" = "Sedan")) -> df$Model
revalue(df$Model, c("Civic EXi" = "Sedan")) -> df$Model
revalue(df$Model, c("Civic Hybrid" = "Sedan")) -> df$Model
revalue(df$Model, c("Civic Prosmetic" = "Sedan")) -> df$Model
revalue(df$Model, c("Civic VTi" = "Hatchback")) -> df$Model
revalue(df$Model, c("Civic VTi Oriel" = "Sedan")) -> df$Model
revalue(df$Model, c("Civic VTi Oriel Prosmatec" = "Sedan")) -> df$Model
revalue(df$Model, c("Classic" = "Coupe")) -> df$Model
revalue(df$Model, c("Clipper" = "Coupe")) -> df$Model
revalue(df$Model, c("CLK Class" = "Coupe")) -> df$Model
revalue(df$Model, c("Coaster" = "Truck")) -> df$Model
revalue(df$Model, c("Colt" = "Hatchback")) -> df$Model
revalue(df$Model, c("Copen" = "Hatchback")) -> df$Model
revalue(df$Model, c("Corolla 2.0 D" = "Sedan")) -> df$Model
revalue(df$Model, c("Corolla Assista" = "Sedan")) -> df$Model
revalue(df$Model, c("Corolla Axio" = "Sedan")) -> df$Model
revalue(df$Model, c("Corolla Fielder" = "Hatchback")) -> df$Model
revalue(df$Model, c("Corolla GLI" = "Sedan")) -> df$Model
revalue(df$Model, c("Corolla XE" = "Sedan")) -> df$Model
revalue(df$Model, c("Corolla XLI" = "Sedan")) -> df$Model
revalue(df$Model, c("Corona" = "Sedan")) -> df$Model
revalue(df$Model, c("Corrolla Altis" = "Sedan")) -> df$Model
revalue(df$Model, c("CR-V" = "Hatchback")) -> df$Model
revalue(df$Model, c("CR-Z" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cressida" = "Sedan")) -> df$Model
revalue(df$Model, c("Cross Road" = "SUV")) -> df$Model
revalue(df$Model, c("Crown" = "Sedan")) -> df$Model
revalue(df$Model, c("Cruze" = "Sedan")) -> df$Model
revalue(df$Model, c("CT200h" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cultus VX" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cultus VXL" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cultus VXR" = "Hatchback")) -> df$Model
revalue(df$Model, c("Cuore" = "Hatchback")) -> df$Model
revalue(df$Model, c("D Series" = "Truck")) -> df$Model
revalue(df$Model, c("Dayz" = "Hatchback")) -> df$Model
revalue(df$Model, c("Dayz Highway Star" = "Hatchback")) -> df$Model
revalue(df$Model, c("Demio" = "Hatchback")) -> df$Model
revalue(df$Model, c("Duet" = "Hatchback")) -> df$Model
revalue(df$Model, c("E Class" = "Coupe")) -> df$Model
revalue(df$Model, c("Echo" = "Sedan")) -> df$Model
revalue(df$Model, c("EK Custom" = "Hatchback")) -> df$Model
revalue(df$Model, c("EK Space Custom" = "Hatchback")) -> df$Model
revalue(df$Model, c("Ek Sport" = "Hatchback")) -> df$Model
revalue(df$Model, c("Ek Wagon" = "Hatchback")) -> df$Model
revalue(df$Model, c("Elantra" = "Sedan")) -> df$Model
revalue(df$Model, c("Escudo" = "SUV")) -> df$Model
revalue(df$Model, c("Esse" = "Hatchback")) -> df$Model
revalue(df$Model, c("Estima" = "SUV")) -> df$Model
revalue(df$Model, c("Every" = "SUV")) -> df$Model
revalue(df$Model, c("Every Wagon" = "Hatchback")) -> df$Model
revalue(df$Model, c("Excel" = "Coupe")) -> df$Model
revalue(df$Model, c("Exclusive" = "Coupe")) -> df$Model
revalue(df$Model, c("Familia Van" = "SUV")) -> df$Model
revalue(df$Model, c("Figaro" = "Coupe")) -> df$Model
revalue(df$Model, c("Fit" = "Hatchback")) -> df$Model
revalue(df$Model, c("Fj Cruiser" = "SUV")) -> df$Model
revalue(df$Model, c("Flair" = "Hatchback")) -> df$Model
revalue(df$Model, c("Flair Wagon" = "Hatchback")) -> df$Model
revalue(df$Model, c("Fortuner" = "SUV")) -> df$Model
revalue(df$Model, c("Freed" = "Hatchback")) -> df$Model
revalue(df$Model, c("FX" = "Hatchback")) -> df$Model
revalue(df$Model, c("Galant" = "Hatchback")) -> df$Model
revalue(df$Model, c("Gilgit" = "Sedan")) -> df$Model
revalue(df$Model, c("Grace Hybrid" = "Sedan")) -> df$Model
revalue(df$Model, c("Gran" = "Coupe")) -> df$Model
revalue(df$Model, c("Gx Series" = "SUV")) -> df$Model
revalue(df$Model, c("Harrier" = "SUV")) -> df$Model
revalue(df$Model, c("Hiace" = "SUV")) -> df$Model
revalue(df$Model, c("Hijet" = "SUV")) -> df$Model
revalue(df$Model, c("Hilux" = "Truck")) -> df$Model
revalue(df$Model, c("HR-V" = "Hatchback")) -> df$Model
revalue(df$Model, c("Hse 4.6" = "SUV")) -> df$Model
revalue(df$Model, c("Hustler" = "SUV")) -> df$Model
revalue(df$Model, c("I" = "Coupe")) -> df$Model
revalue(df$Model, c("I Mivec" = "Hatchback")) -> df$Model
revalue(df$Model, c("i8" = "Coupe")) -> df$Model
revalue(df$Model, c("Ignis" = "Hatchback")) -> df$Model
revalue(df$Model, c("Infinity" = "Sedan")) -> df$Model
revalue(df$Model, c("Insight" = "Sedan")) -> df$Model
revalue(df$Model, c("iQ" = "Hatchback")) -> df$Model
revalue(df$Model, c("Is Series" = "Coupe")) -> df$Model
revalue(df$Model, c("ISIS" = "SUV")) -> df$Model
revalue(df$Model, c("IST" = "Hatchback")) -> df$Model
revalue(df$Model, c("Jade Hybrid" = "SUV")) -> df$Model
revalue(df$Model, c("Jimny" = "SUV")) -> df$Model
revalue(df$Model, c("Jimny Sierra" = "SUV")) -> df$Model
revalue(df$Model, c("Joy" = "Hatchback")) -> df$Model
revalue(df$Model, c("Juke" = "Hatchback")) -> df$Model
revalue(df$Model, c("Kalam" = "SUV")) -> df$Model
revalue(df$Model, c("Kei" = "SUV")) -> df$Model
revalue(df$Model, c("Khyber" = "Hatchback")) -> df$Model
revalue(df$Model, c("Kix" = "SUV")) -> df$Model
revalue(df$Model, c("Kizashi" = "Sedan")) -> df$Model
revalue(df$Model, c("L200" = "Truck")) -> df$Model
revalue(df$Model, c("L300" = "SUV")) -> df$Model
revalue(df$Model, c("Lancer" = "Sedan")) -> df$Model
revalue(df$Model, c("Lancer Evolution" = "Sedan")) -> df$Model
revalue(df$Model, c("Land Cruiser" = "SUV")) -> df$Model
revalue(df$Model, c("Liana" = "Sedan")) -> df$Model
revalue(df$Model, c("Life" = "Hatchback")) -> df$Model
revalue(df$Model, c("Lite Ace" = "SUV")) -> df$Model
revalue(df$Model, c("Luce" = "Coupe")) -> df$Model
revalue(df$Model, c("Lucida" = "Hatchback")) -> df$Model
revalue(df$Model, c("LX Series" = "SUV")) -> df$Model
revalue(df$Model, c("March" = "Hatchback")) -> df$Model
revalue(df$Model, c("Margalla" = "Sedan")) -> df$Model
revalue(df$Model, c("Mark II" = "Sedan")) -> df$Model
revalue(df$Model, c("Mark X" = "Sedan")) -> df$Model
revalue(df$Model, c("Matiz" = "Hatchback")) -> df$Model
revalue(df$Model, c("Mega Carry Xtra" = "Truck")) -> df$Model
revalue(df$Model, c("Mehran VX" = "Hatchback")) -> df$Model
revalue(df$Model, c("Mehran VXR" = "Hatchback")) -> df$Model
revalue(df$Model, c("Minica" = "Hatchback")) -> df$Model
revalue(df$Model, c("Minicab Bravo" = "SUV")) -> df$Model
revalue(df$Model, c("Mira" = "Hatchback")) -> df$Model
revalue(df$Model, c("Mira Cocoa" = "Hatchback")) -> df$Model
revalue(df$Model, c("Mirage" = "Hatchback")) -> df$Model
revalue(df$Model, c("Moco" = "Hatchback")) -> df$Model
revalue(df$Model, c("Move" = "Hatchback")) -> df$Model
revalue(df$Model, c("MR Wagon" = "Hatchback")) -> df$Model
revalue(df$Model, c("MR2" = "Coupe")) -> df$Model
revalue(df$Model, c("Murrano" = "Hatchback")) -> df$Model
revalue(df$Model, c("N Box" = "Hatchback")) -> df$Model
revalue(df$Model, c("N One" = "Hatchback")) -> df$Model
revalue(df$Model, c("N Wgn" = "Hatchback")) -> df$Model
revalue(df$Model, c("Noah" = "Hatchback")) -> df$Model
revalue(df$Model, c("Note" = "Hatchback")) -> df$Model
revalue(df$Model, c("Optra" = "Sedan")) -> df$Model
revalue(df$Model, c("Other" = "Sedan")) -> df$Model
revalue(df$Model, c("Otti" = "Hatchback")) -> df$Model
revalue(df$Model, c("Pajero" = "SUV")) -> df$Model
revalue(df$Model, c("Pajero Mini" = "Hatchback")) -> df$Model
revalue(df$Model, c("Palette" = "SUV")) -> df$Model
revalue(df$Model, c("Palette Sw" = "SUV")) -> df$Model
revalue(df$Model, c("Passo" = "Hatchback")) -> df$Model
revalue(df$Model, c("Patrol" = "SUV")) -> df$Model
revalue(df$Model, c("Pickup" = "Truck")) -> df$Model
revalue(df$Model, c("Pino" = "Hatchback")) -> df$Model
revalue(df$Model, c("Pixis Epoch" = "Hatchback")) -> df$Model
revalue(df$Model, c("Platz" = "Sedan")) -> df$Model
revalue(df$Model, c("Porte" = "Hatchback")) -> df$Model
revalue(df$Model, c("Potohar" = "SUV")) -> df$Model
revalue(df$Model, c("Prado" = "SUV")) -> df$Model
revalue(df$Model, c("Premio" = "Sedan")) -> df$Model
revalue(df$Model, c("President" = "SUV")) -> df$Model
revalue(df$Model, c("Previa" = "SUV")) -> df$Model
revalue(df$Model, c("Pride" = "Sedan")) -> df$Model
revalue(df$Model, c("Prius" = "Sedan")) -> df$Model
revalue(df$Model, c("Prius Alpha" = "Hatchback")) -> df$Model
revalue(df$Model, c("Probox" = "SUV")) -> df$Model
revalue(df$Model, c("Pulsar" = "Hatchback")) -> df$Model
revalue(df$Model, c("Q7" = "SUV")) -> df$Model
revalue(df$Model, c("Qashqai" = "Hatchback")) -> df$Model
revalue(df$Model, c("Racer" = "Hatchback")) -> df$Model
revalue(df$Model, c("Ractis" = "Hatchback")) -> df$Model
revalue(df$Model, c("Raum" = "Hatchback")) -> df$Model
revalue(df$Model, c("Rav4" = "Hatchback")) -> df$Model
revalue(df$Model, c("Ravi" = "Truck")) -> df$Model
revalue(df$Model, c("Rocky" = "SUV")) -> df$Model
revalue(df$Model, c("Roox" = "Hatchback")) -> df$Model
revalue(df$Model, c("Rush" = "SUV")) -> df$Model
revalue(df$Model, c("Rvr" = "Hatchback")) -> df$Model
revalue(df$Model, c("RX Series" = "Hatchback")) -> df$Model
revalue(df$Model, c("RX8" = "Coupe")) -> df$Model
revalue(df$Model, c("S Class" = "Sedan")) -> df$Model
revalue(df$Model, c("S660" = "Coupe")) -> df$Model
revalue(df$Model, c("Safari" = "SUV")) -> df$Model
revalue(df$Model, c("Santro" = "Hatchback")) -> df$Model
revalue(df$Model, c("Scrum" = "Truck")) -> df$Model
revalue(df$Model, c("Scrum Wagon" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sera" = "Coupe")) -> df$Model
revalue(df$Model, c("Shehzore" = "Truck")) -> df$Model
revalue(df$Model, c("Shogun" = "SUV")) -> df$Model
revalue(df$Model, c("Sienta" = "Hatchback")) -> df$Model
revalue(df$Model, c("Silverado" = "Truck")) -> df$Model
revalue(df$Model, c("Sirion" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sirius" = "Hatchback")) -> df$Model
revalue(df$Model, c("Smart" = "Hatchback")) -> df$Model
revalue(df$Model, c("Solio" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sonata" = "Sedan")) -> df$Model
revalue(df$Model, c("Sonica" = "Hatchback")) -> df$Model
revalue(df$Model, c("Spacia" = "Hatchback")) -> df$Model
revalue(df$Model, c("Spark" = "Hatchback")) -> df$Model
revalue(df$Model, c("Spectra" = "Sedan")) -> df$Model
revalue(df$Model, c("Spike" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sport" = "Coupe")) -> df$Model
revalue(df$Model, c("Sportage" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sprinter" = "Hatchback")) -> df$Model
revalue(df$Model, c("Starlet" = "Hatchback")) -> df$Model
revalue(df$Model, c("Stream" = "Hatchback")) -> df$Model
revalue(df$Model, c("Succeed" = "SUV")) -> df$Model
revalue(df$Model, c("Sunny" = "Sedan")) -> df$Model
revalue(df$Model, c("Supra" = "Coupe")) -> df$Model
revalue(df$Model, c("Surf" = "SUV")) -> df$Model
revalue(df$Model, c("Swift" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sx4" = "Hatchback")) -> df$Model
revalue(df$Model, c("Sylphy" = "Sedan")) -> df$Model
revalue(df$Model, c("Tanto" = "Hatchback")) -> df$Model
revalue(df$Model, c("Terios Kid" = "Hatchback")) -> df$Model
revalue(df$Model, c("Thats" = "Hatchback")) -> df$Model
revalue(df$Model, c("Tiida" = "Hatchback")) -> df$Model
revalue(df$Model, c("Toppo" = "Hatchback")) -> df$Model
revalue(df$Model, c("Town Ace" = "Hatchback")) -> df$Model
revalue(df$Model, c("Toyo Ace" = "Truck")) -> df$Model
revalue(df$Model, c("Tundra" = "Truck")) -> df$Model
revalue(df$Model, c("V2" = "Hatchback")) -> df$Model
revalue(df$Model, c("Vamos" = "Hatchback")) -> df$Model
revalue(df$Model, c("Van" = "Hatchback")) -> df$Model
revalue(df$Model, c("Vanette" = "Hatchback")) -> df$Model
revalue(df$Model, c("Verossa" = "Sedan")) -> df$Model
revalue(df$Model, c("Vezel" = "Hatchback")) -> df$Model
revalue(df$Model, c("Vitara" = "Hatchback")) -> df$Model
revalue(df$Model, c("Vitz" = "Hatchback")) -> df$Model
revalue(df$Model, c("Vogue" = "SUV")) -> df$Model
revalue(df$Model, c("Wagon R" = "Hatchback")) -> df$Model
revalue(df$Model, c("Wagon R Stingray" = "Hatchback")) -> df$Model
revalue(df$Model, c("Wake" = "Hatchback")) -> df$Model
revalue(df$Model, c("Wingroad" = "Hatchback")) -> df$Model
revalue(df$Model, c("Wish" = "Hatchback")) -> df$Model
revalue(df$Model, c("X-PV" = "Hatchback")) -> df$Model
revalue(df$Model, c("X Trail" = "Hatchback")) -> df$Model
revalue(df$Model, c("X1" = "Hatchback")) -> df$Model
revalue(df$Model, c("X5 Series" = "Hatchback")) -> df$Model
revalue(df$Model, c("Yaris" = "Hatchback")) -> df$Model
revalue(df$Model, c("Zest" = "Sedan")) -> df$Model
revalue(df$Model, c("Zest Spark" = "Hatchback")) -> df$ModelNow we look at the distribution of the cars according to the brand
# removing a library
detach("package:plyr", unload = TRUE)
# Counts of cars by Brand
group <- group_by(df,Brand)## Warning: Factor `Brand` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 24 x 3
## Brand Cars PercentCars
## <fct> <int> <dbl>
## 1 Audi 18 0.0721
## 2 BMW 31 0.124
## 3 Changan 9 0.0360
## 4 Chevrolet 47 0.188
## 5 Classic & Antiques 13 0.0521
## 6 Daewoo 72 0.288
## 7 Daihatsu 2476 9.91
## 8 FAW 80 0.320
## 9 Honda 3324 13.3
## 10 Hyundai 268 1.07
## # ... with 14 more rows
We cannot determine the model or brand of the cars which are being sold so we would keep it like that for now
## Warning: Factor `Condition` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 3
## Condition Cars PercentCars
## <fct> <int> <dbl>
## 1 New 4365 17.5
## 2 Used 18472 74.0
## 3 <NA> 2136 8.55
Since we don’t know in what condition the cars were in so we would leave it for now as well
## Warning: Factor `Transaction.Type` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 3
## Transaction.Type Cars PercentCars
## <fct> <int> <dbl>
## 1 Cash 21513 86.1
## 2 Installment/Leasing 1015 4.06
## 3 <NA> 2445 9.79
Here we can make an assumption that the cars which were sold were for Cash rather than Installments, due to lack of paperwork. So we apply it to the dataset.
# Mutate missing values
df <- df %>%
mutate(Year
= replace(Year,
is.na(Year),
median(Year, na.rm = TRUE)))Next since we are not able to do anything with the empty observations we would remove them
## [1] 0
## [1] 20334 9
Next we will remove outliers from Year and Price
outliers1 <- boxplot(df$Price, plot=FALSE)$out
outliers2 <- boxplot(df$Year, plot=FALSE)$out
df <- df[-which(df$Price %in% outliers1),]
df <- df[-which(df$Year %in% outliers2),] #Descriptive Statistics
## vars n mean sd median trimmed
## Brand* 1 19092 17.94 6.32 22 18.70
## Condition* 2 19092 1.84 0.37 2 1.93
## Fuel* 3 19092 3.58 1.87 5 3.73
## KMs.Driven 4 19092 137100.95 632337.21 71000 67732.67
## Model* 5 19092 3.21 0.97 4 3.27
## Price 6 19092 719432.71 461309.88 600000 669353.02
## Registered.City* 7 19092 1.97 0.18 2 2.00
## Transaction.Type* 8 19092 1.05 0.22 1 1.00
## Year 9 19092 2005.57 8.81 2007 2006.47
## mad min max range skew kurtosis se
## Brand* 1.48 1 23 22 -0.87 -1.07 0.05
## Condition* 0.00 1 2 1 -1.86 1.46 0.00
## Fuel* 0.00 1 5 4 -0.60 -1.59 0.01
## KMs.Driven 60649.46 1 10000000 9999999 13.67 200.85 4576.39
## Model* 0.00 1 5 4 -0.48 -1.46 0.01
## Price 437367.00 50000 2145000 2095000 0.87 0.12 3338.62
## Registered.City* 0.00 1 2 1 -5.14 24.45 0.00
## Transaction.Type* 0.00 1 2 1 4.14 15.15 0.00
## Year 7.41 1983 2020 37 -0.83 -0.10 0.06
Counts of cars by Brand
group <- group_by(df,Brand)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 23 x 3
## Brand Cars PercentCars
## <fct> <int> <dbl>
## 1 Audi 8 0.0419
## 2 BMW 14 0.0733
## 3 Changan 7 0.0367
## 4 Chevrolet 47 0.246
## 5 Classic & Antiques 1 0.00524
## 6 Daewoo 68 0.356
## 7 Daihatsu 2037 10.7
## 8 FAW 70 0.367
## 9 Honda 2766 14.5
## 10 Hyundai 255 1.34
## # ... with 13 more rows
Counts of cars by Condition
group <- group_by(df,Condition)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 2 x 3
## Condition Cars PercentCars
## <fct> <int> <dbl>
## 1 New 3043 15.9
## 2 Used 16049 84.1
Counts of cars by Fuel
group <- group_by(df,Fuel)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 5 x 3
## Fuel Cars PercentCars
## <fct> <int> <dbl>
## 1 CNG 6288 32.9
## 2 Diesel 251 1.31
## 3 Hybrid 578 3.03
## 4 LPG 15 0.0786
## 5 Petrol 11960 62.6
Counts of cars by Model
group <- group_by(df,Model)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 5 x 3
## Model Cars PercentCars
## <fct> <int> <dbl>
## 1 Coupe 287 1.50
## 2 Sedan 6440 33.7
## 3 SUV 1531 8.02
## 4 Hatchback 10699 56.0
## 5 Truck 135 0.707
Counts of cars by Registered City
group <- group_by(df,Registered.City)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 2 x 3
## Registered.City Cars PercentCars
## <fct> <int> <dbl>
## 1 Non-Capital 649 3.40
## 2 Capital 18443 96.6
Counts of cars by Transaction Type
group <- group_by(df,Transaction.Type)
summarise(group,
Cars = n()) %>%
mutate(PercentCars = Cars / sum(Cars)*100)## # A tibble: 2 x 3
## Transaction.Type Cars PercentCars
## <fct> <int> <dbl>
## 1 Cash 18142 95.0
## 2 Installment/Leasing 950 4.98
Summary of whole data broken down by Brand
group0 <- group_by(df,Brand)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 23 x 5
## Brand AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Audi 860625 528437. 2010. 8.9
## 2 BMW 1030000 593866. 1998. 8.66
## 3 Changan 382857. 108277. 2007. 1.8
## 4 Chevrolet 395723. 212917. 2007. 3.69
## 5 Classic & Antiques 850000 NA 2007 NA
## 6 Daewoo 296088. 314554. 1996. 5.41
## 7 Daihatsu 621990. 369999. 2004. 11.2
## 8 FAW 582400 369193. 2016. 3.91
## 9 Honda 927890. 477297. 2006. 7.81
## 10 Hyundai 407235. 138847. 2003. 5.18
## # ... with 13 more rows
Summary of whole data broken down by Condition
group0 <- group_by(df,Condition)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 2 x 5
## Condition AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 New 912940. 509318. 2011 7.44
## 2 Used 682742. 442200. 2005. 8.67
Summary of whole data broken down by Fuel
group0 <- group_by(df,Fuel)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 5 x 5
## Fuel AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 CNG 414938. 231834. 2000. 8.66
## 2 Diesel 968964. 532718. 1999. 10.2
## 3 Hybrid 909262. 658613. 2006. 8.45
## 4 LPG 454267. 485706. 1997. 11.2
## 5 Petrol 865443. 459476. 2008. 7.44
Summary of whole data broken down by Model
group0 <- group_by(df,Model)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 5 x 5
## Model AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Coupe 538888. 408609. 2002. 10.7
## 2 Sedan 880246. 546761. 2004. 9.58
## 3 SUV 683794. 378265. 2006. 9.62
## 4 Hatchback 631816. 384980. 2007. 7.88
## 5 Truck 779807. 432624. 2006. 10.2
Summary of whole data broken down by Registered City
group0 <- group_by(df,Registered.City)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 2 x 5
## Registered.City AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Non-Capital 593484. 402712. 2005. 9.08
## 2 Capital 723865. 462622. 2006. 8.8
Summary of whole data broken down by Transaction Type
group0 <- group_by(df,Transaction.Type)
summarise(group0,
AvgPrice = round(mean(Price), 2),
SdPrice = round(sd(Price), 2),
AvgYear = round(mean(Year), 2),
SdYear = round(sd(Year), 2))## # A tibble: 2 x 5
## Transaction.Type AvgPrice SdPrice AvgYear SdYear
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Cash 737206. 461343. 2005. 8.72
## 2 Installment/Leasing 380026. 301797. 2014. 5.38
Data types of the column data
## 'data.frame': 19092 obs. of 9 variables:
## $ Brand : Factor w/ 23 levels "Audi","BMW","Changan",..: 23 22 22 22 23 23 22 22 9 22 ...
## $ Condition : Factor w/ 2 levels "New","Used": 2 2 2 2 2 2 1 2 2 2 ...
## $ Fuel : Factor w/ 5 levels "CNG","Diesel",..: 2 5 1 5 5 5 1 1 5 1 ...
## $ KMs.Driven : int 1 100000 12345 94000 100000 80000 65000 83000 1 123 ...
## $ Model : Factor w/ 5 levels "Coupe","Sedan",..: 3 3 3 4 2 2 4 4 4 4 ...
## $ Price : int 2100000 380000 340000 535000 1430000 1620000 450000 490000 480000 230000 ...
## $ Registered.City : Factor w/ 2 levels "Non-Capital",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Transaction.Type: Factor w/ 2 levels "Cash","Installment/Leasing": 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 1997 2006 1998 2010 2013 2012 2006 2009 1997 1994 ...
## - attr(*, "na.action")= 'omit' Named int 1060 2839 3047 3565 3566 3567 3570 3571 3572 3573 ...
## ..- attr(*, "names")= chr "1060" "2839" "3047" "3565" ...
## vars n mean sd median trimmed
## Brand* 1 19092 17.94 6.32 22 18.70
## Condition* 2 19092 1.84 0.37 2 1.93
## Fuel* 3 19092 3.58 1.87 5 3.73
## KMs.Driven 4 19092 137100.95 632337.21 71000 67732.67
## Model* 5 19092 3.21 0.97 4 3.27
## Price 6 19092 719432.71 461309.88 600000 669353.02
## Registered.City* 7 19092 1.97 0.18 2 2.00
## Transaction.Type* 8 19092 1.05 0.22 1 1.00
## Year 9 19092 2005.57 8.81 2007 2006.47
## mad min max range skew kurtosis se
## Brand* 1.48 1 23 22 -0.87 -1.07 0.05
## Condition* 0.00 1 2 1 -1.86 1.46 0.00
## Fuel* 0.00 1 5 4 -0.60 -1.59 0.01
## KMs.Driven 60649.46 1 10000000 9999999 13.67 200.85 4576.39
## Model* 0.00 1 5 4 -0.48 -1.46 0.01
## Price 437367.00 50000 2145000 2095000 0.87 0.12 3338.62
## Registered.City* 0.00 1 2 1 -5.14 24.45 0.00
## Transaction.Type* 0.00 1 2 1 4.14 15.15 0.00
## Year 7.41 1983 2020 37 -0.83 -0.10 0.06
Vistualizing Discreet Variable
Bar Charts of Different types of Models
# frequency table
tab1 <- prop.table(table((df$Brand)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Brand", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("LightBlue"),
beside = TRUE,
ylim = c(0, 60))
text(bp, 0, round(tab2, 2),cex=1,pos=3) ## Bar Charts of Different types of Condition
# frequency table
tab1 <- prop.table(table((df$Condition)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Condition", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("Orange"),
beside = TRUE,
ylim = c(0, 100))
text(bp, 0, round(tab2, 2),cex=1,pos=3) ## Bar Charts of Different types of Fuel
# frequency table
tab1 <- prop.table(table((df$Fuel)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Fuel", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("LightGreen"),
beside = TRUE,
ylim = c(0, 80))
text(bp, 0, round(tab2, 2),cex=1,pos=3) ## Bar Charts of Different types of Models
# frequency table
tab1 <- prop.table(table((df$Model)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Model", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("Red"),
beside = TRUE,
ylim = c(0, 60))
text(bp, 0, round(tab2, 2),cex=1,pos=3) ## Bar Charts of Different types of Registered City
# frequency table
tab1 <- prop.table(table((df$Registered.City)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Registered City", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("Blue"),
beside = TRUE,
ylim = c(0, 100))
text(bp, 0, round(tab2, 2),cex=1,pos=3) ## Bar Charts of Different types of Transaction Type
# frequency table
tab1 <- prop.table(table((df$Transaction.Type)))
tab2 <- round(tab1*100,2)
# bar-plot
bp <- barplot(tab2,
xlab = "Transaction Type", ylab = "Percent (%)",
main = "Distribution of Cars",
col = c("Green"),
beside = TRUE,
ylim = c(0, 100))
text(bp, 0, round(tab2, 2),cex=1,pos=3) Visualizing the Mean Ticket Price Value for Each Type of Brand
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Brand, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean) # Visualizing the Mean Ticket Price Value for Each Type of Condition
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Condition, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean) # Visualizing the Mean Ticket Price Value for Each Type of Fuel
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Fuel, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean) # Visualizing the Mean Ticket Price Value for Each Type of Model
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Model, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean) # Visualizing the Mean Ticket Price Value for Each Type of Registered City
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Registered.City, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean) # Visualizing the Mean Ticket Price Value for Each Type of Transaction Type
# summarises the y values for each unique x value
ggplot(data = df) +
stat_summary(
mapping = aes(x = Transaction.Type, y = Price),
fun.ymin = min,
fun.ymax = max,
fun.y = mean)Visualizing continous variable
Ploting Box Plot for Selling Price of Vehicle
# Ploting BoxPlot for the Price
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price),width = 0.3) ## Ploting Box Plot for Selling Price with Brand
# Ploting BoxPlot for the Price with Brand
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Brand),width = 0.3) ## Ploting Box Plot for Selling Price with Condition
# Ploting BoxPlot for the Price with Condition
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Condition),width = 0.3) ## Ploting Box Plot for Selling Price with Fuel
# Ploting BoxPlot for the Price with Fuel
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Fuel),width = 0.3) ## Ploting Box Plot for Selling Price with Model
# Ploting BoxPlot for the Price with Model
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Model),width = 0.3) ## Ploting Box Plot for Selling Price with Registered City
# Ploting BoxPlot for the Price with Registered City
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Registered.City),width = 0.3) ## Ploting Box Plot for Selling Price with Transaction Type
# Ploting BoxPlot for the Price with Transaction Type
ggplot(data = df) +
geom_boxplot(mapping = aes(y = Price, x = Transaction.Type),width = 0.3)Scatter Plot between Selling Price and KMs Driven
# plotting scatter plot between selling price and KMs Driven
ggplot(data = df) +
geom_point(mapping = aes(x = KMs.Driven, y = Price))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and KMs Driven coloured by Car Condition
# plotting scatter plot between selling price and KMs Driven coloured by Car Condition
ggplot(data = df) +
geom_smooth(mapping = aes(x = KMs.Driven, y = Price,color = Condition))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and KMs Driven coloured by Fuel Type
# plotting scatter plot between selling price and KMs Driven coloured by Fuel Type
ggplot(data = df) +
geom_smooth(mapping = aes(x = KMs.Driven, y = Price,color = Fuel))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and KMs Driven coloured by Car Models
# plotting scatter plot between selling price and KMs Driven coloured by Car Models
ggplot(data = df) +
geom_smooth(mapping = aes(x = KMs.Driven, y = Price,color = Model))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Scatter Plot between Selling Price and Year
# plotting scatter plot between selling price and Year
ggplot(data = df) +
geom_point(mapping = aes(x = Year, y = Price))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and Year coloured by Car Condition
# plotting scatter plot between selling price and Year coloured by Car Condition
ggplot(data = df) +
geom_smooth(mapping = aes(x = Year, y = Price,color = Condition))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and Year coloured by Fuel Type
# plotting scatter plot between selling price and Year coloured by Fuel Type
ggplot(data = df) +
geom_smooth(mapping = aes(x = Year, y = Price,color = Fuel))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Scatter Plot between Selling Price and Year coloured by Car Models
# plotting scatter plot between selling price and Year coloured by Car Models
ggplot(data = df) +
geom_smooth(mapping = aes(x = Year, y = Price,color = Model))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Correlation Matrix
CarSubset <- df[,c('KMs.Driven','Price','Year')]
corMat <- cor(CarSubset, use = "complete")
round(corMat, 3)## KMs.Driven Price Year
## KMs.Driven 1.000 -0.07 -0.116
## Price -0.070 1.00 0.590
## Year -0.116 0.59 1.000
Visualizing Correlation Matrix
corrgram(CarSubset,
lower.panel=panel.shade,
upper.panel=panel.conf,
text.panel=panel.txt,main="corrgram",)Model <- Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City + Transaction.Type + Year
fit <- lm(Model, data = df)
summary(fit)##
## Call:
## lm(formula = Model, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1451115 -150567 -17597 121112 1741934
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -5.885e+07 4.933e+05 -119.307
## BrandBMW 3.009e+05 1.078e+05 2.792
## BrandChangan -5.563e+05 1.259e+05 -4.419
## BrandChevrolet -4.375e+05 9.306e+04 -4.702
## BrandClassic & Antiques -1.867e+05 2.577e+05 -0.724
## BrandDaewoo -2.301e+05 9.099e+04 -2.529
## BrandDaihatsu -2.311e+05 8.618e+04 -2.681
## BrandFAW -3.570e+05 9.075e+04 -3.933
## BrandHonda -1.117e+04 8.610e+04 -0.130
## BrandHyundai -3.434e+05 8.737e+04 -3.930
## BrandKIA -3.529e+05 8.883e+04 -3.973
## BrandLand Rover -3.503e+05 2.578e+05 -1.359
## BrandLexus 1.061e+05 1.130e+05 0.939
## BrandMazda -1.375e+05 9.059e+04 -1.518
## BrandMercedes 4.132e+05 9.300e+04 4.443
## BrandMitsubishi -1.382e+05 8.688e+04 -1.591
## BrandNissan -1.360e+05 8.661e+04 -1.570
## BrandOther Brands -3.422e+05 8.891e+04 -3.849
## BrandPorsche 6.288e+05 2.578e+05 2.439
## BrandRange Rover -1.269e+05 1.645e+05 -0.771
## BrandSubaru -2.850e+05 1.025e+05 -2.781
## BrandSuzuki -3.283e+05 8.608e+04 -3.814
## BrandToyota 1.623e+05 8.605e+04 1.886
## ConditionUsed -5.529e+04 5.039e+03 -10.973
## FuelDiesel 3.116e+05 1.646e+04 18.937
## FuelHybrid 1.873e+05 1.076e+04 17.411
## FuelLPG 1.219e+05 6.286e+04 1.940
## FuelPetrol 1.150e+05 4.426e+03 25.992
## KMs.Driven -4.333e-03 2.805e-03 -1.545
## ModelSedan 1.726e+05 1.574e+04 10.967
## ModelSUV 1.668e+05 1.692e+04 9.859
## ModelHatchback 9.226e+04 1.584e+04 5.823
## ModelTruck 9.194e+04 2.631e+04 3.494
## Registered.CityCapital 4.876e+04 9.768e+03 4.992
## Transaction.TypeInstallment/Leasing -6.963e+05 8.445e+03 -82.450
## Year 2.970e+04 2.422e+02 122.653
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## BrandBMW 0.005240 **
## BrandChangan 9.95e-06 ***
## BrandChevrolet 2.60e-06 ***
## BrandClassic & Antiques 0.468773
## BrandDaewoo 0.011461 *
## BrandDaihatsu 0.007341 **
## BrandFAW 8.40e-05 ***
## BrandHonda 0.896765
## BrandHyundai 8.52e-05 ***
## BrandKIA 7.12e-05 ***
## BrandLand Rover 0.174158
## BrandLexus 0.347702
## BrandMazda 0.128991
## BrandMercedes 8.91e-06 ***
## BrandMitsubishi 0.111583
## BrandNissan 0.116513
## BrandOther Brands 0.000119 ***
## BrandPorsche 0.014734 *
## BrandRange Rover 0.440653
## BrandSubaru 0.005418 **
## BrandSuzuki 0.000137 ***
## BrandToyota 0.059309 .
## ConditionUsed < 2e-16 ***
## FuelDiesel < 2e-16 ***
## FuelHybrid < 2e-16 ***
## FuelLPG 0.052449 .
## FuelPetrol < 2e-16 ***
## KMs.Driven 0.122401
## ModelSedan < 2e-16 ***
## ModelSUV < 2e-16 ***
## ModelHatchback 5.87e-09 ***
## ModelTruck 0.000476 ***
## Registered.CityCapital 6.04e-07 ***
## Transaction.TypeInstallment/Leasing < 2e-16 ***
## Year < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 242900 on 19056 degrees of freedom
## Multiple R-squared: 0.7232, Adjusted R-squared: 0.7227
## F-statistic: 1422 on 35 and 19056 DF, p-value: < 2.2e-16
Question 1: Explore interactions among independent variables
# From the following data we see maximum levels in registered city has a p-value more than 0.05 apart from that KMs-Driven and LPG and Petrol driven fuel cars are not significant as well.
Model1 <- Price ~ Brand*KMs.Driven + Condition*KMs.Driven + Fuel*KMs.Driven + Model*KMs.Driven + Registered.City*KMs.Driven + Transaction.Type*KMs.Driven + Brand*Year + Condition*Year + Fuel*Year + Model*Year + Registered.City*Year + Transaction.Type*Year
fit1 <- lm(Model1, data = df)
summary(fit1)##
## Call:
## lm(formula = Model1, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1647660 -118762 -8471 99396 1791897
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error
## (Intercept) -1.731e+07 2.647e+07
## BrandBMW -9.788e+06 2.986e+07
## BrandChangan -6.801e+07 1.043e+08
## BrandChevrolet -8.381e+07 3.200e+07
## BrandClassic & Antiques -1.736e+05 2.376e+05
## BrandDaewoo -2.731e+07 2.817e+07
## BrandDaihatsu 6.108e+06 2.625e+07
## BrandFAW -1.542e+06 2.986e+07
## BrandHonda -2.575e+07 2.625e+07
## BrandHyundai 2.054e+06 2.682e+07
## BrandKIA -1.548e+07 2.773e+07
## BrandLand Rover -3.861e+05 2.396e+05
## BrandLexus 1.302e+08 3.554e+07
## BrandMazda 8.490e+06 2.686e+07
## BrandMercedes -4.448e+07 2.727e+07
## BrandMitsubishi 1.520e+07 2.633e+07
## BrandNissan 3.026e+06 2.629e+07
## BrandOther Brands 3.037e+07 2.652e+07
## BrandPorsche 4.023e+05 2.529e+05
## BrandRange Rover -1.625e+09 6.845e+08
## BrandSubaru 2.509e+08 1.010e+08
## BrandSuzuki 6.725e+06 2.625e+07
## BrandToyota -1.515e+07 2.625e+07
## KMs.Driven -5.827e-01 1.265e+00
## ConditionUsed 3.205e+06 1.206e+06
## FuelDiesel 7.394e+06 3.047e+06
## FuelHybrid -3.972e+07 2.331e+06
## FuelLPG -1.986e+07 1.131e+07
## FuelPetrol -2.424e+07 9.074e+05
## ModelSedan -2.562e+07 2.804e+06
## ModelSUV 5.991e+06 3.024e+06
## ModelHatchback -1.932e+07 2.845e+06
## ModelTruck 1.782e+07 4.747e+06
## Registered.CityCapital -1.293e+07 1.987e+06
## Transaction.TypeInstallment/Leasing 5.252e+07 2.854e+06
## Year 8.980e+03 1.314e+04
## BrandBMW:KMs.Driven 2.543e+00 1.539e+00
## BrandChangan:KMs.Driven 4.900e-01 1.265e+00
## BrandChevrolet:KMs.Driven 4.006e-01 1.290e+00
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 5.412e-01 1.265e+00
## BrandDaihatsu:KMs.Driven 5.014e-01 1.264e+00
## BrandFAW:KMs.Driven -2.582e-01 1.343e+00
## BrandHonda:KMs.Driven 5.015e-01 1.264e+00
## BrandHyundai:KMs.Driven 5.133e-01 1.265e+00
## BrandKIA:KMs.Driven 5.411e-01 1.265e+00
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -5.942e-01 1.304e+00
## BrandMazda:KMs.Driven 3.596e-01 1.284e+00
## BrandMercedes:KMs.Driven 3.128e-01 1.408e+00
## BrandMitsubishi:KMs.Driven 5.342e-01 1.265e+00
## BrandNissan:KMs.Driven 4.917e-01 1.264e+00
## BrandOther Brands:KMs.Driven 4.945e-01 1.265e+00
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.043e+02 8.355e+01
## BrandSubaru:KMs.Driven -5.480e+00 3.017e+00
## BrandSuzuki:KMs.Driven 4.979e-01 1.264e+00
## BrandToyota:KMs.Driven 5.028e-01 1.264e+00
## KMs.Driven:ConditionUsed 1.863e-02 1.332e-02
## KMs.Driven:FuelDiesel -8.861e-02 3.551e-02
## KMs.Driven:FuelHybrid -6.922e-05 1.415e-02
## KMs.Driven:FuelLPG -3.813e-01 8.503e-01
## KMs.Driven:FuelPetrol -1.434e-02 6.661e-03
## KMs.Driven:ModelSedan 2.738e-02 2.072e-02
## KMs.Driven:ModelSUV 3.633e-02 2.168e-02
## KMs.Driven:ModelHatchback 3.452e-02 2.026e-02
## KMs.Driven:ModelTruck 5.198e-03 3.026e-02
## KMs.Driven:Registered.CityCapital 3.294e-02 2.172e-02
## KMs.Driven:Transaction.TypeInstallment/Leasing 2.587e-01 7.889e-02
## BrandBMW:Year 5.009e+03 1.485e+04
## BrandChangan:Year 3.364e+04 5.196e+04
## BrandChevrolet:Year 4.153e+04 1.590e+04
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 1.354e+04 1.400e+04
## BrandDaihatsu:Year -3.172e+03 1.303e+04
## BrandFAW:Year 5.879e+02 1.482e+04
## BrandHonda:Year 1.282e+04 1.303e+04
## BrandHyundai:Year -1.213e+03 1.332e+04
## BrandKIA:Year 7.542e+03 1.377e+04
## BrandLand Rover:Year NA NA
## BrandLexus:Year -6.468e+04 1.767e+04
## BrandMazda:Year -4.297e+03 1.334e+04
## BrandMercedes:Year 2.245e+04 1.354e+04
## BrandMitsubishi:Year -7.687e+03 1.307e+04
## BrandNissan:Year -1.590e+03 1.305e+04
## BrandOther Brands:Year -1.534e+04 1.316e+04
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 8.054e+05 3.393e+05
## BrandSubaru:Year -1.246e+05 5.009e+04
## BrandSuzuki:Year -3.529e+03 1.303e+04
## BrandToyota:Year 7.599e+03 1.303e+04
## ConditionUsed:Year -1.620e+03 6.001e+02
## FuelDiesel:Year -3.546e+03 1.524e+03
## FuelHybrid:Year 1.990e+04 1.162e+03
## FuelLPG:Year 9.958e+03 5.653e+03
## FuelPetrol:Year 1.216e+04 4.529e+02
## ModelSedan:Year 1.286e+04 1.400e+03
## ModelSUV:Year -2.899e+03 1.510e+03
## ModelHatchback:Year 9.683e+03 1.420e+03
## ModelTruck:Year -8.829e+03 2.367e+03
## Registered.CityCapital:Year 6.468e+03 9.909e+02
## Transaction.TypeInstallment/Leasing:Year -2.644e+04 1.417e+03
## t value Pr(>|t|)
## (Intercept) -0.654 0.513214
## BrandBMW -0.328 0.743059
## BrandChangan -0.652 0.514291
## BrandChevrolet -2.619 0.008822 **
## BrandClassic & Antiques -0.731 0.464937
## BrandDaewoo -0.969 0.332411
## BrandDaihatsu 0.233 0.816019
## BrandFAW -0.052 0.958821
## BrandHonda -0.981 0.326652
## BrandHyundai 0.077 0.938956
## BrandKIA -0.558 0.576595
## BrandLand Rover -1.611 0.107113
## BrandLexus 3.662 0.000250 ***
## BrandMazda 0.316 0.751953
## BrandMercedes -1.631 0.102854
## BrandMitsubishi 0.577 0.563673
## BrandNissan 0.115 0.908357
## BrandOther Brands 1.145 0.252152
## BrandPorsche 1.591 0.111649
## BrandRange Rover -2.375 0.017582 *
## BrandSubaru 2.484 0.013011 *
## BrandSuzuki 0.256 0.797813
## BrandToyota -0.577 0.563833
## KMs.Driven -0.461 0.644989
## ConditionUsed 2.657 0.007893 **
## FuelDiesel 2.427 0.015246 *
## FuelHybrid -17.039 < 2e-16 ***
## FuelLPG -1.756 0.079074 .
## FuelPetrol -26.708 < 2e-16 ***
## ModelSedan -9.137 < 2e-16 ***
## ModelSUV 1.981 0.047564 *
## ModelHatchback -6.793 1.13e-11 ***
## ModelTruck 3.755 0.000174 ***
## Registered.CityCapital -6.507 7.87e-11 ***
## Transaction.TypeInstallment/Leasing 18.401 < 2e-16 ***
## Year 0.683 0.494458
## BrandBMW:KMs.Driven 1.652 0.098470 .
## BrandChangan:KMs.Driven 0.387 0.698404
## BrandChevrolet:KMs.Driven 0.311 0.756111
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 0.428 0.668738
## BrandDaihatsu:KMs.Driven 0.397 0.691726
## BrandFAW:KMs.Driven -0.192 0.847534
## BrandHonda:KMs.Driven 0.397 0.691640
## BrandHyundai:KMs.Driven 0.406 0.684821
## BrandKIA:KMs.Driven 0.428 0.668737
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -0.456 0.648580
## BrandMazda:KMs.Driven 0.280 0.779532
## BrandMercedes:KMs.Driven 0.222 0.824167
## BrandMitsubishi:KMs.Driven 0.422 0.672792
## BrandNissan:KMs.Driven 0.389 0.697391
## BrandOther Brands:KMs.Driven 0.391 0.695766
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.445 0.014478 *
## BrandSubaru:KMs.Driven -1.816 0.069321 .
## BrandSuzuki:KMs.Driven 0.394 0.693754
## BrandToyota:KMs.Driven 0.398 0.690899
## KMs.Driven:ConditionUsed 1.398 0.162103
## KMs.Driven:FuelDiesel -2.496 0.012582 *
## KMs.Driven:FuelHybrid -0.005 0.996098
## KMs.Driven:FuelLPG -0.448 0.653877
## KMs.Driven:FuelPetrol -2.154 0.031289 *
## KMs.Driven:ModelSedan 1.321 0.186404
## KMs.Driven:ModelSUV 1.675 0.093899 .
## KMs.Driven:ModelHatchback 1.704 0.088348 .
## KMs.Driven:ModelTruck 0.172 0.863615
## KMs.Driven:Registered.CityCapital 1.516 0.129439
## KMs.Driven:Transaction.TypeInstallment/Leasing 3.280 0.001040 **
## BrandBMW:Year 0.337 0.735914
## BrandChangan:Year 0.647 0.517388
## BrandChevrolet:Year 2.612 0.009015 **
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 0.967 0.333482
## BrandDaihatsu:Year -0.243 0.807666
## BrandFAW:Year 0.040 0.968360
## BrandHonda:Year 0.984 0.325323
## BrandHyundai:Year -0.091 0.927454
## BrandKIA:Year 0.548 0.584031
## BrandLand Rover:Year NA NA
## BrandLexus:Year -3.661 0.000252 ***
## BrandMazda:Year -0.322 0.747319
## BrandMercedes:Year 1.658 0.097417 .
## BrandMitsubishi:Year -0.588 0.556462
## BrandNissan:Year -0.122 0.903017
## BrandOther Brands:Year -1.165 0.244069
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 2.374 0.017604 *
## BrandSubaru:Year -2.488 0.012854 *
## BrandSuzuki:Year -0.271 0.786539
## BrandToyota:Year 0.583 0.559745
## ConditionUsed:Year -2.700 0.006939 **
## FuelDiesel:Year -2.327 0.019959 *
## FuelHybrid:Year 17.124 < 2e-16 ***
## FuelLPG:Year 1.761 0.078171 .
## FuelPetrol:Year 26.840 < 2e-16 ***
## ModelSedan:Year 9.186 < 2e-16 ***
## ModelSUV:Year -1.920 0.054857 .
## ModelHatchback:Year 6.818 9.54e-12 ***
## ModelTruck:Year -3.730 0.000192 ***
## Registered.CityCapital:Year 6.528 6.85e-11 ***
## Transaction.TypeInstallment/Leasing:Year -18.658 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221200 on 18996 degrees of freedom
## Multiple R-squared: 0.7711, Adjusted R-squared: 0.77
## F-statistic: 673.8 on 95 and 18996 DF, p-value: < 2.2e-16
Explore interactions among independent variables
##
## Call:
## lm(formula = Model2, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1054825 -238590 -49736 178067 1750108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.138e+07 6.277e+05 -97.790 <2e-16 ***
## KMs.Driven 8.592e-01 9.635e-01 0.892 0.373
## Year 3.097e+04 3.130e+02 98.939 <2e-16 ***
## KMs.Driven:Year -4.309e-04 4.829e-04 -0.892 0.372
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 372400 on 19088 degrees of freedom
## Multiple R-squared: 0.3484, Adjusted R-squared: 0.3483
## F-statistic: 3403 on 3 and 19088 DF, p-value: < 2.2e-16
#Question 3: Explore the possibility of having quadratic terms in the model
# From the following Normal QQ Graph we see that the data is not normal at all.
# From the residual vs fitted we see that that the line is pretty much a straight line but the observations are spread out. This means it is Linear but has a lot of outliers.
Model3 <- Price ~ Year + KMs.Driven + I(KMs.Driven^2) + I(Year^2)
fit3 <- lm(Model3, data = df)
summary(fit3)##
## Call:
## lm(formula = Model3, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1162926 -210297 -61358 182524 1740173
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.966e+09 1.266e+08 15.530 < 2e-16 ***
## Year -1.994e+06 1.265e+05 -15.770 < 2e-16 ***
## KMs.Driven 7.332e-02 1.573e-02 4.663 3.14e-06 ***
## I(KMs.Driven^2) -8.274e-09 1.691e-09 -4.892 1.01e-06 ***
## I(Year^2) 5.059e+02 3.159e+01 16.016 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 369800 on 19087 degrees of freedom
## Multiple R-squared: 0.3575, Adjusted R-squared: 0.3573
## F-statistic: 2655 on 4 and 19087 DF, p-value: < 2.2e-16
Combining all of the information
Model4 <- Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City + Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand*KMs.Driven + Condition*KMs.Driven + Fuel*KMs.Driven + Model*KMs.Driven + Registered.City*KMs.Driven + Transaction.Type*KMs.Driven + Brand*Year + Condition*Year + Fuel*Year + Model*Year + Registered.City*Year + Transaction.Type*Year
fit4 <- lm(Model4, data = df)
summary(fit4)##
## Call:
## lm(formula = Model4, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1754950 -98987 -3800 91953 1797933
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 3.205e+09 1.010e+08
## BrandBMW -3.391e+07 2.904e+07
## BrandChangan -5.473e+07 1.014e+08
## BrandChevrolet -8.923e+07 3.111e+07
## BrandClassic & Antiques -1.210e+05 2.310e+05
## BrandDaewoo -4.225e+07 2.739e+07
## BrandDaihatsu -1.636e+07 2.553e+07
## BrandFAW -1.241e+07 2.904e+07
## BrandHonda -3.991e+07 2.552e+07
## BrandHyundai -9.218e+06 2.608e+07
## BrandKIA -2.392e+07 2.696e+07
## BrandLand Rover -3.537e+05 2.329e+05
## BrandLexus 1.298e+08 3.455e+07
## BrandMazda -1.160e+07 2.612e+07
## BrandMercedes -7.196e+07 2.652e+07
## BrandMitsubishi -3.023e+06 2.560e+07
## BrandNissan -1.672e+07 2.557e+07
## BrandOther Brands 1.083e+07 2.579e+07
## BrandPorsche 4.023e+05 2.459e+05
## BrandRange Rover -1.610e+09 6.654e+08
## BrandSubaru 3.201e+08 9.822e+07
## BrandSuzuki -1.057e+07 2.553e+07
## BrandToyota -2.717e+07 2.552e+07
## ConditionUsed -4.129e+06 1.193e+06
## FuelDiesel 1.163e+07 2.965e+06
## FuelHybrid -3.096e+07 2.281e+06
## FuelLPG -2.037e+07 1.099e+07
## FuelPetrol -1.165e+07 9.604e+05
## KMs.Driven -9.443e-01 1.230e+00
## ModelSedan -2.177e+07 2.728e+06
## ModelSUV 1.256e+07 2.947e+06
## ModelHatchback -1.072e+07 2.778e+06
## ModelTruck 2.228e+07 4.617e+06
## Registered.CityCapital -1.238e+07 1.932e+06
## Transaction.TypeInstallment/Leasing 6.511e+07 2.803e+06
## Year -3.206e+06 9.826e+04
## I(Year^2) 8.021e+02 2.431e+01
## I(KMs.Driven^2) 1.676e-09 1.042e-09
## BrandBMW:KMs.Driven 3.255e+00 1.497e+00
## BrandChangan:KMs.Driven 8.480e-01 1.230e+00
## BrandChevrolet:KMs.Driven 7.710e-01 1.254e+00
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 9.059e-01 1.230e+00
## BrandDaihatsu:KMs.Driven 8.622e-01 1.229e+00
## BrandFAW:KMs.Driven 8.081e-02 1.306e+00
## BrandHonda:KMs.Driven 8.613e-01 1.229e+00
## BrandHyundai:KMs.Driven 8.744e-01 1.229e+00
## BrandKIA:KMs.Driven 8.979e-01 1.229e+00
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -2.421e-01 1.268e+00
## BrandMazda:KMs.Driven 7.275e-01 1.249e+00
## BrandMercedes:KMs.Driven 9.997e-01 1.369e+00
## BrandMitsubishi:KMs.Driven 9.133e-01 1.230e+00
## BrandNissan:KMs.Driven 8.526e-01 1.229e+00
## BrandOther Brands:KMs.Driven 8.643e-01 1.229e+00
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.014e+02 8.123e+01
## BrandSubaru:KMs.Driven -7.642e+00 2.934e+00
## BrandSuzuki:KMs.Driven 8.589e-01 1.229e+00
## BrandToyota:KMs.Driven 8.613e-01 1.229e+00
## ConditionUsed:KMs.Driven 1.278e-02 1.302e-02
## FuelDiesel:KMs.Driven -9.044e-02 3.485e-02
## FuelHybrid:KMs.Driven 6.504e-03 1.376e-02
## FuelLPG:KMs.Driven -3.417e-01 8.267e-01
## FuelPetrol:KMs.Driven -1.011e-02 6.486e-03
## KMs.Driven:ModelSedan 2.327e-02 2.024e-02
## KMs.Driven:ModelSUV 3.002e-02 2.122e-02
## KMs.Driven:ModelHatchback 2.805e-02 1.977e-02
## KMs.Driven:ModelTruck -2.738e-03 2.955e-02
## KMs.Driven:Registered.CityCapital 2.900e-02 2.113e-02
## KMs.Driven:Transaction.TypeInstallment/Leasing 2.629e-01 7.711e-02
## BrandBMW:Year 1.702e+04 1.444e+04
## BrandChangan:Year 2.701e+04 5.051e+04
## BrandChevrolet:Year 4.424e+04 1.546e+04
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 2.098e+04 1.361e+04
## BrandDaihatsu:Year 8.004e+03 1.267e+04
## BrandFAW:Year 5.990e+03 1.441e+04
## BrandHonda:Year 1.987e+04 1.267e+04
## BrandHyundai:Year 4.409e+03 1.295e+04
## BrandKIA:Year 1.175e+04 1.339e+04
## BrandLand Rover:Year NA NA
## BrandLexus:Year -6.451e+04 1.717e+04
## BrandMazda:Year 5.697e+03 1.297e+04
## BrandMercedes:Year 3.615e+04 1.318e+04
## BrandMitsubishi:Year 1.382e+03 1.271e+04
## BrandNissan:Year 8.233e+03 1.269e+04
## BrandOther Brands:Year -5.622e+03 1.280e+04
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 7.980e+05 3.298e+05
## BrandSubaru:Year -1.589e+05 4.871e+04
## BrandSuzuki:Year 5.076e+03 1.267e+04
## BrandToyota:Year 1.358e+04 1.267e+04
## ConditionUsed:Year 2.042e+03 5.938e+02
## FuelDiesel:Year -5.669e+03 1.483e+03
## FuelHybrid:Year 1.552e+04 1.137e+03
## FuelLPG:Year 1.020e+04 5.496e+03
## FuelPetrol:Year 5.865e+03 4.794e+02
## ModelSedan:Year 1.096e+04 1.363e+03
## ModelSUV:Year -6.162e+03 1.471e+03
## ModelHatchback:Year 5.410e+03 1.387e+03
## ModelTruck:Year -1.104e+04 2.302e+03
## Registered.CityCapital:Year 6.198e+03 9.634e+02
## Transaction.TypeInstallment/Leasing:Year -3.271e+04 1.392e+03
## t value Pr(>|t|)
## (Intercept) 31.739 < 2e-16 ***
## BrandBMW -1.168 0.242919
## BrandChangan -0.540 0.589355
## BrandChevrolet -2.868 0.004131 **
## BrandClassic & Antiques -0.524 0.600340
## BrandDaewoo -1.542 0.122992
## BrandDaihatsu -0.641 0.521725
## BrandFAW -0.427 0.669024
## BrandHonda -1.564 0.117920
## BrandHyundai -0.353 0.723735
## BrandKIA -0.887 0.374900
## BrandLand Rover -1.519 0.128880
## BrandLexus 3.758 0.000172 ***
## BrandMazda -0.444 0.657142
## BrandMercedes -2.713 0.006671 **
## BrandMitsubishi -0.118 0.905998
## BrandNissan -0.654 0.513137
## BrandOther Brands 0.420 0.674441
## BrandPorsche 1.636 0.101804
## BrandRange Rover -2.420 0.015531 *
## BrandSubaru 3.259 0.001122 **
## BrandSuzuki -0.414 0.678777
## BrandToyota -1.065 0.287051
## ConditionUsed -3.460 0.000542 ***
## FuelDiesel 3.921 8.85e-05 ***
## FuelHybrid -13.570 < 2e-16 ***
## FuelLPG -1.852 0.063991 .
## FuelPetrol -12.132 < 2e-16 ***
## KMs.Driven -0.768 0.442535
## ModelSedan -7.979 1.56e-15 ***
## ModelSUV 4.261 2.04e-05 ***
## ModelHatchback -3.860 0.000114 ***
## ModelTruck 4.826 1.41e-06 ***
## Registered.CityCapital -6.410 1.49e-10 ***
## Transaction.TypeInstallment/Leasing 23.231 < 2e-16 ***
## Year -32.629 < 2e-16 ***
## I(Year^2) 33.000 < 2e-16 ***
## I(KMs.Driven^2) 1.609 0.107675
## BrandBMW:KMs.Driven 2.175 0.029652 *
## BrandChangan:KMs.Driven 0.690 0.490393
## BrandChevrolet:KMs.Driven 0.615 0.538634
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 0.737 0.461311
## BrandDaihatsu:KMs.Driven 0.701 0.483072
## BrandFAW:KMs.Driven 0.062 0.950648
## BrandHonda:KMs.Driven 0.701 0.483555
## BrandHyundai:KMs.Driven 0.711 0.476967
## BrandKIA:KMs.Driven 0.730 0.465212
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -0.191 0.848544
## BrandMazda:KMs.Driven 0.583 0.560191
## BrandMercedes:KMs.Driven 0.730 0.465213
## BrandMitsubishi:KMs.Driven 0.743 0.457665
## BrandNissan:KMs.Driven 0.694 0.487994
## BrandOther Brands:KMs.Driven 0.703 0.482070
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.479 0.013167 *
## BrandSubaru:KMs.Driven -2.605 0.009199 **
## BrandSuzuki:KMs.Driven 0.699 0.484768
## BrandToyota:KMs.Driven 0.701 0.483555
## ConditionUsed:KMs.Driven 0.982 0.326113
## FuelDiesel:KMs.Driven -2.595 0.009460 **
## FuelHybrid:KMs.Driven 0.473 0.636508
## FuelLPG:KMs.Driven -0.413 0.679330
## FuelPetrol:KMs.Driven -1.559 0.119041
## KMs.Driven:ModelSedan 1.149 0.250405
## KMs.Driven:ModelSUV 1.415 0.157170
## KMs.Driven:ModelHatchback 1.419 0.156035
## KMs.Driven:ModelTruck -0.093 0.926167
## KMs.Driven:Registered.CityCapital 1.373 0.169883
## KMs.Driven:Transaction.TypeInstallment/Leasing 3.409 0.000653 ***
## BrandBMW:Year 1.178 0.238688
## BrandChangan:Year 0.535 0.592894
## BrandChevrolet:Year 2.862 0.004217 **
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 1.541 0.123398
## BrandDaihatsu:Year 0.632 0.527681
## BrandFAW:Year 0.416 0.677641
## BrandHonda:Year 1.568 0.116903
## BrandHyundai:Year 0.340 0.733495
## BrandKIA:Year 0.878 0.380092
## BrandLand Rover:Year NA NA
## BrandLexus:Year -3.756 0.000173 ***
## BrandMazda:Year 0.439 0.660488
## BrandMercedes:Year 2.744 0.006074 **
## BrandMitsubishi:Year 0.109 0.913411
## BrandNissan:Year 0.649 0.516531
## BrandOther Brands:Year -0.439 0.660547
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 2.419 0.015553 *
## BrandSubaru:Year -3.263 0.001106 **
## BrandSuzuki:Year 0.401 0.688706
## BrandToyota:Year 1.072 0.283814
## ConditionUsed:Year 3.439 0.000586 ***
## FuelDiesel:Year -3.824 0.000132 ***
## FuelHybrid:Year 13.643 < 2e-16 ***
## FuelLPG:Year 1.855 0.063616 .
## FuelPetrol:Year 12.234 < 2e-16 ***
## ModelSedan:Year 8.043 9.30e-16 ***
## ModelSUV:Year -4.189 2.81e-05 ***
## ModelHatchback:Year 3.901 9.62e-05 ***
## ModelTruck:Year -4.797 1.62e-06 ***
## Registered.CityCapital:Year 6.434 1.28e-10 ***
## Transaction.TypeInstallment/Leasing:Year -23.507 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 215100 on 18994 degrees of freedom
## Multiple R-squared: 0.7837, Adjusted R-squared: 0.7826
## F-statistic: 709.5 on 97 and 18994 DF, p-value: < 2.2e-16
# The following model is very bad since there are too many non- significant variables are in the model.
# We next check for Linearity
plot(fit4, 1)# In the above figure we see that the redline is pretty much horizontal, this means there is linearity in the model.
# Next we check for Normailty
plot(fit4,2)# The following figure shows that the model won't work since there is a problem of heteroscadasity, so we would try to solve this currentl problem by using weighted Least Squared method.
# Currently the model10 which we have is the complete or full model so we would like to run Step AIC Function using "backward" Selection Method on model 10 to get the best model
# running the step backward selection Step AIC function
BackwardStepAIC <- stepAIC(fit4, direction = "backward")## Start: AIC=468951.8
## Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City +
## Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand *
## KMs.Driven + Condition * KMs.Driven + Fuel * KMs.Driven +
## Model * KMs.Driven + Registered.City * KMs.Driven + Transaction.Type *
## KMs.Driven + Brand * Year + Condition * Year + Fuel * Year +
## Model * Year + Registered.City * Year + Transaction.Type *
## Year
##
## Df Sum of Sq RSS AIC
## - KMs.Driven:Model 4 1.8949e+11 8.7892e+14 468948
## - Condition:KMs.Driven 1 4.4613e+10 8.7877e+14 468951
## - KMs.Driven:Registered.City 1 8.7166e+10 8.7882e+14 468952
## <none> 8.7873e+14 468952
## - I(KMs.Driven^2) 1 1.1974e+11 8.7885e+14 468952
## - Fuel:KMs.Driven 4 4.3651e+11 8.7916e+14 468953
## - KMs.Driven:Transaction.Type 1 5.3771e+11 8.7927e+14 468961
## - Condition:Year 1 5.4703e+11 8.7928e+14 468962
## - Brand:KMs.Driven 19 2.3688e+12 8.8110e+14 468965
## - Registered.City:Year 1 1.9150e+12 8.8064e+14 468991
## - Fuel:Year 4 1.4625e+13 8.9335e+14 469259
## - Transaction.Type:Year 1 2.5564e+13 9.0429e+14 469497
## - Model:Year 4 2.8202e+13 9.0693e+14 469547
## - Brand:Year 19 3.4000e+13 9.1273e+14 469639
## - I(Year^2) 1 5.0382e+13 9.2911e+14 470014
##
## Step: AIC=468947.9
## Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City +
## Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand:KMs.Driven +
## Condition:KMs.Driven + Fuel:KMs.Driven + KMs.Driven:Registered.City +
## KMs.Driven:Transaction.Type + Brand:Year + Condition:Year +
## Fuel:Year + Model:Year + Registered.City:Year + Transaction.Type:Year
##
## Df Sum of Sq RSS AIC
## - Condition:KMs.Driven 1 3.2974e+10 8.7895e+14 468947
## - KMs.Driven:Registered.City 1 8.0576e+10 8.7900e+14 468948
## <none> 8.7892e+14 468948
## - Fuel:KMs.Driven 4 3.7716e+11 8.7929e+14 468948
## - I(KMs.Driven^2) 1 1.3522e+11 8.7905e+14 468949
## - KMs.Driven:Transaction.Type 1 5.3522e+11 8.7945e+14 468958
## - Condition:Year 1 5.4150e+11 8.7946e+14 468958
## - Brand:KMs.Driven 19 2.3270e+12 8.8124e+14 468960
## - Registered.City:Year 1 1.9240e+12 8.8084e+14 468988
## - Fuel:Year 4 1.4677e+13 8.9359e+14 469256
## - Transaction.Type:Year 1 2.5582e+13 9.0450e+14 469494
## - Model:Year 4 2.8288e+13 9.0721e+14 469545
## - Brand:Year 19 3.3966e+13 9.1288e+14 469634
## - I(Year^2) 1 5.0402e+13 9.2932e+14 470011
##
## Step: AIC=468946.6
## Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City +
## Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand:KMs.Driven +
## Fuel:KMs.Driven + KMs.Driven:Registered.City + KMs.Driven:Transaction.Type +
## Brand:Year + Condition:Year + Fuel:Year + Model:Year + Registered.City:Year +
## Transaction.Type:Year
##
## Df Sum of Sq RSS AIC
## - KMs.Driven:Registered.City 1 8.0490e+10 8.7903e+14 468946
## <none> 8.7895e+14 468947
## - Fuel:KMs.Driven 4 3.7520e+11 8.7933e+14 468947
## - I(KMs.Driven^2) 1 1.4771e+11 8.7910e+14 468948
## - Condition:Year 1 5.1772e+11 8.7947e+14 468956
## - KMs.Driven:Transaction.Type 1 5.3669e+11 8.7949e+14 468956
## - Brand:KMs.Driven 19 2.3430e+12 8.8129e+14 468959
## - Registered.City:Year 1 1.9213e+12 8.8087e+14 468986
## - Fuel:Year 4 1.4681e+13 8.9363e+14 469255
## - Transaction.Type:Year 1 2.5554e+13 9.0451e+14 469492
## - Model:Year 4 2.8353e+13 9.0730e+14 469545
## - Brand:Year 19 3.3951e+13 9.1290e+14 469632
## - I(Year^2) 1 5.0410e+13 9.2936e+14 470009
##
## Step: AIC=468946.4
## Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City +
## Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand:KMs.Driven +
## Fuel:KMs.Driven + KMs.Driven:Transaction.Type + Brand:Year +
## Condition:Year + Fuel:Year + Model:Year + Registered.City:Year +
## Transaction.Type:Year
##
## Df Sum of Sq RSS AIC
## <none> 8.7903e+14 468946
## - Fuel:KMs.Driven 4 3.7461e+11 8.7941e+14 468947
## - I(KMs.Driven^2) 1 1.4369e+11 8.7917e+14 468947
## - KMs.Driven:Transaction.Type 1 5.0040e+11 8.7953e+14 468955
## - Condition:Year 1 5.1690e+11 8.7955e+14 468956
## - Brand:KMs.Driven 19 2.3748e+12 8.8141e+14 468960
## - Registered.City:Year 1 1.8661e+12 8.8090e+14 468985
## - Fuel:Year 4 1.4686e+13 8.9372e+14 469255
## - Transaction.Type:Year 1 2.5630e+13 9.0466e+14 469493
## - Model:Year 4 2.8365e+13 9.0740e+14 469545
## - Brand:Year 19 3.3977e+13 9.1301e+14 469632
## - I(Year^2) 1 5.0442e+13 9.2947e+14 470010
Model5 <- Price ~ Brand + Condition + Fuel + KMs.Driven + Model + Registered.City +
Transaction.Type + Year + I(Year^2) + I(KMs.Driven^2) + Brand *
KMs.Driven + Condition * KMs.Driven + Fuel * KMs.Driven +
Model * KMs.Driven + Registered.City * KMs.Driven + Transaction.Type *
KMs.Driven + Brand * Year + Condition * Year + Fuel * Year +
Model * Year + Registered.City * Year + Transaction.Type *
Year
fit5 <- lm(Model5, data = df)
summary(fit5)##
## Call:
## lm(formula = Model5, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1754950 -98987 -3800 91953 1797933
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 3.205e+09 1.010e+08
## BrandBMW -3.391e+07 2.904e+07
## BrandChangan -5.473e+07 1.014e+08
## BrandChevrolet -8.923e+07 3.111e+07
## BrandClassic & Antiques -1.210e+05 2.310e+05
## BrandDaewoo -4.225e+07 2.739e+07
## BrandDaihatsu -1.636e+07 2.553e+07
## BrandFAW -1.241e+07 2.904e+07
## BrandHonda -3.991e+07 2.552e+07
## BrandHyundai -9.218e+06 2.608e+07
## BrandKIA -2.392e+07 2.696e+07
## BrandLand Rover -3.537e+05 2.329e+05
## BrandLexus 1.298e+08 3.455e+07
## BrandMazda -1.160e+07 2.612e+07
## BrandMercedes -7.196e+07 2.652e+07
## BrandMitsubishi -3.023e+06 2.560e+07
## BrandNissan -1.672e+07 2.557e+07
## BrandOther Brands 1.083e+07 2.579e+07
## BrandPorsche 4.023e+05 2.459e+05
## BrandRange Rover -1.610e+09 6.654e+08
## BrandSubaru 3.201e+08 9.822e+07
## BrandSuzuki -1.057e+07 2.553e+07
## BrandToyota -2.717e+07 2.552e+07
## ConditionUsed -4.129e+06 1.193e+06
## FuelDiesel 1.163e+07 2.965e+06
## FuelHybrid -3.096e+07 2.281e+06
## FuelLPG -2.037e+07 1.099e+07
## FuelPetrol -1.165e+07 9.604e+05
## KMs.Driven -9.443e-01 1.230e+00
## ModelSedan -2.177e+07 2.728e+06
## ModelSUV 1.256e+07 2.947e+06
## ModelHatchback -1.072e+07 2.778e+06
## ModelTruck 2.228e+07 4.617e+06
## Registered.CityCapital -1.238e+07 1.932e+06
## Transaction.TypeInstallment/Leasing 6.511e+07 2.803e+06
## Year -3.206e+06 9.826e+04
## I(Year^2) 8.021e+02 2.431e+01
## I(KMs.Driven^2) 1.676e-09 1.042e-09
## BrandBMW:KMs.Driven 3.255e+00 1.497e+00
## BrandChangan:KMs.Driven 8.480e-01 1.230e+00
## BrandChevrolet:KMs.Driven 7.710e-01 1.254e+00
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 9.059e-01 1.230e+00
## BrandDaihatsu:KMs.Driven 8.622e-01 1.229e+00
## BrandFAW:KMs.Driven 8.081e-02 1.306e+00
## BrandHonda:KMs.Driven 8.613e-01 1.229e+00
## BrandHyundai:KMs.Driven 8.744e-01 1.229e+00
## BrandKIA:KMs.Driven 8.979e-01 1.229e+00
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -2.421e-01 1.268e+00
## BrandMazda:KMs.Driven 7.275e-01 1.249e+00
## BrandMercedes:KMs.Driven 9.997e-01 1.369e+00
## BrandMitsubishi:KMs.Driven 9.133e-01 1.230e+00
## BrandNissan:KMs.Driven 8.526e-01 1.229e+00
## BrandOther Brands:KMs.Driven 8.643e-01 1.229e+00
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.014e+02 8.123e+01
## BrandSubaru:KMs.Driven -7.642e+00 2.934e+00
## BrandSuzuki:KMs.Driven 8.589e-01 1.229e+00
## BrandToyota:KMs.Driven 8.613e-01 1.229e+00
## ConditionUsed:KMs.Driven 1.278e-02 1.302e-02
## FuelDiesel:KMs.Driven -9.044e-02 3.485e-02
## FuelHybrid:KMs.Driven 6.504e-03 1.376e-02
## FuelLPG:KMs.Driven -3.417e-01 8.267e-01
## FuelPetrol:KMs.Driven -1.011e-02 6.486e-03
## KMs.Driven:ModelSedan 2.327e-02 2.024e-02
## KMs.Driven:ModelSUV 3.002e-02 2.122e-02
## KMs.Driven:ModelHatchback 2.805e-02 1.977e-02
## KMs.Driven:ModelTruck -2.738e-03 2.955e-02
## KMs.Driven:Registered.CityCapital 2.900e-02 2.113e-02
## KMs.Driven:Transaction.TypeInstallment/Leasing 2.629e-01 7.711e-02
## BrandBMW:Year 1.702e+04 1.444e+04
## BrandChangan:Year 2.701e+04 5.051e+04
## BrandChevrolet:Year 4.424e+04 1.546e+04
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 2.098e+04 1.361e+04
## BrandDaihatsu:Year 8.004e+03 1.267e+04
## BrandFAW:Year 5.990e+03 1.441e+04
## BrandHonda:Year 1.987e+04 1.267e+04
## BrandHyundai:Year 4.409e+03 1.295e+04
## BrandKIA:Year 1.175e+04 1.339e+04
## BrandLand Rover:Year NA NA
## BrandLexus:Year -6.451e+04 1.717e+04
## BrandMazda:Year 5.697e+03 1.297e+04
## BrandMercedes:Year 3.615e+04 1.318e+04
## BrandMitsubishi:Year 1.382e+03 1.271e+04
## BrandNissan:Year 8.233e+03 1.269e+04
## BrandOther Brands:Year -5.622e+03 1.280e+04
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 7.980e+05 3.298e+05
## BrandSubaru:Year -1.589e+05 4.871e+04
## BrandSuzuki:Year 5.076e+03 1.267e+04
## BrandToyota:Year 1.358e+04 1.267e+04
## ConditionUsed:Year 2.042e+03 5.938e+02
## FuelDiesel:Year -5.669e+03 1.483e+03
## FuelHybrid:Year 1.552e+04 1.137e+03
## FuelLPG:Year 1.020e+04 5.496e+03
## FuelPetrol:Year 5.865e+03 4.794e+02
## ModelSedan:Year 1.096e+04 1.363e+03
## ModelSUV:Year -6.162e+03 1.471e+03
## ModelHatchback:Year 5.410e+03 1.387e+03
## ModelTruck:Year -1.104e+04 2.302e+03
## Registered.CityCapital:Year 6.198e+03 9.634e+02
## Transaction.TypeInstallment/Leasing:Year -3.271e+04 1.392e+03
## t value Pr(>|t|)
## (Intercept) 31.739 < 2e-16 ***
## BrandBMW -1.168 0.242919
## BrandChangan -0.540 0.589355
## BrandChevrolet -2.868 0.004131 **
## BrandClassic & Antiques -0.524 0.600340
## BrandDaewoo -1.542 0.122992
## BrandDaihatsu -0.641 0.521725
## BrandFAW -0.427 0.669024
## BrandHonda -1.564 0.117920
## BrandHyundai -0.353 0.723735
## BrandKIA -0.887 0.374900
## BrandLand Rover -1.519 0.128880
## BrandLexus 3.758 0.000172 ***
## BrandMazda -0.444 0.657142
## BrandMercedes -2.713 0.006671 **
## BrandMitsubishi -0.118 0.905998
## BrandNissan -0.654 0.513137
## BrandOther Brands 0.420 0.674441
## BrandPorsche 1.636 0.101804
## BrandRange Rover -2.420 0.015531 *
## BrandSubaru 3.259 0.001122 **
## BrandSuzuki -0.414 0.678777
## BrandToyota -1.065 0.287051
## ConditionUsed -3.460 0.000542 ***
## FuelDiesel 3.921 8.85e-05 ***
## FuelHybrid -13.570 < 2e-16 ***
## FuelLPG -1.852 0.063991 .
## FuelPetrol -12.132 < 2e-16 ***
## KMs.Driven -0.768 0.442535
## ModelSedan -7.979 1.56e-15 ***
## ModelSUV 4.261 2.04e-05 ***
## ModelHatchback -3.860 0.000114 ***
## ModelTruck 4.826 1.41e-06 ***
## Registered.CityCapital -6.410 1.49e-10 ***
## Transaction.TypeInstallment/Leasing 23.231 < 2e-16 ***
## Year -32.629 < 2e-16 ***
## I(Year^2) 33.000 < 2e-16 ***
## I(KMs.Driven^2) 1.609 0.107675
## BrandBMW:KMs.Driven 2.175 0.029652 *
## BrandChangan:KMs.Driven 0.690 0.490393
## BrandChevrolet:KMs.Driven 0.615 0.538634
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 0.737 0.461311
## BrandDaihatsu:KMs.Driven 0.701 0.483072
## BrandFAW:KMs.Driven 0.062 0.950648
## BrandHonda:KMs.Driven 0.701 0.483555
## BrandHyundai:KMs.Driven 0.711 0.476967
## BrandKIA:KMs.Driven 0.730 0.465212
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -0.191 0.848544
## BrandMazda:KMs.Driven 0.583 0.560191
## BrandMercedes:KMs.Driven 0.730 0.465213
## BrandMitsubishi:KMs.Driven 0.743 0.457665
## BrandNissan:KMs.Driven 0.694 0.487994
## BrandOther Brands:KMs.Driven 0.703 0.482070
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.479 0.013167 *
## BrandSubaru:KMs.Driven -2.605 0.009199 **
## BrandSuzuki:KMs.Driven 0.699 0.484768
## BrandToyota:KMs.Driven 0.701 0.483555
## ConditionUsed:KMs.Driven 0.982 0.326113
## FuelDiesel:KMs.Driven -2.595 0.009460 **
## FuelHybrid:KMs.Driven 0.473 0.636508
## FuelLPG:KMs.Driven -0.413 0.679330
## FuelPetrol:KMs.Driven -1.559 0.119041
## KMs.Driven:ModelSedan 1.149 0.250405
## KMs.Driven:ModelSUV 1.415 0.157170
## KMs.Driven:ModelHatchback 1.419 0.156035
## KMs.Driven:ModelTruck -0.093 0.926167
## KMs.Driven:Registered.CityCapital 1.373 0.169883
## KMs.Driven:Transaction.TypeInstallment/Leasing 3.409 0.000653 ***
## BrandBMW:Year 1.178 0.238688
## BrandChangan:Year 0.535 0.592894
## BrandChevrolet:Year 2.862 0.004217 **
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 1.541 0.123398
## BrandDaihatsu:Year 0.632 0.527681
## BrandFAW:Year 0.416 0.677641
## BrandHonda:Year 1.568 0.116903
## BrandHyundai:Year 0.340 0.733495
## BrandKIA:Year 0.878 0.380092
## BrandLand Rover:Year NA NA
## BrandLexus:Year -3.756 0.000173 ***
## BrandMazda:Year 0.439 0.660488
## BrandMercedes:Year 2.744 0.006074 **
## BrandMitsubishi:Year 0.109 0.913411
## BrandNissan:Year 0.649 0.516531
## BrandOther Brands:Year -0.439 0.660547
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 2.419 0.015553 *
## BrandSubaru:Year -3.263 0.001106 **
## BrandSuzuki:Year 0.401 0.688706
## BrandToyota:Year 1.072 0.283814
## ConditionUsed:Year 3.439 0.000586 ***
## FuelDiesel:Year -3.824 0.000132 ***
## FuelHybrid:Year 13.643 < 2e-16 ***
## FuelLPG:Year 1.855 0.063616 .
## FuelPetrol:Year 12.234 < 2e-16 ***
## ModelSedan:Year 8.043 9.30e-16 ***
## ModelSUV:Year -4.189 2.81e-05 ***
## ModelHatchback:Year 3.901 9.62e-05 ***
## ModelTruck:Year -4.797 1.62e-06 ***
## Registered.CityCapital:Year 6.434 1.28e-10 ***
## Transaction.TypeInstallment/Leasing:Year -23.507 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 215100 on 18994 degrees of freedom
## Multiple R-squared: 0.7837, Adjusted R-squared: 0.7826
## F-statistic: 709.5 on 97 and 18994 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Price ~ Brand + Condition + Fuel + KMs.Driven +
## Model + Registered.City + Transaction.Type + Year + I(Year^2) +
## I(KMs.Driven^2) + Brand:KMs.Driven + Fuel:KMs.Driven + KMs.Driven:Transaction.Type +
## Brand:Year + Condition:Year + Fuel:Year + Model:Year + Registered.City:Year +
## Transaction.Type:Year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1752898 -99114 -3796 91814 1798060
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 3.205e+09 1.010e+08
## BrandBMW -3.377e+07 2.904e+07
## BrandChangan -5.457e+07 1.014e+08
## BrandChevrolet -8.894e+07 3.111e+07
## BrandClassic & Antiques -1.211e+05 2.310e+05
## BrandDaewoo -4.234e+07 2.739e+07
## BrandDaihatsu -1.628e+07 2.553e+07
## BrandFAW -1.230e+07 2.904e+07
## BrandHonda -3.981e+07 2.552e+07
## BrandHyundai -9.269e+06 2.608e+07
## BrandKIA -2.381e+07 2.696e+07
## BrandLand Rover -3.539e+05 2.329e+05
## BrandLexus 1.303e+08 3.455e+07
## BrandMazda -1.142e+07 2.612e+07
## BrandMercedes -7.172e+07 2.652e+07
## BrandMitsubishi -2.960e+06 2.560e+07
## BrandNissan -1.664e+07 2.557e+07
## BrandOther Brands 1.093e+07 2.579e+07
## BrandPorsche 4.021e+05 2.459e+05
## BrandRange Rover -1.611e+09 6.655e+08
## BrandSubaru 3.197e+08 9.822e+07
## BrandSuzuki -1.054e+07 2.553e+07
## BrandToyota -2.702e+07 2.552e+07
## ConditionUsed -3.984e+06 1.185e+06
## FuelDiesel 1.168e+07 2.964e+06
## FuelHybrid -3.101e+07 2.281e+06
## FuelLPG -2.041e+07 1.099e+07
## FuelPetrol -1.167e+07 9.602e+05
## KMs.Driven -8.787e-01 1.229e+00
## ModelSedan -2.139e+07 2.710e+06
## ModelSUV 1.305e+07 2.928e+06
## ModelHatchback -1.026e+07 2.759e+06
## ModelTruck 2.172e+07 4.548e+06
## Registered.CityCapital -1.217e+07 1.924e+06
## Transaction.TypeInstallment/Leasing 6.516e+07 2.801e+06
## Year -3.207e+06 9.825e+04
## I(Year^2) 8.025e+02 2.430e+01
## I(KMs.Driven^2) 1.814e-09 1.029e-09
## BrandBMW:KMs.Driven 3.254e+00 1.497e+00
## BrandChangan:KMs.Driven 8.417e-01 1.230e+00
## BrandChevrolet:KMs.Driven 7.641e-01 1.254e+00
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 9.090e-01 1.230e+00
## BrandDaihatsu:KMs.Driven 8.585e-01 1.229e+00
## BrandFAW:KMs.Driven 7.082e-02 1.306e+00
## BrandHonda:KMs.Driven 8.579e-01 1.229e+00
## BrandHyundai:KMs.Driven 8.668e-01 1.229e+00
## BrandKIA:KMs.Driven 8.847e-01 1.229e+00
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -2.647e-01 1.267e+00
## BrandMazda:KMs.Driven 7.205e-01 1.249e+00
## BrandMercedes:KMs.Driven 9.961e-01 1.369e+00
## BrandMitsubishi:KMs.Driven 9.125e-01 1.230e+00
## BrandNissan:KMs.Driven 8.507e-01 1.229e+00
## BrandOther Brands:KMs.Driven 8.623e-01 1.230e+00
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.014e+02 8.123e+01
## BrandSubaru:KMs.Driven -7.633e+00 2.934e+00
## BrandSuzuki:KMs.Driven 8.589e-01 1.229e+00
## BrandToyota:KMs.Driven 8.545e-01 1.229e+00
## FuelDiesel:KMs.Driven -8.823e-02 3.479e-02
## FuelHybrid:KMs.Driven 1.028e-02 1.343e-02
## FuelLPG:KMs.Driven -3.426e-01 8.267e-01
## FuelPetrol:KMs.Driven -5.708e-03 6.038e-03
## KMs.Driven:Transaction.TypeInstallment/Leasing 2.521e-01 7.666e-02
## BrandBMW:Year 1.695e+04 1.444e+04
## BrandChangan:Year 2.693e+04 5.051e+04
## BrandChevrolet:Year 4.410e+04 1.546e+04
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 2.102e+04 1.361e+04
## BrandDaihatsu:Year 7.964e+03 1.267e+04
## BrandFAW:Year 5.936e+03 1.441e+04
## BrandHonda:Year 1.982e+04 1.267e+04
## BrandHyundai:Year 4.435e+03 1.295e+04
## BrandKIA:Year 1.170e+04 1.339e+04
## BrandLand Rover:Year NA NA
## BrandLexus:Year -6.476e+04 1.717e+04
## BrandMazda:Year 5.610e+03 1.297e+04
## BrandMercedes:Year 3.603e+04 1.318e+04
## BrandMitsubishi:Year 1.351e+03 1.271e+04
## BrandNissan:Year 8.192e+03 1.269e+04
## BrandOther Brands:Year -5.670e+03 1.280e+04
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 7.981e+05 3.298e+05
## BrandSubaru:Year -1.587e+05 4.871e+04
## BrandSuzuki:Year 5.060e+03 1.267e+04
## BrandToyota:Year 1.350e+04 1.267e+04
## ConditionUsed:Year 1.971e+03 5.896e+02
## FuelDiesel:Year -5.696e+03 1.482e+03
## FuelHybrid:Year 1.554e+04 1.137e+03
## FuelLPG:Year 1.022e+04 5.496e+03
## FuelPetrol:Year 5.874e+03 4.793e+02
## ModelSedan:Year 1.077e+04 1.353e+03
## ModelSUV:Year -6.406e+03 1.462e+03
## ModelHatchback:Year 5.183e+03 1.378e+03
## ModelTruck:Year -1.077e+04 2.268e+03
## Registered.CityCapital:Year 6.093e+03 9.594e+02
## Transaction.TypeInstallment/Leasing:Year -3.274e+04 1.391e+03
## t value Pr(>|t|)
## (Intercept) 31.749 < 2e-16 ***
## BrandBMW -1.163 0.244936
## BrandChangan -0.538 0.590422
## BrandChevrolet -2.859 0.004256 **
## BrandClassic & Antiques -0.524 0.600247
## BrandDaewoo -1.546 0.122212
## BrandDaihatsu -0.638 0.523785
## BrandFAW -0.424 0.671772
## BrandHonda -1.560 0.118834
## BrandHyundai -0.355 0.722272
## BrandKIA -0.883 0.377164
## BrandLand Rover -1.519 0.128742
## BrandLexus 3.773 0.000162 ***
## BrandMazda -0.437 0.661962
## BrandMercedes -2.704 0.006854 **
## BrandMitsubishi -0.116 0.907955
## BrandNissan -0.651 0.515240
## BrandOther Brands 0.424 0.671735
## BrandPorsche 1.635 0.101961
## BrandRange Rover -2.420 0.015516 *
## BrandSubaru 3.255 0.001137 **
## BrandSuzuki -0.413 0.679693
## BrandToyota -1.059 0.289774
## ConditionUsed -3.363 0.000772 ***
## FuelDiesel 3.940 8.16e-05 ***
## FuelHybrid -13.595 < 2e-16 ***
## FuelLPG -1.857 0.063364 .
## FuelPetrol -12.153 < 2e-16 ***
## KMs.Driven -0.715 0.474741
## ModelSedan -7.894 3.09e-15 ***
## ModelSUV 4.456 8.39e-06 ***
## ModelHatchback -3.719 0.000201 ***
## ModelTruck 4.776 1.80e-06 ***
## Registered.CityCapital -6.326 2.57e-10 ***
## Transaction.TypeInstallment/Leasing 23.261 < 2e-16 ***
## Year -32.643 < 2e-16 ***
## I(Year^2) 33.020 < 2e-16 ***
## I(KMs.Driven^2) 1.762 0.078031 .
## BrandBMW:KMs.Driven 2.174 0.029698 *
## BrandChangan:KMs.Driven 0.685 0.493634
## BrandChevrolet:KMs.Driven 0.609 0.542282
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 0.739 0.459768
## BrandDaihatsu:KMs.Driven 0.698 0.484968
## BrandFAW:KMs.Driven 0.054 0.956743
## BrandHonda:KMs.Driven 0.698 0.485279
## BrandHyundai:KMs.Driven 0.705 0.480784
## BrandKIA:KMs.Driven 0.720 0.471757
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -0.209 0.834548
## BrandMazda:KMs.Driven 0.577 0.563957
## BrandMercedes:KMs.Driven 0.728 0.466820
## BrandMitsubishi:KMs.Driven 0.742 0.458098
## BrandNissan:KMs.Driven 0.692 0.488950
## BrandOther Brands:KMs.Driven 0.701 0.483120
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.480 0.013155 *
## BrandSubaru:KMs.Driven -2.602 0.009276 **
## BrandSuzuki:KMs.Driven 0.699 0.484755
## BrandToyota:KMs.Driven 0.695 0.486995
## FuelDiesel:KMs.Driven -2.536 0.011211 *
## FuelHybrid:KMs.Driven 0.766 0.443844
## FuelLPG:KMs.Driven -0.414 0.678593
## FuelPetrol:KMs.Driven -0.945 0.344547
## KMs.Driven:Transaction.TypeInstallment/Leasing 3.289 0.001008 **
## BrandBMW:Year 1.173 0.240681
## BrandChangan:Year 0.533 0.593962
## BrandChevrolet:Year 2.852 0.004344 **
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 1.544 0.122610
## BrandDaihatsu:Year 0.628 0.529753
## BrandFAW:Year 0.412 0.680398
## BrandHonda:Year 1.564 0.117812
## BrandHyundai:Year 0.342 0.731988
## BrandKIA:Year 0.874 0.382336
## BrandLand Rover:Year NA NA
## BrandLexus:Year -3.771 0.000163 ***
## BrandMazda:Year 0.433 0.665309
## BrandMercedes:Year 2.735 0.006243 **
## BrandMitsubishi:Year 0.106 0.915378
## BrandNissan:Year 0.645 0.518644
## BrandOther Brands:Year -0.443 0.657865
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 2.420 0.015537 *
## BrandSubaru:Year -3.259 0.001121 **
## BrandSuzuki:Year 0.399 0.689632
## BrandToyota:Year 1.066 0.286514
## ConditionUsed:Year 3.343 0.000832 ***
## FuelDiesel:Year -3.843 0.000122 ***
## FuelHybrid:Year 13.668 < 2e-16 ***
## FuelLPG:Year 1.859 0.062992 .
## FuelPetrol:Year 12.255 < 2e-16 ***
## ModelSedan:Year 7.958 1.85e-15 ***
## ModelSUV:Year -4.381 1.19e-05 ***
## ModelHatchback:Year 3.761 0.000170 ***
## ModelTruck:Year -4.747 2.08e-06 ***
## Registered.CityCapital:Year 6.351 2.19e-10 ***
## Transaction.TypeInstallment/Leasing:Year -23.537 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 215100 on 19000 degrees of freedom
## Multiple R-squared: 0.7836, Adjusted R-squared: 0.7826
## F-statistic: 756.2 on 91 and 19000 DF, p-value: < 2.2e-16
# Since there is a problem of heteroscedsticity we would use WLS model to rectify the problem
# extracting the residuals from the OLS regression model
residuals <- fit5$residuals
# converting the residuals into absolute Positive value
PstvRsiduals <- abs(residuals)
# regress the fitted values of the OLS model on Absolute Positive values of residuals
FonR <- lm(PstvRsiduals ~ fitted(fit5))
# calculating the weights
wts <- 1/(FonR$fitted.values)^2
# building weighted least square regression model
WLSModel <- lm(Model5, data = df, weights = wts)
summary(WLSModel)##
## Call:
## lm(formula = Model5, data = df, weights = wts)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -6.6101 -0.7495 -0.0537 0.6903 22.8586
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 2.901e+09 8.151e+07
## BrandBMW 4.842e+06 3.159e+07
## BrandChangan -2.703e+07 7.471e+07
## BrandChevrolet 7.800e+05 2.926e+07
## BrandClassic & Antiques -1.381e+05 2.430e+05
## BrandDaewoo -2.823e+07 2.876e+07
## BrandDaihatsu 1.185e+07 2.721e+07
## BrandFAW 2.369e+07 2.821e+07
## BrandHonda -1.118e+07 2.720e+07
## BrandHyundai 1.889e+07 2.737e+07
## BrandKIA -3.310e+06 2.771e+07
## BrandLand Rover -5.579e+05 1.567e+05
## BrandLexus 1.198e+08 3.978e+07
## BrandMazda 1.266e+07 2.748e+07
## BrandMercedes -3.234e+07 2.861e+07
## BrandMitsubishi 1.953e+07 2.725e+07
## BrandNissan 1.039e+07 2.723e+07
## BrandOther Brands 2.581e+07 2.729e+07
## BrandPorsche 3.021e+05 2.440e+05
## BrandRange Rover -1.244e+09 6.301e+08
## BrandSubaru 2.402e+08 9.255e+07
## BrandSuzuki 1.745e+07 2.721e+07
## BrandToyota 1.155e+06 2.720e+07
## ConditionUsed -3.289e+06 8.330e+05
## FuelDiesel 8.752e+06 2.929e+06
## FuelHybrid -1.529e+07 1.555e+06
## FuelLPG -7.880e+06 8.341e+06
## FuelPetrol -7.747e+06 6.106e+05
## KMs.Driven -6.820e-01 1.256e+00
## ModelSedan -1.022e+07 2.218e+06
## ModelSUV 1.106e+07 2.360e+06
## ModelHatchback -4.885e+06 2.207e+06
## ModelTruck 1.032e+07 3.721e+06
## Registered.CityCapital -9.826e+06 1.285e+06
## Transaction.TypeInstallment/Leasing 6.331e+07 1.692e+06
## Year -2.922e+06 7.777e+04
## I(Year^2) 7.359e+02 1.909e+01
## I(KMs.Driven^2) 1.251e-10 6.357e-10
## BrandBMW:KMs.Driven 2.716e+00 1.662e+00
## BrandChangan:KMs.Driven 6.297e-01 1.256e+00
## BrandChevrolet:KMs.Driven 4.352e-01 1.268e+00
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 6.648e-01 1.256e+00
## BrandDaihatsu:KMs.Driven 6.452e-01 1.256e+00
## BrandFAW:KMs.Driven 4.844e-01 1.301e+00
## BrandHonda:KMs.Driven 6.456e-01 1.256e+00
## BrandHyundai:KMs.Driven 6.670e-01 1.256e+00
## BrandKIA:KMs.Driven 6.740e-01 1.256e+00
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -5.759e-01 1.269e+00
## BrandMazda:KMs.Driven 6.206e-01 1.262e+00
## BrandMercedes:KMs.Driven 2.038e-01 1.433e+00
## BrandMitsubishi:KMs.Driven 6.919e-01 1.256e+00
## BrandNissan:KMs.Driven 6.355e-01 1.256e+00
## BrandOther Brands:KMs.Driven 6.446e-01 1.256e+00
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 1.608e+02 7.703e+01
## BrandSubaru:KMs.Driven -5.225e+00 2.804e+00
## BrandSuzuki:KMs.Driven 6.399e-01 1.256e+00
## BrandToyota:KMs.Driven 6.487e-01 1.256e+00
## ConditionUsed:KMs.Driven 2.008e-03 7.822e-03
## FuelDiesel:KMs.Driven -8.075e-02 2.947e-02
## FuelHybrid:KMs.Driven 1.109e-03 9.310e-03
## FuelLPG:KMs.Driven -4.604e-01 4.181e-01
## FuelPetrol:KMs.Driven -5.852e-03 3.538e-03
## KMs.Driven:ModelSedan 2.279e-02 1.284e-02
## KMs.Driven:ModelSUV 2.999e-02 1.370e-02
## KMs.Driven:ModelHatchback 2.892e-02 1.255e-02
## KMs.Driven:ModelTruck 1.417e-02 1.651e-02
## KMs.Driven:Registered.CityCapital 9.798e-03 6.611e-03
## KMs.Driven:Transaction.TypeInstallment/Leasing 2.891e-01 5.422e-02
## BrandBMW:Year -2.344e+03 1.573e+04
## BrandChangan:Year 1.320e+04 3.722e+04
## BrandChevrolet:Year -5.518e+02 1.454e+04
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 1.403e+04 1.430e+04
## BrandDaihatsu:Year -6.061e+03 1.351e+04
## BrandFAW:Year -1.200e+04 1.401e+04
## BrandHonda:Year 5.536e+03 1.351e+04
## BrandHyundai:Year -9.604e+03 1.360e+04
## BrandKIA:Year 1.476e+03 1.377e+04
## BrandLand Rover:Year NA NA
## BrandLexus:Year -5.957e+04 1.978e+04
## BrandMazda:Year -6.423e+03 1.365e+04
## BrandMercedes:Year 1.639e+04 1.423e+04
## BrandMitsubishi:Year -9.863e+03 1.354e+04
## BrandNissan:Year -5.299e+03 1.352e+04
## BrandOther Brands:Year -1.306e+04 1.356e+04
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 6.165e+05 3.123e+05
## BrandSubaru:Year -1.193e+05 4.589e+04
## BrandSuzuki:Year -8.887e+03 1.352e+04
## BrandToyota:Year -5.539e+02 1.351e+04
## ConditionUsed:Year 1.629e+03 4.151e+02
## FuelDiesel:Year -4.235e+03 1.466e+03
## FuelHybrid:Year 7.684e+03 7.786e+02
## FuelLPG:Year 3.965e+03 4.186e+03
## FuelPetrol:Year 3.910e+03 3.053e+02
## ModelSedan:Year 5.157e+03 1.109e+03
## ModelSUV:Year -5.458e+03 1.180e+03
## ModelHatchback:Year 2.459e+03 1.104e+03
## ModelTruck:Year -5.092e+03 1.855e+03
## Registered.CityCapital:Year 4.919e+03 6.414e+02
## Transaction.TypeInstallment/Leasing:Year -3.176e+04 8.407e+02
## t value Pr(>|t|)
## (Intercept) 35.585 < 2e-16 ***
## BrandBMW 0.153 0.878170
## BrandChangan -0.362 0.717478
## BrandChevrolet 0.027 0.978733
## BrandClassic & Antiques -0.568 0.569812
## BrandDaewoo -0.982 0.326307
## BrandDaihatsu 0.435 0.663280
## BrandFAW 0.840 0.400971
## BrandHonda -0.411 0.681075
## BrandHyundai 0.690 0.490206
## BrandKIA -0.119 0.904908
## BrandLand Rover -3.560 0.000371 ***
## BrandLexus 3.011 0.002608 **
## BrandMazda 0.461 0.644937
## BrandMercedes -1.130 0.258336
## BrandMitsubishi 0.716 0.473712
## BrandNissan 0.382 0.702695
## BrandOther Brands 0.946 0.344200
## BrandPorsche 1.238 0.215643
## BrandRange Rover -1.975 0.048286 *
## BrandSubaru 2.595 0.009473 **
## BrandSuzuki 0.641 0.521304
## BrandToyota 0.042 0.966131
## ConditionUsed -3.949 7.88e-05 ***
## FuelDiesel 2.988 0.002816 **
## FuelHybrid -9.830 < 2e-16 ***
## FuelLPG -0.945 0.344823
## FuelPetrol -12.688 < 2e-16 ***
## KMs.Driven -0.543 0.587103
## ModelSedan -4.608 4.08e-06 ***
## ModelSUV 4.687 2.79e-06 ***
## ModelHatchback -2.213 0.026903 *
## ModelTruck 2.773 0.005568 **
## Registered.CityCapital -7.647 2.16e-14 ***
## Transaction.TypeInstallment/Leasing 37.429 < 2e-16 ***
## Year -37.570 < 2e-16 ***
## I(Year^2) 38.550 < 2e-16 ***
## I(KMs.Driven^2) 0.197 0.843926
## BrandBMW:KMs.Driven 1.634 0.102270
## BrandChangan:KMs.Driven 0.501 0.616077
## BrandChevrolet:KMs.Driven 0.343 0.731364
## BrandClassic & Antiques:KMs.Driven NA NA
## BrandDaewoo:KMs.Driven 0.529 0.596580
## BrandDaihatsu:KMs.Driven 0.514 0.607427
## BrandFAW:KMs.Driven 0.372 0.709565
## BrandHonda:KMs.Driven 0.514 0.607154
## BrandHyundai:KMs.Driven 0.531 0.595332
## BrandKIA:KMs.Driven 0.537 0.591474
## BrandLand Rover:KMs.Driven NA NA
## BrandLexus:KMs.Driven -0.454 0.650050
## BrandMazda:KMs.Driven 0.492 0.622988
## BrandMercedes:KMs.Driven 0.142 0.886911
## BrandMitsubishi:KMs.Driven 0.551 0.581729
## BrandNissan:KMs.Driven 0.506 0.612824
## BrandOther Brands:KMs.Driven 0.513 0.607775
## BrandPorsche:KMs.Driven NA NA
## BrandRange Rover:KMs.Driven 2.088 0.036847 *
## BrandSubaru:KMs.Driven -1.863 0.062441 .
## BrandSuzuki:KMs.Driven 0.510 0.610371
## BrandToyota:KMs.Driven 0.517 0.605477
## ConditionUsed:KMs.Driven 0.257 0.797433
## FuelDiesel:KMs.Driven -2.740 0.006154 **
## FuelHybrid:KMs.Driven 0.119 0.905180
## FuelLPG:KMs.Driven -1.101 0.270792
## FuelPetrol:KMs.Driven -1.654 0.098163 .
## KMs.Driven:ModelSedan 1.775 0.075873 .
## KMs.Driven:ModelSUV 2.188 0.028662 *
## KMs.Driven:ModelHatchback 2.305 0.021186 *
## KMs.Driven:ModelTruck 0.858 0.390908
## KMs.Driven:Registered.CityCapital 1.482 0.138309
## KMs.Driven:Transaction.TypeInstallment/Leasing 5.331 9.85e-08 ***
## BrandBMW:Year -0.149 0.881537
## BrandChangan:Year 0.355 0.722858
## BrandChevrolet:Year -0.038 0.969732
## BrandClassic & Antiques:Year NA NA
## BrandDaewoo:Year 0.981 0.326465
## BrandDaihatsu:Year -0.448 0.653800
## BrandFAW:Year -0.856 0.391801
## BrandHonda:Year 0.410 0.681970
## BrandHyundai:Year -0.706 0.480007
## BrandKIA:Year 0.107 0.914610
## BrandLand Rover:Year NA NA
## BrandLexus:Year -3.013 0.002594 **
## BrandMazda:Year -0.471 0.637996
## BrandMercedes:Year 1.152 0.249352
## BrandMitsubishi:Year -0.728 0.466323
## BrandNissan:Year -0.392 0.695200
## BrandOther Brands:Year -0.964 0.335211
## BrandPorsche:Year NA NA
## BrandRange Rover:Year 1.974 0.048382 *
## BrandSubaru:Year -2.600 0.009324 **
## BrandSuzuki:Year -0.658 0.510806
## BrandToyota:Year -0.041 0.967309
## ConditionUsed:Year 3.923 8.78e-05 ***
## FuelDiesel:Year -2.889 0.003872 **
## FuelHybrid:Year 9.869 < 2e-16 ***
## FuelLPG:Year 0.947 0.343617
## FuelPetrol:Year 12.806 < 2e-16 ***
## ModelSedan:Year 4.648 3.37e-06 ***
## ModelSUV:Year -4.627 3.73e-06 ***
## ModelHatchback:Year 2.228 0.025898 *
## ModelTruck:Year -2.745 0.006057 **
## Registered.CityCapital:Year 7.670 1.81e-14 ***
## Transaction.TypeInstallment/Leasing:Year -37.775 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.458 on 18994 degrees of freedom
## Multiple R-squared: 0.7718, Adjusted R-squared: 0.7706
## F-statistic: 662.2 on 97 and 18994 DF, p-value: < 2.2e-16
## Warning: not plotting observations with leverage one:
## 3236, 3492, 4776, 16574, 16764
## Warning: not plotting observations with leverage one:
## 3236, 3492, 4776, 16574, 16764