Este ejercicio consiste en realizar un análisis exploratorio sobre un dataset de vehiculos Toyota Corolla con 1436 instancias y 37 atributos.
El objetivo es conseguir un modelo de regresión lineal con un resultado aceptable interpretando cada paso del razonamiento necesario para llegar al objetivo.
library(fastDummies) # libreria para encoding
library(car)
Loading required package: carData
Registered S3 method overwritten by 'data.table':
method from
print.data.table
library(corrplot) # lbreria para ver la correlacion entre variables
corrplot 0.84 loaded
library(mctest) # libreria para calculo de TOF Y VIF
library(tidyverse) # libreria para limpieza de datos y formateo
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
[37m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[37m[32mv[37m [34mggplot2[37m 3.2.1 [32mv[37m [34mpurrr [37m 0.3.2
[32mv[37m [34mtibble [37m 2.1.3 [32mv[37m [34mdplyr [37m 0.8.3
[32mv[37m [34mtidyr [37m 1.0.0 [32mv[37m [34mstringr[37m 1.4.0
[32mv[37m [34mreadr [37m 1.3.1 [32mv[37m [34mforcats[37m 0.4.0[39m
[37m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()
[31mx[37m [34mdplyr[37m::[32mrecode()[37m masks [34mcar[37m::recode()
[31mx[37m [34mpurrr[37m::[32msome()[37m masks [34mcar[37m::some()[39m
a_raw_data = read.csv("ToyotaCorolla.csv")
a_raw_data
str(a_raw_data)
'data.frame': 1436 obs. of 37 variables:
$ Id : int 1 2 3 4 5 6 7 8 9 10 ...
$ Model : Factor w/ 372 levels "?TOYOTA Corolla 1.3 16V HATCHB G6 2/3-Doors",..: 332 332 67 332 331 331 64 326 62 59 ...
$ Price : int 13500 13750 13950 14950 13750 12950 16900 18600 21500 12950 ...
$ Age_08_04 : int 23 23 24 26 30 32 27 30 27 23 ...
$ Mfg_Month : int 10 10 9 7 3 1 6 3 6 10 ...
$ Mfg_Year : int 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 ...
$ KM : int 46986 72937 41711 48000 38500 61000 94612 75889 19700 71138 ...
$ Fuel_Type : Factor w/ 3 levels "CNG","Diesel",..: 2 2 2 2 2 2 2 2 3 2 ...
$ HP : int 90 90 90 90 90 90 90 90 192 69 ...
$ Met_Color : int 1 1 1 0 0 0 1 1 0 0 ...
$ Automatic : int 0 0 0 0 0 0 0 0 0 0 ...
$ cc : int 2000 2000 2000 2000 2000 2000 2000 2000 1800 1900 ...
$ Doors : int 3 3 3 3 3 3 3 3 3 3 ...
$ Cylinders : int 4 4 4 4 4 4 4 4 4 4 ...
$ Gears : int 5 5 5 5 5 5 5 5 5 5 ...
$ Quarterly_Tax : int 210 210 210 210 210 210 210 210 100 185 ...
$ Weight : int 1165 1165 1165 1165 1170 1170 1245 1245 1185 1105 ...
$ Mfr_Guarantee : int 0 0 1 1 1 0 0 1 0 0 ...
$ BOVAG_Guarantee : int 1 1 1 1 1 1 1 1 1 1 ...
$ Guarantee_Period: int 3 3 3 3 3 3 3 3 3 3 ...
$ ABS : int 1 1 1 1 1 1 1 1 1 1 ...
$ Airbag_1 : int 1 1 1 1 1 1 1 1 1 1 ...
$ Airbag_2 : int 1 1 1 1 1 1 1 1 0 1 ...
$ Airco : int 0 1 0 0 1 1 1 1 1 1 ...
$ Automatic_airco : int 0 0 0 0 0 0 0 0 0 0 ...
$ Boardcomputer : int 1 1 1 1 1 1 1 1 0 1 ...
$ CD_Player : int 0 1 0 0 0 0 0 1 0 0 ...
$ Central_Lock : int 1 1 0 0 1 1 1 1 1 0 ...
$ Powered_Windows : int 1 0 0 0 1 1 1 1 1 0 ...
$ Power_Steering : int 1 1 1 1 1 1 1 1 1 1 ...
$ Radio : int 0 0 0 0 0 0 0 0 1 0 ...
$ Mistlamps : int 0 0 0 0 1 1 0 0 0 0 ...
$ Sport_Model : int 0 0 0 0 0 0 1 0 0 0 ...
$ Backseat_Divider: int 1 1 1 1 1 1 1 1 0 1 ...
$ Metallic_Rim : int 0 0 0 0 0 0 0 0 1 0 ...
$ Radio_cassette : int 0 0 0 0 0 0 0 0 1 0 ...
$ Tow_Bar : int 0 0 0 0 0 0 0 0 0 0 ...
summary(a_raw_data)
Id Model Price
Min. : 1.0 TOYOTA Corolla 1.6 16V HATCHB LINEA TERRA 2/3-Doors: 107 Min. : 4350
1st Qu.: 361.8 TOYOTA Corolla 1.3 16V HATCHB LINEA TERRA 2/3-Doors: 83 1st Qu.: 8450
Median : 721.5 TOYOTA Corolla 1.6 16V LIFTB LINEA LUNA 4/5-Doors : 79 Median : 9900
Mean : 721.6 TOYOTA Corolla 1.6 16V LIFTB LINEA TERRA 4/5-Doors : 70 Mean :10731
3rd Qu.:1081.2 TOYOTA Corolla 1.6 16V SEDAN LINEA TERRA 4/5-Doors : 43 3rd Qu.:11950
Max. :1442.0 TOYOTA Corolla 1.4 16V VVT I HATCHB TERRA 2/3-Doors: 42 Max. :32500
(Other) :1012
Age_08_04 Mfg_Month Mfg_Year KM Fuel_Type HP
Min. : 1.00 Min. : 1.000 Min. :1998 Min. : 1 CNG : 17 Min. : 69.0
1st Qu.:44.00 1st Qu.: 3.000 1st Qu.:1998 1st Qu.: 43000 Diesel: 155 1st Qu.: 90.0
Median :61.00 Median : 5.000 Median :1999 Median : 63390 Petrol:1264 Median :110.0
Mean :55.95 Mean : 5.549 Mean :2000 Mean : 68533 Mean :101.5
3rd Qu.:70.00 3rd Qu.: 8.000 3rd Qu.:2001 3rd Qu.: 87021 3rd Qu.:110.0
Max. :80.00 Max. :12.000 Max. :2004 Max. :243000 Max. :192.0
Met_Color Automatic cc Doors Cylinders Gears
Min. :0.0000 Min. :0.00000 Min. : 1300 Min. :2.000 Min. :4 Min. :3.000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.: 1400 1st Qu.:3.000 1st Qu.:4 1st Qu.:5.000
Median :1.0000 Median :0.00000 Median : 1600 Median :4.000 Median :4 Median :5.000
Mean :0.6748 Mean :0.05571 Mean : 1577 Mean :4.033 Mean :4 Mean :5.026
3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.: 1600 3rd Qu.:5.000 3rd Qu.:4 3rd Qu.:5.000
Max. :1.0000 Max. :1.00000 Max. :16000 Max. :5.000 Max. :4 Max. :6.000
Quarterly_Tax Weight Mfr_Guarantee BOVAG_Guarantee Guarantee_Period ABS
Min. : 19.00 Min. :1000 Min. :0.0000 Min. :0.0000 Min. : 3.000 Min. :0.0000
1st Qu.: 69.00 1st Qu.:1040 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.: 3.000 1st Qu.:1.0000
Median : 85.00 Median :1070 Median :0.0000 Median :1.0000 Median : 3.000 Median :1.0000
Mean : 87.12 Mean :1072 Mean :0.4095 Mean :0.8955 Mean : 3.815 Mean :0.8134
3rd Qu.: 85.00 3rd Qu.:1085 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 3.000 3rd Qu.:1.0000
Max. :283.00 Max. :1615 Max. :1.0000 Max. :1.0000 Max. :36.000 Max. :1.0000
Airbag_1 Airbag_2 Airco Automatic_airco Boardcomputer CD_Player
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :1.0000 Median :1.0000 Median :0.00000 Median :0.0000 Median :0.0000
Mean :0.9708 Mean :0.7228 Mean :0.5084 Mean :0.05641 Mean :0.2946 Mean :0.2187
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
Central_Lock Powered_Windows Power_Steering Radio Mistlamps Sport_Model
Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
Median :1.0000 Median :1.000 Median :1.0000 Median :0.0000 Median :0.000 Median :0.0000
Mean :0.5801 Mean :0.562 Mean :0.9777 Mean :0.1462 Mean :0.257 Mean :0.3001
3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.000 Max. :1.0000
Backseat_Divider Metallic_Rim Radio_cassette Tow_Bar
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.7702 Mean :0.2047 Mean :0.1455 Mean :0.2779
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
Observaciones
* El valor maximo de cc es de 16000, demasiado alto considerando la media.
* El atributo Fuel_Type es de tipo char, requerira un proceso de encoding.
* El valor de Cylinder es constante.
par(mfrow=c(1,1))
boxplot(a_raw_data$Price,main = "Precio Vehiculos Toyota Corolla",
ylab = "Precio ($)", notch = TRUE)
par(mfrow=c(1,2))
boxplot(a_raw_data$Age_08_04,main = "Age_08_04")
boxplot(a_raw_data$Mfg_Year,main = "Año de Mfg_Year")
par(mfrow=c(1,2))
boxplot(a_raw_data$KM,main = "KM",
ylab = "KM", notch = TRUE)
boxplot(a_raw_data$HP,main = "HP",
ylab = "HP", notch = FALSE)
El atributo “HP” presenta un valor outlier superior a 180. Según una investigacion realizada en medios externos al dataset, el valor si corresponde a un modelo de Toyota Corolla.
El atributo “KM” presenta valores outliers. Destaco sobretodo un conjunto de valores superiores a los 200000.
## Grafico CC
boxplot(a_raw_data$cc,main = "Cilindrada",
ylab = "CC", notch = FALSE)
##Graficos Quarterly_Tax y Weight
par(mfrow=c(1,2))
boxplot(a_raw_data$Quarterly_Tax, main="Quarterly_Tax")
boxplot(a_raw_data$Weight, main="Peso(KG)")
lbls <- c("0: No tiene", "1: Tiene")
par(mfrow=c(1,2))
barplot(table(as.factor(a_raw_data$Fuel_Type)), main="Fuel_Type")
pie(x = table(a_raw_data$Radio_cassette), labels = lbls, main="Radio Cassete")
par(mfrow=c(1,2))
pie(x = table(a_raw_data$Metallic_Rim), labels = lbls, main="Metallic Rim")
pie(x = table(a_raw_data$Backseat_Divider) , labels = lbls, main="Backseat_Divider")
par(mfrow=c(1,3))
pie(x = table(a_raw_data$Mistlamps) , labels = lbls, main="Mistlamps")
pie(x = table(a_raw_data$Radio), labels = lbls, main="Radio")
pie(x = table(a_raw_data$Sport_Model), labels = lbls, main="Sport_Model")
par(mfrow=c(1,3))
pie(x = table(a_raw_data$Central_Lock), labels = lbls, main="Central_Lock")
pie(x = table(a_raw_data$CD_Player), labels = lbls, main="CD_Player")
pie(x = table(a_raw_data$Boardcomputer), labels = lbls, main="Boardcomputer")
par(mfrow=c(1,3))
pie(x = table(a_raw_data$Airco), labels = lbls, main="Airco")
pie(x = table(a_raw_data$Airbag_2), labels = lbls, main="Airbag_2")
pie(x = table(a_raw_data$Airbag_1), labels = lbls,main="Airbag_1")
par(mfrow=c(1,2))
barplot(table(as.factor(a_raw_data$Guarantee_Period)), main="Guarantee_Period")
pie(x = table(a_raw_data$Automatic_airco), labels = lbls, main="Automatic_airco")
par(mfrow=c(1,3))
pie(x = table(a_raw_data$Mfr_Guarantee), labels = lbls, main="Mfr_Guarantee")
barplot(table(as.factor(a_raw_data$Gears)), main="Gears")
pie(x = table(a_raw_data$BOVAG_Guarantee), labels = lbls, main="BOVAG_Guarantee")
par(mfrow=c(1,3))
barplot(table(as.factor(a_raw_data$Doors)), main="Doors")
pie(x = table(a_raw_data$Automatic),labels = lbls, main="Automatic")
pie(x = table(a_raw_data$ABS),labels = lbls, main="ABS")
hist(a_raw_data$Price, col="blue", breaks = 60, freq = F)
lines(density(a_raw_data$Price), col = "red", lwd=2)
rug(a_raw_data$Price)
plot(Price~., data=a_raw_data,col="blue")
data_set <- select(a_raw_data, -c("Model","Id"))
data_set <- dummy_cols(data_set, select_columns = "Fuel_Type")
data_set <- select(data_set, -c("Fuel_Type"))
corrplot(cor(data_set), type="upper", method="pie")
the standard deviation is zero
cor(data_set)
the standard deviation is zero
Price Age_08_04 Mfg_Month Mfg_Year KM HP Met_Color
Price 1.00000000 -0.876590497 -0.018138222 0.885159220 -0.569960165 0.31498983 0.108904755
Age_08_04 -0.87659050 1.000000000 -0.123255398 -0.983661157 0.505672180 -0.15662202 -0.108149585
Mfg_Month -0.01813822 -0.123255398 1.000000000 -0.057415518 -0.020629897 -0.03931242 0.030265828
Mfg_Year 0.88515922 -0.983661157 -0.057415518 1.000000000 -0.504974450 0.16469687 0.103310169
KM -0.56996016 0.505672180 -0.020629897 -0.504974450 1.000000000 -0.33353795 -0.080502926
HP 0.31498983 -0.156622020 -0.039312420 0.164696875 -0.333537948 1.00000000 0.058711703
Met_Color 0.10890475 -0.108149585 0.030265828 0.103310169 -0.080502926 0.05871170 1.000000000
Automatic 0.03308069 0.031716772 0.009146095 -0.033566969 -0.081854083 0.01314403 -0.019335450
cc 0.12638920 -0.098083739 0.037386567 0.091891919 0.102682891 0.03585580 0.031812068
Doors 0.18532555 -0.148359215 -0.012068863 0.151441979 -0.036196614 0.09242450 0.085242826
Cylinders NA NA NA NA NA NA NA
Gears 0.06310386 -0.005363947 -0.013063024 0.007766049 0.015023328 0.20947715 0.018600646
Quarterly_Tax 0.21919691 -0.198430508 0.031372634 0.193933911 0.278164697 -0.29843172 0.011325559
Weight 0.58119759 -0.470253184 -0.002167494 0.473477930 -0.028598457 0.08961406 0.057928835
Mfr_Guarantee 0.19780199 -0.164658304 -0.005770789 0.166696657 -0.212850802 0.14002631 0.154849992
BOVAG_Guarantee 0.02813304 0.006864920 -0.003862534 -0.006205542 0.001437579 0.02270078 0.010783490
Guarantee_Period 0.14662661 -0.152562534 0.029009539 0.148218449 -0.138941822 0.07616270 0.009294540
ABS 0.30613784 -0.412887311 0.072532362 0.402215110 -0.177203384 0.05783181 0.022297508
Airbag_1 0.09358787 -0.105405927 0.003756160 0.105359191 -0.018012148 0.02513692 0.100054833
Airbag_2 0.24897390 -0.329017479 0.076749474 0.317075233 -0.139275138 0.01764356 0.038415730
Airco 0.42925943 -0.403600048 0.057088375 0.395673667 -0.133056812 0.24113427 0.114190482
Automatic_airco 0.58826200 -0.426259145 -0.049017448 0.437718185 -0.258221494 0.24495736 0.027977182
Boardcomputer 0.60129196 -0.719448710 0.017714587 0.720567067 -0.353862248 0.12971474 0.089885891
CD_Player 0.48137444 -0.510895187 -0.016735694 0.517007513 -0.266826156 0.10229988 0.198219962
Central_Lock 0.34345757 -0.279630639 0.010055129 0.279490247 -0.125177013 0.25012219 0.153307153
Powered_Windows 0.35651823 -0.283855826 0.025184550 0.280996200 -0.156241578 0.26559348 0.145147102
Power_Steering 0.06427537 -0.069191857 -0.055495374 0.079676069 0.007396622 0.04885045 0.086543865
Automatic cc Doors Cylinders Gears Quarterly_Tax
Price 0.033080694 0.1263891974 0.185325550 NA 0.063103857 0.219196911
Age_08_04 0.031716772 -0.0980837391 -0.148359215 NA -0.005363947 -0.198430508
Mfg_Month 0.009146095 0.0373865668 -0.012068863 NA -0.013063024 0.031372634
Mfg_Year -0.033566969 0.0918919186 0.151441979 NA 0.007766049 0.193933911
KM -0.081854083 0.1026828910 -0.036196614 NA 0.015023328 0.278164697
HP 0.013144031 0.0358558027 0.092424496 NA 0.209477146 -0.298431717
Met_Color -0.019335450 0.0318120676 0.085242826 NA 0.018600646 0.011325559
Automatic 1.000000000 0.0667403090 -0.027653817 NA -0.098555054 -0.055370791
cc 0.066740309 1.0000000000 0.079903296 NA 0.014629352 0.306995798
Doors -0.027653817 0.0799032965 1.000000000 NA -0.160141430 0.109363225
Cylinders NA NA NA 1 NA NA
Gears -0.098555054 0.0146293521 -0.160141430 NA 1.000000000 -0.005451955
Quarterly_Tax -0.055370791 0.3069957983 0.109363225 NA -0.005451955 1.000000000
Weight 0.057248510 0.3356373992 0.302617644 NA 0.020613284 0.626133733
Mfr_Guarantee 0.026193798 -0.0574065681 0.037689328 NA 0.010822444 -0.022150154
BOVAG_Guarantee 0.023393188 -0.0817250929 -0.014311384 NA 0.072123611 0.094330856
Guarantee_Period -0.002256229 -0.0176829456 0.053653901 NA -0.030677543 -0.163438345
ABS -0.016128172 0.0378055951 0.063732629 NA 0.086234835 0.080287748
Airbag_1 -0.011895357 0.0226780779 0.053827968 NA 0.002443573 0.082340476
Airbag_2 0.001171485 0.0247384005 0.021733788 NA 0.095209774 0.200172594
Airco -0.028352763 0.1198880518 0.170543978 NA 0.145489368 0.118225262
Automatic_airco 0.059056613 0.1626688293 0.054808873 NA 0.077791331 0.123125182
Boardcomputer -0.037068561 0.0093119139 0.089606069 NA -0.025889315 0.141534096
CD_Player -0.010967021 0.0577868309 0.094652527 NA -0.047466045 0.090868341
Central_Lock -0.002501825 0.0726344348 0.132091580 NA 0.126963598 0.032084382
Powered_Windows -0.005863832 0.0552988754 0.107625618 NA 0.131423084 0.003826656
Power_Steering -0.004469112 0.0329326048 0.059791778 NA 0.021200029 0.047956107
Weight Mfr_Guarantee BOVAG_Guarantee Guarantee_Period ABS Airbag_1
Price 0.581197589 0.197801991 0.028133044 0.146626614 0.30613784 0.0935878711
Age_08_04 -0.470253184 -0.164658304 0.006864920 -0.152562534 -0.41288731 -0.1054059274
Mfg_Month -0.002167494 -0.005770789 -0.003862534 0.029009539 0.07253236 0.0037561596
Mfg_Year 0.473477930 0.166696657 -0.006205542 0.148218449 0.40221511 0.1053591914
KM -0.028598457 -0.212850802 0.001437579 -0.138941822 -0.17720338 -0.0180121479
HP 0.089614059 0.140026308 0.022700778 0.076162697 0.05783181 0.0251369203
Met_Color 0.057928835 0.154849992 0.010783490 0.009294540 0.02229751 0.1000548326
Automatic 0.057248510 0.026193798 0.023393188 -0.002256229 -0.01612817 -0.0118953566
cc 0.335637399 -0.057406568 -0.081725093 -0.017682946 0.03780560 0.0226780779
Doors 0.302617644 0.037689328 -0.014311384 0.053653901 0.06373263 0.0538279678
Cylinders NA NA NA NA NA NA
Gears 0.020613284 0.010822444 0.072123611 -0.030677543 0.08623483 0.0024435733
Quarterly_Tax 0.626133733 -0.022150154 0.094330856 -0.163438345 0.08028775 0.0823404757
Weight 1.000000000 -0.008564636 -0.056076982 -0.012913462 0.10261557 0.0301817619
Mfr_Guarantee -0.008564636 1.000000000 0.233458589 -0.098563061 0.11899626 0.0520891638
BOVAG_Guarantee -0.056076982 0.233458589 1.000000000 -0.300062902 0.13444124 0.2244788766
Guarantee_Period -0.012913462 -0.098563061 -0.300062902 1.000000000 -0.06083998 -0.1424531130
ABS 0.102615574 0.118996262 0.134441245 -0.060839975 1.00000000 0.2775065414
Airbag_1 0.030181762 0.052089164 0.224478877 -0.142453113 0.27750654 1.0000000000
Airbag_2 0.078494306 0.202394935 0.287030993 -0.322769418 0.66176551 0.2803176212
Airco 0.310061953 0.051233618 0.005708771 0.026246244 0.22609484 0.0938356338
Automatic_airco 0.430478501 0.072634969 -0.015188469 -0.039162519 0.11711670 0.0424390956
Boardcomputer 0.274324106 0.198185513 0.115804372 -0.056305124 0.30953619 0.1121653509
CD_Player 0.247056066 0.155637001 0.059487871 -0.003948487 0.19286592 0.0718280455
Central_Lock 0.234644220 0.039915427 -0.023008433 0.058934226 0.09945414 0.1202756810
Powered_Windows 0.213356016 0.041550927 -0.012405873 0.040533587 0.09946502 0.1216410915
Power_Steering 0.047848786 0.029771454 0.164391857 -0.118973564 0.25462559 0.5617703725
Airbag_2 Airco Automatic_airco Boardcomputer CD_Player Central_Lock
Price 0.248973897 0.429259430 0.58826200 0.6012919565 0.4813744379 0.343457572
Age_08_04 -0.329017479 -0.403600048 -0.42625915 -0.7194487099 -0.5108951869 -0.279630639
Mfg_Month 0.076749474 0.057088375 -0.04901745 0.0177145869 -0.0167356937 0.010055129
Mfg_Year 0.317075233 0.395673667 0.43771818 0.7205670674 0.5170075127 0.279490247
KM -0.139275138 -0.133056812 -0.25822149 -0.3538622479 -0.2668261563 -0.125177013
HP 0.017643556 0.241134272 0.24495736 0.1297147413 0.1022998776 0.250122190
Met_Color 0.038415730 0.114190482 0.02797718 0.0898858909 0.1982199624 0.153307153
Automatic 0.001171485 -0.028352763 0.05905661 -0.0370685613 -0.0109670214 -0.002501825
cc 0.024738401 0.119888052 0.16266883 0.0093119139 0.0577868309 0.072634435
Doors 0.021733788 0.170543978 0.05480887 0.0896060689 0.0946525275 0.132091580
Cylinders NA NA NA NA NA NA
Gears 0.095209774 0.145489368 0.07779133 -0.0258893149 -0.0474660452 0.126963598
Quarterly_Tax 0.200172594 0.118225262 0.12312518 0.1415340959 0.0908683412 0.032084382
Weight 0.078494306 0.310061953 0.43047850 0.2743241058 0.2470560657 0.234644220
Mfr_Guarantee 0.202394935 0.051233618 0.07263497 0.1981855134 0.1556370005 0.039915427
BOVAG_Guarantee 0.287030993 0.005708771 -0.01518847 0.1158043718 0.0594878710 -0.023008433
Guarantee_Period -0.322769418 0.026246244 -0.03916252 -0.0563051243 -0.0039484870 0.058934226
ABS 0.661765511 0.226094835 0.11711670 0.3095361928 0.1928659238 0.099454144
Airbag_1 0.280317621 0.093835634 0.04243910 0.1121653509 0.0718280455 0.120275681
Airbag_2 1.000000000 0.184626833 0.09070263 0.3762454725 0.2372386822 0.024819190
Airco 0.184626833 1.000000000 0.24044391 0.2932442060 0.2573869774 0.540588387
Automatic_airco 0.090702625 0.240443910 1.00000000 0.2724152210 0.2503957737 0.195790059
Boardcomputer 0.376245473 0.293244206 0.27241522 1.0000000000 0.4897251626 0.203125940
CD_Player 0.237238682 0.257386977 0.25039577 0.4897251626 1.0000000000 0.194075729
Central_Lock 0.024819190 0.540588387 0.19579006 0.2031259404 0.1940757289 1.000000000
Powered_Windows 0.049129417 0.543981749 0.20368691 0.2133274783 0.1953859749 0.875552474
Power_Steering 0.212187253 0.096893133 0.03691172 0.0872071153 0.0684516980 0.129646282
Powered_Windows Power_Steering Radio Mistlamps Sport_Model Backseat_Divider
Price 0.3565182258 0.064275368 -0.0418873522 0.222082519 0.1641209622 0.10256915
Age_08_04 -0.2838558256 -0.069191857 0.0137914024 -0.126894569 -0.1109883188 -0.11675106
Mfg_Month 0.0251845496 -0.055495374 0.0316014980 -0.033503771 0.0527892029 0.02324544
Mfg_Year 0.2809961995 0.079676069 -0.0196073695 0.133736662 0.1020799610 0.11323703
KM -0.1562415784 0.007396622 0.0136611034 -0.074326655 -0.0447838761 -0.04565758
HP 0.2655934848 0.048850452 0.0209981381 0.210571265 -0.0060266503 0.01090798
Met_Color 0.1451471025 0.086543865 0.0727564442 0.023821349 0.0037788003 0.03774104
Automatic -0.0058638318 -0.004469112 -0.0146002437 0.003077421 0.0131753360 -0.01887627
cc 0.0552988754 0.032932605 -0.0003610891 0.017326122 -0.0351951669 -0.05571083
Doors 0.1076256182 0.059791778 -0.0083180738 0.064704827 -0.1298805666 -0.02254186
Cylinders NA NA NA NA NA NA
Gears 0.1314230843 0.021200029 0.0150902447 0.238788846 0.1741171356 0.07670513
Quarterly_Tax 0.0038266562 0.047956107 -0.0318162601 0.024024007 0.0675251905 0.19841873
Weight 0.2133560160 0.047848786 -0.0384073722 0.135235745 0.1259738904 0.03644617
Mfr_Guarantee 0.0415509270 0.029771454 -0.0520575752 0.083957782 0.0541294147 0.25624925
BOVAG_Guarantee -0.0124058729 0.164391857 -0.0390748594 0.117472133 0.1739778084 0.45746808
Guarantee_Period 0.0405335874 -0.118973564 0.1988855994 -0.118021352 -0.1728737542 -0.48442685
ABS 0.0994650204 0.254625594 -0.0546698668 0.179432532 0.2005960321 0.25665879
Airbag_1 0.1216410915 0.561770372 -0.0100345937 0.092618058 0.1136707859 0.30794725
Airbag_2 0.0491294169 0.212187253 -0.2236631719 0.228843121 0.3002735426 0.58998743
Airco 0.5439817486 0.096893133 -0.0305710644 0.466750824 0.0027302616 0.10514887
Automatic_airco 0.2036869067 0.036911721 -0.0926473776 0.312140517 0.2152873511 0.01875632
Boardcomputer 0.2133274783 0.087207115 -0.1290936057 0.147900225 0.0368018520 0.28761493
CD_Player 0.1953859749 0.068451698 -0.1855677207 0.124589102 0.0579195080 0.14880586
Central_Lock 0.8755524744 0.129646282 -0.0112509667 0.487425901 -0.0031278972 0.05844882
Powered_Windows 1.0000000000 0.123457779 -0.0358111799 0.496696773 -0.0006504532 0.07824375
Power_Steering 0.1234577794 1.000000000 0.0090747860 0.077984550 0.0988660173 0.26516943
Metallic_Rim Radio_cassette Tow_Bar Fuel_Type_CNG Fuel_Type_Diesel Fuel_Type_Petrol
Price 0.10856398 -0.043178988 -0.172368602 -0.0395362811 0.0540842283 -0.0385164095
Age_08_04 -0.04004538 0.012857260 0.188719528 0.0023892110 -0.0977404115 0.0926105871
Mfg_Month 0.02350647 0.032576240 -0.042169981 0.0012890926 0.0515007047 -0.0496464779
Mfg_Year 0.03602212 -0.018844434 -0.182205679 -0.0026374554 0.0889860345 -0.0841617029
KM -0.01359877 0.015770423 0.084153196 0.1440158195 0.4030599249 -0.4331596083
HP 0.20678416 0.019919200 0.068271274 0.0621088646 -0.5334531033 0.4891102880
Met_Color 0.05382944 0.071529685 0.148536237 0.0210086689 -0.0124203059 0.0048715287
Automatic -0.07809465 -0.014149521 0.018785917 0.0014856725 -0.0844902184 0.0802489005
cc 0.00323562 -0.000469534 0.002724607 0.0059408788 0.3277228757 -0.3151700204
Doors -0.03955479 -0.008265241 0.102291795 0.0096796305 0.0254947130 -0.0275885443
Cylinders NA NA NA NA NA NA
Gears 0.29507704 0.015397437 -0.029356649 -0.0495366325 -0.0488468198 0.0631816471
Quarterly_Tax -0.01196455 -0.031008861 -0.004987518 0.2337906099 0.7927262329 -0.8354516723
Weight 0.05384674 -0.037265380 -0.074931667 0.0527564752 0.5680868663 -0.5604702648
Mfr_Guarantee 0.02672792 -0.054532147 -0.023328109 -0.0125827868 -0.1527413987 0.1501599491
BOVAG_Guarantee 0.06043430 -0.039826683 -0.006718221 -0.0257713458 -0.0499624245 0.0563315074
Guarantee_Period -0.04400287 0.193910054 0.008590048 -0.0104018096 -0.0651612322 0.0657367438
ABS 0.07915186 -0.055724076 -0.065976495 -0.0301966909 0.0341430554 -0.0225705022
Airbag_1 0.05734470 -0.022116573 0.052311749 -0.0574221897 0.0204230626 -0.0003899949
Airbag_2 0.07512782 -0.220917181 -0.063955554 -0.0185317254 0.0198530039 -0.0127997246
Airco 0.23316610 -0.036523590 -0.024361999 0.0174886051 0.0323421048 -0.0367335003
Automatic_airco 0.10036278 -0.092348095 -0.117966859 -0.0267612333 0.0511371894 -0.0399554476
Boardcomputer -0.02499875 -0.128073275 -0.128001147 -0.0001081937 0.0213757390 -0.0203918636
CD_Player 0.04473025 -0.189668131 -0.079910736 0.0355575671 0.0005822628 -0.0124007590
Central_Lock 0.28133382 -0.016954462 -0.007727796 0.0018078245 -0.0496221000 0.0468195828
Powered_Windows 0.29141941 -0.037625504 -0.013251510 -0.0071843814 -0.0909472155 0.0893076003
Power_Steering 0.05321656 -0.004583647 0.030452502 0.0165243619 0.0221070879 -0.0266311292
[ reached getOption("max.print") -- omitted 10 rows ]
imcdiag(select(data_set, -c("Price")), data_set$Price)
Call:
imcdiag(x = select(data_set, -c("Price")), y = data_set$Price)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
Age_08_04 Inf 0.0000 Inf Inf 0.0000 -Inf 1
Mfg_Month Inf 0.0000 Inf Inf 0.0000 -Inf 1
Mfg_Year Inf 0.0000 Inf Inf 0.0000 -Inf 1
KM 2.0862 0.4793 43.4499 44.7598 0.6923 -0.0602 0
HP 2.5533 0.3916 62.1333 64.0064 0.6258 -0.0736 0
Met_Color 1.1438 0.8743 5.7511 5.9245 0.9350 -0.0330 0
Automatic 1.1161 0.8960 4.6423 4.7822 0.9466 -0.0322 0
cc 1.2567 0.7957 10.2682 10.5777 0.8920 -0.0362 0
Doors 1.3512 0.7401 14.0483 14.4718 0.8603 -0.0390 0
Cylinders 2.0000 0.5000 39.9983 41.2041 NA -0.0577 0
Gears 1.2711 0.7867 10.8445 11.1715 0.8870 -0.0367 0
Quarterly_Tax 5.2040 0.1922 168.1594 173.2289 0.4384 -0.1501 0
Weight 4.2108 0.2375 128.4331 132.3049 0.4873 -0.1214 0
Mfr_Guarantee 1.2005 0.8330 8.0215 8.2634 0.9127 -0.0346 0
BOVAG_Guarantee 1.3737 0.7280 14.9466 15.3972 0.8532 -0.0396 0
Guarantee_Period 1.5401 0.6493 21.6051 22.2564 0.8058 -0.0444 0
ABS 2.2675 0.4410 50.6996 52.2280 0.6641 -0.0654 0
Airbag_1 1.6124 0.6202 24.4970 25.2355 0.7875 -0.0465 0
Airbag_2 3.1059 0.3220 84.2375 86.7770 0.5674 -0.0896 0
Airco 1.8382 0.5440 33.5291 34.5399 0.7376 -0.0530 0
Automatic_airco 1.7455 0.5729 29.8215 30.7206 0.7569 -0.0503 0
Boardcomputer 2.6347 0.3796 65.3876 67.3588 0.6161 -0.0760 0
CD_Player 1.5582 0.6418 22.3288 23.0020 0.8011 -0.0449 0
Central_Lock 4.5916 0.2178 143.6625 147.9935 0.4667 -0.1324 0
Powered_Windows 4.6403 0.2155 145.6129 150.0027 0.4642 -0.1338 0
Power_Steering 1.5828 0.6318 23.3118 24.0146 0.7949 -0.0456 0
Radio 62.3145 0.0160 2452.5793 2526.5173 0.1267 -1.7969 1
Mistlamps 2.0764 0.4816 43.0577 44.3558 0.6940 -0.0599 0
Sport_Model 1.4939 0.6694 19.7557 20.3513 0.8182 -0.0431 0
Backseat_Divider 2.6918 0.3715 67.6718 69.7119 0.6095 -0.0776 0
Metallic_Rim 1.3439 0.7441 13.7553 14.1700 0.8626 -0.0388 0
Radio_cassette 62.1305 0.0161 2445.2194 2518.9356 0.1269 -1.7916 1
Tow_Bar 1.1468 0.8720 5.8734 6.0505 0.9338 -0.0331 0
Fuel_Type_CNG Inf 0.0000 Inf Inf 0.0000 -Inf 1
Fuel_Type_Diesel Inf 0.0000 Inf Inf 0.0000 -Inf 1
Fuel_Type_Petrol Inf 0.0000 Inf Inf 0.0000 -Inf 1
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
HP , Automatic , cc , Doors , Guarantee_Period , ABS , Boardcomputer , Central_Lock , Powered_Windows , Power_Steering , Backseat_Divider , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.9087
* use method argument to check which regressors may be the reason of collinearity
===================================
linearMod <- lm(formula = Price ~ ., data=data_set)
summary(linearMod)
Call:
lm(formula = Price ~ ., data = data_set)
Residuals:
Min 1Q Median 3Q Max
-8047.3 -645.4 -35.0 641.4 5528.8
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.668e+03 1.445e+03 1.847 0.065014 .
Age_08_04 -1.190e+02 3.475e+00 -34.236 < 2e-16 ***
Mfg_Month -9.553e+01 9.079e+00 -10.522 < 2e-16 ***
Mfg_Year NA NA NA NA
KM -1.638e-02 1.127e-03 -14.528 < 2e-16 ***
HP 1.910e+01 3.122e+00 6.119 1.22e-09 ***
Met_Color -1.686e+00 6.680e+01 -0.025 0.979865
Automatic 3.460e+02 1.348e+02 2.567 0.010354 *
cc -1.136e-01 7.731e-02 -1.470 0.141829
Doors 3.779e+01 3.571e+01 1.058 0.290110
Cylinders NA NA NA NA
Gears 1.539e+02 1.750e+02 0.879 0.379433
Quarterly_Tax 1.442e+01 1.623e+00 8.882 < 2e-16 ***
Weight 1.095e+01 1.141e+00 9.598 < 2e-16 ***
Mfr_Guarantee 2.258e+02 6.519e+01 3.464 0.000548 ***
BOVAG_Guarantee 4.914e+02 1.121e+02 4.383 1.26e-05 ***
Guarantee_Period 6.602e+01 1.206e+01 5.473 5.25e-08 ***
ABS -2.686e+02 1.131e+02 -2.376 0.017650 *
Airbag_1 1.187e+02 2.205e+02 0.538 0.590485
Airbag_2 -7.726e+01 1.152e+02 -0.671 0.502527
Airco 1.984e+02 7.935e+01 2.501 0.012499 *
Automatic_airco 2.441e+03 1.676e+02 14.571 < 2e-16 ***
Boardcomputer -2.670e+02 1.042e+02 -2.563 0.010474 *
CD_Player 2.091e+02 8.836e+01 2.367 0.018087 *
Central_Lock -8.865e+01 1.270e+02 -0.698 0.485364
Powered_Windows 4.189e+02 1.270e+02 3.297 0.001000 ***
Power_Steering -4.368e+01 2.494e+02 -0.175 0.860981
Radio 5.382e+02 6.536e+02 0.823 0.410384
Mistlamps -5.027e+01 9.649e+01 -0.521 0.602447
Sport_Model 3.046e+02 7.803e+01 3.904 9.92e-05 ***
Backseat_Divider -2.649e+02 1.141e+02 -2.321 0.020410 *
Metallic_Rim 2.060e+02 8.406e+01 2.451 0.014368 *
Radio_cassette -6.395e+02 6.540e+02 -0.978 0.328314
Tow_Bar -2.035e+02 6.995e+01 -2.909 0.003677 **
Fuel_Type_CNG -2.122e+03 3.299e+02 -6.431 1.74e-10 ***
Fuel_Type_Diesel -1.163e+03 2.665e+02 -4.364 1.37e-05 ***
Fuel_Type_Petrol NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1109 on 1402 degrees of freedom
Multiple R-squared: 0.9087, Adjusted R-squared: 0.9066
F-statistic: 422.9 on 33 and 1402 DF, p-value: < 2.2e-16
A partir de este modelo de regresión lineal, obtengo las siguientes consideraciones:
* Las variables con un alto T Value (posibles variables más significativas) son: Age_08_04, Mfg_Month, KM, Weight, Automatic_airco, entre otras.
* Algunas variables presentan un T Value muy cercano a cero, posiblemente no son muy relevantes para el modelo: Met_Color, Doors, Gears, Airbag_1, Airbag_2, Central_Lock, Power_Steering, Mistlamps, Radio_cassette.
* El precio base de un vehiculo es de 2.668e+03.
* El modelo tiene una exactitud de 0.90, el cual es un buen valor. Sin embargo, existen problemas de colinealidad.
* Es necesario tomar acciones sobre Mfg_Year y Fuel_Type_Petrol por los problemas de colinealidad.
par(mfrow=c(2,2))
plot(linearMod)
A partir de este modelo de regresión lineal, observo lo siguiente:
residuos = resid(linearMod)
hist(residuos, col="blue", breaks = 60, freq = F)
lines(density(residuos), col = "red", lwd=2)
rug(residuos)
Los residuos del modelo de regresión lineal aplicado sobre el dataSet original, sin consideración de los valores atípicos, presenta una distribución similar a una distribución normal, con un sesgo hacia la izquierda.
El modelo de regresión lineal aplicado sobre el dataSet original, presenta una exactitud de 0.90. Sin embargo, el conjunto de datos original presenta problemas de colinealidad.
data_set_1 <- select(data_set, -c("Mfg_Month",
"Cylinders",
"CD_Player", "Mfg_Year", "Met_Color", "Doors", "Gears", "Airbag_1", "Airbag_2", "Central_Lock", "Power_Steering", "Mistlamps", "Radio_cassette", "Fuel_Type_Diesel"))
summary(data_set_1)
Price Age_08_04 KM HP Automatic cc
Min. : 4350 Min. : 1.00 Min. : 1 Min. : 69.0 Min. :0.00000 Min. : 1300
1st Qu.: 8450 1st Qu.:44.00 1st Qu.: 43000 1st Qu.: 90.0 1st Qu.:0.00000 1st Qu.: 1400
Median : 9900 Median :61.00 Median : 63390 Median :110.0 Median :0.00000 Median : 1600
Mean :10731 Mean :55.95 Mean : 68533 Mean :101.5 Mean :0.05571 Mean : 1577
3rd Qu.:11950 3rd Qu.:70.00 3rd Qu.: 87021 3rd Qu.:110.0 3rd Qu.:0.00000 3rd Qu.: 1600
Max. :32500 Max. :80.00 Max. :243000 Max. :192.0 Max. :1.00000 Max. :16000
Quarterly_Tax Weight Mfr_Guarantee BOVAG_Guarantee Guarantee_Period ABS
Min. : 19.00 Min. :1000 Min. :0.0000 Min. :0.0000 Min. : 3.000 Min. :0.0000
1st Qu.: 69.00 1st Qu.:1040 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.: 3.000 1st Qu.:1.0000
Median : 85.00 Median :1070 Median :0.0000 Median :1.0000 Median : 3.000 Median :1.0000
Mean : 87.12 Mean :1072 Mean :0.4095 Mean :0.8955 Mean : 3.815 Mean :0.8134
3rd Qu.: 85.00 3rd Qu.:1085 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 3.000 3rd Qu.:1.0000
Max. :283.00 Max. :1615 Max. :1.0000 Max. :1.0000 Max. :36.000 Max. :1.0000
Airco Automatic_airco Boardcomputer Powered_Windows Radio Sport_Model
Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.00000 Median :0.0000 Median :1.000 Median :0.0000 Median :0.0000
Mean :0.5084 Mean :0.05641 Mean :0.2946 Mean :0.562 Mean :0.1462 Mean :0.3001
3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
Backseat_Divider Metallic_Rim Tow_Bar Fuel_Type_CNG Fuel_Type_Petrol
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.0000
Median :1.0000 Median :0.0000 Median :0.0000 Median :0.00000 Median :1.0000
Mean :0.7702 Mean :0.2047 Mean :0.2779 Mean :0.01184 Mean :0.8802
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
corrplot(cor(data_set_1), type="upper", method="pie")
cor(data_set_1)
Price Age_08_04 KM HP Automatic cc
Price 1.00000000 -0.876590497 -0.569960165 0.31498983 0.033080694 0.1263891974
Age_08_04 -0.87659050 1.000000000 0.505672180 -0.15662202 0.031716772 -0.0980837391
KM -0.56996016 0.505672180 1.000000000 -0.33353795 -0.081854083 0.1026828910
HP 0.31498983 -0.156622020 -0.333537948 1.00000000 0.013144031 0.0358558027
Automatic 0.03308069 0.031716772 -0.081854083 0.01314403 1.000000000 0.0667403090
cc 0.12638920 -0.098083739 0.102682891 0.03585580 0.066740309 1.0000000000
Quarterly_Tax 0.21919691 -0.198430508 0.278164697 -0.29843172 -0.055370791 0.3069957983
Weight 0.58119759 -0.470253184 -0.028598457 0.08961406 0.057248510 0.3356373992
Mfr_Guarantee 0.19780199 -0.164658304 -0.212850802 0.14002631 0.026193798 -0.0574065681
BOVAG_Guarantee 0.02813304 0.006864920 0.001437579 0.02270078 0.023393188 -0.0817250929
Guarantee_Period 0.14662661 -0.152562534 -0.138941822 0.07616270 -0.002256229 -0.0176829456
ABS 0.30613784 -0.412887311 -0.177203384 0.05783181 -0.016128172 0.0378055951
Airco 0.42925943 -0.403600048 -0.133056812 0.24113427 -0.028352763 0.1198880518
Automatic_airco 0.58826200 -0.426259145 -0.258221494 0.24495736 0.059056613 0.1626688293
Boardcomputer 0.60129196 -0.719448710 -0.353862248 0.12971474 -0.037068561 0.0093119139
Powered_Windows 0.35651823 -0.283855826 -0.156241578 0.26559348 -0.005863832 0.0552988754
Radio -0.04188735 0.013791402 0.013661103 0.02099814 -0.014600244 -0.0003610891
Sport_Model 0.16412096 -0.110988319 -0.044783876 -0.00602665 0.013175336 -0.0351951669
Backseat_Divider 0.10256915 -0.116751059 -0.045657583 0.01090798 -0.018876271 -0.0557108268
Metallic_Rim 0.10856398 -0.040045378 -0.013598770 0.20678416 -0.078094651 0.0032356199
Tow_Bar -0.17236860 0.188719528 0.084153196 0.06827127 0.018785917 0.0027246074
Fuel_Type_CNG -0.03953628 0.002389211 0.144015820 0.06210886 0.001485673 0.0059408788
Fuel_Type_Petrol -0.03851641 0.092610587 -0.433159608 0.48911029 0.080248901 -0.3151700204
Quarterly_Tax Weight Mfr_Guarantee BOVAG_Guarantee Guarantee_Period ABS
Price 0.219196911 0.581197589 0.197801991 0.028133044 0.146626614 0.30613784
Age_08_04 -0.198430508 -0.470253184 -0.164658304 0.006864920 -0.152562534 -0.41288731
KM 0.278164697 -0.028598457 -0.212850802 0.001437579 -0.138941822 -0.17720338
HP -0.298431717 0.089614059 0.140026308 0.022700778 0.076162697 0.05783181
Automatic -0.055370791 0.057248510 0.026193798 0.023393188 -0.002256229 -0.01612817
cc 0.306995798 0.335637399 -0.057406568 -0.081725093 -0.017682946 0.03780560
Quarterly_Tax 1.000000000 0.626133733 -0.022150154 0.094330856 -0.163438345 0.08028775
Weight 0.626133733 1.000000000 -0.008564636 -0.056076982 -0.012913462 0.10261557
Mfr_Guarantee -0.022150154 -0.008564636 1.000000000 0.233458589 -0.098563061 0.11899626
BOVAG_Guarantee 0.094330856 -0.056076982 0.233458589 1.000000000 -0.300062902 0.13444124
Guarantee_Period -0.163438345 -0.012913462 -0.098563061 -0.300062902 1.000000000 -0.06083998
ABS 0.080287748 0.102615574 0.118996262 0.134441245 -0.060839975 1.00000000
Airco 0.118225262 0.310061953 0.051233618 0.005708771 0.026246244 0.22609484
Automatic_airco 0.123125182 0.430478501 0.072634969 -0.015188469 -0.039162519 0.11711670
Boardcomputer 0.141534096 0.274324106 0.198185513 0.115804372 -0.056305124 0.30953619
Powered_Windows 0.003826656 0.213356016 0.041550927 -0.012405873 0.040533587 0.09946502
Radio -0.031816260 -0.038407372 -0.052057575 -0.039074859 0.198885599 -0.05466987
Sport_Model 0.067525190 0.125973890 0.054129415 0.173977808 -0.172873754 0.20059603
Backseat_Divider 0.198418730 0.036446167 0.256249254 0.457468081 -0.484426852 0.25665879
Metallic_Rim -0.011964548 0.053846741 0.026727922 0.060434301 -0.044002874 0.07915186
Tow_Bar -0.004987518 -0.074931667 -0.023328109 -0.006718221 0.008590048 -0.06597649
Fuel_Type_CNG 0.233790610 0.052756475 -0.012582787 -0.025771346 -0.010401810 -0.03019669
Fuel_Type_Petrol -0.835451672 -0.560470265 0.150159949 0.056331507 0.065736744 -0.02257050
Airco Automatic_airco Boardcomputer Powered_Windows Radio Sport_Model
Price 0.429259430 0.58826200 0.6012919565 0.3565182258 -0.0418873522 0.1641209622
Age_08_04 -0.403600048 -0.42625915 -0.7194487099 -0.2838558256 0.0137914024 -0.1109883188
KM -0.133056812 -0.25822149 -0.3538622479 -0.1562415784 0.0136611034 -0.0447838761
HP 0.241134272 0.24495736 0.1297147413 0.2655934848 0.0209981381 -0.0060266503
Automatic -0.028352763 0.05905661 -0.0370685613 -0.0058638318 -0.0146002437 0.0131753360
cc 0.119888052 0.16266883 0.0093119139 0.0552988754 -0.0003610891 -0.0351951669
Quarterly_Tax 0.118225262 0.12312518 0.1415340959 0.0038266562 -0.0318162601 0.0675251905
Weight 0.310061953 0.43047850 0.2743241058 0.2133560160 -0.0384073722 0.1259738904
Mfr_Guarantee 0.051233618 0.07263497 0.1981855134 0.0415509270 -0.0520575752 0.0541294147
BOVAG_Guarantee 0.005708771 -0.01518847 0.1158043718 -0.0124058729 -0.0390748594 0.1739778084
Guarantee_Period 0.026246244 -0.03916252 -0.0563051243 0.0405335874 0.1988855994 -0.1728737542
ABS 0.226094835 0.11711670 0.3095361928 0.0994650204 -0.0546698668 0.2005960321
Airco 1.000000000 0.24044391 0.2932442060 0.5439817486 -0.0305710644 0.0027302616
Automatic_airco 0.240443910 1.00000000 0.2724152210 0.2036869067 -0.0926473776 0.2152873511
Boardcomputer 0.293244206 0.27241522 1.0000000000 0.2133274783 -0.1290936057 0.0368018520
Powered_Windows 0.543981749 0.20368691 0.2133274783 1.0000000000 -0.0358111799 -0.0006504532
Radio -0.030571064 -0.09264738 -0.1290936057 -0.0358111799 1.0000000000 -0.1377287584
Sport_Model 0.002730262 0.21528735 0.0368018520 -0.0006504532 -0.1377287584 1.0000000000
Backseat_Divider 0.105148873 0.01875632 0.2876149262 0.0782437522 -0.2095898260 0.3577132365
Metallic_Rim 0.233166104 0.10036278 -0.0249987520 0.2914194137 -0.0341621429 0.0518101200
Tow_Bar -0.024361999 -0.11796686 -0.1280011466 -0.0132515099 0.1436522639 -0.0941471363
Fuel_Type_CNG 0.017488605 -0.02676123 -0.0001081937 -0.0071843814 0.0093645265 -0.0576304085
Fuel_Type_Petrol -0.036733500 -0.03995545 -0.0203918636 0.0893076003 -0.0051397374 0.0309964827
Backseat_Divider Metallic_Rim Tow_Bar Fuel_Type_CNG Fuel_Type_Petrol
Price 0.10256915 0.10856398 -0.172368602 -0.0395362811 -0.038516409
Age_08_04 -0.11675106 -0.04004538 0.188719528 0.0023892110 0.092610587
KM -0.04565758 -0.01359877 0.084153196 0.1440158195 -0.433159608
HP 0.01090798 0.20678416 0.068271274 0.0621088646 0.489110288
Automatic -0.01887627 -0.07809465 0.018785917 0.0014856725 0.080248901
cc -0.05571083 0.00323562 0.002724607 0.0059408788 -0.315170020
Quarterly_Tax 0.19841873 -0.01196455 -0.004987518 0.2337906099 -0.835451672
Weight 0.03644617 0.05384674 -0.074931667 0.0527564752 -0.560470265
Mfr_Guarantee 0.25624925 0.02672792 -0.023328109 -0.0125827868 0.150159949
BOVAG_Guarantee 0.45746808 0.06043430 -0.006718221 -0.0257713458 0.056331507
Guarantee_Period -0.48442685 -0.04400287 0.008590048 -0.0104018096 0.065736744
ABS 0.25665879 0.07915186 -0.065976495 -0.0301966909 -0.022570502
Airco 0.10514887 0.23316610 -0.024361999 0.0174886051 -0.036733500
Automatic_airco 0.01875632 0.10036278 -0.117966859 -0.0267612333 -0.039955448
Boardcomputer 0.28761493 -0.02499875 -0.128001147 -0.0001081937 -0.020391864
Powered_Windows 0.07824375 0.29141941 -0.013251510 -0.0071843814 0.089307600
Radio -0.20958983 -0.03416214 0.143652264 0.0093645265 -0.005139737
Sport_Model 0.35771324 0.05181012 -0.094147136 -0.0576304085 0.030996483
Backseat_Divider 1.00000000 0.10486232 -0.049175675 -0.0473400119 0.053391920
Metallic_Rim 0.10486232 1.00000000 -0.037330986 0.0402018637 0.059605905
Tow_Bar -0.04917567 -0.03733099 1.000000000 0.0327207640 0.042090434
Fuel_Type_CNG -0.04734001 0.04020186 0.032720764 1.0000000000 -0.296717101
Fuel_Type_Petrol 0.05339192 0.05960590 0.042090434 -0.2967171005 1.000000000
imcdiag(select(data_set_1, -c("Price")), data_set_1$Price)
Call:
imcdiag(x = select(data_set_1, -c("Price")), y = data_set_1$Price)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
Age_08_04 4.3590 0.2294 226.1754 237.6521 0.4790 -0.2340 0
KM 2.0563 0.4863 71.1230 74.7319 0.6974 -0.1104 0
HP 2.3954 0.4175 93.9552 98.7227 0.6461 -0.1286 0
Automatic 1.0877 0.9194 5.9040 6.2035 0.9588 -0.0584 0
cc 1.2468 0.8021 16.6180 17.4612 0.8956 -0.0669 0
Quarterly_Tax 4.9128 0.2036 263.4604 276.8290 0.4512 -0.2637 0
Weight 3.5150 0.2845 169.3437 177.9366 0.5334 -0.1887 0
Mfr_Guarantee 1.1665 0.8573 11.2120 11.7810 0.9259 -0.0626 0
BOVAG_Guarantee 1.3502 0.7406 23.5826 24.7792 0.8606 -0.0725 0
Guarantee_Period 1.5054 0.6643 34.0311 35.7579 0.8150 -0.0808 0
ABS 1.3685 0.7307 24.8104 26.0694 0.8548 -0.0734 0
Airco 1.6779 0.5960 45.6446 47.9607 0.7720 -0.0901 0
Automatic_airco 1.5320 0.6527 35.8214 37.6391 0.8079 -0.0822 0
Boardcomputer 2.5231 0.3963 102.5527 107.7565 0.6296 -0.1354 0
Powered_Windows 1.5612 0.6405 37.7894 39.7069 0.8003 -0.0838 0
Radio 1.1215 0.8917 8.1779 8.5929 0.9443 -0.0602 0
Sport_Model 1.3315 0.7510 22.3216 23.4542 0.8666 -0.0715 0
Backseat_Divider 2.2185 0.4508 82.0467 86.2099 0.6714 -0.1191 0
Metallic_Rim 1.1816 0.8463 12.2252 12.8455 0.9200 -0.0634 0
Tow_Bar 1.0956 0.9128 6.4349 6.7614 0.9554 -0.0588 0
Fuel_Type_CNG 1.3667 0.7317 24.6919 25.9448 0.8554 -0.0734 0
Fuel_Type_Petrol 7.8849 0.1268 463.5805 487.1037 0.3561 -0.4232 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
Automatic , cc , Boardcomputer , Radio , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.9005
* use method argument to check which regressors may be the reason of collinearity
===================================
Tras la selección de atributos, aparentemente se resolvio el problema de colinealidad sobre el dataSet 1.
par(mfrow=c(1,2))
boxplot(data_set_1$Price, main="price")
boxplot(data_set_1$KM, main="KM")
par(mfrow=c(1,2))
boxplot(data_set_1$Weight, main="Weight")
boxplot(data_set_1$HP, main="HP")
El atributo CC presenta un outlier(valor atípico) de CC = 16000. No es un valor coherente con el contexto de un vehiculo Toyota Corolla. Considero que probablemente fue un error y supongo que se agrego un cero de más, siendo el valor correcto 1600.
El atributo Guarantee_Period presenta un outlier de Guarantee_Period = 13.Considero que probablemente fue un error y decido imputar el valor 12.
data_set_1[which(data_set_1$cc == 16000), "cc"] <- 1600
data_set_1[which(data_set_1$Guarantee_Period == 13), "Guarantee_Period"] <- 12
data_set_1 = data_set_1 %>% filter((data_set_1$KM > 10000 & data_set_1$KM < 120000))
data_set_1 = data_set_1 %>% filter(data_set_1$Weight < 1150)
data_set_1 = data_set_1 %>% filter(data_set_1$Age_08_04 > 25)
data_set_1 = data_set_1 %>% filter(data_set_1$HP < 120 & data_set_1$HP > 80)
data_set_1 = data_set_1 %>% filter(data_set_1$Price > 6500 & data_set_1$Price < 14500)
par(mfrow=c(1,2))
boxplot(data_set_1$Price, main="price")
boxplot(data_set_1$KM, main="KM")
par(mfrow=c(1,2))
boxplot(data_set_1$Weight, main="Weight")
boxplot(data_set_1$HP, main="HP")
corrplot(cor(data_set_1), type="upper", method="pie")
cor(data_set_1)
Price Age_08_04 KM HP Automatic cc
Price 1.000000000 -0.81086438 -0.435235606 0.239783342 0.0237901939 0.210977921
Age_08_04 -0.810864381 1.00000000 0.397917080 -0.126926355 0.0712622678 -0.083829576
KM -0.435235606 0.39791708 1.000000000 0.028951107 -0.0544686728 0.049562785
HP 0.239783342 -0.12692636 0.028951107 1.000000000 -0.0650888935 0.994410006
Automatic 0.023790194 0.07126227 -0.054468673 -0.065088893 1.0000000000 -0.033880839
cc 0.210977921 -0.08382958 0.049562785 0.994410006 -0.0338808390 1.000000000
Quarterly_Tax 0.173941104 -0.11326464 0.046061595 0.232331268 -0.0054740273 0.230926217
Weight 0.307188125 -0.12668928 0.024109991 0.680511130 0.1857467403 0.685818658
Mfr_Guarantee 0.209950366 -0.16307255 -0.048186377 0.035914774 0.0170620003 0.033505374
BOVAG_Guarantee 0.095236152 -0.05372062 -0.017835157 -0.005639483 0.0138840169 -0.009814461
Guarantee_Period 0.074821866 -0.02608321 -0.028775942 0.021439871 0.0282380708 0.019553773
ABS 0.342044231 -0.43348939 -0.119286718 0.068129936 -0.0196186920 0.051396197
Airco 0.386881506 -0.27148013 -0.004902131 0.321174563 -0.0438821897 0.317986513
Automatic_airco -0.001912387 0.02333268 0.006127251 0.033721110 0.0327451085 0.031109680
Boardcomputer 0.550072533 -0.65778398 -0.267149783 0.103867288 -0.0795007521 0.061005976
Powered_Windows 0.214774868 -0.07853545 0.050774834 0.237633841 -0.0119825576 0.243497709
Radio 0.031718411 -0.05495276 0.007948905 0.043575090 0.0004786372 0.038732273
Sport_Model -0.250139747 0.22645226 0.119482811 -0.101201424 0.0091588378 -0.077899171
Backseat_Divider 0.137403685 -0.15616546 -0.033824004 0.013955368 -0.0495385171 0.010900605
Metallic_Rim 0.047770538 0.02854947 0.049458769 0.105356647 -0.0787599589 0.109304126
Tow_Bar -0.062610185 0.07429896 0.045145220 0.143090217 0.0119462953 0.147782724
Fuel_Type_CNG 0.026799745 -0.05117355 0.080712160 0.058240636 0.0226909697 0.058867811
Fuel_Type_Petrol -0.026799745 0.05117355 -0.080712160 -0.058240636 -0.0226909697 -0.058867811
Quarterly_Tax Weight Mfr_Guarantee BOVAG_Guarantee Guarantee_Period ABS
Price 0.173941104 0.30718812 0.209950366 0.0952361518 0.07482187 0.34204423
Age_08_04 -0.113264643 -0.12668928 -0.163072546 -0.0537206180 -0.02608321 -0.43348939
KM 0.046061595 0.02410999 -0.048186377 -0.0178351571 -0.02877594 -0.11928672
HP 0.232331268 0.68051113 0.035914774 -0.0056394828 0.02143987 0.06812994
Automatic -0.005474027 0.18574674 0.017062000 0.0138840169 0.02823807 -0.01961869
cc 0.230926217 0.68581866 0.033505374 -0.0098144607 0.01955377 0.05139620
Quarterly_Tax 1.000000000 0.32848864 0.175256650 0.1603729376 -0.16520159 0.07077024
Weight 0.328488641 1.00000000 0.030339916 -0.0273485931 0.00397248 0.07342586
Mfr_Guarantee 0.175256650 0.03033992 1.000000000 0.2251127899 -0.07943866 0.11808102
BOVAG_Guarantee 0.160372938 -0.02734859 0.225112790 1.0000000000 -0.29857430 0.13094973
Guarantee_Period -0.165201593 0.00397248 -0.079438656 -0.2985742987 1.00000000 -0.14787809
ABS 0.070770239 0.07342586 0.118081020 0.1309497298 -0.14787809 1.00000000
Airco 0.099036881 0.32482993 0.022658162 0.0104244215 -0.03614292 0.20488068
Automatic_airco -0.238162545 0.06933205 -0.068155096 -0.0195238487 0.02646674 0.03817055
Boardcomputer 0.137186814 0.07200263 0.162696204 0.1051130583 -0.15057739 0.26559356
Powered_Windows 0.095237718 0.26654767 -0.026077572 -0.0002696174 -0.02657053 0.06950573
Radio -0.057094414 0.02689977 -0.046382255 -0.0211878890 0.23932909 -0.02765671
Sport_Model 0.003388081 -0.07721750 -0.002711477 0.1277471579 -0.17828211 0.15465659
Backseat_Divider 0.332765087 0.08129328 0.230440807 0.3980766666 -0.58692617 0.28544272
Metallic_Rim 0.114696229 0.11499733 0.012205084 0.0671345412 -0.07408261 0.09823569
Tow_Bar 0.085564984 0.10877411 -0.008820041 -0.0469709756 0.06247537 -0.03141409
Fuel_Type_CNG 0.528301627 0.18852759 0.032082482 -0.0098909747 -0.01384006 0.01659740
Fuel_Type_Petrol -0.528301627 -0.18852759 -0.032082482 0.0098909747 0.01384006 -0.01659740
Airco Automatic_airco Boardcomputer Powered_Windows Radio Sport_Model
Price 0.386881506 -0.001912387 0.550072533 0.2147748682 0.0317184108 -0.250139747
Age_08_04 -0.271480134 0.023332684 -0.657783975 -0.0785354480 -0.0549527641 0.226452262
KM -0.004902131 0.006127251 -0.267149783 0.0507748340 0.0079489050 0.119482811
HP 0.321174563 0.033721110 0.103867288 0.2376338410 0.0435750903 -0.101201424
Automatic -0.043882190 0.032745108 -0.079500752 -0.0119825576 0.0004786372 0.009158838
cc 0.317986513 0.031109680 0.061005976 0.2434977092 0.0387322731 -0.077899171
Quarterly_Tax 0.099036881 -0.238162545 0.137186814 0.0952377179 -0.0570944140 0.003388081
Weight 0.324829931 0.069332045 0.072002628 0.2665476676 0.0268997746 -0.077217496
Mfr_Guarantee 0.022658162 -0.068155096 0.162696204 -0.0260775725 -0.0463822553 -0.002711477
BOVAG_Guarantee 0.010424422 -0.019523849 0.105113058 -0.0002696174 -0.0211878890 0.127747158
Guarantee_Period -0.036142923 0.026466740 -0.150577387 -0.0265705336 0.2393290909 -0.178282113
ABS 0.204880681 0.038170548 0.265593557 0.0695057285 -0.0276567128 0.154656588
Airco 1.000000000 0.083659806 0.175476309 0.5195104818 -0.0115375896 -0.109702801
Automatic_airco 0.083659806 1.000000000 -0.040453927 0.0462194895 -0.0324294737 -0.041815756
Boardcomputer 0.175476309 -0.040453927 1.000000000 0.0791329013 -0.0512018641 -0.246803158
Powered_Windows 0.519510482 0.046219490 0.079132901 1.0000000000 -0.0343132382 -0.111716998
Radio -0.011537590 -0.032429474 -0.051201864 -0.0343132382 1.0000000000 -0.068285872
Sport_Model -0.109702801 -0.041815756 -0.246803158 -0.1117169977 -0.0682858722 1.000000000
Backseat_Divider 0.101972512 -0.140561686 0.260042283 0.0749633565 -0.1841207075 0.297490425
Metallic_Rim 0.259695833 -0.006077912 -0.059793611 0.2983940303 -0.0640819666 0.053682400
Tow_Bar 0.015445477 -0.051941036 -0.056341978 0.0388232637 0.1414457236 -0.054361558
Fuel_Type_CNG -0.013941585 -0.006719890 0.006459918 -0.0054442252 -0.0068953966 -0.048331531
Fuel_Type_Petrol 0.013941585 0.006719890 -0.006459918 0.0054442252 0.0068953966 0.048331531
Backseat_Divider Metallic_Rim Tow_Bar Fuel_Type_CNG Fuel_Type_Petrol
Price 0.13740368 0.047770538 -0.062610185 0.026799745 -0.026799745
Age_08_04 -0.15616546 0.028549466 0.074298962 -0.051173550 0.051173550
KM -0.03382400 0.049458769 0.045145220 0.080712160 -0.080712160
HP 0.01395537 0.105356647 0.143090217 0.058240636 -0.058240636
Automatic -0.04953852 -0.078759959 0.011946295 0.022690970 -0.022690970
cc 0.01090061 0.109304126 0.147782724 0.058867811 -0.058867811
Quarterly_Tax 0.33276509 0.114696229 0.085564984 0.528301627 -0.528301627
Weight 0.08129328 0.114997327 0.108774110 0.188527589 -0.188527589
Mfr_Guarantee 0.23044081 0.012205084 -0.008820041 0.032082482 -0.032082482
BOVAG_Guarantee 0.39807667 0.067134541 -0.046970976 -0.009890975 0.009890975
Guarantee_Period -0.58692617 -0.074082614 0.062475366 -0.013840061 0.013840061
ABS 0.28544272 0.098235694 -0.031414086 0.016597398 -0.016597398
Airco 0.10197251 0.259695833 0.015445477 -0.013941585 0.013941585
Automatic_airco -0.14056169 -0.006077912 -0.051941036 -0.006719890 0.006719890
Boardcomputer 0.26004228 -0.059793611 -0.056341978 0.006459918 -0.006459918
Powered_Windows 0.07496336 0.298394030 0.038823264 -0.005444225 0.005444225
Radio -0.18412071 -0.064081967 0.141445724 -0.006895397 0.006895397
Sport_Model 0.29749043 0.053682400 -0.054361558 -0.048331531 0.048331531
Backseat_Divider 1.00000000 0.143134542 -0.070141379 -0.057328375 0.057328375
Metallic_Rim 0.14313454 1.000000000 -0.048242624 0.011388982 -0.011388982
Tow_Bar -0.07014138 -0.048242624 1.000000000 0.010994168 -0.010994168
Fuel_Type_CNG -0.05732837 0.011388982 0.010994168 1.000000000 -1.000000000
Fuel_Type_Petrol 0.05732837 -0.011388982 -0.010994168 -1.000000000 1.000000000
imcdiag(select(data_set_1, -c("Price")), data_set_1$Price)
Call:
imcdiag(x = select(data_set_1, -c("Price")), y = data_set_1$Price)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
Age_08_04 2.6390 0.3789 79.2976 83.3444 0.6156 -0.7420 0
KM 1.2469 0.8020 11.9448 12.5544 0.8955 -0.3506 0
HP 129.7755 0.0077 6230.2792 6548.2319 0.0878 -36.4900 1
Automatic 1.2580 0.7949 12.4844 13.1215 0.8916 -0.3537 0
cc 128.6272 0.0078 6174.7277 6489.8455 0.0882 -36.1671 1
Quarterly_Tax 2.0672 0.4837 51.6322 54.2671 0.6955 -0.5813 0
Weight 2.3868 0.4190 67.0938 70.5179 0.6473 -0.6711 0
Mfr_Guarantee 1.1312 0.8840 6.3459 6.6698 0.9402 -0.3181 0
BOVAG_Guarantee 1.2600 0.7937 12.5771 13.2189 0.8909 -0.3543 0
Guarantee_Period 1.6754 0.5969 32.6776 34.3453 0.7726 -0.4711 0
ABS 1.4314 0.6986 20.8739 21.9392 0.8358 -0.4025 0
Airco 1.6383 0.6104 30.8836 32.4597 0.7813 -0.4607 0
Automatic_airco 1.1419 0.8758 6.8630 7.2132 0.9358 -0.3211 0
Boardcomputer 2.0898 0.4785 52.7273 55.4182 0.6917 -0.5876 0
Powered_Windows 1.4851 0.6734 23.4685 24.6661 0.8206 -0.4176 0
Radio 1.1107 0.9003 5.3574 5.6308 0.9488 -0.3123 0
Sport_Model 1.3852 0.7219 18.6381 19.5893 0.8496 -0.3895 0
Backseat_Divider 2.4211 0.4130 68.7544 72.2632 0.6427 -0.6808 0
Metallic_Rim 1.1840 0.8446 8.8999 9.3541 0.9190 -0.3329 0
Tow_Bar 1.0760 0.9293 3.6781 3.8658 0.9640 -0.3026 0
Fuel_Type_CNG Inf 0.0000 Inf Inf 0.0000 -Inf 1
Fuel_Type_Petrol Inf 0.0000 Inf Inf 0.0000 -Inf 1
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
Automatic , ABS , Automatic_airco , Boardcomputer , Radio , Backseat_Divider , Metallic_Rim , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.7643
* use method argument to check which regressors may be the reason of collinearity
===================================
linearMod_1 <- lm(formula = Price ~ ., data=data_set_1)
summary(linearMod_1)
Call:
lm(formula = Price ~ ., data = data_set_1)
Residuals:
Min 1Q Median 3Q Max
-2603.7 -535.6 -20.1 524.1 3341.5
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.344e+03 1.551e+03 0.867 0.38622
Age_08_04 -8.830e+01 3.328e+00 -26.536 < 2e-16 ***
KM -1.188e-02 1.246e-03 -9.534 < 2e-16 ***
HP -7.426e+01 3.009e+01 -2.468 0.01376 *
Automatic 1.582e+02 1.212e+02 1.305 0.19224
cc 5.497e+00 2.333e+00 2.356 0.01868 *
Quarterly_Tax 6.516e+00 2.114e+00 3.082 0.00211 **
Weight 1.194e+01 1.647e+00 7.251 8.20e-13 ***
Mfr_Guarantee 2.396e+02 5.605e+01 4.275 2.09e-05 ***
BOVAG_Guarantee 3.981e+02 1.014e+02 3.924 9.28e-05 ***
Guarantee_Period 6.801e+01 1.519e+01 4.478 8.40e-06 ***
ABS -5.034e+00 7.826e+01 -0.064 0.94873
Airco 3.860e+02 6.732e+01 5.734 1.29e-08 ***
Automatic_airco 1.406e+02 3.691e+02 0.381 0.70334
Boardcomputer 1.521e+02 9.143e+01 1.663 0.09655 .
Powered_Windows 2.034e+02 6.394e+01 3.182 0.00151 **
Radio -6.234e+01 7.662e+01 -0.814 0.41603
Sport_Model -1.919e+02 7.309e+01 -2.625 0.00879 **
Backseat_Divider -6.167e+01 9.720e+01 -0.634 0.52596
Metallic_Rim 4.031e+01 7.143e+01 0.564 0.57266
Tow_Bar -1.257e+02 5.837e+01 -2.153 0.03154 *
Fuel_Type_CNG -1.232e+03 3.819e+02 -3.225 0.00130 **
Fuel_Type_Petrol NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 843.6 on 1016 degrees of freedom
Multiple R-squared: 0.7643, Adjusted R-squared: 0.7594
F-statistic: 156.9 on 21 and 1016 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(linearMod_1)
residuos = resid(linearMod_1)
hist(residuos, col = "blue", freq = F)
lines(density(residuos), col = "red", lwd=2)
rug(residuos)
data_set_2 <- select(data_set_1, -c("Fuel_Type_Petrol", "Fuel_Type_CNG", "Automatic" , "Quarterly_Tax" , "ABS" , "Automatic_airco" , "Boardcomputer" , "Radio" , "Backseat_Divider" , "Metallic_Rim", "HP", "cc", "Tow_Bar", "Sport_Model", "BOVAG_Guarantee", "Guarantee_Period"))
corrplot(cor(data_set_2), type="upper", method="pie")
cor(data_set_2)
Price Age_08_04 KM Weight Mfr_Guarantee Airco
Price 1.0000000 -0.81086438 -0.435235606 0.30718812 0.20995037 0.386881506
Age_08_04 -0.8108644 1.00000000 0.397917080 -0.12668928 -0.16307255 -0.271480134
KM -0.4352356 0.39791708 1.000000000 0.02410999 -0.04818638 -0.004902131
Weight 0.3071881 -0.12668928 0.024109991 1.00000000 0.03033992 0.324829931
Mfr_Guarantee 0.2099504 -0.16307255 -0.048186377 0.03033992 1.00000000 0.022658162
Airco 0.3868815 -0.27148013 -0.004902131 0.32482993 0.02265816 1.000000000
Powered_Windows 0.2147749 -0.07853545 0.050774834 0.26654767 -0.02607757 0.519510482
Powered_Windows
Price 0.21477487
Age_08_04 -0.07853545
KM 0.05077483
Weight 0.26654767
Mfr_Guarantee -0.02607757
Airco 0.51951048
Powered_Windows 1.00000000
imcdiag(select(data_set_2, -c("Price")), data_set_2$Price)
Call:
imcdiag(x = select(data_set_2, -c("Price")), y = data_set_2$Price)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
Age_08_04 1.3418 0.7453 70.5517 88.2751 0.8633 -1.8736 0
KM 1.2090 0.8271 43.1341 53.9699 0.9095 -1.6881 0
Weight 1.1405 0.8768 28.9939 36.2775 0.9364 -1.5924 0
Mfr_Guarantee 1.0299 0.9710 6.1715 7.7218 0.9854 -1.4380 0
Airco 1.5512 0.6447 113.7681 142.3479 0.8029 -2.1659 0
Powered_Windows 1.4023 0.7131 83.0291 103.8869 0.8445 -1.9580 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
* all coefficients have significant t-ratios
R-square of y on all x: 0.747
* use method argument to check which regressors may be the reason of collinearity
===================================
linearMod_2 <- lm(formula = Price ~ ., data=data_set_2)
summary(linearMod_2)
Call:
lm(formula = Price ~ ., data = data_set_2)
Residuals:
Min 1Q Median 3Q Max
-2917.3 -567.6 -3.0 569.4 3284.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.231e+03 1.237e+03 2.611 0.009167 **
Age_08_04 -9.037e+01 2.440e+00 -37.034 < 2e-16 ***
KM -1.252e-02 1.262e-03 -9.917 < 2e-16 ***
Weight 1.182e+01 1.171e+00 10.100 < 2e-16 ***
Mfr_Guarantee 2.988e+02 5.500e+01 5.433 6.92e-08 ***
Airco 3.799e+02 6.736e+01 5.640 2.20e-08 ***
Powered_Windows 2.436e+02 6.389e+01 3.813 0.000145 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 867.5 on 1031 degrees of freedom
Multiple R-squared: 0.747, Adjusted R-squared: 0.7455
F-statistic: 507.4 on 6 and 1031 DF, p-value: < 2.2e-16
A partir de este modelo de regresión lineal, observo lo siguiente:
par(mfrow=c(2,2))
plot(linearMod_2)
A partir de este modelo de regresión lineal, observo lo siguiente:
residuos = resid(linearMod_2)
hist(residuos, col = "blue", freq = F)
lines(density(residuos), col = "red", lwd=2)
rug(residuos)
El Histograma de los residuos muestra una distribución muy similar a la distribucion normal, lo cual es deseable.