As a way to fulfill my assignment for Algoritma Data Science School, we would like to create a linear regression model to predict the Manufacturer’s Suggested Retail Price of car models from different kind of company brands by using many specification of each model as the predictor for the model that we are about to make. we will then make a prediction based on the model, do a validation test to the model whether the model is acceptable or need some adjustment, and make an interpretation of the model.
As a beginner to machine learning where we have just learn a method to create a model call linear regression, we were taught in class by using data with mostly numerical variables. However, in real world case, data can also contain many categorical variables. we were challenged by the thought of how accurate can a model, where it best to be used to predict numerical variables, predict a data which are mostly categorical variables.
This is a car dataset which were Scraped from Edmunds and Twitter website, features the year it was made, market prices, all the way up to the specification of each car with a total of 16 columns to cover all the category.
Before we begin to create our model, we have take a look at our data, observe our data, and clean or change several names and values of our data if necessary.
library(MLmetrics)##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
library(stats)
library(GGally)## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(nortest)
library(caret)## Loading required package: lattice
##
## Attaching package: 'caret'
## The following objects are masked from 'package:MLmetrics':
##
## MAE, RMSE
library(alookr)## Loading required package: randomForest
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
library(car)## Loading required package: carData
library(lmtest)## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dplyr)##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following object is masked from 'package:randomForest':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
car_dataset <- read.csv("Car_Dataset/data.csv")car_dataset| Make | : The company which produce the car which also be branded by the name of the company |
| Model | : Car model released by the company |
| Year | : Year where the car models were released |
| Engine.Fuel.Type | : Type of fuel required to run the engine |
| Engine.HP | : Power of the engine goes by the unit of Horsepower |
| Engine.Cylinders | : Number of cylinder inside the engine |
| Transmission.Type | : Type of transmission gears made for the model |
| Driven.Wheels | : Where the wheels are connected to the engine |
| Number.of.Doors | : Number of doors produced for the market |
| Market.Category | : Categorized a car based on the market category names |
| Vehicle.Size | : Size of the vehicle |
| Vehicle.Style | : Type of the vehicle |
| Highway.MPG | : How many gallons of fuel needed to make 100 miles trip in highway |
| city.mpg | : How many gallons of fuel needed to make 100 miles trip in a city roads |
| Popularity | : Popularity for car brands |
| MSRP | : Manufacturer’s Suggested Retail Price |
colSums(is.na(car_dataset))## Make Model Year Engine.Fuel.Type
## 0 0 0 0
## Engine.HP Engine.Cylinders Transmission.Type Driven_Wheels
## 69 30 0 0
## Number.of.Doors Market.Category Vehicle.Size Vehicle.Style
## 6 0 0 0
## highway.MPG city.mpg Popularity MSRP
## 0 0 0 0
car_dataset <- car_dataset %>%
filter(!is.na(Engine.HP),
!is.na(Engine.Cylinders),
!is.na(Number.of.Doors))
colSums(is.na(car_dataset))## Make Model Year Engine.Fuel.Type
## 0 0 0 0
## Engine.HP Engine.Cylinders Transmission.Type Driven_Wheels
## 0 0 0 0
## Number.of.Doors Market.Category Vehicle.Size Vehicle.Style
## 0 0 0 0
## highway.MPG city.mpg Popularity MSRP
## 0 0 0 0
car_dataset %>%
filter(duplicated(car_dataset))car_dataset <- car_dataset %>%
distinct()
car_dataset %>%
filter(duplicated(car_dataset))glimpse(car_dataset)## Rows: 11,100
## Columns: 16
## $ Make <chr> "BMW", "BMW", "BMW", "BMW", "BMW", "BMW", "BMW", "BM~
## $ Model <chr> "1 Series M", "1 Series", "1 Series", "1 Series", "1~
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2012, 2012, 2012, 2012~
## $ Engine.Fuel.Type <chr> "premium unleaded (required)", "premium unleaded (re~
## $ Engine.HP <int> 335, 300, 300, 230, 230, 230, 300, 300, 230, 230, 30~
## $ Engine.Cylinders <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6~
## $ Transmission.Type <chr> "MANUAL", "MANUAL", "MANUAL", "MANUAL", "MANUAL", "M~
## $ Driven_Wheels <chr> "rear wheel drive", "rear wheel drive", "rear wheel ~
## $ Number.of.Doors <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4~
## $ Market.Category <chr> "Factory Tuner,Luxury,High-Performance", "Luxury,Per~
## $ Vehicle.Size <chr> "Compact", "Compact", "Compact", "Compact", "Compact~
## $ Vehicle.Style <chr> "Coupe", "Convertible", "Coupe", "Coupe", "Convertib~
## $ highway.MPG <int> 26, 28, 28, 28, 28, 28, 26, 28, 28, 27, 28, 28, 28, ~
## $ city.mpg <int> 19, 19, 20, 18, 18, 18, 17, 20, 18, 18, 20, 19, 19, ~
## $ Popularity <int> 3916, 3916, 3916, 3916, 3916, 3916, 3916, 3916, 3916~
## $ MSRP <int> 46135, 40650, 36350, 29450, 34500, 31200, 44100, 393~
To make it easier for us to understand the column names. It is best to change couple of names of the columns, and change the data type for a better memory efficiency.
car_dataset <- car_dataset %>%
mutate_if(is.character, as.factor) %>%
rename(Brands = Make,
Driven.Wheels = Driven_Wheels,
Vehicle.Type = Vehicle.Style,
Highway.MpG = highway.MPG,
City.MpG = city.mpg)
unique(car_dataset$Transmission.Type)## [1] MANUAL AUTOMATIC AUTOMATED_MANUAL UNKNOWN
## [5] DIRECT_DRIVE
## Levels: AUTOMATED_MANUAL AUTOMATIC DIRECT_DRIVE MANUAL UNKNOWN
In the Transmission.Type variable, we seems to have “UNKNOWN” value for a specific car model. We understand that when we want to buy a car, we would like to know every detail possible, especially the main details such as the type of transmission that the car have. In this case, it is best for us to replace the “UNKNOWN” by looking up through the internet for the detail specification of the car which fall under “UNKNOWN” transmission category if possible.
car_dataset %>%
filter(Transmission.Type == "UNKNOWN")car_dataset_unknown_auto <- car_dataset %>%
filter(Model %in% c("Achieva", "Firebird", "Le Baron"),
Transmission.Type == "UNKNOWN") %>%
mutate(Transmission.Type = case_when(Transmission.Type == "UNKNOWN" ~ "AUTOMATIC"))
car_dataset_unknown_man <- car_dataset %>%
filter(Model %in% c("Jimmy", "RAM 150"),
Transmission.Type == "UNKNOWN") %>%
mutate(Transmission.Type = case_when(Transmission.Type == "UNKNOWN" ~ "MANUAL"))
car_dataset <- rbind(car_dataset, car_dataset_unknown_auto, car_dataset_unknown_man)
car_dataset <- car_dataset %>%
filter(Transmission.Type != "UNKNOWN")
unique(car_dataset$Transmission.Type)## [1] MANUAL AUTOMATIC AUTOMATED_MANUAL DIRECT_DRIVE
## Levels: AUTOMATED_MANUAL AUTOMATIC DIRECT_DRIVE MANUAL UNKNOWN
After we finished cleaning up the data, it is best for us to check for some outliers or noises in our data and decide whether we have to remove it or not in order to make the most accurate regression model as possible.
hist(car_dataset$MSRP)The distribution chart above has shown us that there are outliers which makes the distribution chart cannot be interpreted properly. To observe the outliers, we can plot a box graph as follow:
boxplot(car_dataset$MSRP)the major outliers were above $100,000, let us try to remove the outliers.
car_dataset <- car_dataset %>%
filter(MSRP <= 100000)
hist(car_dataset$MSRP)At this point, we can properly interpret the distribution chart where most of car prices fall between the price of 20,000 to 40,000 US Dollars.
Now that we have clean and inspect our data thoroughly, we can start to explore our data further starting from observing the correlation between columns to creating the best linear regression model based on data exploration and analysis. Our target will be MSRP as we want to create a model which can predict the increase of the price of the car listed in our data where the other columns which are the specification of the cars will be our predictors.
ggcorr(car_dataset, label = T, hjust = 1)## Warning in ggcorr(car_dataset, label = T, hjust = 1): data in column(s)
## 'Brands', 'Model', 'Engine.Fuel.Type', 'Transmission.Type', 'Driven.Wheels',
## 'Market.Category', 'Vehicle.Size', 'Vehicle.Type' are not numeric and were
## ignored
According to the correlation chart, the car specification which have the highest correlation to our target of prediction for our linear model is Engine.HP which also happen to have quite strong correlation with Engine.Cylinders which are not that significant compare to Engine.HP. We also happen to have 2 possible predictor which have strong correlation to each other like the Highway.MpG and City.MpG. For this matter, we have to remove one of them as it will cause problem to our model later on. Between the Engine.HP and Engine.Cylinders we would would want to remove the Engine.Cylinders as it have lower correlation to the MSRP compare to the Engine.HP. In case of Highway.MpG and City.MpG, we can observe which one of them are more significant compare to the other by creating a multiple linear regression model and observe the significance level of both predictors since they have the same correlation value to the MSRP.
car_dataset <- car_dataset %>%
select(-Engine.Cylinders)model_HP <- lm(MSRP ~ Engine.HP, car_dataset)
summary(model_HP)##
## Call:
## lm(formula = MSRP ~ Engine.HP, data = car_dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54581 -5961 1079 5978 59589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6778.215 333.878 -20.3 <2e-16 ***
## Engine.HP 159.998 1.318 121.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11930 on 10471 degrees of freedom
## Multiple R-squared: 0.5846, Adjusted R-squared: 0.5845
## F-statistic: 1.473e+04 on 1 and 10471 DF, p-value: < 2.2e-16
plot(car_dataset$Engine.HP, car_dataset$MSRP)
abline(model_HP$coefficients[1], model_HP$coefficients[2], col = "red", lwd = 2)
With the Multiple R-Squared value of 0.5895, this model able to explain MSRP variable for as accurate as 59%. The rest of the 41% are explained by the other predictors which were not included in the model. For every increase of 1 engine horsepower, the price increase as much as 162.518 US Dollars.
Since our potential predictors comes in a large number, we can build our model using stepwise method where the predictors were chosen automatically. Although there are different kind of direction method in stepwise, we can try to model our data by using every direction method to be compared to each other. The direction previously mentioned are forward, backward, and both. The result of the stepwise model will be compared with the model which have all the variable as the predictors. In addition, we will create a model where it uses all of the potential predictors inside the data and a stepwise model which only use numerical variables as the predictors to be compared to the model which buidl automatically using stepwise methods.
model_data <- car_dataset %>%
select(-c(Brands, Model))model_all <- lm(MSRP ~ ., model_data)options("scipen"=100, "digits"=4)
summary(model_all)##
## Call:
## lm(formula = MSRP ~ ., data = model_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32816 -4169 -218 3823 44282
##
## Coefficients: (1 not defined because of singularities)
## Estimate
## (Intercept) -1843098.6161
## Year 923.0860
## Engine.Fuel.Typediesel 457.5212
## Engine.Fuel.Typeelectric 18212.9480
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) -16417.7832
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 3924.4277
## Engine.Fuel.Typeflex-fuel (unleaded/E85) -7389.8629
## Engine.Fuel.Typenatural gas 2326.0020
## Engine.Fuel.Typepremium unleaded (recommended) -3266.9013
## Engine.Fuel.Typepremium unleaded (required) 2294.4914
## Engine.Fuel.Typeregular unleaded -6780.5568
## Engine.HP 89.3439
## Transmission.TypeAUTOMATIC -290.7810
## Transmission.TypeDIRECT_DRIVE -693.0108
## Transmission.TypeMANUAL -1785.0454
## Driven.Wheelsfour wheel drive -1300.3672
## Driven.Wheelsfront wheel drive -1195.7539
## Driven.Wheelsrear wheel drive -4031.8958
## Number.of.Doors 89.6660
## Market.CategoryCrossover,Diesel 15428.7435
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 18382.1676
## Market.CategoryCrossover,Exotic,Luxury,Performance 14393.4447
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 14011.4222
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 5584.3031
## Market.CategoryCrossover,Factory Tuner,Performance -837.8015
## Market.CategoryCrossover,Flex Fuel 3238.4533
## Market.CategoryCrossover,Flex Fuel,Luxury 18023.5839
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 35029.2018
## Market.CategoryCrossover,Flex Fuel,Performance 1446.8278
## Market.CategoryCrossover,Hatchback 4405.7130
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance -4238.7659
## Market.CategoryCrossover,Hatchback,Luxury 14287.8047
## Market.CategoryCrossover,Hatchback,Performance 764.8238
## Market.CategoryCrossover,Hybrid 9012.0055
## Market.CategoryCrossover,Luxury 5566.6732
## Market.CategoryCrossover,Luxury,Diesel 14561.5039
## Market.CategoryCrossover,Luxury,High-Performance 16111.6363
## Market.CategoryCrossover,Luxury,Hybrid 12299.7654
## Market.CategoryCrossover,Luxury,Performance 6557.0979
## Market.CategoryCrossover,Luxury,Performance,Hybrid 29513.3794
## Market.CategoryCrossover,Performance 1515.3365
## Market.CategoryDiesel 3639.9467
## Market.CategoryDiesel,Luxury 19885.6262
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 57836.7444
## Market.CategoryExotic,High-Performance 35143.1689
## Market.CategoryExotic,Luxury,High-Performance 45354.6119
## Market.CategoryFactory Tuner,High-Performance 786.9383
## Market.CategoryFactory Tuner,Luxury,High-Performance 14638.4289
## Market.CategoryFactory Tuner,Luxury,Performance 1315.9048
## Market.CategoryFactory Tuner,Performance -3324.9355
## Market.CategoryFlex Fuel 5172.3270
## Market.CategoryFlex Fuel,Diesel 8129.7023
## Market.CategoryFlex Fuel,Hybrid 12400.4290
## Market.CategoryFlex Fuel,Luxury 26113.5591
## Market.CategoryFlex Fuel,Luxury,High-Performance 17570.6763
## Market.CategoryFlex Fuel,Luxury,Performance 29312.6830
## Market.CategoryFlex Fuel,Performance 6212.2071
## Market.CategoryFlex Fuel,Performance,Hybrid 475.9915
## Market.CategoryHatchback 4553.3053
## Market.CategoryHatchback,Diesel 1951.5189
## Market.CategoryHatchback,Factory Tuner,High-Performance 148.9705
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 793.1411
## Market.CategoryHatchback,Factory Tuner,Performance -4568.0904
## Market.CategoryHatchback,Flex Fuel 2120.2187
## Market.CategoryHatchback,Hybrid 11519.2295
## Market.CategoryHatchback,Luxury 6222.5975
## Market.CategoryHatchback,Luxury,Hybrid 17204.3637
## Market.CategoryHatchback,Luxury,Performance 5462.3512
## Market.CategoryHatchback,Performance 1455.9407
## Market.CategoryHigh-Performance 3965.1372
## Market.CategoryHybrid 12265.2570
## Market.CategoryLuxury 10055.8749
## Market.CategoryLuxury,High-Performance 21512.6499
## Market.CategoryLuxury,High-Performance,Hybrid 13167.8424
## Market.CategoryLuxury,Hybrid 25175.4771
## Market.CategoryLuxury,Performance 13753.1919
## Market.CategoryLuxury,Performance,Hybrid 26708.4500
## Market.CategoryN/A 5272.2468
## Market.CategoryPerformance 3673.5071
## Market.CategoryPerformance,Hybrid 1959.5169
## Vehicle.SizeLarge 2116.9936
## Vehicle.SizeMidsize -1977.4405
## Vehicle.Type2dr SUV 2379.5828
## Vehicle.Type4dr Hatchback -1724.0878
## Vehicle.Type4dr SUV 5095.5883
## Vehicle.TypeCargo Minivan -680.0619
## Vehicle.TypeCargo Van -2792.8188
## Vehicle.TypeConvertible 4587.4118
## Vehicle.TypeConvertible SUV 7906.8713
## Vehicle.TypeCoupe -2163.6980
## Vehicle.TypeCrew Cab Pickup -1153.2916
## Vehicle.TypeExtended Cab Pickup -4143.7812
## Vehicle.TypePassenger Minivan 1282.2370
## Vehicle.TypePassenger Van 596.9037
## Vehicle.TypeRegular Cab Pickup -4417.6117
## Vehicle.TypeSedan -2311.2938
## Vehicle.TypeWagon NA
## Highway.MpG 46.8939
## City.MpG -180.1212
## Popularity 0.1089
## Std. Error
## (Intercept) 32731.7969
## Year 16.5194
## Engine.Fuel.Typediesel 4416.2261
## Engine.Fuel.Typeelectric 7376.1620
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) 4637.5394
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 4625.9365
## Engine.Fuel.Typeflex-fuel (unleaded/E85) 4135.8028
## Engine.Fuel.Typenatural gas 6475.0382
## Engine.Fuel.Typepremium unleaded (recommended) 4109.6380
## Engine.Fuel.Typepremium unleaded (required) 4110.4739
## Engine.Fuel.Typeregular unleaded 4099.3705
## Engine.HP 1.9919
## Transmission.TypeAUTOMATIC 431.1711
## Transmission.TypeDIRECT_DRIVE 5075.9545
## Transmission.TypeMANUAL 440.5604
## Driven.Wheelsfour wheel drive 363.1740
## Driven.Wheelsfront wheel drive 237.2331
## Driven.Wheelsrear wheel drive 278.1471
## Number.of.Doors 345.8576
## Market.CategoryCrossover,Diesel 3148.5736
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 7101.4367
## Market.CategoryCrossover,Exotic,Luxury,Performance 7098.4894
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 1967.5414
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 3195.1981
## Market.CategoryCrossover,Factory Tuner,Performance 3557.0063
## Market.CategoryCrossover,Flex Fuel 1033.4243
## Market.CategoryCrossover,Flex Fuel,Luxury 2535.8241
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 3649.1990
## Market.CategoryCrossover,Flex Fuel,Performance 2933.3678
## Market.CategoryCrossover,Hatchback 1254.6671
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance 3055.6370
## Market.CategoryCrossover,Hatchback,Luxury 2860.5131
## Market.CategoryCrossover,Hatchback,Performance 3049.6843
## Market.CategoryCrossover,Hybrid 1167.5744
## Market.CategoryCrossover,Luxury 441.2828
## Market.CategoryCrossover,Luxury,Diesel 2062.5716
## Market.CategoryCrossover,Luxury,High-Performance 2718.3630
## Market.CategoryCrossover,Luxury,Hybrid 1514.1993
## Market.CategoryCrossover,Luxury,Performance 748.0633
## Market.CategoryCrossover,Luxury,Performance,Hybrid 5048.4985
## Market.CategoryCrossover,Performance 902.0672
## Market.CategoryDiesel 1288.1802
## Market.CategoryDiesel,Luxury 1978.1042
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 4161.4939
## Market.CategoryExotic,High-Performance 1357.0157
## Market.CategoryExotic,Luxury,High-Performance 1567.5611
## Market.CategoryFactory Tuner,High-Performance 1005.4762
## Market.CategoryFactory Tuner,Luxury,High-Performance 870.0279
## Market.CategoryFactory Tuner,Luxury,Performance 1372.5306
## Market.CategoryFactory Tuner,Performance 906.7883
## Market.CategoryFlex Fuel 623.7253
## Market.CategoryFlex Fuel,Diesel 2008.8447
## Market.CategoryFlex Fuel,Hybrid 5031.4555
## Market.CategoryFlex Fuel,Luxury 1603.1342
## Market.CategoryFlex Fuel,Luxury,High-Performance 1843.7863
## Market.CategoryFlex Fuel,Luxury,Performance 1538.0407
## Market.CategoryFlex Fuel,Performance 1021.1929
## Market.CategoryFlex Fuel,Performance,Hybrid 5061.2492
## Market.CategoryHatchback 900.6487
## Market.CategoryHatchback,Diesel 2609.0619
## Market.CategoryHatchback,Factory Tuner,High-Performance 2217.4317
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 2505.9447
## Market.CategoryHatchback,Factory Tuner,Performance 1774.6460
## Market.CategoryHatchback,Flex Fuel 2903.1701
## Market.CategoryHatchback,Hybrid 1380.0765
## Market.CategoryHatchback,Luxury 1391.4300
## Market.CategoryHatchback,Luxury,Hybrid 4232.9808
## Market.CategoryHatchback,Luxury,Performance 1457.3393
## Market.CategoryHatchback,Performance 996.6584
## Market.CategoryHigh-Performance 817.1806
## Market.CategoryHybrid 885.3231
## Market.CategoryLuxury 484.3925
## Market.CategoryLuxury,High-Performance 743.1068
## Market.CategoryLuxury,High-Performance,Hybrid 2458.4514
## Market.CategoryLuxury,Hybrid 1128.0121
## Market.CategoryLuxury,Performance 557.3432
## Market.CategoryLuxury,Performance,Hybrid 2236.0458
## Market.CategoryN/A 384.4053
## Market.CategoryPerformance 551.3122
## Market.CategoryPerformance,Hybrid 7104.2276
## Vehicle.SizeLarge 281.9431
## Vehicle.SizeMidsize 211.6869
## Vehicle.Type2dr SUV 1075.0267
## Vehicle.Type4dr Hatchback 818.8356
## Vehicle.Type4dr SUV 417.1570
## Vehicle.TypeCargo Minivan 1006.2427
## Vehicle.TypeCargo Van 959.0931
## Vehicle.TypeConvertible 830.1547
## Vehicle.TypeConvertible SUV 1546.1282
## Vehicle.TypeCoupe 804.5947
## Vehicle.TypeCrew Cab Pickup 514.5871
## Vehicle.TypeExtended Cab Pickup 536.9430
## Vehicle.TypePassenger Minivan 506.7778
## Vehicle.TypePassenger Van 953.6105
## Vehicle.TypeRegular Cab Pickup 891.6942
## Vehicle.TypeSedan 359.6699
## Vehicle.TypeWagon NA
## Highway.MpG 20.0694
## City.MpG 35.3822
## Popularity 0.0526
## t value
## (Intercept) -56.31
## Year 55.88
## Engine.Fuel.Typediesel 0.10
## Engine.Fuel.Typeelectric 2.47
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) -3.54
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 0.85
## Engine.Fuel.Typeflex-fuel (unleaded/E85) -1.79
## Engine.Fuel.Typenatural gas 0.36
## Engine.Fuel.Typepremium unleaded (recommended) -0.79
## Engine.Fuel.Typepremium unleaded (required) 0.56
## Engine.Fuel.Typeregular unleaded -1.65
## Engine.HP 44.85
## Transmission.TypeAUTOMATIC -0.67
## Transmission.TypeDIRECT_DRIVE -0.14
## Transmission.TypeMANUAL -4.05
## Driven.Wheelsfour wheel drive -3.58
## Driven.Wheelsfront wheel drive -5.04
## Driven.Wheelsrear wheel drive -14.50
## Number.of.Doors 0.26
## Market.CategoryCrossover,Diesel 4.90
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 2.59
## Market.CategoryCrossover,Exotic,Luxury,Performance 2.03
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 7.12
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 1.75
## Market.CategoryCrossover,Factory Tuner,Performance -0.24
## Market.CategoryCrossover,Flex Fuel 3.13
## Market.CategoryCrossover,Flex Fuel,Luxury 7.11
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 9.60
## Market.CategoryCrossover,Flex Fuel,Performance 0.49
## Market.CategoryCrossover,Hatchback 3.51
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance -1.39
## Market.CategoryCrossover,Hatchback,Luxury 4.99
## Market.CategoryCrossover,Hatchback,Performance 0.25
## Market.CategoryCrossover,Hybrid 7.72
## Market.CategoryCrossover,Luxury 12.61
## Market.CategoryCrossover,Luxury,Diesel 7.06
## Market.CategoryCrossover,Luxury,High-Performance 5.93
## Market.CategoryCrossover,Luxury,Hybrid 8.12
## Market.CategoryCrossover,Luxury,Performance 8.77
## Market.CategoryCrossover,Luxury,Performance,Hybrid 5.85
## Market.CategoryCrossover,Performance 1.68
## Market.CategoryDiesel 2.83
## Market.CategoryDiesel,Luxury 10.05
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 13.90
## Market.CategoryExotic,High-Performance 25.90
## Market.CategoryExotic,Luxury,High-Performance 28.93
## Market.CategoryFactory Tuner,High-Performance 0.78
## Market.CategoryFactory Tuner,Luxury,High-Performance 16.83
## Market.CategoryFactory Tuner,Luxury,Performance 0.96
## Market.CategoryFactory Tuner,Performance -3.67
## Market.CategoryFlex Fuel 8.29
## Market.CategoryFlex Fuel,Diesel 4.05
## Market.CategoryFlex Fuel,Hybrid 2.46
## Market.CategoryFlex Fuel,Luxury 16.29
## Market.CategoryFlex Fuel,Luxury,High-Performance 9.53
## Market.CategoryFlex Fuel,Luxury,Performance 19.06
## Market.CategoryFlex Fuel,Performance 6.08
## Market.CategoryFlex Fuel,Performance,Hybrid 0.09
## Market.CategoryHatchback 5.06
## Market.CategoryHatchback,Diesel 0.75
## Market.CategoryHatchback,Factory Tuner,High-Performance 0.07
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 0.32
## Market.CategoryHatchback,Factory Tuner,Performance -2.57
## Market.CategoryHatchback,Flex Fuel 0.73
## Market.CategoryHatchback,Hybrid 8.35
## Market.CategoryHatchback,Luxury 4.47
## Market.CategoryHatchback,Luxury,Hybrid 4.06
## Market.CategoryHatchback,Luxury,Performance 3.75
## Market.CategoryHatchback,Performance 1.46
## Market.CategoryHigh-Performance 4.85
## Market.CategoryHybrid 13.85
## Market.CategoryLuxury 20.76
## Market.CategoryLuxury,High-Performance 28.95
## Market.CategoryLuxury,High-Performance,Hybrid 5.36
## Market.CategoryLuxury,Hybrid 22.32
## Market.CategoryLuxury,Performance 24.68
## Market.CategoryLuxury,Performance,Hybrid 11.94
## Market.CategoryN/A 13.72
## Market.CategoryPerformance 6.66
## Market.CategoryPerformance,Hybrid 0.28
## Vehicle.SizeLarge 7.51
## Vehicle.SizeMidsize -9.34
## Vehicle.Type2dr SUV 2.21
## Vehicle.Type4dr Hatchback -2.11
## Vehicle.Type4dr SUV 12.22
## Vehicle.TypeCargo Minivan -0.68
## Vehicle.TypeCargo Van -2.91
## Vehicle.TypeConvertible 5.53
## Vehicle.TypeConvertible SUV 5.11
## Vehicle.TypeCoupe -2.69
## Vehicle.TypeCrew Cab Pickup -2.24
## Vehicle.TypeExtended Cab Pickup -7.72
## Vehicle.TypePassenger Minivan 2.53
## Vehicle.TypePassenger Van 0.63
## Vehicle.TypeRegular Cab Pickup -4.95
## Vehicle.TypeSedan -6.43
## Vehicle.TypeWagon NA
## Highway.MpG 2.34
## City.MpG -5.09
## Popularity 2.07
## Pr(>|t|)
## (Intercept) < 0.0000000000000002
## Year < 0.0000000000000002
## Engine.Fuel.Typediesel 0.91749
## Engine.Fuel.Typeelectric 0.01356
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) 0.00040
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 0.39626
## Engine.Fuel.Typeflex-fuel (unleaded/E85) 0.07400
## Engine.Fuel.Typenatural gas 0.71943
## Engine.Fuel.Typepremium unleaded (recommended) 0.42667
## Engine.Fuel.Typepremium unleaded (required) 0.57672
## Engine.Fuel.Typeregular unleaded 0.09815
## Engine.HP < 0.0000000000000002
## Transmission.TypeAUTOMATIC 0.50007
## Transmission.TypeDIRECT_DRIVE 0.89141
## Transmission.TypeMANUAL 0.00005120405006348
## Driven.Wheelsfour wheel drive 0.00034
## Driven.Wheelsfront wheel drive 0.00000047235345592
## Driven.Wheelsrear wheel drive < 0.0000000000000002
## Number.of.Doors 0.79544
## Market.CategoryCrossover,Diesel 0.00000097170960470
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 0.00965
## Market.CategoryCrossover,Exotic,Luxury,Performance 0.04262
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 0.00000000000114015
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 0.08054
## Market.CategoryCrossover,Factory Tuner,Performance 0.81380
## Market.CategoryCrossover,Flex Fuel 0.00173
## Market.CategoryCrossover,Flex Fuel,Luxury 0.00000000000125860
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance < 0.0000000000000002
## Market.CategoryCrossover,Flex Fuel,Performance 0.62186
## Market.CategoryCrossover,Hatchback 0.00045
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance 0.16541
## Market.CategoryCrossover,Hatchback,Luxury 0.00000059843833690
## Market.CategoryCrossover,Hatchback,Performance 0.80198
## Market.CategoryCrossover,Hybrid 0.00000000000001285
## Market.CategoryCrossover,Luxury < 0.0000000000000002
## Market.CategoryCrossover,Luxury,Diesel 0.00000000000177317
## Market.CategoryCrossover,Luxury,High-Performance 0.00000000318414166
## Market.CategoryCrossover,Luxury,Hybrid 0.00000000000000051
## Market.CategoryCrossover,Luxury,Performance < 0.0000000000000002
## Market.CategoryCrossover,Luxury,Performance,Hybrid 0.00000000518806344
## Market.CategoryCrossover,Performance 0.09302
## Market.CategoryDiesel 0.00473
## Market.CategoryDiesel,Luxury < 0.0000000000000002
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryExotic,High-Performance < 0.0000000000000002
## Market.CategoryExotic,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryFactory Tuner,High-Performance 0.43385
## Market.CategoryFactory Tuner,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryFactory Tuner,Luxury,Performance 0.33771
## Market.CategoryFactory Tuner,Performance 0.00025
## Market.CategoryFlex Fuel < 0.0000000000000002
## Market.CategoryFlex Fuel,Diesel 0.00005226492162296
## Market.CategoryFlex Fuel,Hybrid 0.01373
## Market.CategoryFlex Fuel,Luxury < 0.0000000000000002
## Market.CategoryFlex Fuel,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryFlex Fuel,Luxury,Performance < 0.0000000000000002
## Market.CategoryFlex Fuel,Performance 0.00000000121900704
## Market.CategoryFlex Fuel,Performance,Hybrid 0.92507
## Market.CategoryHatchback 0.00000043640140776
## Market.CategoryHatchback,Diesel 0.45449
## Market.CategoryHatchback,Factory Tuner,High-Performance 0.94644
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 0.75163
## Market.CategoryHatchback,Factory Tuner,Performance 0.01006
## Market.CategoryHatchback,Flex Fuel 0.46522
## Market.CategoryHatchback,Hybrid < 0.0000000000000002
## Market.CategoryHatchback,Luxury 0.00000782827710830
## Market.CategoryHatchback,Luxury,Hybrid 0.00004851938201439
## Market.CategoryHatchback,Luxury,Performance 0.00018
## Market.CategoryHatchback,Performance 0.14409
## Market.CategoryHigh-Performance 0.00000123866815708
## Market.CategoryHybrid < 0.0000000000000002
## Market.CategoryLuxury < 0.0000000000000002
## Market.CategoryLuxury,High-Performance < 0.0000000000000002
## Market.CategoryLuxury,High-Performance,Hybrid 0.00000008683034509
## Market.CategoryLuxury,Hybrid < 0.0000000000000002
## Market.CategoryLuxury,Performance < 0.0000000000000002
## Market.CategoryLuxury,Performance,Hybrid < 0.0000000000000002
## Market.CategoryN/A < 0.0000000000000002
## Market.CategoryPerformance 0.00000000002815078
## Market.CategoryPerformance,Hybrid 0.78269
## Vehicle.SizeLarge 0.00000000000006468
## Vehicle.SizeMidsize < 0.0000000000000002
## Vehicle.Type2dr SUV 0.02688
## Vehicle.Type4dr Hatchback 0.03527
## Vehicle.Type4dr SUV < 0.0000000000000002
## Vehicle.TypeCargo Minivan 0.49916
## Vehicle.TypeCargo Van 0.00360
## Vehicle.TypeConvertible 0.00000003355839501
## Vehicle.TypeConvertible SUV 0.00000032106948621
## Vehicle.TypeCoupe 0.00717
## Vehicle.TypeCrew Cab Pickup 0.02503
## Vehicle.TypeExtended Cab Pickup 0.00000000000001297
## Vehicle.TypePassenger Minivan 0.01142
## Vehicle.TypePassenger Van 0.53137
## Vehicle.TypeRegular Cab Pickup 0.00000073783026518
## Vehicle.TypeSedan 0.00000000013661848
## Vehicle.TypeWagon NA
## Highway.MpG 0.01948
## City.MpG 0.00000036293758923
## Popularity 0.03851
##
## (Intercept) ***
## Year ***
## Engine.Fuel.Typediesel
## Engine.Fuel.Typeelectric *
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) ***
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85)
## Engine.Fuel.Typeflex-fuel (unleaded/E85) .
## Engine.Fuel.Typenatural gas
## Engine.Fuel.Typepremium unleaded (recommended)
## Engine.Fuel.Typepremium unleaded (required)
## Engine.Fuel.Typeregular unleaded .
## Engine.HP ***
## Transmission.TypeAUTOMATIC
## Transmission.TypeDIRECT_DRIVE
## Transmission.TypeMANUAL ***
## Driven.Wheelsfour wheel drive ***
## Driven.Wheelsfront wheel drive ***
## Driven.Wheelsrear wheel drive ***
## Number.of.Doors
## Market.CategoryCrossover,Diesel ***
## Market.CategoryCrossover,Exotic,Luxury,High-Performance **
## Market.CategoryCrossover,Exotic,Luxury,Performance *
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance ***
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance .
## Market.CategoryCrossover,Factory Tuner,Performance
## Market.CategoryCrossover,Flex Fuel **
## Market.CategoryCrossover,Flex Fuel,Luxury ***
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance ***
## Market.CategoryCrossover,Flex Fuel,Performance
## Market.CategoryCrossover,Hatchback ***
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance
## Market.CategoryCrossover,Hatchback,Luxury ***
## Market.CategoryCrossover,Hatchback,Performance
## Market.CategoryCrossover,Hybrid ***
## Market.CategoryCrossover,Luxury ***
## Market.CategoryCrossover,Luxury,Diesel ***
## Market.CategoryCrossover,Luxury,High-Performance ***
## Market.CategoryCrossover,Luxury,Hybrid ***
## Market.CategoryCrossover,Luxury,Performance ***
## Market.CategoryCrossover,Luxury,Performance,Hybrid ***
## Market.CategoryCrossover,Performance .
## Market.CategoryDiesel **
## Market.CategoryDiesel,Luxury ***
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance ***
## Market.CategoryExotic,High-Performance ***
## Market.CategoryExotic,Luxury,High-Performance ***
## Market.CategoryFactory Tuner,High-Performance
## Market.CategoryFactory Tuner,Luxury,High-Performance ***
## Market.CategoryFactory Tuner,Luxury,Performance
## Market.CategoryFactory Tuner,Performance ***
## Market.CategoryFlex Fuel ***
## Market.CategoryFlex Fuel,Diesel ***
## Market.CategoryFlex Fuel,Hybrid *
## Market.CategoryFlex Fuel,Luxury ***
## Market.CategoryFlex Fuel,Luxury,High-Performance ***
## Market.CategoryFlex Fuel,Luxury,Performance ***
## Market.CategoryFlex Fuel,Performance ***
## Market.CategoryFlex Fuel,Performance,Hybrid
## Market.CategoryHatchback ***
## Market.CategoryHatchback,Diesel
## Market.CategoryHatchback,Factory Tuner,High-Performance
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance
## Market.CategoryHatchback,Factory Tuner,Performance *
## Market.CategoryHatchback,Flex Fuel
## Market.CategoryHatchback,Hybrid ***
## Market.CategoryHatchback,Luxury ***
## Market.CategoryHatchback,Luxury,Hybrid ***
## Market.CategoryHatchback,Luxury,Performance ***
## Market.CategoryHatchback,Performance
## Market.CategoryHigh-Performance ***
## Market.CategoryHybrid ***
## Market.CategoryLuxury ***
## Market.CategoryLuxury,High-Performance ***
## Market.CategoryLuxury,High-Performance,Hybrid ***
## Market.CategoryLuxury,Hybrid ***
## Market.CategoryLuxury,Performance ***
## Market.CategoryLuxury,Performance,Hybrid ***
## Market.CategoryN/A ***
## Market.CategoryPerformance ***
## Market.CategoryPerformance,Hybrid
## Vehicle.SizeLarge ***
## Vehicle.SizeMidsize ***
## Vehicle.Type2dr SUV *
## Vehicle.Type4dr Hatchback *
## Vehicle.Type4dr SUV ***
## Vehicle.TypeCargo Minivan
## Vehicle.TypeCargo Van **
## Vehicle.TypeConvertible ***
## Vehicle.TypeConvertible SUV ***
## Vehicle.TypeCoupe **
## Vehicle.TypeCrew Cab Pickup *
## Vehicle.TypeExtended Cab Pickup ***
## Vehicle.TypePassenger Minivan *
## Vehicle.TypePassenger Van
## Vehicle.TypeRegular Cab Pickup ***
## Vehicle.TypeSedan ***
## Vehicle.TypeWagon
## Highway.MpG *
## City.MpG ***
## Popularity *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7090 on 10374 degrees of freedom
## Multiple R-squared: 0.855, Adjusted R-squared: 0.853
## F-statistic: 623 on 98 and 10374 DF, p-value: <0.0000000000000002
It appears we have ran into a problem with our model. As seen in the summary above, the last variables of the Vehicle.Type predictor returns an NA value which is an indication of a singularity problem. To be more certain, we can check our model by using alias() function.
alias(model_all)## Model :
## MSRP ~ Year + Engine.Fuel.Type + Engine.HP + Transmission.Type +
## Driven.Wheels + Number.of.Doors + Market.Category + Vehicle.Size +
## Vehicle.Type + Highway.MpG + City.MpG + Popularity
##
## Complete :
## (Intercept) Year Engine.Fuel.Typediesel
## Vehicle.TypeWagon 1 0 0
## Engine.Fuel.Typeelectric
## Vehicle.TypeWagon 0
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85)
## Vehicle.TypeWagon 0
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85)
## Vehicle.TypeWagon 0
## Engine.Fuel.Typeflex-fuel (unleaded/E85)
## Vehicle.TypeWagon 0
## Engine.Fuel.Typenatural gas
## Vehicle.TypeWagon 0
## Engine.Fuel.Typepremium unleaded (recommended)
## Vehicle.TypeWagon 0
## Engine.Fuel.Typepremium unleaded (required)
## Vehicle.TypeWagon 0
## Engine.Fuel.Typeregular unleaded Engine.HP
## Vehicle.TypeWagon 0 0
## Transmission.TypeAUTOMATIC Transmission.TypeDIRECT_DRIVE
## Vehicle.TypeWagon 0 0
## Transmission.TypeMANUAL Driven.Wheelsfour wheel drive
## Vehicle.TypeWagon 0 0
## Driven.Wheelsfront wheel drive Driven.Wheelsrear wheel drive
## Vehicle.TypeWagon 0 0
## Number.of.Doors Market.CategoryCrossover,Diesel
## Vehicle.TypeWagon 0 0
## Market.CategoryCrossover,Exotic,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Exotic,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Factory Tuner,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Flex Fuel
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Flex Fuel,Luxury
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Flex Fuel,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Hatchback
## Vehicle.TypeWagon -1
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance
## Vehicle.TypeWagon -1
## Market.CategoryCrossover,Hatchback,Luxury
## Vehicle.TypeWagon -1
## Market.CategoryCrossover,Hatchback,Performance
## Vehicle.TypeWagon -1
## Market.CategoryCrossover,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury,Diesel
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Luxury,Performance,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryCrossover,Performance Market.CategoryDiesel
## Vehicle.TypeWagon 0 0
## Market.CategoryDiesel,Luxury
## Vehicle.TypeWagon 0
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryExotic,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryExotic,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryFactory Tuner,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryFactory Tuner,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryFactory Tuner,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryFactory Tuner,Performance
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel Market.CategoryFlex Fuel,Diesel
## Vehicle.TypeWagon 0 0
## Market.CategoryFlex Fuel,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel,Luxury
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel,Luxury,High-Performance
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel,Luxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel,Performance
## Vehicle.TypeWagon 0
## Market.CategoryFlex Fuel,Performance,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryHatchback Market.CategoryHatchback,Diesel
## Vehicle.TypeWagon -1 -1
## Market.CategoryHatchback,Factory Tuner,High-Performance
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Factory Tuner,Performance
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Flex Fuel
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Hybrid
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Luxury
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Luxury,Hybrid
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Luxury,Performance
## Vehicle.TypeWagon -1
## Market.CategoryHatchback,Performance
## Vehicle.TypeWagon -1
## Market.CategoryHigh-Performance Market.CategoryHybrid
## Vehicle.TypeWagon 0 0
## Market.CategoryLuxury Market.CategoryLuxury,High-Performance
## Vehicle.TypeWagon 0 0
## Market.CategoryLuxury,High-Performance,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryLuxury,Hybrid
## Vehicle.TypeWagon 0
## Market.CategoryLuxury,Performance
## Vehicle.TypeWagon 0
## Market.CategoryLuxury,Performance,Hybrid Market.CategoryN/A
## Vehicle.TypeWagon 0 0
## Market.CategoryPerformance Market.CategoryPerformance,Hybrid
## Vehicle.TypeWagon 0 0
## Vehicle.SizeLarge Vehicle.SizeMidsize Vehicle.Type2dr SUV
## Vehicle.TypeWagon 0 0 -1
## Vehicle.Type4dr Hatchback Vehicle.Type4dr SUV
## Vehicle.TypeWagon 0 -1
## Vehicle.TypeCargo Minivan Vehicle.TypeCargo Van
## Vehicle.TypeWagon -1 -1
## Vehicle.TypeConvertible Vehicle.TypeConvertible SUV
## Vehicle.TypeWagon -1 -1
## Vehicle.TypeCoupe Vehicle.TypeCrew Cab Pickup
## Vehicle.TypeWagon -1 -1
## Vehicle.TypeExtended Cab Pickup Vehicle.TypePassenger Minivan
## Vehicle.TypeWagon -1 -1
## Vehicle.TypePassenger Van Vehicle.TypeRegular Cab Pickup
## Vehicle.TypeWagon -1 -1
## Vehicle.TypeSedan Highway.MpG City.MpG Popularity
## Vehicle.TypeWagon -1 0 0 0
Apparently, the Vehicle.Type predictor correlates with many other predictors, and the variable inside it mostly correlates with each other. As a result, we have to remove Vehicle.Type predictor from our model. In addition to that, If we go back to the previously discussed correlation plot, we have mentioned that there are a strong correlation between the Highway.MpG and City.Mpg predictors. Although different, they have quite the similarities between each other as they are the numbers which represent how economical is a car by calculating how many miles can a car manage to travel per gallons of fuel. Due to this reasons, we have to choose one of them as our predictor which happen to be City.MpG as it has the lower p-value compare to Highway.MpG. Which means that we have to remove Highway.MpG from our model.
model_data <- model_data %>%
select(-c(Vehicle.Type, Highway.MpG))model_all <- lm(MSRP ~ ., model_data)model_fwd <- step(lm(MSRP ~ 1, model_data),
scope = list(lower = lm(MSRP ~ 1, model_data),
upper = lm(MSRP ~ ., model_data)),
direction = "forward")## Start: AIC=205812
## MSRP ~ 1
##
## Df Sum of Sq RSS AIC
## + Engine.HP 1 2096262177374 1489778586322 196615
## + Market.Category 61 1695116344793 1890924418903 199232
## + Year 1 1331909844208 2254130919488 200952
## + Engine.Fuel.Type 9 1166846233190 2419194530507 201708
## + Driven.Wheels 3 520118087971 3065922675725 204177
## + Vehicle.Size 2 400762781697 3185277981999 204575
## + Transmission.Type 3 377722049149 3208318714547 204653
## + City.MpG 1 70245818475 3515794945221 205607
## + Number.of.Doors 1 57313005881 3528727757815 205646
## + Popularity 1 8221255711 3577819507985 205790
## <none> 3586040763696 205812
##
## Step: AIC=196615
## MSRP ~ Engine.HP
##
## Df Sum of Sq RSS AIC
## + Year 1 437616201226 1052162385096 192974
## + Market.Category 61 435489305929 1054289280394 193115
## + Engine.Fuel.Type 9 260443334211 1229335252111 194620
## + City.MpG 1 167815649539 1321962936783 195365
## + Driven.Wheels 3 119687829611 1370090756711 195743
## + Transmission.Type 3 104395132505 1385383453817 195860
## + Number.of.Doors 1 43618048742 1446160537581 196305
## + Vehicle.Size 2 19397585270 1470381001052 196481
## + Popularity 1 2908327896 1486870258426 196596
## <none> 1489778586322 196615
##
## Step: AIC=192974
## MSRP ~ Engine.HP + Year
##
## Df Sum of Sq RSS AIC
## + Market.Category 61 380593004688 671569380409 188394
## + Engine.Fuel.Type 9 200664823655 851497561441 190776
## + Driven.Wheels 3 37671652384 1014490732712 192598
## + Transmission.Type 3 20301722330 1031860662766 192776
## + City.MpG 1 10437591983 1041724793113 192872
## + Popularity 1 9539829673 1042622555423 192881
## + Vehicle.Size 2 2787920653 1049374464443 192951
## + Number.of.Doors 1 257074703 1051905310393 192974
## <none> 1052162385096 192974
##
## Step: AIC=188394
## MSRP ~ Engine.HP + Year + Market.Category
##
## Df Sum of Sq RSS AIC
## + Engine.Fuel.Type 9 58859141778 612710238630 187451
## + Driven.Wheels 3 13933444241 657635936167 188181
## + Vehicle.Size 2 7381835880 664187544528 188282
## + Transmission.Type 3 6996345497 664573034911 188290
## + City.MpG 1 154157347 671415223062 188394
## <none> 671569380409 188394
## + Number.of.Doors 1 94360971 671475019438 188395
## + Popularity 1 60261230 671509119179 188395
##
## Step: AIC=187451
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type
##
## Df Sum of Sq RSS AIC
## + Driven.Wheels 3 13082281579 599627957052 187231
## + Vehicle.Size 2 8984622961 603725615669 187301
## + Transmission.Type 3 5490069274 607220169356 187363
## + City.MpG 1 3683227880 609027010751 187390
## <none> 612710238630 187451
## + Number.of.Doors 1 60043150 612650195480 187452
## + Popularity 1 23153316 612687085314 187453
##
## Step: AIC=187231
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels
##
## Df Sum of Sq RSS AIC
## + Vehicle.Size 2 9627980605 589999976447 187066
## + Transmission.Type 3 5171791941 594456165110 187147
## + City.MpG 1 3519927092 596108029959 187172
## <none> 599627957052 187231
## + Number.of.Doors 1 50699277 599577257775 187233
## + Popularity 1 435876 599627521175 187233
##
## Step: AIC=187066
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size
##
## Df Sum of Sq RSS AIC
## + Transmission.Type 3 6498070581 583501905866 186956
## + City.MpG 1 3757269800 586242706647 187001
## <none> 589999976447 187066
## + Popularity 1 25969435 589974007012 187067
## + Number.of.Doors 1 18570045 589981406402 187068
##
## Step: AIC=186956
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type
##
## Df Sum of Sq RSS AIC
## + City.MpG 1 3135704215 580366201651 186901
## + Number.of.Doors 1 375222610 583126683256 186951
## <none> 583501905866 186956
## + Popularity 1 1981231 583499924635 186958
##
## Step: AIC=186901
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type + City.MpG
##
## Df Sum of Sq RSS AIC
## + Number.of.Doors 1 354992827 580011208824 186897
## <none> 580366201651 186901
## + Popularity 1 6347645 580359854006 186903
##
## Step: AIC=186897
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type + City.MpG +
## Number.of.Doors
##
## Df Sum of Sq RSS AIC
## <none> 580011208824 186897
## + Popularity 1 377595 580010831229 186899
model_bwd <- step(lm(MSRP ~ ., model_data), direction = "backward")## Start: AIC=186899
## MSRP ~ Year + Engine.Fuel.Type + Engine.HP + Transmission.Type +
## Driven.Wheels + Number.of.Doors + Market.Category + Vehicle.Size +
## City.MpG + Popularity
##
## Df Sum of Sq RSS AIC
## - Popularity 1 377595 580011208824 186897
## <none> 580010831229 186899
## - Number.of.Doors 1 349022777 580359854006 186903
## - City.MpG 1 3115546335 583126377564 186953
## - Transmission.Type 3 6194427509 586205258738 187004
## - Vehicle.Size 2 11006581643 591017412872 187092
## - Driven.Wheels 3 13774200193 593785031422 187139
## - Engine.Fuel.Type 9 60408721262 640419552491 187919
## - Engine.HP 1 124187198228 704198029457 188929
## - Year 1 177111737302 757122568530 189688
## - Market.Category 61 221549513475 801560344704 190165
##
## Step: AIC=186897
## MSRP ~ Year + Engine.Fuel.Type + Engine.HP + Transmission.Type +
## Driven.Wheels + Number.of.Doors + Market.Category + Vehicle.Size +
## City.MpG
##
## Df Sum of Sq RSS AIC
## <none> 580011208824 186897
## - Number.of.Doors 1 354992827 580366201651 186901
## - City.MpG 1 3115474432 583126683256 186951
## - Transmission.Type 3 6209045562 586220254386 187003
## - Vehicle.Size 2 11013587504 591024796328 187090
## - Driven.Wheels 3 13789215848 593800424672 187137
## - Engine.Fuel.Type 9 60462957929 640474166753 187918
## - Engine.HP 1 124318243091 704329451915 188929
## - Year 1 178406873287 758418082111 189704
## - Market.Category 61 223478508833 803489717658 190188
model_both <- step(lm(MSRP ~ 1, model_data),
scope = list(lower = lm(MSRP ~ 1, model_data),
upper = lm(MSRP ~ ., model_data)),
direction = "both")## Start: AIC=205812
## MSRP ~ 1
##
## Df Sum of Sq RSS AIC
## + Engine.HP 1 2096262177374 1489778586322 196615
## + Market.Category 61 1695116344793 1890924418903 199232
## + Year 1 1331909844208 2254130919488 200952
## + Engine.Fuel.Type 9 1166846233190 2419194530507 201708
## + Driven.Wheels 3 520118087971 3065922675725 204177
## + Vehicle.Size 2 400762781697 3185277981999 204575
## + Transmission.Type 3 377722049149 3208318714547 204653
## + City.MpG 1 70245818475 3515794945221 205607
## + Number.of.Doors 1 57313005881 3528727757815 205646
## + Popularity 1 8221255711 3577819507985 205790
## <none> 3586040763696 205812
##
## Step: AIC=196615
## MSRP ~ Engine.HP
##
## Df Sum of Sq RSS AIC
## + Year 1 437616201226 1052162385096 192974
## + Market.Category 61 435489305929 1054289280394 193115
## + Engine.Fuel.Type 9 260443334211 1229335252111 194620
## + City.MpG 1 167815649539 1321962936783 195365
## + Driven.Wheels 3 119687829611 1370090756711 195743
## + Transmission.Type 3 104395132505 1385383453817 195860
## + Number.of.Doors 1 43618048742 1446160537581 196305
## + Vehicle.Size 2 19397585270 1470381001052 196481
## + Popularity 1 2908327896 1486870258426 196596
## <none> 1489778586322 196615
## - Engine.HP 1 2096262177374 3586040763696 205812
##
## Step: AIC=192974
## MSRP ~ Engine.HP + Year
##
## Df Sum of Sq RSS AIC
## + Market.Category 61 380593004688 671569380409 188394
## + Engine.Fuel.Type 9 200664823655 851497561441 190776
## + Driven.Wheels 3 37671652384 1014490732712 192598
## + Transmission.Type 3 20301722330 1031860662766 192776
## + City.MpG 1 10437591983 1041724793113 192872
## + Popularity 1 9539829673 1042622555423 192881
## + Vehicle.Size 2 2787920653 1049374464443 192951
## + Number.of.Doors 1 257074703 1051905310393 192974
## <none> 1052162385096 192974
## - Year 1 437616201226 1489778586322 196615
## - Engine.HP 1 1201968534392 2254130919488 200952
##
## Step: AIC=188394
## MSRP ~ Engine.HP + Year + Market.Category
##
## Df Sum of Sq RSS AIC
## + Engine.Fuel.Type 9 58859141778 612710238630 187451
## + Driven.Wheels 3 13933444241 657635936167 188181
## + Vehicle.Size 2 7381835880 664187544528 188282
## + Transmission.Type 3 6996345497 664573034911 188290
## + City.MpG 1 154157347 671415223062 188394
## <none> 671569380409 188394
## + Number.of.Doors 1 94360971 671475019438 188395
## + Popularity 1 60261230 671509119179 188395
## - Market.Category 61 380593004688 1052162385096 192974
## - Year 1 382719899985 1054289280394 193115
## - Engine.HP 1 421665509303 1093234889712 193495
##
## Step: AIC=187451
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type
##
## Df Sum of Sq RSS AIC
## + Driven.Wheels 3 13082281579 599627957052 187231
## + Vehicle.Size 2 8984622961 603725615669 187301
## + Transmission.Type 3 5490069274 607220169356 187363
## + City.MpG 1 3683227880 609027010751 187390
## <none> 612710238630 187451
## + Number.of.Doors 1 60043150 612650195480 187452
## + Popularity 1 23153316 612687085314 187453
## - Engine.Fuel.Type 9 58859141778 671569380409 188394
## - Market.Category 61 238787322811 851497561441 190776
## - Year 1 315679721285 928389959915 191802
## - Engine.HP 1 410337420061 1023047658692 192818
##
## Step: AIC=187231
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels
##
## Df Sum of Sq RSS AIC
## + Vehicle.Size 2 9627980605 589999976447 187066
## + Transmission.Type 3 5171791941 594456165110 187147
## + City.MpG 1 3519927092 596108029959 187172
## <none> 599627957052 187231
## + Number.of.Doors 1 50699277 599577257775 187233
## + Popularity 1 435876 599627521175 187233
## - Driven.Wheels 3 13082281579 612710238630 187451
## - Engine.Fuel.Type 9 58007979116 657635936167 188181
## - Market.Category 61 232581066958 832209024009 190542
## - Year 1 281574121952 881202079003 191261
## - Engine.HP 1 331167015958 930794973009 191835
##
## Step: AIC=187066
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size
##
## Df Sum of Sq RSS AIC
## + Transmission.Type 3 6498070581 583501905866 186956
## + City.MpG 1 3757269800 586242706647 187001
## <none> 589999976447 187066
## + Popularity 1 25969435 589974007012 187067
## + Number.of.Doors 1 18570045 589981406402 187068
## - Vehicle.Size 2 9627980605 599627957052 187231
## - Driven.Wheels 3 13725639222 603725615669 187301
## - Engine.Fuel.Type 9 60051209892 650051186339 188063
## - Engine.HP 1 220207884208 810207860655 190386
## - Market.Category 61 235646500912 825646477359 190463
## - Year 1 278090271491 868090247938 191108
##
## Step: AIC=186956
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type
##
## Df Sum of Sq RSS AIC
## + City.MpG 1 3135704215 580366201651 186901
## + Number.of.Doors 1 375222610 583126683256 186951
## <none> 583501905866 186956
## + Popularity 1 1981231 583499924635 186958
## - Transmission.Type 3 6498070581 589999976447 187066
## - Vehicle.Size 2 10954259245 594456165110 187147
## - Driven.Wheels 3 13218358475 596720264341 187184
## - Engine.Fuel.Type 9 59229769736 642731675602 187950
## - Engine.HP 1 208373569078 791875474944 190152
## - Market.Category 61 222751569089 806253474955 190220
## - Year 1 245580885790 829082791656 190633
##
## Step: AIC=186901
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type + City.MpG
##
## Df Sum of Sq RSS AIC
## + Number.of.Doors 1 354992827 580011208824 186897
## <none> 580366201651 186901
## + Popularity 1 6347645 580359854006 186903
## - City.MpG 1 3135704215 583501905866 186956
## - Transmission.Type 3 5876504996 586242706647 187001
## - Vehicle.Size 2 11129778252 591495979903 187096
## - Driven.Wheels 3 13434232219 593800433870 187135
## - Engine.Fuel.Type 9 61035390612 641401592263 187931
## - Engine.HP 1 124275608729 704641810380 188932
## - Year 1 180784617518 761150819169 189739
## - Market.Category 61 223188385761 803554587412 190187
##
## Step: AIC=186897
## MSRP ~ Engine.HP + Year + Market.Category + Engine.Fuel.Type +
## Driven.Wheels + Vehicle.Size + Transmission.Type + City.MpG +
## Number.of.Doors
##
## Df Sum of Sq RSS AIC
## <none> 580011208824 186897
## + Popularity 1 377595 580010831229 186899
## - Number.of.Doors 1 354992827 580366201651 186901
## - City.MpG 1 3115474432 583126683256 186951
## - Transmission.Type 3 6209045562 586220254386 187003
## - Vehicle.Size 2 11013587504 591024796328 187090
## - Driven.Wheels 3 13789215848 593800424672 187137
## - Engine.Fuel.Type 9 60462957929 640474166753 187918
## - Engine.HP 1 124318243091 704329451915 188929
## - Year 1 178406873287 758418082111 189704
## - Market.Category 61 223478508833 803489717658 190188
summary(model_fwd)$adj.r.squared## [1] 0.837
summary(model_bwd)$adj.r.squared## [1] 0.837
summary(model_both)$adj.r.squared## [1] 0.837
The result has shown us that no matter which method that we used, the value of the multiple R squared will be exactly the same with the value of around 84%. As such, it is recommended to use the backward method as it is the lightest method for a computer processor to process the code.
options("scipen"=100, "digits"=4)
summary(model_bwd)##
## Call:
## lm(formula = MSRP ~ Year + Engine.Fuel.Type + Engine.HP + Transmission.Type +
## Driven.Wheels + Number.of.Doors + Market.Category + Vehicle.Size +
## City.MpG, data = model_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29299 -4315 -282 3995 47431
##
## Coefficients:
## Estimate
## (Intercept) -1871373.9
## Year 939.9
## Engine.Fuel.Typediesel 5610.5
## Engine.Fuel.Typeelectric 26456.9
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) -14265.8
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 8178.9
## Engine.Fuel.Typeflex-fuel (unleaded/E85) -5662.0
## Engine.Fuel.Typenatural gas 3400.5
## Engine.Fuel.Typepremium unleaded (recommended) -976.1
## Engine.Fuel.Typepremium unleaded (required) 4385.2
## Engine.Fuel.Typeregular unleaded -5156.3
## Engine.HP 94.6
## Transmission.TypeAUTOMATIC -348.5
## Transmission.TypeDIRECT_DRIVE -905.1
## Transmission.TypeMANUAL -2593.0
## Driven.Wheelsfour wheel drive -73.7
## Driven.Wheelsfront wheel drive -670.0
## Driven.Wheelsrear wheel drive -3337.6
## Number.of.Doors -272.0
## Market.CategoryCrossover,Diesel 12542.8
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 18218.3
## Market.CategoryCrossover,Exotic,Luxury,Performance 14690.3
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 12697.3
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 4953.2
## Market.CategoryCrossover,Factory Tuner,Performance -1123.9
## Market.CategoryCrossover,Flex Fuel 1428.4
## Market.CategoryCrossover,Flex Fuel,Luxury 12376.1
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 28811.3
## Market.CategoryCrossover,Flex Fuel,Performance 1704.9
## Market.CategoryCrossover,Hatchback -1826.9
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance -10444.8
## Market.CategoryCrossover,Hatchback,Luxury 6840.8
## Market.CategoryCrossover,Hatchback,Performance -5339.0
## Market.CategoryCrossover,Hybrid 10276.0
## Market.CategoryCrossover,Luxury 4877.8
## Market.CategoryCrossover,Luxury,Diesel 12007.1
## Market.CategoryCrossover,Luxury,High-Performance 14899.4
## Market.CategoryCrossover,Luxury,Hybrid 12862.0
## Market.CategoryCrossover,Luxury,Performance 5707.5
## Market.CategoryCrossover,Luxury,Performance,Hybrid 28250.4
## Market.CategoryCrossover,Performance -120.7
## Market.CategoryDiesel -2665.3
## Market.CategoryDiesel,Luxury 11529.8
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 49712.5
## Market.CategoryExotic,High-Performance 27750.3
## Market.CategoryExotic,Luxury,High-Performance 37660.8
## Market.CategoryFactory Tuner,High-Performance -6712.2
## Market.CategoryFactory Tuner,Luxury,High-Performance 7198.7
## Market.CategoryFactory Tuner,Luxury,Performance -4814.0
## Market.CategoryFactory Tuner,Performance -8636.6
## Market.CategoryFlex Fuel -1865.4
## Market.CategoryFlex Fuel,Diesel 3449.3
## Market.CategoryFlex Fuel,Hybrid 7319.6
## Market.CategoryFlex Fuel,Luxury 22573.4
## Market.CategoryFlex Fuel,Luxury,High-Performance 10216.2
## Market.CategoryFlex Fuel,Luxury,Performance 27612.1
## Market.CategoryFlex Fuel,Performance 2939.3
## Market.CategoryFlex Fuel,Performance,Hybrid -6031.9
## Market.CategoryHatchback -363.1
## Market.CategoryHatchback,Diesel -6002.5
## Market.CategoryHatchback,Factory Tuner,High-Performance -6702.8
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance -4293.8
## Market.CategoryHatchback,Factory Tuner,Performance -10262.9
## Market.CategoryHatchback,Flex Fuel -3035.3
## Market.CategoryHatchback,Hybrid 7611.3
## Market.CategoryHatchback,Luxury 770.4
## Market.CategoryHatchback,Luxury,Hybrid 12285.7
## Market.CategoryHatchback,Luxury,Performance -34.3
## Market.CategoryHatchback,Performance -4430.7
## Market.CategoryHigh-Performance -2427.1
## Market.CategoryHybrid 7366.7
## Market.CategoryLuxury 4774.3
## Market.CategoryLuxury,High-Performance 14697.0
## Market.CategoryLuxury,High-Performance,Hybrid 5206.9
## Market.CategoryLuxury,Hybrid 22523.5
## Market.CategoryLuxury,Performance 7383.6
## Market.CategoryLuxury,Performance,Hybrid 19658.5
## Market.CategoryN/A -113.6
## Market.CategoryPerformance -2171.6
## Market.CategoryPerformance,Hybrid -6097.2
## Vehicle.SizeLarge 484.8
## Vehicle.SizeMidsize -2168.9
## City.MpG -221.9
## Std. Error
## (Intercept) 33143.0
## Year 16.6
## Engine.Fuel.Typediesel 4639.0
## Engine.Fuel.Typeelectric 7673.9
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) 4883.8
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 4867.9
## Engine.Fuel.Typeflex-fuel (unleaded/E85) 4356.3
## Engine.Fuel.Typenatural gas 6826.1
## Engine.Fuel.Typepremium unleaded (recommended) 4330.4
## Engine.Fuel.Typepremium unleaded (required) 4332.0
## Engine.Fuel.Typeregular unleaded 4319.8
## Engine.HP 2.0
## Transmission.TypeAUTOMATIC 450.4
## Transmission.TypeDIRECT_DRIVE 5349.6
## Transmission.TypeMANUAL 461.1
## Driven.Wheelsfour wheel drive 324.8
## Driven.Wheelsfront wheel drive 243.0
## Driven.Wheelsrear wheel drive 265.4
## Number.of.Doors 107.9
## Market.CategoryCrossover,Diesel 3300.7
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 7486.8
## Market.CategoryCrossover,Exotic,Luxury,Performance 7484.1
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 2071.6
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 3368.5
## Market.CategoryCrossover,Factory Tuner,Performance 3749.2
## Market.CategoryCrossover,Flex Fuel 1079.5
## Market.CategoryCrossover,Flex Fuel,Luxury 2654.9
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 3831.4
## Market.CategoryCrossover,Flex Fuel,Performance 3084.2
## Market.CategoryCrossover,Hatchback 913.5
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance 3077.7
## Market.CategoryCrossover,Hatchback,Luxury 2850.5
## Market.CategoryCrossover,Hatchback,Performance 3074.8
## Market.CategoryCrossover,Hybrid 1216.7
## Market.CategoryCrossover,Luxury 462.0
## Market.CategoryCrossover,Luxury,Diesel 2143.3
## Market.CategoryCrossover,Luxury,High-Performance 2864.8
## Market.CategoryCrossover,Luxury,Hybrid 1582.7
## Market.CategoryCrossover,Luxury,Performance 785.5
## Market.CategoryCrossover,Luxury,Performance,Hybrid 5320.6
## Market.CategoryCrossover,Performance 934.8
## Market.CategoryDiesel 1280.4
## Market.CategoryDiesel,Luxury 2013.8
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 4353.8
## Market.CategoryExotic,High-Performance 1336.2
## Market.CategoryExotic,Luxury,High-Performance 1568.2
## Market.CategoryFactory Tuner,High-Performance 932.0
## Market.CategoryFactory Tuner,Luxury,High-Performance 790.3
## Market.CategoryFactory Tuner,Luxury,Performance 1395.4
## Market.CategoryFactory Tuner,Performance 902.4
## Market.CategoryFlex Fuel 578.8
## Market.CategoryFlex Fuel,Diesel 1977.8
## Market.CategoryFlex Fuel,Hybrid 5295.6
## Market.CategoryFlex Fuel,Luxury 1655.4
## Market.CategoryFlex Fuel,Luxury,High-Performance 1890.5
## Market.CategoryFlex Fuel,Luxury,Performance 1604.3
## Market.CategoryFlex Fuel,Performance 1043.3
## Market.CategoryFlex Fuel,Performance,Hybrid 5323.2
## Market.CategoryHatchback 439.1
## Market.CategoryHatchback,Diesel 2608.6
## Market.CategoryHatchback,Factory Tuner,High-Performance 2121.7
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 2520.4
## Market.CategoryHatchback,Factory Tuner,Performance 1669.0
## Market.CategoryHatchback,Flex Fuel 2911.6
## Market.CategoryHatchback,Hybrid 1152.8
## Market.CategoryHatchback,Luxury 1199.8
## Market.CategoryHatchback,Luxury,Hybrid 4357.7
## Market.CategoryHatchback,Luxury,Performance 1291.3
## Market.CategoryHatchback,Performance 634.4
## Market.CategoryHigh-Performance 703.8
## Market.CategoryHybrid 873.3
## Market.CategoryLuxury 388.9
## Market.CategoryLuxury,High-Performance 639.2
## Market.CategoryLuxury,High-Performance,Hybrid 2551.4
## Market.CategoryLuxury,Hybrid 1160.9
## Market.CategoryLuxury,Performance 433.0
## Market.CategoryLuxury,Performance,Hybrid 2313.1
## Market.CategoryN/A 292.7
## Market.CategoryPerformance 444.8
## Market.CategoryPerformance,Hybrid 7481.2
## Vehicle.SizeLarge 278.2
## Vehicle.SizeMidsize 208.8
## City.MpG 29.7
## t value
## (Intercept) -56.46
## Year 56.53
## Engine.Fuel.Typediesel 1.21
## Engine.Fuel.Typeelectric 3.45
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) -2.92
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 1.68
## Engine.Fuel.Typeflex-fuel (unleaded/E85) -1.30
## Engine.Fuel.Typenatural gas 0.50
## Engine.Fuel.Typepremium unleaded (recommended) -0.23
## Engine.Fuel.Typepremium unleaded (required) 1.01
## Engine.Fuel.Typeregular unleaded -1.19
## Engine.HP 47.19
## Transmission.TypeAUTOMATIC -0.77
## Transmission.TypeDIRECT_DRIVE -0.17
## Transmission.TypeMANUAL -5.62
## Driven.Wheelsfour wheel drive -0.23
## Driven.Wheelsfront wheel drive -2.76
## Driven.Wheelsrear wheel drive -12.57
## Number.of.Doors -2.52
## Market.CategoryCrossover,Diesel 3.80
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 2.43
## Market.CategoryCrossover,Exotic,Luxury,Performance 1.96
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 6.13
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 1.47
## Market.CategoryCrossover,Factory Tuner,Performance -0.30
## Market.CategoryCrossover,Flex Fuel 1.32
## Market.CategoryCrossover,Flex Fuel,Luxury 4.66
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 7.52
## Market.CategoryCrossover,Flex Fuel,Performance 0.55
## Market.CategoryCrossover,Hatchback -2.00
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance -3.39
## Market.CategoryCrossover,Hatchback,Luxury 2.40
## Market.CategoryCrossover,Hatchback,Performance -1.74
## Market.CategoryCrossover,Hybrid 8.45
## Market.CategoryCrossover,Luxury 10.56
## Market.CategoryCrossover,Luxury,Diesel 5.60
## Market.CategoryCrossover,Luxury,High-Performance 5.20
## Market.CategoryCrossover,Luxury,Hybrid 8.13
## Market.CategoryCrossover,Luxury,Performance 7.27
## Market.CategoryCrossover,Luxury,Performance,Hybrid 5.31
## Market.CategoryCrossover,Performance -0.13
## Market.CategoryDiesel -2.08
## Market.CategoryDiesel,Luxury 5.73
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance 11.42
## Market.CategoryExotic,High-Performance 20.77
## Market.CategoryExotic,Luxury,High-Performance 24.02
## Market.CategoryFactory Tuner,High-Performance -7.20
## Market.CategoryFactory Tuner,Luxury,High-Performance 9.11
## Market.CategoryFactory Tuner,Luxury,Performance -3.45
## Market.CategoryFactory Tuner,Performance -9.57
## Market.CategoryFlex Fuel -3.22
## Market.CategoryFlex Fuel,Diesel 1.74
## Market.CategoryFlex Fuel,Hybrid 1.38
## Market.CategoryFlex Fuel,Luxury 13.64
## Market.CategoryFlex Fuel,Luxury,High-Performance 5.40
## Market.CategoryFlex Fuel,Luxury,Performance 17.21
## Market.CategoryFlex Fuel,Performance 2.82
## Market.CategoryFlex Fuel,Performance,Hybrid -1.13
## Market.CategoryHatchback -0.83
## Market.CategoryHatchback,Diesel -2.30
## Market.CategoryHatchback,Factory Tuner,High-Performance -3.16
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance -1.70
## Market.CategoryHatchback,Factory Tuner,Performance -6.15
## Market.CategoryHatchback,Flex Fuel -1.04
## Market.CategoryHatchback,Hybrid 6.60
## Market.CategoryHatchback,Luxury 0.64
## Market.CategoryHatchback,Luxury,Hybrid 2.82
## Market.CategoryHatchback,Luxury,Performance -0.03
## Market.CategoryHatchback,Performance -6.98
## Market.CategoryHigh-Performance -3.45
## Market.CategoryHybrid 8.44
## Market.CategoryLuxury 12.28
## Market.CategoryLuxury,High-Performance 22.99
## Market.CategoryLuxury,High-Performance,Hybrid 2.04
## Market.CategoryLuxury,Hybrid 19.40
## Market.CategoryLuxury,Performance 17.05
## Market.CategoryLuxury,Performance,Hybrid 8.50
## Market.CategoryN/A -0.39
## Market.CategoryPerformance -4.88
## Market.CategoryPerformance,Hybrid -0.82
## Vehicle.SizeLarge 1.74
## Vehicle.SizeMidsize -10.39
## City.MpG -7.47
## Pr(>|t|)
## (Intercept) < 0.0000000000000002
## Year < 0.0000000000000002
## Engine.Fuel.Typediesel 0.22652
## Engine.Fuel.Typeelectric 0.00057
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) 0.00350
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 0.09296
## Engine.Fuel.Typeflex-fuel (unleaded/E85) 0.19372
## Engine.Fuel.Typenatural gas 0.61838
## Engine.Fuel.Typepremium unleaded (recommended) 0.82166
## Engine.Fuel.Typepremium unleaded (required) 0.31142
## Engine.Fuel.Typeregular unleaded 0.23264
## Engine.HP < 0.0000000000000002
## Transmission.TypeAUTOMATIC 0.43909
## Transmission.TypeDIRECT_DRIVE 0.86565
## Transmission.TypeMANUAL 0.00000001925471916
## Driven.Wheelsfour wheel drive 0.82063
## Driven.Wheelsfront wheel drive 0.00584
## Driven.Wheelsrear wheel drive < 0.0000000000000002
## Number.of.Doors 0.01169
## Market.CategoryCrossover,Diesel 0.00015
## Market.CategoryCrossover,Exotic,Luxury,High-Performance 0.01497
## Market.CategoryCrossover,Exotic,Luxury,Performance 0.04969
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance 0.00000000091456396
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance 0.14147
## Market.CategoryCrossover,Factory Tuner,Performance 0.76435
## Market.CategoryCrossover,Flex Fuel 0.18582
## Market.CategoryCrossover,Flex Fuel,Luxury 0.00000317717953205
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance 0.00000000000005941
## Market.CategoryCrossover,Flex Fuel,Performance 0.58042
## Market.CategoryCrossover,Hatchback 0.04555
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance 0.00069
## Market.CategoryCrossover,Hatchback,Luxury 0.01642
## Market.CategoryCrossover,Hatchback,Performance 0.08252
## Market.CategoryCrossover,Hybrid < 0.0000000000000002
## Market.CategoryCrossover,Luxury < 0.0000000000000002
## Market.CategoryCrossover,Luxury,Diesel 0.00000002169450558
## Market.CategoryCrossover,Luxury,High-Performance 0.00000020211811529
## Market.CategoryCrossover,Luxury,Hybrid 0.00000000000000049
## Market.CategoryCrossover,Luxury,Performance 0.00000000000039815
## Market.CategoryCrossover,Luxury,Performance,Hybrid 0.00000011210373084
## Market.CategoryCrossover,Performance 0.89726
## Market.CategoryDiesel 0.03740
## Market.CategoryDiesel,Luxury 0.00000001059972076
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryExotic,High-Performance < 0.0000000000000002
## Market.CategoryExotic,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryFactory Tuner,High-Performance 0.00000000000063352
## Market.CategoryFactory Tuner,Luxury,High-Performance < 0.0000000000000002
## Market.CategoryFactory Tuner,Luxury,Performance 0.00056
## Market.CategoryFactory Tuner,Performance < 0.0000000000000002
## Market.CategoryFlex Fuel 0.00127
## Market.CategoryFlex Fuel,Diesel 0.08118
## Market.CategoryFlex Fuel,Hybrid 0.16694
## Market.CategoryFlex Fuel,Luxury < 0.0000000000000002
## Market.CategoryFlex Fuel,Luxury,High-Performance 0.00000006662959107
## Market.CategoryFlex Fuel,Luxury,Performance < 0.0000000000000002
## Market.CategoryFlex Fuel,Performance 0.00485
## Market.CategoryFlex Fuel,Performance,Hybrid 0.25718
## Market.CategoryHatchback 0.40835
## Market.CategoryHatchback,Diesel 0.02141
## Market.CategoryHatchback,Factory Tuner,High-Performance 0.00159
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance 0.08848
## Market.CategoryHatchback,Factory Tuner,Performance 0.00000000080748625
## Market.CategoryHatchback,Flex Fuel 0.29720
## Market.CategoryHatchback,Hybrid 0.00000000004243056
## Market.CategoryHatchback,Luxury 0.52081
## Market.CategoryHatchback,Luxury,Hybrid 0.00482
## Market.CategoryHatchback,Luxury,Performance 0.97879
## Market.CategoryHatchback,Performance 0.00000000000303147
## Market.CategoryHigh-Performance 0.00057
## Market.CategoryHybrid < 0.0000000000000002
## Market.CategoryLuxury < 0.0000000000000002
## Market.CategoryLuxury,High-Performance < 0.0000000000000002
## Market.CategoryLuxury,High-Performance,Hybrid 0.04130
## Market.CategoryLuxury,Hybrid < 0.0000000000000002
## Market.CategoryLuxury,Performance < 0.0000000000000002
## Market.CategoryLuxury,Performance,Hybrid < 0.0000000000000002
## Market.CategoryN/A 0.69796
## Market.CategoryPerformance 0.00000106693015318
## Market.CategoryPerformance,Hybrid 0.41509
## Vehicle.SizeLarge 0.08147
## Vehicle.SizeMidsize < 0.0000000000000002
## City.MpG 0.00000000000008629
##
## (Intercept) ***
## Year ***
## Engine.Fuel.Typediesel
## Engine.Fuel.Typeelectric ***
## Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) **
## Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) .
## Engine.Fuel.Typeflex-fuel (unleaded/E85)
## Engine.Fuel.Typenatural gas
## Engine.Fuel.Typepremium unleaded (recommended)
## Engine.Fuel.Typepremium unleaded (required)
## Engine.Fuel.Typeregular unleaded
## Engine.HP ***
## Transmission.TypeAUTOMATIC
## Transmission.TypeDIRECT_DRIVE
## Transmission.TypeMANUAL ***
## Driven.Wheelsfour wheel drive
## Driven.Wheelsfront wheel drive **
## Driven.Wheelsrear wheel drive ***
## Number.of.Doors *
## Market.CategoryCrossover,Diesel ***
## Market.CategoryCrossover,Exotic,Luxury,High-Performance *
## Market.CategoryCrossover,Exotic,Luxury,Performance *
## Market.CategoryCrossover,Factory Tuner,Luxury,High-Performance ***
## Market.CategoryCrossover,Factory Tuner,Luxury,Performance
## Market.CategoryCrossover,Factory Tuner,Performance
## Market.CategoryCrossover,Flex Fuel
## Market.CategoryCrossover,Flex Fuel,Luxury ***
## Market.CategoryCrossover,Flex Fuel,Luxury,Performance ***
## Market.CategoryCrossover,Flex Fuel,Performance
## Market.CategoryCrossover,Hatchback *
## Market.CategoryCrossover,Hatchback,Factory Tuner,Performance ***
## Market.CategoryCrossover,Hatchback,Luxury *
## Market.CategoryCrossover,Hatchback,Performance .
## Market.CategoryCrossover,Hybrid ***
## Market.CategoryCrossover,Luxury ***
## Market.CategoryCrossover,Luxury,Diesel ***
## Market.CategoryCrossover,Luxury,High-Performance ***
## Market.CategoryCrossover,Luxury,Hybrid ***
## Market.CategoryCrossover,Luxury,Performance ***
## Market.CategoryCrossover,Luxury,Performance,Hybrid ***
## Market.CategoryCrossover,Performance
## Market.CategoryDiesel *
## Market.CategoryDiesel,Luxury ***
## Market.CategoryExotic,Factory Tuner,Luxury,High-Performance ***
## Market.CategoryExotic,High-Performance ***
## Market.CategoryExotic,Luxury,High-Performance ***
## Market.CategoryFactory Tuner,High-Performance ***
## Market.CategoryFactory Tuner,Luxury,High-Performance ***
## Market.CategoryFactory Tuner,Luxury,Performance ***
## Market.CategoryFactory Tuner,Performance ***
## Market.CategoryFlex Fuel **
## Market.CategoryFlex Fuel,Diesel .
## Market.CategoryFlex Fuel,Hybrid
## Market.CategoryFlex Fuel,Luxury ***
## Market.CategoryFlex Fuel,Luxury,High-Performance ***
## Market.CategoryFlex Fuel,Luxury,Performance ***
## Market.CategoryFlex Fuel,Performance **
## Market.CategoryFlex Fuel,Performance,Hybrid
## Market.CategoryHatchback
## Market.CategoryHatchback,Diesel *
## Market.CategoryHatchback,Factory Tuner,High-Performance **
## Market.CategoryHatchback,Factory Tuner,Luxury,Performance .
## Market.CategoryHatchback,Factory Tuner,Performance ***
## Market.CategoryHatchback,Flex Fuel
## Market.CategoryHatchback,Hybrid ***
## Market.CategoryHatchback,Luxury
## Market.CategoryHatchback,Luxury,Hybrid **
## Market.CategoryHatchback,Luxury,Performance
## Market.CategoryHatchback,Performance ***
## Market.CategoryHigh-Performance ***
## Market.CategoryHybrid ***
## Market.CategoryLuxury ***
## Market.CategoryLuxury,High-Performance ***
## Market.CategoryLuxury,High-Performance,Hybrid *
## Market.CategoryLuxury,Hybrid ***
## Market.CategoryLuxury,Performance ***
## Market.CategoryLuxury,Performance,Hybrid ***
## Market.CategoryN/A
## Market.CategoryPerformance ***
## Market.CategoryPerformance,Hybrid
## Vehicle.SizeLarge .
## Vehicle.SizeMidsize ***
## City.MpG ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7470 on 10390 degrees of freedom
## Multiple R-squared: 0.838, Adjusted R-squared: 0.837
## F-statistic: 657 on 82 and 10390 DF, p-value: <0.0000000000000002
Since we are including categorical variables in our model, we obtain so many predictors in our model. However, if we bring our focus to the Pr(>|t|) values, we can see that the values there represent how significant the predictors are which also represented by the number of stars besides the values for an easier and quicker interpretations. When statistically tested where:
- H0: Correlation equals to 0 (non-linear)
- H1: Correlation is not equal to 0 (linear)
where as P-value < alpha, reject H0, with alpha = 0.05.
We can say that the predictors which p-value above the value of alpha does not have any significant correlation to the price of the car. However, this does not mean that every predictors which have p-value above alpha is absolutely insignificant. For best possible result, we have to look into the business side or consult with the experts which understand the content of our data to determine whether the predictors is completely insignificant or it is actually a significant data which have to be included into the model.
As for the result itself, the model can explain the MSRP value better then just a single predictor which have the strongest correlation to the MSRP at a value of 86% compared to 59%. this means that chances of predicting the correct value of the MSRP are much higher than when we use a single predictor. For this reason, it is important to include as much possible significant predictors to our model.
To understand the numerical predictors, we can follow the multiple linear regression formula to explain the model and subtitute the variables with our variables obtained from our linear regression model. The formula mentioned is as follow:
\[\hat{y} = \beta_0 + \beta_1*x_1 + \beta_2*x_2 + ... + \beta_n*x_n\]
numerical predictors:
- Year
- Engine.HP
- City.MpG
formula based on the numerical predictors:
\[\hat{y} = -1861117.92 + 931.79*Year + 85.80*Engine.HP + -163.33*City.MpG\]
If we take a look at one of the predictor such as Engine.HP, for every increase of 1 horsepower, there will be an increase of 94.60 US Dollars of the MSRP. and for every decrease of 1 horsepower, the MSRP will be reduced as much as 94.60 US Dollars. This interpretation can also be applied to the other numerical predictors. In the case of City.MpG where the value is negative. For every increase of 1 City.MpG, the MSRP decrease as much as 221.90 US Dollars. And for every decrease of 1 city.mpg, the MSRP increase as much as 221.90 US Dollars. Which means that the higher the power of a car, the higher the car price will be, The more economical the car be, the cheaper the car price wiil be.
As for the categorical predictors, we can do as simple as selecting one of the variable from the same predictor. For example, if we look at the Engine.Fuel.Type predictor. if we choose “electric” variable, the MSRP will increase as much as 26456.90 US Dollars. Do remember that a categorical predictor can only have 1 variable selected from the same predictor.
Due to the categorical predictors being too specific in this data, we will include a multiple linear regression model where the predictors are only numerical data type from our dataset which let us observe how significant the categorical predictors are.
data_num <- model_data %>%
select(where(is.numeric))
model_num <- step(lm(MSRP ~ ., data_num), direction = "backward")## Start: AIC=192781
## MSRP ~ Year + Engine.HP + Number.of.Doors + City.MpG + Popularity
##
## Df Sum of Sq RSS AIC
## - Number.of.Doors 1 113533250 1032399753653 192780
## <none> 1032286220403 192781
## - Popularity 1 9028686609 1041314907012 192870
## - City.MpG 1 10295843306 1042582063709 192883
## - Year 1 259639857771 1291926078174 195128
## - Engine.HP 1 900331830001 1932618050404 199346
##
## Step: AIC=192780
## MSRP ~ Year + Engine.HP + City.MpG + Popularity
##
## Df Sum of Sq RSS AIC
## <none> 1032399753653 192780
## - Popularity 1 9325039460 1041724793113 192872
## - City.MpG 1 10222801770 1042622555423 192881
## - Year 1 285581088484 1317980842136 195335
## - Engine.HP 1 906604943545 1939004697198 199379
summary(model_num)##
## Call:
## lm(formula = MSRP ~ Year + Engine.HP + City.MpG + Popularity,
## data = data_num)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41124 -5850 -926 4839 61292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1763652.3355 32316.8754 -54.57 <0.0000000000000002 ***
## Year 874.8201 16.2572 53.81 <0.0000000000000002 ***
## Engine.HP 140.0783 1.4610 95.88 <0.0000000000000002 ***
## City.MpG 190.8609 18.7467 10.18 <0.0000000000000002 ***
## Popularity -0.6525 0.0671 -9.72 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9930 on 10468 degrees of freedom
## Multiple R-squared: 0.712, Adjusted R-squared: 0.712
## F-statistic: 6.47e+03 on 4 and 10468 DF, p-value: <0.0000000000000002
The R squared value of around 71% were lower at about 13% compare to the the stepwise model with categorical predictors. It means that the categorical predictors were significant for only around 13%.
Let us try to predict the MSRP value based on our model. Since we made 4 models, we can make 4 prediction model based on 4 different model that we made and check the performance of each model by checking the error.
pred_HP <- predict(model_HP, car_dataset, interval = "confidence", level = 0.95)
pred_all <- predict(model_all, car_dataset, interval = "confidence", level = 0.95)
pred_bwd <- predict(model_bwd, car_dataset, interval = "confidence", level = 0.95)
pred_num <- predict(model_num, car_dataset, interval = "confidence", level = 0.95)print(paste0("MAPE with single predictor: ",
round(MAPE(pred_HP, car_dataset$MSRP) * 100, digit = 3), "%"))## [1] "MAPE with single predictor: 113.577%"
print(paste0("MAPE with all predictors: ",
round(MAPE(pred_all, car_dataset$MSRP) * 100, digit = 3), "%"))## [1] "MAPE with all predictors: 51.806%"
print(paste0("MAPE with multiple predictors: ",
round(MAPE(pred_bwd, car_dataset$MSRP) * 100, digit = 3), "%"))## [1] "MAPE with multiple predictors: 51.799%"
print(paste0("MAPE with numemrical predictors: ",
round(MAPE(pred_num, car_dataset$MSRP) * 100, digit = 3), "%"))## [1] "MAPE with numemrical predictors: 57.753%"
According to the error test with MAPE (Mean Absolute Percentage Error) method, we can see that the model with multiple predictors have a better prediction for the MRSP with a chance of failed in predicting the MSRP for only about 51.806% compare to the single predictor model with a chance of failed in predicting the MSRP for about 113.577%, it has a lower chance of failed prediction compare to the numerical model which have 57.753% chance of a failed prediction, and it has a slightly lower error percentage by 0.007% compare to the model which were using all of the data as its predictors. We can safely say that the model build with stepwise function where categorical predictors were included is better compare to the other.
As we model our data using linear regression method, it is important to ensure that our model is linear. We can check the linearity by plotting a scatter plot where the x and y axis will be the residual and the fitted values with a smooth line going through the plot to help us see the pattern.
linear_plot_HP <- data.frame(residual = model_HP$residuals, fitted = model_HP$fitted.values)
linear_plot_HP %>%
ggplot(aes(fitted, residual))+
geom_point()+
geom_smooth()+
geom_hline(yintercept = 0, col = "red", size = 1)+
labs(x = "Fitted Values", y = "Residuals")+
theme_minimal()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
linear_plot_all <- data.frame(residual = model_all$residuals, fitted = model_all$fitted.values)
linear_plot_all %>%
ggplot(aes(fitted, residual))+
geom_point()+
geom_smooth()+
geom_hline(yintercept = 0, col = "red", size = 1)+
labs(x = "Fitted Values", y = "Residuals")+
theme_minimal()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
linear_plot_bwd <- data.frame(residual = model_bwd$residuals, fitted = model_bwd$fitted.values)
linear_plot_bwd %>%
ggplot(aes(fitted, residual))+
geom_point()+
geom_smooth()+
geom_hline(yintercept = 0, col = "red", size = 1)+
labs(x = "Fitted Values", y = "Residuals")+
theme_minimal()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
linear_plot_num <- data.frame(residual = model_num$residuals, fitted = model_num$fitted.values)
linear_plot_num %>%
ggplot(aes(fitted, residual))+
geom_point()+
geom_smooth()+
geom_hline(yintercept = 0, col = "red", size = 1)+
labs(x = "Fitted Values", y = "Residuals")+
theme_minimal()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
When we observe the 4 chars of our models, they are relatively stable and trying to stay in the line. Although the line were not accurately straight, it can still be considered as a linear model. Even though the single predictor model can be said to have the straightest line compare to the other, the performance of all the model were more or less the same.
To determine whether the residuals of our models were normally distributed, we would want to test our models using normality test. The testing method requires our model to have the p-value measured and compared to the hypothesis as follow:
H0: normally distributed residuals
H1: residual does not distributed normally
when p-value < 0.05, reject H0. which means that the residual of our model is distributed normally
ad.test(model_HP$residuals)##
## Anderson-Darling normality test
##
## data: model_HP$residuals
## A = 116, p-value <0.0000000000000002
ad.test(model_all$residuals)##
## Anderson-Darling normality test
##
## data: model_all$residuals
## A = 48, p-value <0.0000000000000002
ad.test(model_bwd$residuals)##
## Anderson-Darling normality test
##
## data: model_bwd$residuals
## A = 48, p-value <0.0000000000000002
ad.test(model_num$residuals)##
## Anderson-Darling normality test
##
## data: model_num$residuals
## A = 117, p-value <0.0000000000000002
In Algoritma Data Science School, we were taught to use shapiro.test() to test the distribution of the residual of the model as the basic of normality test, However. for model with a sample size bigger than 5000, we have to choose other function such as ad.test() as shapiro.test() have a maximum sample of 5000. when our models were tested using Anderson-Darling normality test, the p-value was lower than 0.05 which indicates a normally distributed residuals.
In the model which we have made. we would want to make sure that the error result from our model does not have a pattern. If there is a pattern, there might be outliers which we have to remove. The condition which the error does hava a pattern is called Heteroscedasticity. The test can be done by using Breusch-Pagan test in the result follows a certain rules as follow:
H0: model is homoscedasticity H1: model is heteroscedasticity
when p-value < 0.05, reject H0. which means that our model is homoscedasticity
bptest(model_HP)##
## studentized Breusch-Pagan test
##
## data: model_HP
## BP = 643, df = 1, p-value <0.0000000000000002
bptest(model_all)##
## studentized Breusch-Pagan test
##
## data: model_all
## BP = 2075, df = 83, p-value <0.0000000000000002
bptest(model_bwd)##
## studentized Breusch-Pagan test
##
## data: model_bwd
## BP = 2073, df = 82, p-value <0.0000000000000002
bptest(model_num)##
## studentized Breusch-Pagan test
##
## data: model_num
## BP = 1485, df = 4, p-value <0.0000000000000002
Result has shown that all our model have a p-value of less than 0.05 which can be said that all the model is homoscedasticity. If we were to plot our residual error with MSRP, we should get no pattern between the two variable
It is important to make sure that each and every predictors in our model does not have any correlation between each other. If any of the predictors have correlation to the other, it can reduce the level of significance of each independent predictors to our target which is the MSRP. For a non-multiple linear model such as model_HP, this test is not necessary.
vif(model_all)## GVIF Df GVIF^(1/(2*Df))
## Year 2.835 1 1.684
## Engine.Fuel.Type 2463.299 9 1.543
## Engine.HP 5.898 1 2.429
## Transmission.Type 15.946 3 1.587
## Driven.Wheels 3.373 3 1.225
## Number.of.Doors 1.574 1 1.254
## Market.Category 3067.786 61 1.068
## Vehicle.Size 2.744 2 1.287
## City.MpG 7.169 1 2.678
## Popularity 1.168 1 1.081
vif(model_bwd)## GVIF Df GVIF^(1/(2*Df))
## Year 2.815 1 1.678
## Engine.Fuel.Type 2436.564 9 1.542
## Engine.HP 5.893 1 2.427
## Transmission.Type 15.800 3 1.584
## Driven.Wheels 3.356 3 1.224
## Number.of.Doors 1.558 1 1.248
## Market.Category 2865.424 61 1.067
## Vehicle.Size 2.720 2 1.284
## City.MpG 7.166 1 2.677
vif(model_num)## Year Engine.HP City.MpG Popularity
## 1.524 1.772 1.615 1.015
Result of the multicolinearity test shown us that there are 3 predictors on the model which uses all variables as predictors, and 3 predictors on the stepwise model with the GVIF value above 10. Furthermore, we have Market.Category and Engine.Fuel.Type which have GVIF value above 2000. This might be caused by the the number of variables inside those categorical predictors. The higher the number of variables inside the categorical predictors and the number of the correlating variables, the higher the GVIF value will be.
As we progress through working on the data, we realize that choosing the right data is very important. For a data which have many categorical variables, especially the one in which the categorical variables were very specific such as Market.Category, will result in a potentially higher number of correlating categorical variables of predictors which we want to avoid to model our data based on those predictors. Unfortunately, adding only numerical predictors to our model will reduce the R squared value as they are one of the specification of a car model. In our model, the R squares reduced to around 13% compare to our model which have categorical predictors. For linear regression model, we recommend to use a data where numerical variables dominates the data. if there are any categorical variables present, we would like to make sure that the variables haave a general categorical values and not a specific categorical values such as Market.Categories as seen on the dataset that we are working on in this markdown.
However, it does not mean that our model completely useless. if we do not have any other option besides to use this data, we can do a feature engineering to our problematical variables if possible. The result has shown us that our model need an improvement as there were 3 predictors which have a correlation to the other predictors based on the multicolinearity test. Futhermore, the value of GVIF from the 2 predictors which have a value bigger than 10, have a unreasonably high value far above the value of 10. Further data processing or feature engineering might possibly the solution to this problem.
For finding unknown transmission type:
- https://www.auto123.com/en/new-cars/technical-specs/oldsmobile/achieva/1997/2-dr/sc/
- https://www.autoblog.com/buy/2000-Pontiac-Firebird-Trans_Am__2dr_Coupe/specs/
- https://www.edmunds.com/gmc/jimmy/1999/features-specs/
- https://www.edmunds.com/chrysler/le-baron/1993/features-specs/
- https://cars-specs.com/dodge/1991-ram-150-regular-cab/