Project 'Carify': estimating car price and prediction of its reliability

Yuri K
30 01 2017

Intro

This is a project created to demonstrate the applicability of Shiny R package for development of interactive web products utilizing statistical learning tools provided by R. The output is the online predicting engine for car model’s price and reliability score.

The system exploits the model for car price, constructed on the car.test.frame dataset, which is shipped in rpart package. We apply generalized linear models theory to build the model for assessment of car price (response), given some combination of predictors.

The prediction of the reliability is based on supervised machine learning technique provided by rpart function of the same named package.

In the final product, it will take form of user input of car country of origin, weight and other factors, and the resulting trend line of dependency between horsepower and price will be drawn. In real world, similar products might help used car retailers when rare car is being priced (e.g. Pontiac cars are rare in Europe, but we can estimate its value given it’s horsepower, car type and country of origin).

Assess the car price

We apply generalized linear model with poisson distribution family and log link. After nested models study, we chose the model with factors [Country], [Car Type], [Horse power] and [Weight]. The following code is used for model construction and mean absolute error estime :

require(rpart);require(glm2);require(ggplot2);require(plotly)
dt<-car.test.frame
model_price<-glm(Price~factor(Country)+factor(Type)+HP+Weight,
                 data=dt,family=poisson(link="log"))
dt$predicted<-sapply(1:nrow(dt),function(i){exp(predict(model_price, dt[i,]))})
mean(abs(dt$predicted-dt$Price)/dt$Price)
[1] 0.09623792

Car price model Accuracy

The plot of observed price and fitted one is given in the next chart:

library(plotly)
dt%>%plot_ly(x = ~Price,y = ~predicted,color = ~Country,text=rownames(dt),
            type = "scatter", mode = "markers", name = "Data",
            marker = list(size = 10, opacity = 0.9), showlegend = T) %>%
add_trace(x = ~predicted,y = ~predicted,
              type = "scatter", mode = "lines+markers", name = "Smooth",
              marker=list(size=0.01),line=list(width=3, color="black"),showlegend = F)  %>%
layout(title = "Model accuracy", plot_bgcolor = "#e6e6e6",
          xaxis = list(title="Actual car price"),yaxis = list(title="Predicted car price"))

plot of chunk unnamed-chunk-3

Estimating the reliability

The prediction model for reliability based on abovementioned data is as follows:

require(rpart.plot)
model_reli<-rpart(Reliability~HP+Mileage+
                    Type+Weight+Country,
                  data=subset(dt,!is.na(Reliability))
                  )
rpart.plot(model_reli,varlen = 10)

plot of chunk unnamed-chunk-5