Final Project Work of the "Developing Data Product" Specialization

Predict Your Car Price

MI
Head of Innovation Practice

1 The Car Price Prediction

The Deck is aimed to describe the model underlying the Shiny application submitted as the first part of the Project Work.
The Model is aimed to predict the price (outcome) of used cars applyng a multivariable linear regression model. The Data Set has been the base of the following article: Introduction to Multiple Regression: How Much Is Your Car Worth, Journal of Statistics Education.

##      Price Mileage Cylinder Doors Cruise Sound Leather Buick Cadillac
## 1 22661.05   20105        6     4      1     0       0     1        0
## 2 21725.01   13457        6     2      1     1       0     0        0
## 3 29142.71   31655        4     2      1     1       1     0        0
##   Chevy Pontiac Saab Saturn convertible coupe hatchback sedan wagon
## 1     0       0    0      0           0     0         0     1     0
## 2     1       0    0      0           0     1         0     0     0
## 3     0       0    1      0           1     0         0     0     0

2 Preprocessing

Data set has been pre-processed in order to prepare the linear model build-up.

# Eliminate Near Zero Variable
nzv <- nearZeroVar(cars, saveMetrics= TRUE)
## Error in eval(expr, envir, enclos): non trovo la funzione "nearZeroVar"
# Eliminate Highly correlate variable
descrCor <- cor(cars)
highlyCorDescr <- findCorrelation(descrCor, cutoff = .75)
## Error in eval(expr, envir, enclos): non trovo la funzione "findCorrelation"
cars2 <- cars[,-highlyCorDescr]
## Error in `[.data.frame`(cars, , -highlyCorDescr): oggetto "highlyCorDescr" non trovato
# Eliminate Linear dependencies
comboInfo <- findLinearCombos(cars2)
## Error in eval(expr, envir, enclos): non trovo la funzione "findLinearCombos"
cars3<-cars2[, -comboInfo$remove]
## Error in eval(expr, envir, enclos): oggetto "cars2" non trovato

All of this chunk of code is esecuted in the server component of the Shiny App (i.d. Rerver.R), before the ShinyServer chunk. It means that it is executed once, when the App is loaded.

3 Model Training and prediction

Model Training is again performed once before user interaction and therefore before the ShinyServer chunk

mod <- train(Price~., model="lm", data = cars3)
finMod <- mod$finalModel

Prediction takes instead in place once the variables have been poured into the app by the user and when the push button is pressed and input the data are transformed to populate the "newdata" data.frame

newdata <- data.frame(waiting=80,Mileage=1000, Cylinder=2,Cruise=1,Sound=1,Leather=0, Buick=1,Cadillac=0,Chevy=0,Pontiac=0, Saab=0,Saturn=0,convertible=0,coupe=1,hatchback=0,sedan=0)
predict(finMod,newdata)
##        1 
## 17140.36

To make the prediction system working and achieve better user experience the dataset variables rpresented as dummy has been resented to the user (e.g. Manufacturer) as non-dummy variables.

4 What is the reliability and error profile of the model?

As You can see the regression model has been based on Forrest Trees with a sharp error decrease as the number of trees increases

# set up 2 x 2 panel plot
par(mfrow = c(2, 2))
# construct diagnostic plots for model
plot(finMod,pch=19,cex=0.5,col="#00000010")

plot of chunk unnamed-chunk-5

5:This is the UI effect of the App:

This is the UI effect of the App: alt text