Developing Data Products Course Project

Georgy Makarov
March 31, 2020

Car class predictor

This application is part of Developing Data Products Coursera course project http://www.coursera.org/learn/data-products/.

The application predicts the class of a car with random forest model based on engine displacement, number of cylinders, city mileage and highway mileage. The model was trained on mpg dataset from ggplot2 package. http://ggplot2.tidyverse.org/reference/mpg.html

This presentation is R presentation created in Rstudio.

The shiny app pitched by this presentation is at:http://georgymakarov.shinyapps.io/ddp_course_project/

The source code of the app is at: http://github.com/GeorgyMakarov/Shiny-car-predictor

Explore the dataset

Fuel efficiency decreases as displacement, number of cylinders increase. This is applicable to any class. This makes it possible to use the dataset for classification.

plot of chunk unnamed-chunk-1

Density plots

Density plots provide visual feedback to changes in values compared to the classes in the dataset.

plot of chunk unnamed-chunk-2

How the model works?

Model uses random forest algorithm with cross-validation.

control <- trainControl(method = "cv", number = 5)
set.seed(seed)
fit.rf <- train(class ~., data = training, method = "rf", metric = "Accuracy", trControl = control)
pred.rf <- predict(fit.rf, training)
conf_m <- confusionMatrix(pred.rf, training$class)
conf_m$overall["Accuracy"]
 Accuracy 
0.8829787