Data products project

Gouthami Senthamaraikkannan

May 11, 2016

Introduction

This is a presentation depicting an app that predicts the enginer displacement of a car, given its mileage and no. of cylinders.

i.e., the regressors considered in the model are

and the output is

What the app does

The regression model used for prediction is based on the “mpg” dataset released by EPA. All the variables in the data set are shown below.

## Warning: package 'ggplot2' was built under R version 3.2.5
##  [1] "manufacturer" "model"        "displ"        "year"        
##  [5] "cyl"          "trans"        "drv"          "cty"         
##  [9] "hwy"          "fl"           "class"

Correlation between regressors and regressand

The following gives an idea of the correlation that exists mileage, no. of cylinders and the regressand, engine displacement.

## Warning: package 'corrplot' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Linear model

m <- lm(displ~hwy+cty+cyl, data = mpg)
m
## 
## Call:
## lm(formula = displ ~ hwy + cty + cyl, data = mpg)
## 
## Coefficients:
## (Intercept)          hwy          cty          cyl  
##    0.324047    -0.024000    -0.009646     0.657666

Prediction using the linear model

hwy <- 7
cty <- 5
cyl <- 4
d <- data.frame(hwy, cty, cyl)
p <- predict(m, d)

Thus, the predicted engine displacement for the given values of hwy, cty & cyl is

p
##        1 
## 2.738485