Data Prep

The data is from Kaggle and it is used for analyzing the price variation based on individual variables.

https://www.kaggle.com/hellbuoy/car-price-prediction

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(RCurl)
## Warning: package 'RCurl' was built under R version 3.6.2
file_url <- getURL("https://raw.githubusercontent.com/jey1987/DATA605/master/CarPrice_Assignment.csv")
df_input <- read.csv(text=file_url,header=TRUE)

df_final <- df_input %>%
  select(carlength,carwidth,carheight,curbweight,citympg,highwaympg)

Data Visualization

pairs(df_final)

Multiple Linear Regression Model

lm_price <- lm(price~(carlength+carwidth+carheight+curbweight+citympg+highwaympg),data=df_input)
summary(lm_price)
## 
## Call:
## lm(formula = price ~ (carlength + carwidth + carheight + curbweight + 
##     citympg + highwaympg), data = df_input)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8898.7 -2087.5  -457.9  1659.6 18578.4 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36991.104  16995.822  -2.176  0.03070 *  
## carlength     -160.798     63.201  -2.544  0.01171 *  
## carwidth       858.593    295.368   2.907  0.00407 ** 
## carheight     -177.097    158.140  -1.120  0.26412    
## curbweight      12.614      1.561   8.079 6.31e-14 ***
## citympg       -418.927    195.799  -2.140  0.03361 *  
## highwaympg     309.265    196.463   1.574  0.11705    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4174 on 198 degrees of freedom
## Multiple R-squared:  0.735,  Adjusted R-squared:  0.727 
## F-statistic: 91.54 on 6 and 198 DF,  p-value: < 2.2e-16

The Model Equation is -36991.104 - 160.798carlength + 858.593carwidth - 177.097carheight + 12.614curbweight - 418.927citympg + 309.265highwaympg

Residual Analysis

par(mfrow=c(2,2))
plot(lm_price)
abline(lm_price)
## Warning in abline(lm_price): only using the first two of 7 regression
## coefficients

Conclusion

By looking at the plots and residual analysis we can say that the variables have strong impact over the price of car.