Analysis of sale price for used cars

Factors influencing sale price

s3687114

Last updated: 23 October, 2017

Introduction

The aim of this analysis is to find out if there is a factor influencing sales of used BMW cars in Berlin. This can be done by evaluating the price of premium vehicles.

knitr::include_graphics('C:/WorkingFolder/R/Assignment4/car-sales.jpg')

Introduction Cont.

The data analysis was done using descriptive and inferential statistical tools such as linear regression using R programming.

Problem Statement

Data

bmw <- read_csv("C:/WorkingFolder/R/Assignment4/Assign4.csv")

The data was collected on 1049 BMWs listed on a used car website in 2016

Data Cont.

There are price of the vehicle and characteristics of the vehicle such as age, kilometres driven, fuel used and the body style. By analysing the relationship between these variable, the aim of this analysis will be achieved.

Descriptive Statistics and Visualisation

The relationship between the car price (in australian dollars) and Power

plot(bmw$PriceAUD~bmw$PowerKW, xlim=c(10,800))

The graph clearly shows there is a descrepancy in price as per the variable Power.

Decsriptive Statistics Cont.

The relationship between the vehicle type and price in a summary

bmwA <- bmw %>% group_by(bmw$vehicleType, add=TRUE)

bmwA  %>% summarise(Min = min(bmw$PriceAUD,na.rm = TRUE),
                                           Q1 = quantile(bmw$PriceAUD,probs = .25,na.rm = TRUE),
                                           Median = median(bmw$PriceAUD, na.rm = TRUE),
                                           Q3 = quantile(bmw$PriceAUD,probs = .75,na.rm = TRUE),
                                           Max = max(bmw$PriceAUD,na.rm = TRUE),
                                           Mean = mean(bmw$PriceAUD, na.rm = TRUE),
                                           SD = sd(bmw$PriceAUD, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(bmw$PriceAUD))) -> table1 
knitr::kable(table1)
bmw$vehicleType Min Q1 Median Q3 Max Mean SD n Missing
convertible 735 4115 8820 18816 87465 12956.47 12228.86 102 0
coupe 735 4115 8820 18816 87465 12956.47 12228.86 123 0
hatchback 735 4115 8820 18816 87465 12956.47 12228.86 15 0
sedan 735 4115 8820 18816 87465 12956.47 12228.86 809 0
echo=FALSE

Hypothesis Testing

model1 <- lm(bmw$yearOfRegistration ~ bmw$PriceAUD, data = bmw)
model1 %>% summary()
## 
## Call:
## lm(formula = bmw$yearOfRegistration ~ bmw$PriceAUD, data = bmw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.356  -2.169   0.419   2.927  11.880 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.999e+03  2.060e-01 9701.78   <2e-16 ***
## bmw$PriceAUD 3.492e-04  1.157e-05   30.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.579 on 1047 degrees of freedom
## Multiple R-squared:  0.4655, Adjusted R-squared:  0.4649 
## F-statistic: 911.7 on 1 and 1047 DF,  p-value: < 2.2e-16

Hypthesis Testing Cont.

H_0: There is no association in the population price between the variables

H_A: There is an association in the population price between the variables

Discussion

The p value is less than 0.05 and so the null hypothesis should be rejected. There is a statistical significance evidence that there is an association between price and other variable factors.

References

https://astral-theory-157510.appspot.com/secured/MATH1324_Module_08.html http://rmarkdown.rstudio.com/slidy_presentation_format.html#figure_options https://astral-theory-157510.appspot.com/secured/RBootcamp_Course_04.html