s3687114
Last updated: 23 October, 2017
The link : http://rpubs.com/Clarine78/321589
The aim of this analysis is to find out if there is a factor influencing sales of used BMW cars in Berlin. This can be done by evaluating the price of premium vehicles.
knitr::include_graphics('C:/WorkingFolder/R/Assignment4/car-sales.jpg')The data analysis was done using descriptive and inferential statistical tools such as linear regression using R programming.
To evaluate the price of premium vehicle.
In order to achieve the aim of the analysis, the variables will be used to summarise and this summaries will be manipulated.
bmw <- read_csv("C:/WorkingFolder/R/Assignment4/Assign4.csv")The data was collected on 1049 BMWs listed on a used car website in 2016
There are price of the vehicle and characteristics of the vehicle such as age, kilometres driven, fuel used and the body style. By analysing the relationship between these variable, the aim of this analysis will be achieved.
The relationship between the car price (in australian dollars) and Power
plot(bmw$PriceAUD~bmw$PowerKW, xlim=c(10,800))The graph clearly shows there is a descrepancy in price as per the variable Power.
The relationship between the vehicle type and price in a summary
bmwA <- bmw %>% group_by(bmw$vehicleType, add=TRUE)
bmwA %>% summarise(Min = min(bmw$PriceAUD,na.rm = TRUE),
Q1 = quantile(bmw$PriceAUD,probs = .25,na.rm = TRUE),
Median = median(bmw$PriceAUD, na.rm = TRUE),
Q3 = quantile(bmw$PriceAUD,probs = .75,na.rm = TRUE),
Max = max(bmw$PriceAUD,na.rm = TRUE),
Mean = mean(bmw$PriceAUD, na.rm = TRUE),
SD = sd(bmw$PriceAUD, na.rm = TRUE),
n = n(),
Missing = sum(is.na(bmw$PriceAUD))) -> table1
knitr::kable(table1)| bmw$vehicleType | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| convertible | 735 | 4115 | 8820 | 18816 | 87465 | 12956.47 | 12228.86 | 102 | 0 |
| coupe | 735 | 4115 | 8820 | 18816 | 87465 | 12956.47 | 12228.86 | 123 | 0 |
| hatchback | 735 | 4115 | 8820 | 18816 | 87465 | 12956.47 | 12228.86 | 15 | 0 |
| sedan | 735 | 4115 | 8820 | 18816 | 87465 | 12956.47 | 12228.86 | 809 | 0 |
echo=FALSEmodel1 <- lm(bmw$yearOfRegistration ~ bmw$PriceAUD, data = bmw)
model1 %>% summary()##
## Call:
## lm(formula = bmw$yearOfRegistration ~ bmw$PriceAUD, data = bmw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.356 -2.169 0.419 2.927 11.880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.999e+03 2.060e-01 9701.78 <2e-16 ***
## bmw$PriceAUD 3.492e-04 1.157e-05 30.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.579 on 1047 degrees of freedom
## Multiple R-squared: 0.4655, Adjusted R-squared: 0.4649
## F-statistic: 911.7 on 1 and 1047 DF, p-value: < 2.2e-16
H_0: There is no association in the population price between the variables
H_A: There is an association in the population price between the variables
The p value is less than 0.05 and so the null hypothesis should be rejected. There is a statistical significance evidence that there is an association between price and other variable factors.