Porshi Gupta-S3894438,
Dhruv Pathak-S3908797,
Unnimaya
Stalin-S3861387
Last updated: 31 May, 2022
## 'data.frame': 17966 obs. of 9 variables:
## $ model : chr " Fiesta" " Focus" " Focus" " Fiesta" ...
## $ year : int 2017 2018 2017 2019 2019 2015 2019 2017 2019 2018 ...
## $ price : int 12000 14000 13000 17500 16500 10500 22500 9000 25500 10000 ...
## $ transmission: chr "Automatic" "Manual" "Manual" "Manual" ...
## $ mileage : int 15944 9083 12456 10460 1482 35432 2029 13054 6894 48141 ...
## $ fuelType : chr "Petrol" "Petrol" "Petrol" "Petrol" ...
## $ tax : int 150 150 150 145 145 145 145 145 145 145 ...
## $ mpg : num 57.7 57.7 57.7 40.3 48.7 47.9 50.4 54.3 42.2 61.4 ...
## $ engineSize : num 1 1 1 1.5 1 1.6 1 1.2 2 1 ...
ford$model = factor(ford$model)
ford$year = factor(ford$year, levels = levels(as.factor(ford$year)), ordered = TRUE)
ford$price = factor(ford$price, levels = levels(as.factor(ford$price)), ordered = TRUE)mpg- Miles per gallon is a numeric variable that provides us information on how far a car can travel for every gallon of fuel it consumes.
mileage- It is an efficiency metric that assists in measuring the car’s financial and economic affordability.
engineSize- The engine size determines the amount of power the engine can produce which affects the fuel consumption of a car.
ford <- ford[ford$tax != 0,]
mileage_summary <- ford %>% summarise(Parameter = "mileage",
Min = min(mileage,na.rm = TRUE),
Q1 = quantile(mileage,probs = .25,na.rm = TRUE),
Median = median(mileage, na.rm = TRUE),
Q3 = quantile(mileage,probs = .75,na.rm = TRUE),
Max = max(mileage,na.rm = TRUE),
Mean = mean(mileage, na.rm = TRUE),
SD = sd(mileage, na.rm = TRUE),
n = n(),Missing = sum(is.na(mileage)))
mpg_summary <- ford %>% summarise(Parameter = "mpg",
Min = min(mpg,na.rm = TRUE),
Q1 = quantile(mpg,probs = .25,na.rm = TRUE),
Median = median(mpg, na.rm = TRUE),
Q3 = quantile(mpg,probs = .75,na.rm = TRUE),
Max = max(mpg,na.rm = TRUE),
Mean = mean(mpg, na.rm = TRUE),
SD = sd(mpg, na.rm = TRUE),
n = n(),Missing = sum(is.na(mpg)))
engineSize_summary <- ford %>% summarise(Parameter = "engineSize",
Min = min(engineSize,na.rm = TRUE),
Q1 = quantile(engineSize,probs = .25,na.rm = TRUE),
Median = median(engineSize, na.rm = TRUE),
Q3 = quantile(engineSize,probs = .75,na.rm = TRUE),
Max = max(engineSize,na.rm = TRUE),
Mean = mean(engineSize, na.rm = TRUE),
SD = sd(engineSize, na.rm = TRUE),
n = n(),Missing = sum(is.na(engineSize)))
table1<-ford %>% group_by(fuelType) %>% summarise(Parameter = "mpg",
Min = min(mpg,na.rm = TRUE),
Q1 = quantile(mpg, probs = .25,na.rm = TRUE),
Median = median(mpg, na.rm = TRUE),
Q3 = quantile(mpg,probs = .75,na.rm = TRUE),
Max = max(mpg,na.rm = TRUE),
Mean = mean(mpg, na.rm = TRUE),
SD = sd(mpg, na.rm = TRUE),
range = max(mpg,na.rm = TRUE) - min(mpg,na.rm = TRUE),
n = n(),Missing = sum(is.na(mpg)))
knitr::kable(table1)| fuelType | Parameter | Min | Q1 | Median | Q3 | Max | Mean | SD | range | n | Missing |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Diesel | mpg | 28.3 | 54.3 | 60.10 | 67.3 | 88.3 | 59.72766 | 10.050223 | 60.0 | 4888 | 0 |
| Hybrid | mpg | 46.3 | 47.1 | 49.15 | 201.8 | 201.8 | 96.82500 | 73.161652 | 155.5 | 16 | 0 |
| Petrol | mpg | 20.8 | 49.6 | 55.40 | 60.1 | 85.6 | 54.73053 | 8.239173 | 64.8 | 10909 | 0 |
The combined summary statistics of the Ford data-set has been described as follows:
| Parameter | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| mileage | 1.0 | 9236.0 | 16816.0 | 29440.0 | 177644.0 | 22107.463037 | 19281.262605 | 15813 | 0 |
| mpg | 20.8 | 51.4 | 57.7 | 62.8 | 201.8 | 56.317802 | 9.493097 | 15813 | 0 |
| engineSize | 0.0 | 1.0 | 1.2 | 1.6 | 5.0 | 1.370467 | 0.446978 | 15813 | 0 |
The scatter-plot maintains a steady trend as the mileage increases as the number of mpg stays within the same range.
However, majority of the car models have their mpg below 100 making it consistent even if the mileage increases for it.
Below are the R chunks implemented for the above mentioned functions.
# histogram
ford$mpg %>% hist(col="grey", ylim=c(0,7000), xlim=c(0,300), xlab="Number of mpg",main="Histogram of Number of Miles per Gallon")ford$mileage %>% hist(col="grey",ylim=c(0,5000), xlim=c(0,200000), xlab="Mileage",main="Histogram of Number of Miles Travelled")#Scatter plot
ford %>% plot(mpg ~ mileage, data = .,ylab="Miles per gallon", xlab="Mileage",col="blue",main="Mileage covered for every MPG ")
# Decsriptive Statistics Cont.
Boxplot of Miles per Gallon(mpg) showcasing outliers:
Analysing the boxplot below, we can see that there is one extreme outlier in comparison to the other few outliers that are placed just below 100% that there are a few outliers below 100%.
In contrast, there are a majority of outliers below the 40% mark. there are some models that have very low mpg wherein most of the outliers with lower percentage ranged below 40%. There are a few assumptions that can be made for this. One such possible assumption might be that a particular model of Ford car gives very low mileage, ranking those particular car models at a lower purchase or use preference. This makes them a less preferred choice for people wishing to buy a car for long distance travels.
Below are the R chunks implemented for the above mentioned functions.
# boxplot
plot1 <- boxplot(ford$mpg, main="Boxplot of Rate of Mileage with outliers", ylab= "Percentage of miles traveled")##
## One Sample t-test
##
## data: ford$mpg
## t = 83.688, df = 15812, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 56.16983 56.46577
## sample estimates:
## mean of x
## 56.3178
\[H_0: \mu_1 = 50 \]
\[H_A: \mu_1 \ne 50 \]
Australian Industry and Skills Committee 2018, Automotive, National Industry Insights Report.(Australian Industry and Skills Committee 2018)
Ford Car Price Prediction n.d., www.kaggle.com, https://www.kaggle.com/datasets/adhurimquku/ford-car-price-prediction
www.hiltongarage.co.uk. (n.d.). What makes a good MPG? - Hilton Garage Limited. [online] Available at: https://www.hiltongarage.co.uk/blog/what-makes-a-good-mpg
carwow.co.uk. (n.d.). What is mpg? How is it calculated? [online] Available at: https://www.carwow.co.uk/guides/running/what-is-mpg-0255#gref
stasha (n.d.). Good mileage for a used car | Policy Advice. [online] policyadvice.net. Available at: https://policyadvice.net/insurance/guides/good-mileage-for-used-car/ [Accessed 29 May 2022].
Car Keys. (n.d.). What does engine size mean? [online] Available at: https://www.carkeys.co.uk/guides/what-does-engine-size-mean [Accessed 29 May 2022].