Picture are taken from Google
Hello Everyone :)
Welcome to my Rmd.
This is my first HTML_Document which contains some of my analysis about used car’s dataset.
Enjoy reading!
This dataset is the stacked version of 100,000 UK Used Car Data set present in Kaggle. Here I have combined the used car information of 7 brands namely Audi, BMW, Skoda, Ford, Volkswagen, Toyota and Hyundai.
Data Source: https://www.kaggle.com/datasets/aishwaryamuthukumar/cars-dataset-audi-bmw-ford-hyundai-skoda-vw.
Note: more explanations will be written in the summary part.
Firts step : do not forget to put dataset in the same folder with our R project data.
cars <- read.csv(file="data_input/cars_mobil.csv")
Input or read data is done. This process is the beginning of pre-process data that we will use. Then let’s continue to next step.
head(cars)
## model year price transmission mileage fuelType tax mpg engineSize Make
## 1 A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4 audi
## 2 A6 2016 16500 Automatic 36203 Diesel 20 64.2 2.0 audi
## 3 A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4 audi
## 4 A4 2017 16800 Automatic 25952 Diesel 145 67.3 2.0 audi
## 5 A3 2019 17300 Manual 1998 Petrol 145 49.6 1.0 audi
## 6 A1 2016 13900 Automatic 32260 Petrol 30 58.9 1.4 audi
tail(cars)
## model year price transmission mileage fuelType tax mpg engineSize
## 72430 Santa Fe 2019 29995 Semi-Auto 1567 Diesel 145 39.8 2.2
## 72431 I30 2016 8680 Manual 25906 Diesel 0 78.4 1.6
## 72432 I40 2015 7830 Manual 59508 Diesel 30 65.7 1.7
## 72433 I10 2017 6830 Manual 13810 Petrol 20 60.1 1.0
## 72434 Tucson 2018 13994 Manual 23313 Petrol 145 44.8 1.6
## 72435 Tucson 2016 15999 Automatic 11472 Diesel 125 57.6 1.7
## Make
## 72430 Hyundai
## 72431 Hyundai
## 72432 Hyundai
## 72433 Hyundai
## 72434 Hyundai
## 72435 Hyundai
dim(cars)
## [1] 72435 10
names(cars)
## [1] "model" "year" "price" "transmission" "mileage"
## [6] "fuelType" "tax" "mpg" "engineSize" "Make"
From our inspection we can know that : cars dataset has 72435 rows and 10 columns. The name of columns are model,year,price,transmission,mileage,mileage,tax,mpg,engineSize,Make.
Check data type of columns.
str(cars)
## 'data.frame': 72435 obs. of 10 variables:
## $ model : chr "A1" "A6" "A1" "A4" ...
## $ year : int 2017 2016 2016 2017 2019 2016 2016 2016 2015 2016 ...
## $ price : int 12500 16500 11000 16800 17300 13900 13250 11750 10200 12000 ...
## $ transmission: chr "Manual" "Automatic" "Manual" "Automatic" ...
## $ mileage : int 15735 36203 29946 25952 1998 32260 76788 75185 46112 22451 ...
## $ fuelType : chr "Petrol" "Diesel" "Petrol" "Diesel" ...
## $ tax : int 150 20 30 145 145 30 30 20 20 30 ...
## $ mpg : num 55.4 64.2 55.4 67.3 49.6 58.9 61.4 70.6 60.1 55.4 ...
## $ engineSize : num 1.4 2 1.4 2 1 1.4 2 2 1.4 1.4 ...
## $ Make : chr "audi" "audi" "audi" "audi" ...
After checking type for each columns, we still find there are some
columns with incorrect type. So we must change it to the correct type
firstly. The process is called as data coertion.
cars$year <- as.character(cars$year)
cars$transmission <- as.factor(cars$transmission)
cars$fuelType <- as.factor(cars$fuelType)
cars$Make <- as.factor(cars$Make)
str(cars)
## 'data.frame': 72435 obs. of 10 variables:
## $ model : chr "A1" "A6" "A1" "A4" ...
## $ year : chr "2017" "2016" "2016" "2017" ...
## $ price : int 12500 16500 11000 16800 17300 13900 13250 11750 10200 12000 ...
## $ transmission: Factor w/ 4 levels "Automatic","Manual",..: 2 1 2 1 2 1 1 2 2 2 ...
## $ mileage : int 15735 36203 29946 25952 1998 32260 76788 75185 46112 22451 ...
## $ fuelType : Factor w/ 5 levels "Diesel","Electric",..: 5 1 5 1 5 5 1 1 5 5 ...
## $ tax : int 150 20 30 145 145 30 30 20 20 30 ...
## $ mpg : num 55.4 64.2 55.4 67.3 49.6 58.9 61.4 70.6 60.1 55.4 ...
## $ engineSize : num 1.4 2 1.4 2 1 1.4 2 2 1.4 1.4 ...
## $ Make : Factor w/ 7 levels "audi","BMW","Ford",..: 1 1 1 1 1 1 1 1 1 1 ...
The columns have changed into correct type.
Now, we continue to check missing value in cars dataset.
colSums(is.na(cars))
## model year price transmission mileage fuelType
## 0 0 0 0 0 0
## tax mpg engineSize Make
## 0 0 0 0
anyNA(cars)
## [1] FALSE
By the result, we can get information that cars dataset has no missing value. It means the dataset is complete.
For this analysis, we will use all of columns.
Brief explanation.
summary(cars)
## model year price transmission
## Length:72435 Length:72435 Min. : 495 Automatic:14046
## Class :character Class :character 1st Qu.: 10175 Manual :43021
## Mode :character Mode :character Median : 14495 Other : 4
## Mean : 16580 Semi-Auto:15364
## 3rd Qu.: 20361
## Max. :145000
##
## mileage fuelType tax mpg
## Min. : 1 Diesel :28918 Min. : 0 Min. : 0.30
## 1st Qu.: 7202 Electric: 5 1st Qu.: 30 1st Qu.: 47.90
## Median : 17531 Hybrid : 2903 Median :145 Median : 55.40
## Mean : 23177 Other : 239 Mean :117 Mean : 55.85
## 3rd Qu.: 32449 Petrol :40370 3rd Qu.:145 3rd Qu.: 62.80
## Max. :323000 Max. :580 Max. :470.80
##
## engineSize Make
## Min. :0.000 audi :10668
## 1st Qu.:1.200 BMW :10781
## Median :1.600 Ford :17964
## Mean :1.636 Hyundai: 4860
## 3rd Qu.:2.000 skoda : 6267
## Max. :6.600 toyota : 6738
## vw :15157
Summary:
There were 7 company with many variance model.
The lowest price of used cars was 495, The highest price was 145000, and The average price was 16580.
The used cars were in range year from 1996 to 2020.
There were only 4 cars which have Other transmission
type.
Petrol was the famous fueltype.
Minimun of mileage was 1. It meant there were cars with minimum activity (used).
Mile per gallon was started from 0.30 to 470.80
The tax of used cars have many variance too with minimun 0 to 580.
Ford was the most highest quantity used cars.
Check outliers within price
aggregate(price~Make,cars,mean)
## Make price
## 1 audi 22896.69
## 2 BMW 22733.41
## 3 Ford 12280.08
## 4 Hyundai 12750.13
## 5 skoda 14275.45
## 6 toyota 12522.39
## 7 vw 16838.95
aggregate(price~Make,cars,var)
## Make price
## 1 audi 137237520
## 2 BMW 130314284
## 3 Ford 22480098
## 4 Hyundai 35915118
## 5 skoda 40094871
## 6 toyota 40259248
## 7 vw 60140261
aggregate(price~Make,cars,sd)
## Make price
## 1 audi 11714.842
## 2 BMW 11415.528
## 3 Ford 4741.318
## 4 Hyundai 5992.922
## 5 skoda 6332.051
## 6 toyota 6345.018
## 7 vw 7755.015
boxplot(cars$price)
From the plot we find posibilities for outliers but from or calculation Deviation Standard value is around 12000 and 22000. It still can be accepted. So we can continue to the next step.
cars[cars$price == 495,]
## model year price transmission mileage fuelType tax mpg engineSize Make
## 38553 Focus 2003 495 Manual 177644 Petrol 200 41.5 1.6 Ford
cars[cars$price == 145000,]
## model year price transmission mileage fuelType tax mpg engineSize Make
## 4784 R8 2020 145000 Semi-Auto 2000 Petrol 145 21.1 5.2 audi
Answer :
-. Price 495 come from Ford company with
Focus model and year 2003.
-. Price 145000 come from Audi company with
R8 model and year 2020.
ford <- cars[cars$Make == "Ford",]
round(prop.table(table(ford$transmission))*100,2)
##
## Automatic Manual Other Semi-Auto
## 7.57 86.38 0.00 6.05
Answer :
Ford with Manual transmission –> 86.38%
avgprice<-cars[cars$price > 16580,]
(sort(table(avgprice$year),decreasing = T)[1])
## 2019
## 14173
Answer :
The highest quantity of used cars with price above average was in 2019.
avgprice2 <- avgprice[avgprice$year == "2019",]
(sort(table(avgprice2$Make), decreasing = T))
##
## vw BMW audi Ford skoda Hyundai toyota
## 3711 3438 3190 1606 1156 576 496
Answer :
The highest quantity of used cars in 2019 with price above
average was VW.
table(cars$fuelType, cars$Make)
##
## audi BMW Ford Hyundai skoda toyota vw
## Diesel 5577 7027 5762 1608 2069 503 6372
## Electric 0 3 2 0 0 0 0
## Hybrid 28 298 22 349 18 2043 145
## Other 0 36 1 1 9 105 87
## Petrol 5063 3417 12177 2902 4171 4087 8553
Answer :
Type of fueltype which has highest quantity from Company Audi
was Diesel with amount 5577.
round(prop.table(table(cars$fuelType, cars$Make))*100,3)
##
## audi BMW Ford Hyundai skoda toyota vw
## Diesel 7.699 9.701 7.955 2.220 2.856 0.694 8.797
## Electric 0.000 0.004 0.003 0.000 0.000 0.000 0.000
## Hybrid 0.039 0.411 0.030 0.482 0.025 2.820 0.200
## Other 0.000 0.050 0.001 0.001 0.012 0.145 0.120
## Petrol 6.990 4.717 16.811 4.006 5.758 5.642 11.808
plot(table(cars$transmission, cars$fuelType))
heatmap(table(cars$transmission, cars$fuelType), cexRow = 1, cexCol = 1, scale = "column", Colv = NA, Rowv = NA)
Answer:
Based on the result, `Manual transmission with
Petrol fueltype made the highest quantity with total
28505.
mileage1<- cars[cars$mileage==1,]
knitr::kable(table(mileage1$mileage))
| Var1 | Freq |
|---|---|
| 1 | 23 |
Answer :
23 cars
min(mileage1$price)
## [1] 899
max(mileage1$price)
## [1] 69000
Answer :
The lowest was 899, and the highest was 69000.
mileage1[mileage1$price == 899 | mileage1$price == 69000,]
## model year price transmission mileage fuelType tax mpg engineSize Make
## 5493 Q7 2019 69000 Automatic 1 Diesel 145 30.1 4 audi
## 52737 Beetle 2001 899 Automatic 1 Petrol 325 29.7 2 vw
Answer :
The cheapest was VW with Beetle model.
The highest was Audi with Q7 model.
min(cars$mpg)
## [1] 0.3
max(cars$mpg)
## [1] 470.8
cars[cars$mpg == 0.3,]
## model year price transmission mileage fuelType tax mpg engineSize Make
## 53918 Golf SV 2019 17950 Manual 100 Petrol 145 0.3 1 vw
mpg_min_max <- cars[cars$mpg == 470.8,]
(sort(table(mpg_min_max$Make), decreasing = T))
##
## BMW audi Ford Hyundai skoda toyota vw
## 43 0 0 0 0 0 0
Answer :
Minimum mpg was VW.
Maximum pgw was BMW.
xtabs(tax~Make, cars)
## Make
## audi BMW Ford Hyundai skoda toyota vw
## 1344290 1419880 2035850 588775 735790 638070 1708865
Answer :
The lowest total sum of tax was Hyundai.
xtabs(engineSize~Make, cars)
## Make
## audi BMW Ford Hyundai skoda toyota vw
## 20596.8 23370.7 24266.2 7085.5 8983.8 9913.6 24261.7
Answer :
The lowest total sum of engineSize was Hyundai.
(sort(table(ford$model), decreasing = T))
##
## Fiesta Focus Kuga
## 6556 4588 2225
## EcoSport C-MAX Ka+
## 1143 543 531
## Mondeo B-MAX S-MAX
## 526 355 296
## Grand C-MAX Galaxy Edge
## 247 228 208
## KA Puma Tourneo Custom
## 199 80 69
## Grand Tourneo Connect Mustang Tourneo Connect
## 59 57 33
## Fusion Streetka Escort
## 16 2 1
## Ranger Transit Tourneo
## 1 1
ford_model <- as.data.frame(sort(table(ford$model), decreasing = T))
ford_model
## Var1 Freq
## 1 Fiesta 6556
## 2 Focus 4588
## 3 Kuga 2225
## 4 EcoSport 1143
## 5 C-MAX 543
## 6 Ka+ 531
## 7 Mondeo 526
## 8 B-MAX 355
## 9 S-MAX 296
## 10 Grand C-MAX 247
## 11 Galaxy 228
## 12 Edge 208
## 13 KA 199
## 14 Puma 80
## 15 Tourneo Custom 69
## 16 Grand Tourneo Connect 59
## 17 Mustang 57
## 18 Tourneo Connect 33
## 19 Fusion 16
## 20 Streetka 2
## 21 Escort 1
## 22 Ranger 1
## 23 Transit Tourneo 1
Answer :
The most famous model from Ford company was Fiesta.
graphics::barplot(xtabs(Freq ~ Var1,ford_model))
Started from year 1996 to 2020, there were 72435 used cars in UK (by the dataset). With many variance of prices, the cheapest was 495 and the most expensive was 145000. There were 4 types for trasmission and can be combined with 5 types of fueltype. The number of mileage can be used to know about the activity of used cars. It started from 1 as minimum value and the highest was 323000. The road tax, milepergallon and also engineSize have minimum value from 0 , 0.30 and 0.
Recomendations :
From our analysis, if the budget is big and want to buy expensive car, Audi or BMW can be a good choice because the average of price was more than 22000.
Quantity of Ford used cars was the biggest. It can be known by its average of price that was the lowest one. And Ford with Fiesta model can be a best choice because it has the highest quantity of used cars from Ford company. And the specifications were good too.
Manual transmission and Petrol Fueltype were favorite users in UK. The electric fueltype was the very lowest. But in the newest technology and innovations, the electric cars is going to develop for future. So it can be alternative choices because the user of electric cars was so little.