Picture are taken from Google

1. Explanation

About Writter

Hello Everyone :)

Welcome to my Rmd.

This is my first HTML_Document which contains some of my analysis about used car’s dataset.

Enjoy reading!

Informations of dataset

This dataset is the stacked version of 100,000 UK Used Car Data set present in Kaggle. Here I have combined the used car information of 7 brands namely Audi, BMW, Skoda, Ford, Volkswagen, Toyota and Hyundai.

Data Source: https://www.kaggle.com/datasets/aishwaryamuthukumar/cars-dataset-audi-bmw-ford-hyundai-skoda-vw.

Note: more explanations will be written in the summary part.

2. Input Data

Firts step : do not forget to put dataset in the same folder with our R project data.

cars <- read.csv(file="data_input/cars_mobil.csv")

Input or read data is done. This process is the beginning of pre-process data that we will use. Then let’s continue to next step.

2.1. Data Inspection

head(cars)
##   model year price transmission mileage fuelType tax  mpg engineSize Make
## 1    A1 2017 12500       Manual   15735   Petrol 150 55.4        1.4 audi
## 2    A6 2016 16500    Automatic   36203   Diesel  20 64.2        2.0 audi
## 3    A1 2016 11000       Manual   29946   Petrol  30 55.4        1.4 audi
## 4    A4 2017 16800    Automatic   25952   Diesel 145 67.3        2.0 audi
## 5    A3 2019 17300       Manual    1998   Petrol 145 49.6        1.0 audi
## 6    A1 2016 13900    Automatic   32260   Petrol  30 58.9        1.4 audi
tail(cars)
##          model year price transmission mileage fuelType tax  mpg engineSize
## 72430 Santa Fe 2019 29995    Semi-Auto    1567   Diesel 145 39.8        2.2
## 72431      I30 2016  8680       Manual   25906   Diesel   0 78.4        1.6
## 72432      I40 2015  7830       Manual   59508   Diesel  30 65.7        1.7
## 72433      I10 2017  6830       Manual   13810   Petrol  20 60.1        1.0
## 72434   Tucson 2018 13994       Manual   23313   Petrol 145 44.8        1.6
## 72435   Tucson 2016 15999    Automatic   11472   Diesel 125 57.6        1.7
##          Make
## 72430 Hyundai
## 72431 Hyundai
## 72432 Hyundai
## 72433 Hyundai
## 72434 Hyundai
## 72435 Hyundai
dim(cars)
## [1] 72435    10
names(cars)
##  [1] "model"        "year"         "price"        "transmission" "mileage"     
##  [6] "fuelType"     "tax"          "mpg"          "engineSize"   "Make"

From our inspection we can know that : cars dataset has 72435 rows and 10 columns. The name of columns are model,year,price,transmission,mileage,mileage,tax,mpg,engineSize,Make.

2.2. Data Cleansing & Coertions

Check data type of columns.

str(cars)
## 'data.frame':    72435 obs. of  10 variables:
##  $ model       : chr  "A1" "A6" "A1" "A4" ...
##  $ year        : int  2017 2016 2016 2017 2019 2016 2016 2016 2015 2016 ...
##  $ price       : int  12500 16500 11000 16800 17300 13900 13250 11750 10200 12000 ...
##  $ transmission: chr  "Manual" "Automatic" "Manual" "Automatic" ...
##  $ mileage     : int  15735 36203 29946 25952 1998 32260 76788 75185 46112 22451 ...
##  $ fuelType    : chr  "Petrol" "Diesel" "Petrol" "Diesel" ...
##  $ tax         : int  150 20 30 145 145 30 30 20 20 30 ...
##  $ mpg         : num  55.4 64.2 55.4 67.3 49.6 58.9 61.4 70.6 60.1 55.4 ...
##  $ engineSize  : num  1.4 2 1.4 2 1 1.4 2 2 1.4 1.4 ...
##  $ Make        : chr  "audi" "audi" "audi" "audi" ...

After checking type for each columns, we still find there are some columns with incorrect type. So we must change it to the correct type firstly. The process is called as data coertion.

cars$year <- as.character(cars$year)
cars$transmission <- as.factor(cars$transmission)
cars$fuelType <- as.factor(cars$fuelType)
cars$Make <- as.factor(cars$Make)
str(cars)
## 'data.frame':    72435 obs. of  10 variables:
##  $ model       : chr  "A1" "A6" "A1" "A4" ...
##  $ year        : chr  "2017" "2016" "2016" "2017" ...
##  $ price       : int  12500 16500 11000 16800 17300 13900 13250 11750 10200 12000 ...
##  $ transmission: Factor w/ 4 levels "Automatic","Manual",..: 2 1 2 1 2 1 1 2 2 2 ...
##  $ mileage     : int  15735 36203 29946 25952 1998 32260 76788 75185 46112 22451 ...
##  $ fuelType    : Factor w/ 5 levels "Diesel","Electric",..: 5 1 5 1 5 5 1 1 5 5 ...
##  $ tax         : int  150 20 30 145 145 30 30 20 20 30 ...
##  $ mpg         : num  55.4 64.2 55.4 67.3 49.6 58.9 61.4 70.6 60.1 55.4 ...
##  $ engineSize  : num  1.4 2 1.4 2 1 1.4 2 2 1.4 1.4 ...
##  $ Make        : Factor w/ 7 levels "audi","BMW","Ford",..: 1 1 1 1 1 1 1 1 1 1 ...

The columns have changed into correct type.

Now, we continue to check missing value in cars dataset.

colSums(is.na(cars))
##        model         year        price transmission      mileage     fuelType 
##            0            0            0            0            0            0 
##          tax          mpg   engineSize         Make 
##            0            0            0            0
 anyNA(cars)
## [1] FALSE

By the result, we can get information that cars dataset has no missing value. It means the dataset is complete.

For this analysis, we will use all of columns.

3. Data Explanation

Brief explanation.

summary(cars)
##     model               year               price           transmission  
##  Length:72435       Length:72435       Min.   :   495   Automatic:14046  
##  Class :character   Class :character   1st Qu.: 10175   Manual   :43021  
##  Mode  :character   Mode  :character   Median : 14495   Other    :    4  
##                                        Mean   : 16580   Semi-Auto:15364  
##                                        3rd Qu.: 20361                    
##                                        Max.   :145000                    
##                                                                          
##     mileage           fuelType          tax           mpg        
##  Min.   :     1   Diesel  :28918   Min.   :  0   Min.   :  0.30  
##  1st Qu.:  7202   Electric:    5   1st Qu.: 30   1st Qu.: 47.90  
##  Median : 17531   Hybrid  : 2903   Median :145   Median : 55.40  
##  Mean   : 23177   Other   :  239   Mean   :117   Mean   : 55.85  
##  3rd Qu.: 32449   Petrol  :40370   3rd Qu.:145   3rd Qu.: 62.80  
##  Max.   :323000                    Max.   :580   Max.   :470.80  
##                                                                  
##    engineSize         Make      
##  Min.   :0.000   audi   :10668  
##  1st Qu.:1.200   BMW    :10781  
##  Median :1.600   Ford   :17964  
##  Mean   :1.636   Hyundai: 4860  
##  3rd Qu.:2.000   skoda  : 6267  
##  Max.   :6.600   toyota : 6738  
##                  vw     :15157

Summary:

  1. There were 7 company with many variance model.

  2. The lowest price of used cars was 495, The highest price was 145000, and The average price was 16580.

  3. The used cars were in range year from 1996 to 2020.

  4. There were only 4 cars which have Other transmission type.

  5. Petrol was the famous fueltype.

  6. Minimun of mileage was 1. It meant there were cars with minimum activity (used).

  7. Mile per gallon was started from 0.30 to 470.80

  8. The tax of used cars have many variance too with minimun 0 to 580.

  9. Ford was the most highest quantity used cars.

Check outliers within price

aggregate(price~Make,cars,mean)
##      Make    price
## 1    audi 22896.69
## 2     BMW 22733.41
## 3    Ford 12280.08
## 4 Hyundai 12750.13
## 5   skoda 14275.45
## 6  toyota 12522.39
## 7      vw 16838.95
aggregate(price~Make,cars,var)
##      Make     price
## 1    audi 137237520
## 2     BMW 130314284
## 3    Ford  22480098
## 4 Hyundai  35915118
## 5   skoda  40094871
## 6  toyota  40259248
## 7      vw  60140261
aggregate(price~Make,cars,sd)
##      Make     price
## 1    audi 11714.842
## 2     BMW 11415.528
## 3    Ford  4741.318
## 4 Hyundai  5992.922
## 5   skoda  6332.051
## 6  toyota  6345.018
## 7      vw  7755.015
boxplot(cars$price)

From the plot we find posibilities for outliers but from or calculation Deviation Standard value is around 12000 and 22000. It still can be accepted. So we can continue to the next step.

4. Data Manipulation & Transformation

  1. Which company has the lowest and the highest? What’s the car model and the year?
cars[cars$price == 495,]
##       model year price transmission mileage fuelType tax  mpg engineSize Make
## 38553 Focus 2003   495       Manual  177644   Petrol 200 41.5        1.6 Ford
cars[cars$price == 145000,]
##      model year  price transmission mileage fuelType tax  mpg engineSize Make
## 4784    R8 2020 145000    Semi-Auto    2000   Petrol 145 21.1        5.2 audi

Answer :

-. Price 495 come from Ford company with Focus model and year 2003.

-. Price 145000 come from Audi company with R8 model and year 2020.

  1. Which transmission sold the most Ford Company?
ford <- cars[cars$Make == "Ford",]
round(prop.table(table(ford$transmission))*100,2)
## 
## Automatic    Manual     Other Semi-Auto 
##      7.57     86.38      0.00      6.05

Answer :

Ford with Manual transmission –> 86.38%

  1. In what year the highest quantity of cars with price above average price?
avgprice<-cars[cars$price > 16580,]
(sort(table(avgprice$year),decreasing = T)[1])
##  2019 
## 14173

Answer :

The highest quantity of used cars with price above average was in 2019.

  1. From the previous question, what was the company name?
avgprice2 <- avgprice[avgprice$year == "2019",]
(sort(table(avgprice2$Make), decreasing = T))
## 
##      vw     BMW    audi    Ford   skoda Hyundai  toyota 
##    3711    3438    3190    1606    1156     576     496

Answer :

The highest quantity of used cars in 2019 with price above average was VW.

  1. What type of fueltype which has highest quantity from Company Audi?
table(cars$fuelType, cars$Make)
##           
##             audi   BMW  Ford Hyundai skoda toyota    vw
##   Diesel    5577  7027  5762    1608  2069    503  6372
##   Electric     0     3     2       0     0      0     0
##   Hybrid      28   298    22     349    18   2043   145
##   Other        0    36     1       1     9    105    87
##   Petrol    5063  3417 12177    2902  4171   4087  8553

Answer :

Type of fueltype which has highest quantity from Company Audi was Diesel with amount 5577.

  1. Now, make the number 5 to percentage table.
round(prop.table(table(cars$fuelType, cars$Make))*100,3)
##           
##              audi    BMW   Ford Hyundai  skoda toyota     vw
##   Diesel    7.699  9.701  7.955   2.220  2.856  0.694  8.797
##   Electric  0.000  0.004  0.003   0.000  0.000  0.000  0.000
##   Hybrid    0.039  0.411  0.030   0.482  0.025  2.820  0.200
##   Other     0.000  0.050  0.001   0.001  0.012  0.145  0.120
##   Petrol    6.990  4.717 16.811   4.006  5.758  5.642 11.808
  1. Made the number 5 into a plot.
plot(table(cars$transmission, cars$fuelType))

heatmap(table(cars$transmission, cars$fuelType), cexRow = 1, cexCol = 1, scale = "column", Colv = NA, Rowv = NA)

Answer:

Based on the result, `Manual transmission with Petrol fueltype made the highest quantity with total 28505.

  1. How many used cars which has the lowest mileage and mileage above average (23177)?
mileage1<- cars[cars$mileage==1,]
knitr::kable(table(mileage1$mileage))
Var1 Freq
1 23

Answer :

23 cars

  1. From the lowest mileage of used cars, how much the prices (the cheapest and the most expensive)?
min(mileage1$price)
## [1] 899
max(mileage1$price)
## [1] 69000

Answer :

The lowest was 899, and the highest was 69000.

  1. What were the specifications of the cheapest and the most expensive used cars with minimum mileage?
mileage1[mileage1$price == 899 | mileage1$price == 69000,]
##        model year price transmission mileage fuelType tax  mpg engineSize Make
## 5493      Q7 2019 69000    Automatic       1   Diesel 145 30.1          4 audi
## 52737 Beetle 2001   899    Automatic       1   Petrol 325 29.7          2   vw

Answer :

The cheapest was VW with Beetle model.

The highest was Audi with Q7 model.

  1. What used cars has the minimum and maximum milepergallon?
min(cars$mpg)
## [1] 0.3
max(cars$mpg)
## [1] 470.8
cars[cars$mpg == 0.3,]
##         model year price transmission mileage fuelType tax mpg engineSize Make
## 53918 Golf SV 2019 17950       Manual     100   Petrol 145 0.3          1   vw
mpg_min_max <- cars[cars$mpg == 470.8,]
(sort(table(mpg_min_max$Make), decreasing = T))
## 
##     BMW    audi    Ford Hyundai   skoda  toyota      vw 
##      43       0       0       0       0       0       0

Answer :

Minimum mpg was VW.

Maximum pgw was BMW.

  1. Total sum of used cars tax by each company?
xtabs(tax~Make, cars)
## Make
##    audi     BMW    Ford Hyundai   skoda  toyota      vw 
## 1344290 1419880 2035850  588775  735790  638070 1708865

Answer :

The lowest total sum of tax was Hyundai.

  1. Total sum of used cars engineSize by each company?
xtabs(engineSize~Make, cars)
## Make
##    audi     BMW    Ford Hyundai   skoda  toyota      vw 
## 20596.8 23370.7 24266.2  7085.5  8983.8  9913.6 24261.7

Answer :

The lowest total sum of engineSize was Hyundai.

  1. Show amount of used cars Ford Company with its model.
(sort(table(ford$model), decreasing = T))
## 
##                Fiesta                 Focus                  Kuga 
##                  6556                  4588                  2225 
##              EcoSport                 C-MAX                   Ka+ 
##                  1143                   543                   531 
##                Mondeo                 B-MAX                 S-MAX 
##                   526                   355                   296 
##           Grand C-MAX                Galaxy                  Edge 
##                   247                   228                   208 
##                    KA                  Puma        Tourneo Custom 
##                   199                    80                    69 
## Grand Tourneo Connect               Mustang       Tourneo Connect 
##                    59                    57                    33 
##                Fusion              Streetka                Escort 
##                    16                     2                     1 
##                Ranger       Transit Tourneo 
##                     1                     1
ford_model <- as.data.frame(sort(table(ford$model), decreasing = T))
ford_model
##                     Var1 Freq
## 1                 Fiesta 6556
## 2                  Focus 4588
## 3                   Kuga 2225
## 4               EcoSport 1143
## 5                  C-MAX  543
## 6                    Ka+  531
## 7                 Mondeo  526
## 8                  B-MAX  355
## 9                  S-MAX  296
## 10           Grand C-MAX  247
## 11                Galaxy  228
## 12                  Edge  208
## 13                    KA  199
## 14                  Puma   80
## 15        Tourneo Custom   69
## 16 Grand Tourneo Connect   59
## 17               Mustang   57
## 18       Tourneo Connect   33
## 19                Fusion   16
## 20              Streetka    2
## 21                Escort    1
## 22                Ranger    1
## 23       Transit Tourneo    1

Answer :

The most famous model from Ford company was Fiesta.

  1. Change table in number 14 to a simple graphics so it can be read easyly.
graphics::barplot(xtabs(Freq ~ Var1,ford_model))

5. Explanatory Text & Business Recomendation

Started from year 1996 to 2020, there were 72435 used cars in UK (by the dataset). With many variance of prices, the cheapest was 495 and the most expensive was 145000. There were 4 types for trasmission and can be combined with 5 types of fueltype. The number of mileage can be used to know about the activity of used cars. It started from 1 as minimum value and the highest was 323000. The road tax, milepergallon and also engineSize have minimum value from 0 , 0.30 and 0.

Recomendations :

  1. From our analysis, if the budget is big and want to buy expensive car, Audi or BMW can be a good choice because the average of price was more than 22000.

  2. Quantity of Ford used cars was the biggest. It can be known by its average of price that was the lowest one. And Ford with Fiesta model can be a best choice because it has the highest quantity of used cars from Ford company. And the specifications were good too.

  3. Manual transmission and Petrol Fueltype were favorite users in UK. The electric fueltype was the very lowest. But in the newest technology and innovations, the electric cars is going to develop for future. So it can be alternative choices because the user of electric cars was so little.