Purpose of Analysis

The primary reason for examining the prices at which vehicles are being sold is to understand the pricing strategies and market differences. This assists enterprises in identifying market trends, setting reasonable and fair prices, and making sound decisions base on stock and sales management therefore improving the overall profitability and positioning in the market.The analysis mainly focuses on the impact of mileage on selling price, types of cars sold, condition and if the sellers always sell the vehicles at the estimated market price.

Data Features

The vehicle sales dataset comprises various attributes:

state of registration, condition rating, odometer reading, selling price,

library(ggplot2)
library(dplyr)
library(readxl)
library(plotly)
library(tidyr)

#Loading the data
car_prices <- read_excel("C:/Users/HP/Desktop/Automobile Data/car_prices.xlsx")

#Viewing the data
head(car_prices)
## # A tibble: 6 × 16
##    year make   model     trim  body  transmission vin   state condition odometer
##   <dbl> <chr>  <chr>     <chr> <chr> <chr>        <chr> <chr>     <dbl>    <dbl>
## 1  2015 Kia    Sorento   LX    SUV   automatic    5xyk… ca            5    16639
## 2  2015 Kia    Sorento   LX    SUV   automatic    5xyk… ca            5     9393
## 3  2014 BMW    3 Series  328i… Sedan automatic    wba3… ca           45     1331
## 4  2015 Volvo  S60       T5    Sedan automatic    yv16… ca           41    14282
## 5  2014 BMW    6 Series… 650i  Sedan automatic    wba6… ca           43     2641
## 6  2015 Nissan Altima    2.5 S Sedan automatic    1n4a… ca            1     5554
## # ℹ 6 more variables: color <chr>, interior <chr>, seller <chr>, mmr <dbl>,
## #   sellingprice <dbl>, saledate <chr>

#cleaning the data

# Checking for missing values in the dataset
missing_values <- sapply(car_prices, function(x) sum(is.na(x)))
missing_values
##         year         make        model         trim         body transmission 
##            0        10301        10399        10651        13195        65352 
##          vin        state    condition     odometer        color     interior 
##            4            0        11820           94          749          749 
##       seller          mmr sellingprice     saledate 
##            0           38           12           12
# Summary of missing values
cat("Summary of missing values:\n")
## Summary of missing values:
print(missing_values)
##         year         make        model         trim         body transmission 
##            0        10301        10399        10651        13195        65352 
##          vin        state    condition     odometer        color     interior 
##            4            0        11820           94          749          749 
##       seller          mmr sellingprice     saledate 
##            0           38           12           12
# Removing rows with missing values
car_prices_clean <- na.omit(car_prices)

# Checking for duplicates
duplicates <- car_prices_clean[duplicated(car_prices_clean), ]
cat("Number of duplicate rows: ", nrow(duplicates), "\n")
## Number of duplicate rows:  0
# Verifying data types
str(car_prices_clean)
## tibble [472,325 × 16] (S3: tbl_df/tbl/data.frame)
##  $ year        : num [1:472325] 2015 2015 2014 2015 2014 ...
##  $ make        : chr [1:472325] "Kia" "Kia" "BMW" "Volvo" ...
##  $ model       : chr [1:472325] "Sorento" "Sorento" "3 Series" "S60" ...
##  $ trim        : chr [1:472325] "LX" "LX" "328i SULEV" "T5" ...
##  $ body        : chr [1:472325] "SUV" "SUV" "Sedan" "Sedan" ...
##  $ transmission: chr [1:472325] "automatic" "automatic" "automatic" "automatic" ...
##  $ vin         : chr [1:472325] "5xyktca69fg566472" "5xyktca69fg561319" "wba3c1c51ek116351" "yv1612tb4f1310987" ...
##  $ state       : chr [1:472325] "ca" "ca" "ca" "ca" ...
##  $ condition   : num [1:472325] 5 5 45 41 43 1 34 2 42 3 ...
##  $ odometer    : num [1:472325] 16639 9393 1331 14282 2641 ...
##  $ color       : chr [1:472325] "white" "white" "gray" "white" ...
##  $ interior    : chr [1:472325] "black" "beige" "black" "black" ...
##  $ seller      : chr [1:472325] "kia motors america  inc" "kia motors america  inc" "financial services remarketing (lease)" "volvo na rep/world omni" ...
##  $ mmr         : num [1:472325] 20500 20800 31900 27500 66000 ...
##  $ sellingprice: num [1:472325] 21500 21500 30000 27750 67000 ...
##  $ saledate    : chr [1:472325] "Tue Dec 16 2014 12:30:00 GMT-0800 (PST)" "Tue Dec 16 2014 12:30:00 GMT-0800 (PST)" "Thu Jan 15 2015 04:30:00 GMT-0800 (PST)" "Thu Jan 29 2015 04:30:00 GMT-0800 (PST)" ...
##  - attr(*, "na.action")= 'omit' Named int [1:86512] 15 17 23 26 29 36 42 45 52 73 ...
##   ..- attr(*, "names")= chr [1:86512] "15" "17" "23" "26" ...
# viewing the cleaned data
head(car_prices_clean)
## # A tibble: 6 × 16
##    year make   model     trim  body  transmission vin   state condition odometer
##   <dbl> <chr>  <chr>     <chr> <chr> <chr>        <chr> <chr>     <dbl>    <dbl>
## 1  2015 Kia    Sorento   LX    SUV   automatic    5xyk… ca            5    16639
## 2  2015 Kia    Sorento   LX    SUV   automatic    5xyk… ca            5     9393
## 3  2014 BMW    3 Series  328i… Sedan automatic    wba3… ca           45     1331
## 4  2015 Volvo  S60       T5    Sedan automatic    yv16… ca           41    14282
## 5  2014 BMW    6 Series… 650i  Sedan automatic    wba6… ca           43     2641
## 6  2015 Nissan Altima    2.5 S Sedan automatic    1n4a… ca            1     5554
## # ℹ 6 more variables: color <chr>, interior <chr>, seller <chr>, mmr <dbl>,
## #   sellingprice <dbl>, saledate <chr>

##Exploring Average Selling Price and Odometer Data

# Calculating the average odometer and selling price
avg_data <- car_prices_clean %>%
  summarize(
    avg_odometer = mean(odometer, na.rm = TRUE),
    avg_sellingprice = mean(sellingprice, na.rm = TRUE)
  )

# Plotting the graph: Average Odometer (Y-axis) vs. Average Selling Price (X-axis)
ggplot(car_prices_clean, aes(x = sellingprice, y = odometer)) +
  geom_point(color = "blue", alpha = 0.5) +  
  geom_hline(yintercept = avg_data$avg_odometer, linetype = "dashed", color = "red") +  
  geom_vline(xintercept = avg_data$avg_sellingprice, linetype = "dashed", color = "red") +  
  scale_x_continuous(labels = scales::dollar_format()) +  
  scale_y_continuous(labels = scales::comma_format()) +  
  labs(
    title = "Selling Price vs. Odometer",
    x = "Selling Price (USD)",
    y = "Odometer (Miles)"
  ) +
  theme_minimal()

The key observation from the graph is the fact that there is a general trend that vehicles that have high odometer readings tend to have a low selling price which is shown by the low concentration of points in the bottom left corner. This illustrates the fact that a car which has been used a lot would fetch low prices which is expected since it has depreciated over time.

##Exploring Total Cars Sold and Average Selling Price for Each Car Make

# The total number of cars sold and average selling price for each make
summary_data <- car_prices_clean %>%
  group_by(make) %>%
  summarize(
    total_cars_sold = n(),
    avg_sellingprice = mean(sellingprice, na.rm = TRUE)
  )

# Creating the plot 
p <- ggplot(summary_data, aes(x = avg_sellingprice, y = total_cars_sold, color = make)) +
  geom_point(size = 3, alpha = 0.7) +  
  scale_x_continuous(labels = scales::dollar_format()) + 
  scale_y_continuous(labels = scales::comma_format(), name = "Total Number of Cars Sold") +
  labs(
    title = "Total Number of Cars Sold vs. Average Selling Price",
    x = "Average Selling Price (USD)",
    color = "Car Make"
  ) +
  theme_minimal()

# Making the plot interactive
interactive_plot <- ggplotly(p)

# Displaying the interactive plot
interactive_plot

The scatter plot showcases the relationship that exists between the total number of cars sold and their average selling price across different makes. One of the key observations from the data is the fact that there are some car makes such as Dodge and Chevrolet which have sold many cars but at a low selling price. On the other hand, there are luxury brands such as Bentley and Ferrari which have higher selling price but they sell fewer cars. Therefore, the data shows that there are cars which attract the mass market and tend to sell many units while the luxury brands sell fewer units.

##Exploring Average MMR and Selling Prices Over Car Years

# Calculating the average MMR and Selling Price for each Car Year
avg_sellingprice_by_year <- car_prices_clean %>%
  group_by(year) %>%
  summarize(
    avg_mmr = mean(mmr, na.rm = TRUE),
    avg_sellingprice = mean(sellingprice, na.rm = TRUE)
  )

# Reshaping the data to have MMR and Selling Price in a single column
avg_sellingprice_long <- avg_sellingprice_by_year %>%
  pivot_longer(
    cols = c(avg_mmr, avg_sellingprice),
    names_to = "price_type",
    values_to = "price"
  )

# Plotting the graph: Car Year (X-axis) vs. Average Price (Y-axis)
ggplot(avg_sellingprice_long, aes(x = year, y = price, color = price_type, group = price_type)) +
  geom_line(linewidth = 1) +  
  scale_y_continuous(labels = scales::dollar_format()) +  
  labs(
    title = "Average MMR and Selling Price by Car Year",
    x = "Car Year",
    y = "Average Price (USD)",
    color = "Price Type"
  ) +
  theme_minimal()

The similarity between mmr and the average selling price indicates that the mmr is a good indicator of the real market price of vehicles. Therefore, the buyers of the cars in the market are paying the correct price and they are not usually overcharged or undercharged by the sellers.

##Understanding the selling Price basing on the car Condition

# Assuming the column containing the condition ratings is named 'condition'

# Dividing the data into four different categories
car_prices_clean <- car_prices_clean %>%
  mutate(condition_category = case_when(
    condition >= 0 & condition <= 19 ~ "Poor",
    condition > 19 & condition <= 25 ~ "Fair",
    condition > 25 & condition <= 37 ~ "Good",
    condition > 37 & condition <= 49 ~ "Excellent",
    TRUE ~ NA_character_  
  ))

# Calculating the average selling price for each condition category
avg_selling_price_by_condition <- car_prices_clean %>%
  group_by(condition_category) %>%
  summarize(avg_sellingprice = mean(sellingprice, na.rm = TRUE))

# Arranging the different categories
avg_selling_price_by_condition$condition_category <- factor(avg_selling_price_by_condition$condition_category,
                                                            levels = c("Poor", "Fair", "Good", "Excellent"))

# Creating a bar chart 
ggplot(avg_selling_price_by_condition, aes(y = condition_category, x = avg_sellingprice, fill = condition_category)) +
  geom_bar(stat = "identity", width = 0.8) + 
  geom_text(aes(label = scales::dollar(avg_sellingprice)),  
            hjust = 2,  
            vjust = 0.5,  
            color = "black",  
            size = 4) +  
  scale_x_continuous(labels = scales::dollar_format()) +  
  labs(
    title = "Average Selling Price by Car Condition",
    x = "Average Selling Price (USD)",
    y = "Car Condition"
  ) +
  scale_fill_manual(values = c("Poor" = "purple", "Fair" = "blue", "Good" = "orange", "Excellent" = "darkgreen")) +  
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 12),  
    axis.text.x = element_text(size = 12),  
    plot.title = element_text(hjust = 0.5, size = 14)  
  )

The graph shows that cars that are new or in excellent condition tend to fetch higher prices compared to cars of any other condition. However, it can also be noted that the cars that are in poor condition tend to fetch higher prices compared to cars in a fair condition. This indicates that there are buyers who may focus a lot on the cars price without paying attention to factors such as age.

#Conclusion

From the analysis, it is evident that vehicles that are new or those that have been well maintained and in good condition cost more due to market forces that need newer models automobiles with modern features and mechanical soundness. Also, the luxury brands command lower volumes of products but at a higher price per unit as compared to the mass market brands that sell their products at comparatively lower prices but in higher volume. Vehicles in poor condition are actually sold at a higher price than vehicles in a fair condition which may be due to some buyers seeking to meet minimum costs or have a desire to renovate.

##Recommendation

Sellers of newer cars, in better condition, from luxury car brands should also focus on the physical condition of the car, as well as the prestigiousness of the car brand to capture the targeted audience of ‘status symbol’ car buyers. For the mass market brands such as automobiles and consumer electronics focusing on the low price and durability can help attract more customers. If selling older or poorly conditioned vehicles, market your car to those who can appreciate a ‘project car’ or a ‘fixer’ because these categories can still bring decent profit, especially if the car is from a well-established manufacture