The primary reason for examining the prices at which vehicles are being sold is to understand the pricing strategies and market differences. This assists enterprises in identifying market trends, setting reasonable and fair prices, and making sound decisions base on stock and sales management therefore improving the overall profitability and positioning in the market.The analysis mainly focuses on the impact of mileage on selling price, types of cars sold, condition and if the sellers always sell the vehicles at the estimated market price.
The vehicle sales dataset comprises various attributes:
state of registration, condition rating, odometer reading, selling price,
library(ggplot2)
library(dplyr)
library(readxl)
library(plotly)
library(tidyr)
#Loading the data
car_prices <- read_excel("C:/Users/HP/Desktop/Automobile Data/car_prices.xlsx")
#Viewing the data
head(car_prices)
## # A tibble: 6 × 16
## year make model trim body transmission vin state condition odometer
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 2015 Kia Sorento LX SUV automatic 5xyk… ca 5 16639
## 2 2015 Kia Sorento LX SUV automatic 5xyk… ca 5 9393
## 3 2014 BMW 3 Series 328i… Sedan automatic wba3… ca 45 1331
## 4 2015 Volvo S60 T5 Sedan automatic yv16… ca 41 14282
## 5 2014 BMW 6 Series… 650i Sedan automatic wba6… ca 43 2641
## 6 2015 Nissan Altima 2.5 S Sedan automatic 1n4a… ca 1 5554
## # ℹ 6 more variables: color <chr>, interior <chr>, seller <chr>, mmr <dbl>,
## # sellingprice <dbl>, saledate <chr>
#cleaning the data
# Checking for missing values in the dataset
missing_values <- sapply(car_prices, function(x) sum(is.na(x)))
missing_values
## year make model trim body transmission
## 0 10301 10399 10651 13195 65352
## vin state condition odometer color interior
## 4 0 11820 94 749 749
## seller mmr sellingprice saledate
## 0 38 12 12
# Summary of missing values
cat("Summary of missing values:\n")
## Summary of missing values:
print(missing_values)
## year make model trim body transmission
## 0 10301 10399 10651 13195 65352
## vin state condition odometer color interior
## 4 0 11820 94 749 749
## seller mmr sellingprice saledate
## 0 38 12 12
# Removing rows with missing values
car_prices_clean <- na.omit(car_prices)
# Checking for duplicates
duplicates <- car_prices_clean[duplicated(car_prices_clean), ]
cat("Number of duplicate rows: ", nrow(duplicates), "\n")
## Number of duplicate rows: 0
# Verifying data types
str(car_prices_clean)
## tibble [472,325 × 16] (S3: tbl_df/tbl/data.frame)
## $ year : num [1:472325] 2015 2015 2014 2015 2014 ...
## $ make : chr [1:472325] "Kia" "Kia" "BMW" "Volvo" ...
## $ model : chr [1:472325] "Sorento" "Sorento" "3 Series" "S60" ...
## $ trim : chr [1:472325] "LX" "LX" "328i SULEV" "T5" ...
## $ body : chr [1:472325] "SUV" "SUV" "Sedan" "Sedan" ...
## $ transmission: chr [1:472325] "automatic" "automatic" "automatic" "automatic" ...
## $ vin : chr [1:472325] "5xyktca69fg566472" "5xyktca69fg561319" "wba3c1c51ek116351" "yv1612tb4f1310987" ...
## $ state : chr [1:472325] "ca" "ca" "ca" "ca" ...
## $ condition : num [1:472325] 5 5 45 41 43 1 34 2 42 3 ...
## $ odometer : num [1:472325] 16639 9393 1331 14282 2641 ...
## $ color : chr [1:472325] "white" "white" "gray" "white" ...
## $ interior : chr [1:472325] "black" "beige" "black" "black" ...
## $ seller : chr [1:472325] "kia motors america inc" "kia motors america inc" "financial services remarketing (lease)" "volvo na rep/world omni" ...
## $ mmr : num [1:472325] 20500 20800 31900 27500 66000 ...
## $ sellingprice: num [1:472325] 21500 21500 30000 27750 67000 ...
## $ saledate : chr [1:472325] "Tue Dec 16 2014 12:30:00 GMT-0800 (PST)" "Tue Dec 16 2014 12:30:00 GMT-0800 (PST)" "Thu Jan 15 2015 04:30:00 GMT-0800 (PST)" "Thu Jan 29 2015 04:30:00 GMT-0800 (PST)" ...
## - attr(*, "na.action")= 'omit' Named int [1:86512] 15 17 23 26 29 36 42 45 52 73 ...
## ..- attr(*, "names")= chr [1:86512] "15" "17" "23" "26" ...
# viewing the cleaned data
head(car_prices_clean)
## # A tibble: 6 × 16
## year make model trim body transmission vin state condition odometer
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 2015 Kia Sorento LX SUV automatic 5xyk… ca 5 16639
## 2 2015 Kia Sorento LX SUV automatic 5xyk… ca 5 9393
## 3 2014 BMW 3 Series 328i… Sedan automatic wba3… ca 45 1331
## 4 2015 Volvo S60 T5 Sedan automatic yv16… ca 41 14282
## 5 2014 BMW 6 Series… 650i Sedan automatic wba6… ca 43 2641
## 6 2015 Nissan Altima 2.5 S Sedan automatic 1n4a… ca 1 5554
## # ℹ 6 more variables: color <chr>, interior <chr>, seller <chr>, mmr <dbl>,
## # sellingprice <dbl>, saledate <chr>
##Exploring Average Selling Price and Odometer Data
# Calculating the average odometer and selling price
avg_data <- car_prices_clean %>%
summarize(
avg_odometer = mean(odometer, na.rm = TRUE),
avg_sellingprice = mean(sellingprice, na.rm = TRUE)
)
# Plotting the graph: Average Odometer (Y-axis) vs. Average Selling Price (X-axis)
ggplot(car_prices_clean, aes(x = sellingprice, y = odometer)) +
geom_point(color = "blue", alpha = 0.5) +
geom_hline(yintercept = avg_data$avg_odometer, linetype = "dashed", color = "red") +
geom_vline(xintercept = avg_data$avg_sellingprice, linetype = "dashed", color = "red") +
scale_x_continuous(labels = scales::dollar_format()) +
scale_y_continuous(labels = scales::comma_format()) +
labs(
title = "Selling Price vs. Odometer",
x = "Selling Price (USD)",
y = "Odometer (Miles)"
) +
theme_minimal()
The key observation from the graph is the fact that there is a general trend that vehicles that have high odometer readings tend to have a low selling price which is shown by the low concentration of points in the bottom left corner. This illustrates the fact that a car which has been used a lot would fetch low prices which is expected since it has depreciated over time.
##Exploring Total Cars Sold and Average Selling Price for Each Car Make
# The total number of cars sold and average selling price for each make
summary_data <- car_prices_clean %>%
group_by(make) %>%
summarize(
total_cars_sold = n(),
avg_sellingprice = mean(sellingprice, na.rm = TRUE)
)
# Creating the plot
p <- ggplot(summary_data, aes(x = avg_sellingprice, y = total_cars_sold, color = make)) +
geom_point(size = 3, alpha = 0.7) +
scale_x_continuous(labels = scales::dollar_format()) +
scale_y_continuous(labels = scales::comma_format(), name = "Total Number of Cars Sold") +
labs(
title = "Total Number of Cars Sold vs. Average Selling Price",
x = "Average Selling Price (USD)",
color = "Car Make"
) +
theme_minimal()
# Making the plot interactive
interactive_plot <- ggplotly(p)
# Displaying the interactive plot
interactive_plot
The scatter plot showcases the relationship that exists between the total number of cars sold and their average selling price across different makes. One of the key observations from the data is the fact that there are some car makes such as Dodge and Chevrolet which have sold many cars but at a low selling price. On the other hand, there are luxury brands such as Bentley and Ferrari which have higher selling price but they sell fewer cars. Therefore, the data shows that there are cars which attract the mass market and tend to sell many units while the luxury brands sell fewer units.
##Exploring Average MMR and Selling Prices Over Car Years
# Calculating the average MMR and Selling Price for each Car Year
avg_sellingprice_by_year <- car_prices_clean %>%
group_by(year) %>%
summarize(
avg_mmr = mean(mmr, na.rm = TRUE),
avg_sellingprice = mean(sellingprice, na.rm = TRUE)
)
# Reshaping the data to have MMR and Selling Price in a single column
avg_sellingprice_long <- avg_sellingprice_by_year %>%
pivot_longer(
cols = c(avg_mmr, avg_sellingprice),
names_to = "price_type",
values_to = "price"
)
# Plotting the graph: Car Year (X-axis) vs. Average Price (Y-axis)
ggplot(avg_sellingprice_long, aes(x = year, y = price, color = price_type, group = price_type)) +
geom_line(linewidth = 1) +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Average MMR and Selling Price by Car Year",
x = "Car Year",
y = "Average Price (USD)",
color = "Price Type"
) +
theme_minimal()
The similarity between mmr and the average selling price indicates that the mmr is a good indicator of the real market price of vehicles. Therefore, the buyers of the cars in the market are paying the correct price and they are not usually overcharged or undercharged by the sellers.
##Understanding the selling Price basing on the car Condition
# Assuming the column containing the condition ratings is named 'condition'
# Dividing the data into four different categories
car_prices_clean <- car_prices_clean %>%
mutate(condition_category = case_when(
condition >= 0 & condition <= 19 ~ "Poor",
condition > 19 & condition <= 25 ~ "Fair",
condition > 25 & condition <= 37 ~ "Good",
condition > 37 & condition <= 49 ~ "Excellent",
TRUE ~ NA_character_
))
# Calculating the average selling price for each condition category
avg_selling_price_by_condition <- car_prices_clean %>%
group_by(condition_category) %>%
summarize(avg_sellingprice = mean(sellingprice, na.rm = TRUE))
# Arranging the different categories
avg_selling_price_by_condition$condition_category <- factor(avg_selling_price_by_condition$condition_category,
levels = c("Poor", "Fair", "Good", "Excellent"))
# Creating a bar chart
ggplot(avg_selling_price_by_condition, aes(y = condition_category, x = avg_sellingprice, fill = condition_category)) +
geom_bar(stat = "identity", width = 0.8) +
geom_text(aes(label = scales::dollar(avg_sellingprice)),
hjust = 2,
vjust = 0.5,
color = "black",
size = 4) +
scale_x_continuous(labels = scales::dollar_format()) +
labs(
title = "Average Selling Price by Car Condition",
x = "Average Selling Price (USD)",
y = "Car Condition"
) +
scale_fill_manual(values = c("Poor" = "purple", "Fair" = "blue", "Good" = "orange", "Excellent" = "darkgreen")) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 12),
axis.text.x = element_text(size = 12),
plot.title = element_text(hjust = 0.5, size = 14)
)
The graph shows that cars that are new or in excellent condition tend to fetch higher prices compared to cars of any other condition. However, it can also be noted that the cars that are in poor condition tend to fetch higher prices compared to cars in a fair condition. This indicates that there are buyers who may focus a lot on the cars price without paying attention to factors such as age.
#Conclusion
From the analysis, it is evident that vehicles that are new or those that have been well maintained and in good condition cost more due to market forces that need newer models automobiles with modern features and mechanical soundness. Also, the luxury brands command lower volumes of products but at a higher price per unit as compared to the mass market brands that sell their products at comparatively lower prices but in higher volume. Vehicles in poor condition are actually sold at a higher price than vehicles in a fair condition which may be due to some buyers seeking to meet minimum costs or have a desire to renovate.
##Recommendation
Sellers of newer cars, in better condition, from luxury car brands should also focus on the physical condition of the car, as well as the prestigiousness of the car brand to capture the targeted audience of ‘status symbol’ car buyers. For the mass market brands such as automobiles and consumer electronics focusing on the low price and durability can help attract more customers. If selling older or poorly conditioned vehicles, market your car to those who can appreciate a ‘project car’ or a ‘fixer’ because these categories can still bring decent profit, especially if the car is from a well-established manufacture