Final Project BAIS 462

Author

Chase Gray

Introduction

This is my introduction as a graduating member of the Business Analytics major from Xavier University! In this post, I will be exploring the intricacies of the production of cars. What drives the changes manufacturers make in cars. Performance? Fuel efficiency? Practicality? I’m sure all of those impact car manufacturers to an extent, but each segment has its own manufacturers attempting to release the best vehicle for the desired customer base. I will be exploring the high performance section. Even though these vehicles are way out of my budget, I am interested in how people justify the high price tags and maintenance required for owning these luxury cars.

Primary Data Set

With 5000 recorded vehicles, there are plenty of data points ranging from 1980 to 2025. In this analysis, I will be using some of these categories to dig into the lure of exotic cars. This data was retrieved through a data website called Kaggle. Props to the creator for making this data public!. A link to the data set will be listed here for those that wish to do their own analysis (it might make you sign up though):

https://www.kaggle.com/datasets/wlwwwlw/elite-sports-cars-in-data

Data Dictionary

This data set contains the following observations:

Observation Description
Brand Car Manufacturer
Model Specific Model of the car
Year Manufacturing Year
Country Region of production
Condition Car condition
Engine Size Engine Displacement in liters
Horsepower Engine power output in horsepower
Torque Torque produced by the engine in nanometers
Weight Vehicle Weight in kilograms
Top speed Maximum speed of the car in kilometers/hour
Acceleration Time taken to accelerate from 0 to 100 in kilometers/hour
Fuel Type Type of fuel used: Petrol, Diesel, or Electric
Drivetrain Type of drive system: RWD, AWD, or FWD
Transmission Transmission Type: Manual, Automatic, DCT, or CVT
Fuel Efficiency Fuel Consumption in liters per 100 kilometer
CO2 Emissions Carbon dioxide emissions in gram/kilometer
Price Vehicle price in USD
Mileage Total kilometers driven
Popularity Market popularity: Low, medium, or high
Safety Rating Safety rating: 1 to 5 stars
Number of owners Number of previous owners
Market Demand Estimated market demand: Low, medium or high
Insurance Cost Estimated yearly insurance cost in USD
Production Units Number of units produced: indicates rarity
Log Price Log transformed price for better regression modeling
Log Mileage Log transformed mileage for better statistical analysis
Modification Special edition or modification type

Price

Price is an important factor in this analysis because the more high end cars should have a higher price tag. We can use the price to determine what price people are willing to pay for high end cars.

When working with a lot of data like this, it is important to try to get the visualizations to be visually pleasing. In order to accomplish this, I grouped the Price Data into their different Brands. I also adjusted the bin range and details can make the visualization easier to read.

In this visualization, we have a distribution of the Price split off by manufacturer. Surprisingly they all draw similar observations. The first being a higher frequency around the $100,000 range compared to anything less than that. Then it typically stays pretty consistent, either rising or decreasing, until we get around the $500,000 marker. Then the frequency drops significantly. This could be because a lot of these brands only have a few vehicles that go above the $500,000 mark, but there still is a market for those vehicles. The amount of people willing to spend half a million on a car is probably what is keeping the vehicle count low however. There is plenty of market for vehicles in the $100,000 to $400,000 range though, so companies could leverage this to introduce new products that people would be interested in.

Rebounding to the initial question of price impacting the car being purchased, I believe there is a slight impact. Since we can see almost, if not, all manufacturers slowly increase frequency as they reach the $200,000 to $400,00 range, there is plenty of reason to believe people are willing to spend the money in that range for a vehicle.

Drivetrain

Most people buying high end vehicles tend to do so because they enjoy driving. A popular belief is that sports cars should be rear wheel drive because there is more fun functionality of a rear wheel drive like drifting, lack of grip to the ground, and more responsive steering/acceleration. We can see if the manufacturers think the same.

In order to create the percentages, there was a new object created in order to calculate the percentages for each drivetrain. With the percentages, they could be easily added to the visualization.

We can see from this visualization that there is a pretty even number of cars with the different drive trains. As expected, rear wheel drive is the leader, but is not as far ahead as we would think. All wheel drive is the next closest, only .01 percent behind. This makes sense because all wheel drive have some of the quickest accelerations due to sending power to all the wheels, making it easier to grip the road compared to the other two drivetrains. It is clear that there is not a large significance in drive train. Since our previous popular assumption is actually not correct, so it begs the question, should manufacturers start prioritizing the thrill of driving of rear wheel drive over practicality of all and front wheel drive cars?

Number of Owners and Price

It would be an easy assumption that the number of owners effects the average price of the car, right?

Since the data is actually quite similar no matter how many owners, quartile values were added to the visualization to easier depict the box plot.

We can see that the price of these vehicles are quite similar, no matter how many owners it previously had. Previous ownership has very little impact on price and that can be from several different assumptions. The first being that it is assumed the previous owners had taken good care of the very expensive cars on this list, which isn’t very outlandish to think. Another thing could be cars appreciating over time. This one is probably further out than the other, but typically cars that are quality made, taken care of, and are more of a collectors item typically have pretty good resale value contrary to the “depreciating asset” most have become accustomed to when talking about cars.

Mileage and Price

Similar to the last visualization, it can be assumed that more mileage yields a lower price. However, the number of owners did not seem to have too much bearing on it so maybe this assumption will also be wrong.

This visualization seems very messy, but I think the scatter plot with a trend line is the best way to show the relationship with a large amount of data.

We can see that the rend line in this scatter plot is not very wavy. If anything, the trend line fluctuates the more mileage. An interesting discovery because in the typical daily driven car, the mileage determines how low the price is, among other dealership shenanigans. Similar to the last visual, Mileage has very little influence on price for these high end cars. For anyone that is lucky enough to have one of these cars, it is good to know you will not have to take a hit on equity if you ever wanted to sell the car. Maintenance is an entirely other issue at hand, so keeping this car may be more expensive than selling it in the long run

Condition and Price

I believe the vehicles condition is the most influential variable when drawing a connection to price. Collectors cars are among the most expensive, and are typcially restored so it is assumed that restored vehicles have the highest vlaues. Used car prices should be lower than new car prices, but we saw very little correlation between mileage and number of owners to price so this should be an interesting one.

In order to add the label value, I had to calculate the average price per group and input it into the graph.

As we can see, restored was the highest value. However, it was not much further ahead of used and new. Used being more expensive than new is mind blowing. It is crazy that buying a used high end car is more expensive than getting a new one. It corroborates my idea about these cars appreciating over time, but i didn’t think there would actually be a $6000 difference one the average used car versus a new one. If you are looking to buy a sports car, it may be worth it to buy a new one instead of a used because it is not cheaper to buy used on average.

Secondary source

Google Places API

For my secondary source, I will be using the Google’s API’s to supplement my larger data set. In the Google API data, the observations are as follows:

Data Dictionary

Observation Description
Name The name of the dealership
Vicinity Address of the dealership
Geometry.location.lat Lattitude coordinates for the dealership
Geometry.location.lng Longitude coordinates for the dealership
Place_id Unique identifier for this business
Types What kind of business it is
Icon The image icon for this business
Ratings The average rating for the dealership
Total_ratings_total Total ratings for the dealership
Brand What brand does this dealership primarily sell

I will be using the ratings of the local businesses to evaluate which brands are the easiest to buy from dealerships. Since we established earlier that it may be cheaper to buy a newer car, dealing with a dealership may be something that has to be dealt with.

Ratings

We will be exploring the ratings of luxury dealerships in the Cincinnati area. Hopefully this can point us in a direction to what brand usually has the best customer experience since we are spending up to $400,000 on the car itself.

An object was needed to be created to calculate the average rating per brand. Then it can be used in the visualization.

We can see from this visualization that, generally, if we wanted to buy a luxury car in Cincinnati from one of these five brands, there wouldn’t be much difference when it comes to dealership experience. Its not very uprising that Lexus has the highest satisfaction, as Toyotas are typically lower maintenance and are lower cost to maintain them. However, Audi and Porsche, who are owned by the same ownership group, slightly edge out BMW and Mercedes in ratings. Seems as though Mercedes and BMW are on the lower end of the reviews because they are typically more expensive to maintain. In any case, if we were to buy a luxury car from a dealer, we would want it to be Lexus or from a Volkswagen group dealer (Audi and Porsche).

Number of Reviews

We will be seeing if the number of reviews has any sort of sway over the average rating by brand we looked at above. A lack of reviews can really impact the overall rating because there is less bulk of reviews to dilute some of the bad reviews (bad reviews are bound to happen).

Similar to above, I calculated the average rating and average number of reviews by brand. They then can be used in the visualization

This visualization clarifies the question we prompted in the previous analysis we did on ratings. Even though BMW and Mercedes have lower average ratings, they do have more ratings than Audi and Porsche do. However, Porsche and Audi have the edge on quality of ratings over BMW and Mercedes. Funnily enough, Lexus dominates all the other 5 manufacturers in quality and quantity. Lexus definitely seems like the way to go if you’re going to buy a high end car. Obviously bad reviews can be deleted or changed but in a perfect world we are going to assume the dealerships are not deleting their bad reviews to improve their rating.

Final Thoughts

After reviewing loads of information, I would like to give my final thoughts on the high end car segment. As a college student getting ready to graduate, these cars aren’t exactly in my near future. A guy can window shop though. However, it doesn’t hurt to be “in the know” about the segment and “being in the know” can lead to other opportunities later in life so I’m glad I spent the time diving into this topic.

We learned from the first data set that price really does not discriminate much when it comes to mileage, number of owners, or condition. However, the most popular and expensive are those that are restored. This can be anything from old cars being restored to newer cars being changed for buyer preference. Also, there is not much difference in drivetrain, probably thanks to the introduction of electric and hybrid cars.

Though there is more data in the first source, the second source gives some local, important information about dealers. Knowing who you should buy your vehicle from, no matter the condition, is an important thing to know before beginning your search for your new vehicle.