2025-10-31

The Dataset

This project has a dataset called BMW_sales_data from Kaggle.

  • The data set contains the sales from 2010 - 2024 for BMW. It has the model and trim details which include the Year, Fuel_Type, Transmission,Engine_Size, and Prize.

  • The data set has 11 columns and it is an analysis of how mileage, engine size, and market region relate to price. It also shows how the sales pattern differs across different regions.

## # A tibble: 10 × 12
##    Model     Year Region   Color Fuel_Type Transmission Engine_Size_L Mileage_KM
##    <chr>    <int> <chr>    <chr> <chr>     <chr>                <dbl>      <dbl>
##  1 5 Series  2016 Asia     Red   Petrol    Manual                 3.5     151748
##  2 i8        2013 North A… Red   Hybrid    Automatic              1.6     121671
##  3 5 Series  2022 North A… Blue  Petrol    Automatic              4.5      10991
##  4 X3        2024 Middle … Blue  Petrol    Automatic              1.7      27255
##  5 7 Series  2020 South A… Black Diesel    Manual                 2.1     122131
##  6 5 Series  2017 Middle … Silv… Diesel    Manual                 1.9     171362
##  7 i8        2022 Europe   White Diesel    Manual                 1.8     196741
##  8 M5        2014 Asia     Black Diesel    Automatic              1.6     121156
##  9 X3        2016 South A… White Diesel    Automatic              1.7      48073
## 10 i8        2019 Europe   White Electric  Manual                 3        35700
## # ℹ 4 more variables: Price_USD <dbl>, Sales_Volume <dbl>,
## #   Sales_Classification <chr>, Fuel_type <chr>

## Brief Overview We are going to look at the ggplot and plotly visuals and an easy statistical summary to answer: How the mileage, engine size, and region explain the variation in the BMW prices?

Layout: - Line chart: The average price by region and year - Scatter plot: Price vs Mileage (size = engine size, color = region) - Interactive scatter plot: explore data with hover details - Pie charts: Fuel type shares by region - 3D plot (plotly): Price vs Engine Size vs Mileage - Box plot: Price distribution by region and by Sales_Classification - Stats: five-number summaries by the region and linear model

Ggplot: Average price by region and year

This shows price trends by region over time. We use a mean line per region and include points for each year.

Ggplot: Price vs Mileage (3 variables)

Scatter plot of Price and Mileage; point size represents the Engine size and color indicates Region.

Plotly: Interactive scatter

Hover over to see the model, year, and specs. Use the legend to filter the regions.

Plotly 3D plot

3D scatter of price (z) by Mileage (x) and Engine size (y), colored by Region.

Pie Charts

Fuel type shares by region (Electric, Hybrid, Petrol, Diesel, etc.).

Ggplot Boxplot: Price by region

Boxplot = Price by sales

## Stats Analysis

Compute five-number summarises by region and fit a simple linear model explaining price by mileage, engine size, year, and sales volume.

## # A tibble: 6 × 7
##   Region          Min     Q1 Median     Q3    MAX     n
##   <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <int>
## 1 Africa        30002 52114   74742 97181  119997  8253
## 2 Asia          30000 52931.  76292 98348. 119997  8454
## 3 Europe        30028 52488.  74830 97405. 119985  8334
## 4 Middle East   30008 51953   74364 97478  119998  8373
## 5 North America 30032 52470.  74898 97917  119992  8335
## 6 South America 30020 52629   74935 97253  119986  8251
## 
## Call:
## lm(formula = Price_USD ~ Mileage_KM + Engine_Size_L + Year + 
##     Sales_Volume, data = bmw)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -45310 -22606      6  22605  45185 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)    3.174e+04  5.423e+04   0.585    0.558
## Mileage_KM    -1.916e-03  2.007e-03  -0.954    0.340
## Engine_Size_L  3.414e+00  1.152e+02   0.030    0.976
## Year           2.155e+01  2.689e+01   0.802    0.423
## Sales_Volume   7.300e-04  4.070e-02   0.018    0.986
## 
## Residual standard error: 26000 on 49995 degrees of freedom
## Multiple R-squared:  3.084e-05,  Adjusted R-squared:  -4.917e-05 
## F-statistic: 0.3854 on 4 and 49995 DF,  p-value: 0.8192

Conclusion

Overall, BMW prices tend to decrease with higher mileage and increase with larger engine sizes. Regional differences are modest, though Europe and North America show slightly higher price ranges, reflecting stronger demand for luxury models. Fuel type distribution is fairly balanced, with hybrid and electric models showing steady growth. These results highlight how performance features and market location place key roles in shaping BMW pricing trends over time