2025-11-02

The Dataset

- This project analyzes used-car listings from autos.csv (Germany, eBay Kleinanzeigen style data). - Key variables include: price, kilometer, powerPS, yearOfRegistration, vehicleType, gearbox, fuelType, brand. - We created helper fields age (2016 - year) and km_per_year.

Brief Overview

We will explore the data with several visualizations and a brief summary table:

  • Scatter plot (ggplot2): Price vs. Power, colored by Fuel Type (3 variables).
  • Histogram (ggplot2): Distribution of prices after cleaning.
  • Pie Chart (plotly): Share of listings by vehicle type.
  • Box Plot (ggplot2): Price distribution by vehicle type.
  • 3D Scatter (plotly): Price ~ Power & Kilometers, colored by Fuel Type.
  • Statistical Analysis: Five-number summaries of price by vehicle type.

Ggplot Price Distribution

A quick look at the distribution of asking prices (post-cleaning) to understand scale and skew.

Ggplot Scatter (3 variables)

Price vs. engine power (PS), colored by fuel type to show cluster differences across drivetrains.

Plotly Pie Chart

Share of listings by vehicle type. Useful for context on the composition of the dataset.

Ggplot Box Plot

Distribution of prices across vehicle types to compare medians and spread; outliers truncated in cleaning step.

Plotly 3D plot

3D scatter of Price, Power (PS), and Kilometers; color encodes Fuel Type.

Statistical analysis

Five-number summary of Price by Vehicle Type to compare levels and spread.

## # A tibble: 9 × 6
##   vehicleType    Min    Q1 Median    Q3    Max
##   <fct>        <dbl> <dbl>  <dbl> <dbl>  <dbl>
## 1 ""               1   400    999  2650 130000
## 2 "andere"         1  1000   2500  5200  99990
## 3 "bus"            1  2300   4500  8500 109000
## 4 "cabrio"         1  3200   6500 13100 149999
## 5 "coupe"          1  2000   5500 14800 149500
## 6 "kleinwagen"     1   799   1599  3699 111111
## 7 "kombi"          1  1500   3500  7999 145000
## 8 "limousine"      1  1490   3399  7950 149000
## 9 "suv"            1  5500  10800 17500 127500

Conclusion and Thanks!

- Prices rise with power (PS) and tend to fall with high kilometers and age. - Body style and fuel type exhibit clear differences in both median price and dispersion. - The 3D view helps reveal clusters/outliers that are harder to see in 2D.