Last Report

Author

Sarp Ata Kanca, Yagmur Beren Sengezken

ECON 465 Stage 3

Predicting Used Vehicle Prices

Authors

Sarp Ata Kanca

Yagmur Beren Sengezken


Economic Question

Research Question

To what extent do vehicle age, mileage, condition, and engine size predict the market price of a used vehicle?

Why Does It Matter?

  • Used vehicle markets are economically important.
  • Consumers need accurate price expectations.
  • Sellers need efficient pricing strategies.
  • Vehicle depreciation affects household wealth.

Dataset

Source

Kaggle: Craigslist Cars & Trucks Dataset

Sample

  • Original observations: 426,880
  • Cleaned sample: 170,439 observations

Variables

Price: Vehicle Listing Price , Odometer:Miles Driven , Cylinders:Engine Size

Year: Manufacturing Year , Condition:Vehicle Condition


Price Distribution

{r} library(tidyverse) cars_raw <- read_csv("vehicles.csv") cars_clean <- cars_raw %>% filter( price > 500, price < 100000, !is.na(year), !is.na(odometer), !is.na(condition), !is.na(cylinders) ) %>% mutate( log_price = log(price) ) ggplot(cars_clean, aes(price)) + geom_histogram(bins = 50) + labs( title = "Distribution of Vehicle Prices", x = "Price (USD)", y = "Frequency" )}

Observation

  • Strong right-skewed distribution.
  • Extreme high-value listings present.
  • Log transformation improves model performance.

Model A

Baseline Regression

[ log(price)=_0+_1(year)+_2(odometer)]

Economic Intuition

  • Newer vehicles should be worth more.
  • Higher mileage should reduce value.

Model B

Extended Regression

[ log(price)=_0+_1(year)+_2(odometer)+_3(condition)+_4(cylinders)]

Why Add These Variables?

  • Vehicle quality matters.
  • Engine size affects consumer demand.
  • More realistic representation of market valuation.

Model Comparison

Metric Model A Model B
RMSE 0.8344 0.7392
R?? 0.1383 0.3237

Result

Model B clearly performs better.

  • Lower prediction error
  • Higher explanatory power

Cross-Validation Results

5-Fold Cross Validation

Metric Mean
RMSE 0.729
R’2 0.339

Interpretation

  • Model performance is stable across samples.
  • Results generalize well to unseen observations.

Key Findings & Economic Interpretation

Vehicle Age

  • Effect: Positive coefficient.

    Interpretation: Newer manufacturing cohorts command statistically significant premiums, as they represent a lower baseline risk of mechanical failures.

Odometer

  • Effect: Negative coefficient.

    Interpretation: More mileage continuously scales down market values, directly reflecting the physical depreciation of the asset through operational usage.

Condition

  • Effect: Strong non-linear premiums and discounts.

    Interpretation: “Like New” listings capture substantial valuation premiums, whereas “Salvage” classifications trigger heavy immediate market discounts due to title restrictions and structural damage.


Economic Interpretation

Depreciation

Vehicle prices decrease as usage increases.

Consumer Valuation

Buyers pay premiums for:

  • Newer vehicles
  • Better condition
  • Larger engines

Market Implications

Observable quality characteristics strongly influence market outcomes.


Limitations

Limitation 1

Omitted Variable Bias (OVB): Important vehicle characteristics were omitted from our regressions due to dataset scope limitations, notably Manufacturer/Brand reputation, exact Model line tiers, Fuel efficiency (MPG), and regional geographic market location variables.

Limitation 2

Asking vs. Transaction Prices: The dataset relies entirely on initial public seller listing prices rather than finalized contract clearing transactions. Because secondary vehicle transactions typically involve localized bargaining, our models capture nominal consumer pricing expectations rather than pure market equilibrium values.

Limitation 3

Self-Reporting Bias: The data is entirely self-reported by private sellers, introducing subjective flaws into categorical metrics like vehicle condition.


Future Research

Potential Improvement

Include:

  • Brand effects
  • Fuel efficiency
  • Geographic market differences

New Economic Question

Do vehicle brands create a measurable price premium after controlling for age, mileage, condition, and engine size?


Conclusion

Main Takeaways

Vehicle age and mileage are significant predictors.

Condition substantially improves prediction accuracy.

Model B outperforms Model A.

Results are consistent with economic theory of depreciation.

Thank You

Questions?

Reproducibility Protocols

To guarantee full structural reproducibility across diverse operational platforms, the following design parameters were enforced:

  • Relative Pathing: Data ingress relies exclusively on relative path commands (read_csv("vehicles.csv")), rendering the execution script independent of hardcoded local drive environments.

  • Stochastic Isolation: Global pseudo-random distribution states are pinned via set.seed(465) at the setup phase before any data partitioning or cross-validation sampling occurs.

  • Standardized Coding Environment: Scripting utilizes standard, stable container packages within the tidyverse framework to prevent breaking changes during runtime execution.

AI Use Log

  • Assisting AI System: Gemini (Advanced Architecture Engine)

  • Applied Interaction Strategy: Layout syntax automation, code consolidation, and structural Markdown optimization.

  • Raw User Prompt Input: “Show me to how to solve this problem in quarto slides document in fastest way possible”

  • Applied Output Implementation: The AI response provided clean styling mechanics (such as the custom {.smaller} header tags and column containment blocks) to fix layout overflow issues.

  • Verification and Modification: The styling tips were verified via local engine compilation (quarto render). These slide design components were then manually converted into a formal, continuous narrative report structure. This was done by replacing presentation slide breaks (---) with descriptive paragraphs to satisfy the final essay requirement.

Final Reflections

Strategic Path Improvements

Given a more extended research timeline or access to deeper computational power, our primary structural improvement would involve implementing high-cardinality fixed-effects modeling for vehicle brands and models. Controlling for manufacturer identity would effectively insulate our continuous mileage and age coefficients from brand-equity bias (such as the slow structural depreciation rates of reliable commuter brands compared to high-end luxury lines).

Future Economic Research Questions

The insights developed across this econometric evaluation inspire a compelling new research question:

“Do secondary market vehicle brands display asymmetric depreciation elasticities across varying regional economic environments during inflationary contraction cycles?”

Answering this question would reveal whether affordable economy vehicle choices behave as Giffen or defensive assets when aggregate consumer purchasing power contracts.

Conclusion

Main Takeaways

  • Vehicle age and structural usage mileage are significant, robust predictors of asset depreciation
  • Incorporating qualitative condition metrics drastically improves model prediction accuracy and reduces error.

  • Extended Specification Model B decisively outperforms the baseline model across all metrics.

  • Empirical results match classical economic depreciation theories, showing how asset usage and features dictate consumer valuation.