Last Report
ECON 465 Stage 3
Predicting Used Vehicle Prices
Authors
Sarp Ata Kanca
Yagmur Beren Sengezken
Economic Question
Research Question
To what extent do vehicle age, mileage, condition, and engine size predict the market price of a used vehicle?
Why Does It Matter?
- Used vehicle markets are economically important.
- Consumers need accurate price expectations.
- Sellers need efficient pricing strategies.
- Vehicle depreciation affects household wealth.
Dataset
Source
Kaggle: Craigslist Cars & Trucks Dataset
Sample
- Original observations: 426,880
- Cleaned sample: 170,439 observations
Variables
Price: Vehicle Listing Price , Odometer:Miles Driven , Cylinders:Engine Size
Year: Manufacturing Year , Condition:Vehicle Condition
Price Distribution
{r} library(tidyverse) cars_raw <- read_csv("vehicles.csv") cars_clean <- cars_raw %>% filter( price > 500, price < 100000, !is.na(year), !is.na(odometer), !is.na(condition), !is.na(cylinders) ) %>% mutate( log_price = log(price) ) ggplot(cars_clean, aes(price)) + geom_histogram(bins = 50) + labs( title = "Distribution of Vehicle Prices", x = "Price (USD)", y = "Frequency" )}
Observation
- Strong right-skewed distribution.
- Extreme high-value listings present.
- Log transformation improves model performance.
Model A
Baseline Regression
[ log(price)=_0+_1(year)+_2(odometer)]
Economic Intuition
- Newer vehicles should be worth more.
- Higher mileage should reduce value.
Model B
Extended Regression
[ log(price)=_0+_1(year)+_2(odometer)+_3(condition)+_4(cylinders)]
Why Add These Variables?
- Vehicle quality matters.
- Engine size affects consumer demand.
- More realistic representation of market valuation.
Model Comparison
| Metric | Model A | Model B |
|---|---|---|
| RMSE | 0.8344 | 0.7392 |
| R?? | 0.1383 | 0.3237 |
Result
Model B clearly performs better.
- Lower prediction error
- Higher explanatory power
Cross-Validation Results
5-Fold Cross Validation
| Metric | Mean |
|---|---|
| RMSE | 0.729 |
| R’2 | 0.339 |
Interpretation
- Model performance is stable across samples.
- Results generalize well to unseen observations.
Key Findings & Economic Interpretation
Vehicle Age
Effect: Positive coefficient.
Interpretation: Newer manufacturing cohorts command statistically significant premiums, as they represent a lower baseline risk of mechanical failures.
Odometer
Effect: Negative coefficient.
Interpretation: More mileage continuously scales down market values, directly reflecting the physical depreciation of the asset through operational usage.
Condition
Effect: Strong non-linear premiums and discounts.
Interpretation: “Like New” listings capture substantial valuation premiums, whereas “Salvage” classifications trigger heavy immediate market discounts due to title restrictions and structural damage.
Economic Interpretation
Depreciation
Vehicle prices decrease as usage increases.
Consumer Valuation
Buyers pay premiums for:
- Newer vehicles
- Better condition
- Larger engines
Market Implications
Observable quality characteristics strongly influence market outcomes.
Limitations
Limitation 1
Omitted Variable Bias (OVB): Important vehicle characteristics were omitted from our regressions due to dataset scope limitations, notably Manufacturer/Brand reputation, exact Model line tiers, Fuel efficiency (MPG), and regional geographic market location variables.
Limitation 2
Asking vs. Transaction Prices: The dataset relies entirely on initial public seller listing prices rather than finalized contract clearing transactions. Because secondary vehicle transactions typically involve localized bargaining, our models capture nominal consumer pricing expectations rather than pure market equilibrium values.
Limitation 3
Self-Reporting Bias: The data is entirely self-reported by private sellers, introducing subjective flaws into categorical metrics like vehicle condition.
Future Research
Potential Improvement
Include:
- Brand effects
- Fuel efficiency
- Geographic market differences
New Economic Question
Do vehicle brands create a measurable price premium after controlling for age, mileage, condition, and engine size?
Conclusion
Main Takeaways
Vehicle age and mileage are significant predictors.
Condition substantially improves prediction accuracy.
Model B outperforms Model A.
Results are consistent with economic theory of depreciation.
Thank You
Questions?
Reproducibility Protocols
To guarantee full structural reproducibility across diverse operational platforms, the following design parameters were enforced:
Relative Pathing: Data ingress relies exclusively on relative path commands (
read_csv("vehicles.csv")), rendering the execution script independent of hardcoded local drive environments.Stochastic Isolation: Global pseudo-random distribution states are pinned via
set.seed(465)at the setup phase before any data partitioning or cross-validation sampling occurs.Standardized Coding Environment: Scripting utilizes standard, stable container packages within the tidyverse framework to prevent breaking changes during runtime execution.
AI Use Log
Assisting AI System: Gemini (Advanced Architecture Engine)
Applied Interaction Strategy: Layout syntax automation, code consolidation, and structural Markdown optimization.
Raw User Prompt Input: “Show me to how to solve this problem in quarto slides document in fastest way possible”
Applied Output Implementation: The AI response provided clean styling mechanics (such as the custom
{.smaller}header tags and column containment blocks) to fix layout overflow issues.Verification and Modification: The styling tips were verified via local engine compilation (
quarto render). These slide design components were then manually converted into a formal, continuous narrative report structure. This was done by replacing presentation slide breaks (---) with descriptive paragraphs to satisfy the final essay requirement.
Final Reflections
Strategic Path Improvements
Given a more extended research timeline or access to deeper computational power, our primary structural improvement would involve implementing high-cardinality fixed-effects modeling for vehicle brands and models. Controlling for manufacturer identity would effectively insulate our continuous mileage and age coefficients from brand-equity bias (such as the slow structural depreciation rates of reliable commuter brands compared to high-end luxury lines).
Future Economic Research Questions
The insights developed across this econometric evaluation inspire a compelling new research question:
“Do secondary market vehicle brands display asymmetric depreciation elasticities across varying regional economic environments during inflationary contraction cycles?”
Answering this question would reveal whether affordable economy vehicle choices behave as Giffen or defensive assets when aggregate consumer purchasing power contracts.
Conclusion
Main Takeaways
- Vehicle age and structural usage mileage are significant, robust predictors of asset depreciation
Incorporating qualitative condition metrics drastically improves model prediction accuracy and reduces error.
Extended Specification Model B decisively outperforms the baseline model across all metrics.
Empirical results match classical economic depreciation theories, showing how asset usage and features dictate consumer valuation.