Predicting Medical Insurance Charges Using Regression Analysis
Economic Question & Motivation
Research Question
Which individual characteristics are associated with higher medical insurance charges?
Why is this important?
- Healthcare costs affect households
- Important for insurance pricing
- Useful for healthcare policy
Dataset
Medical Cost Personal Dataset (Kaggle)
- 1,338 observations
- Outcome variable:
charges
Predictors:
- Age
- Sex
- BMI
- Children
- Smoking Status
- Region
Probability Analysis
Distribution of Medical Insurance Charges
![]()
- Distribution is positively skewed
- A small group has very high healthcare costs
Smoking Status Analysis
Charges by Smoking Status
![]()
- Smokers have substantially higher charges
- Smoking appears to be a strong predictor
Modeling Approach
Model 1
charges ~ age + bmi + children
Model 2
charges ~ age + bmi + children + smoker + region + sex
Data Split:
Model Comparison
Economic Interpretation
| Smoker |
+24,199 |
| Age |
+262 |
| BMI |
+318 |
Key Finding:
Smoking status is the strongest predictor.
Limitations
- No income information
- No education data
- No medical history
- Limited sample size
Future Improvement:
- Add socioeconomic variables
- Add medical history variables
Conclusion
Key Findings
- Smoking is the strongest predictor
- Age and BMI increase healthcare costs
- Model 2 performs significantly better
- Lifestyle factors matter
Thank you for listening.
Questions?