Predicting House Sale Prices

Souleymane Doumbia, Data 622 Final Project

2024-12-22

Introduction

Problem Statement

Objectives:

  1. Develop a robust predictive model.

  2. Identify the most influential property features.

  3. Simplify the dataset using Principal Component Analysis (PCA).

Business Context:

I. Data Import and Exploration

Dataset Overview

Initial Observations:

II. Data Cleaning and Preprocessing

Key Steps:

III. Exploratory Data Analysis (EDA)

Target Variable: SalePrice

III. Exploratory Data Analysis (EDA)

Correlation Insights:

IV. Modeling: PCA and Feature Selection

Why PCA?

IV. Modeling: Random Forest and XGBoost

Random Forest

XGBoost

V. Model Evaluation and Comparison

Bias-Variance Trade-Off:

Model Evaluation: Bias, Variance, and MSE

V. Model Evaluation and Comparison

Cross-Validation:

V. Model Evaluation and Comparison

Cross-Validation:

V. Model Evaluation and Comparison

Cross-Validation:

Conclusion