IS607 - Project 3

Overview

  1. Importance of Preprocessing

  2. Model Candidates

  3. Cross-Validation

  4. Using Random Forests to Answer our Research Question

  5. Conclusion

Importance of Preprocessing

Size Reduction of the Input Space:

objective: get a reasonable generalization with a lower dimensionality of the data set without losing the most significant relationship of the data.

Smoother Relationships:

Normalization (Scaling):

Noise Reduction:

Feature Extraction:

-If key attributes or features characterizing the data can be extracted, problem encountered can be easily solved.

Model Candidates

Research Question:

  1. classification and regression trees (CART)

  2. bagged trees

  3. random forests

Non-Ensemble Techniques: Motivation for Ensembles

Classification and Regression Tree (CART)