Importance of Preprocessing
Model Candidates
Cross-Validation
Using Random Forests to Answer our Research Question
Conclusion
data preprocessing converts raw data and signals into data representation suitable for application through a sequence of operations
objectives: include size reduction of the input space, smoother relationships, data normalization, noise reduction, and feature extraction
even though the process is very time consuming, it has huge impact on data modelling. The quality of the preprocessing may vary from case to case.
objective: get a reasonable generalization with a lower dimensionality of the data set without losing the most significant relationship of the data.
-If key attributes or features characterizing the data can be extracted, problem encountered can be easily solved.
Research Question:
classification and regression trees (CART)
bagged trees
random forests
term Classification And Regression Tree (CART) used as a generic term to describe the class of models that use trees to either predict the class to which data belongs (classification) or predict a real number (regression tree)
Decision trees can be thought of as machine-generated business rules