ML 1 - Assignment 13 Random Forest

``` Lecture 13– Assignment 13: Random Forest – Chapter 17 Part 1

What can we do to reduce high variance in decision trees?

• by creating multiple trees with different samples of the training dataset and combining their predictions.

What kind of problems is the random forest algorithm applied to?

• Classification and Regression

Explain the greedy selection of the best split point.

• the greedy selection of the best split point refers to the process of identifying the optimal attribute and threshold value to divide the dataset into two subsets during the tree construction.

Fill-in-the-blank: The model selects the best split with __________ ___________ until you get homogeneous nodes.

• Lowest costs

When working with decision trees what can we do to make sure that trees will be uncorrelated?

• Use bootstrap sampling to randomly sample the training data with replacement to create different subsets of data for each tree.

Fill-in-the-blank: For classification problems, the number of attributes to be considered for the split is limited to the ______________ .

• square root of the number of input features (columns).

When working with the Random Forest algorithm, what happens to predictions when trees are made more uncorrelated?

• results in predictions that are more diverse and a combined prediction that often has better performance

What does a Gini Index of 0 indicate?

• class values are perfectly separated into two groups

For classification problems, what cost function do you use to calculate the purity of the group of data created by the split point?

• Use the Gini impurit

Fill-in-the-blank: Bagging uses sampling __________________ replacement.

• With

Fill-in-the-blank: Random Forest uses sampling _______________ replacement.

• without

True of False: When working with the Random Forest algorithm, we are working with rows.

• False

True or False: When working with bagging, we are usually working with columns.

• False

How is the number of features considered at each split point?

• the number of features considered at each split point is typically set as the square root of the total number of features

ML 1 - Assignment 13 Random Forest

Paul Brown