ML - Assignment #12

Lecture  12– Assignment 12:  Bootstrap Aggregation – Chapter 16
Part 1
1.  Fill-in-the-blank: Decision trees are a ____________________.

•   simple and powerful predictive modeling technique

2.  What is a weakness of decision trees?

•   They have high variance

3.  True or False: You cannot make predictions with bootstrap models.

•   False

4.  When bootstrapping do you make samples with replacement or without replacement?

•   With replacement

5.  Does the CART have low variance?

•   No, it does not

6.  Explain Bootstrap aggregation.

•   A technique to make decision trees more robust and to achieve better performance

7.  Explain variance.

•   an algorithm’s performance is sensitive to the training data; the more the training data is changed, the more the performance of the algorithm will vary.

8.  State the benefits of bagging.

•   improved performance and cannot overfit the problem.

9.  Fill-in-the-blanks: We can create a new sample of a dataset by randomly selecting  __________ from the dataset and adding them to a new __________ .

•   rows, to a new list

10. Explain the procedure that the function, subsample( ),

•   Subsampling typically involves randomly selecting a subset of data points from a larger dataset to create a smaller representative sample.

11. What does the randrange( ) function do?

•   used to select a random row index to add to the sample each iteration of the loop.

12.  When bootstrapping, what is the default size of the sample?

•   the size of the original dataset.

13. Explain the following line of code.
“def subsample(dataset, ratio=1.0): “

•   a function named subsample that takes in a dataset and an optional ratio as arguments.

14. Which resampling technique do we use to estimate the performance of the learned model on unseen data?

•   k-fold cross-validation

15. What is the get_split( ) function used for

•   finds an optimal split point

16. What is the difficulty of the bagging method?

•   Difficulty of the bagging method that are created are very similar, making the predictions made by these trees also similar

17. Explain the greedy algorithm.

•   The greedy algorithm solves optimization problems by making locally optimal choices at each step. It can provide efficient and approximate solutions to optimization problems, although it does not guarantee the globally optimal solution in all cases.

18. How does the Random Forest algorithm resolve the similar prediction problem of bagging?

•   In Random Forest, each tree is trained on a bootstrap sample, but the sample is further randomized by considering only a random subset of the original dataset. This process is known as random subspace sampling or random selection with replacement. By randomly selecting both the features and the data points used for training, Random Forest creates a more diverse set of decision trees.

19. When bagging, how do you make a subsample of a dataset?

•   You create a new sample of a dataset by randomly selecting rows from the dataset and adding them to a new list.  

20. What is another name for bootstrap aggregation?

•   bagging
ML - Assignment #12

Paul Brown

6/11/2023