#Decision Trees - Part 2 - Assignment 5B
#1 Name three predictor variables associated with the Gini index. Numeric, nominal and regression trees.
#2 Fill-in-the-blank: Any implementation of a decision tree algorithm provides a collection of parameters _____for tuning _________ how the tree is built.
#3 Explain tuning parameters. Model parameters which can modified to increase prediction accuracy.
#4 What is the rpart( ) function? Stands for Recursive Partitioning, and it implements a two-stage procedure to construct models that can be represented as binary trees.
#5 List four splitting functions.
#split= “information” directs rpart to use the information gain measure
#split= “gini”, splitting functions are Gini Index, Information Gain, Entropy and Gain.
#minsplit= minimum number of observations that must exist at a node in the tree before it is considered for splitting.
#minbucket= minimum number of observations in any terminal or leaf node. The default value is about 1/3 of minsplit=
#6 Fill-in-the-blank: Maxdepth, minbucket, minsplit, and maxcomplete are called _tunning arguments_________.
#7 Explain the following arguments:
# data=data[train, vars] --> Training data
# method=“class” --> Classification problem
# split=“information” --> use the information gain measure
# control=control --> Complexicity controling arguments
#8 When working with clustering, K means is sensitive to the number of clusters; the choice requires a delicate balance. Setting K to be very large will improve the homogeneity of the clusters, and at the same time, it risks overfitting the data. Ideally, you will have a prior knowledge about the true groupings and you can apply this information to choosing the number of clusters. TRUE or FALSE TRUE
#9 Name four tree-building implementations. Entropy, Gain, Information Gain, Gini Index
#10 The default value of the minbucket= argument is about one-third of the default value of minsplit= TRUE or FALSE TRUE
#11 Fill-in-the-blank: In general, you will get a larger decision tree by __________Terminal Nodes______________.
#12 Fill-in-the-blank: The ____minbuckets=____________ is the minimum number of observations in any leaf mode.
#13 A node will be considered for splitting if it has at least minsplit= observations
#14 List four tuning parameters for the decision tree algorithm. minbucket, minsplit, maxdepth or the complexity parameter
#15 Fill-in-the-blank: The ___complexity parameter_____________ is used to control the size of the decision tree, and to select an optimal tree size.
#16 The larger the decision tree, the more likely it can overfit the training data. TRUE of FALSE TRUE
#17 In order to avoid overfitting, we should do what? Increase minbucket, minsplit, maxdepth or the complexity parameter
#18 In order to make a node split of a decision tree worthwhile, what could we do? The cp= argument shows minimum “benefit” that must be obtained at the split of the decision tree in order to make a split worthwhile.
#19 In the code statement,
#Model <-- rpart(formula = form, …)
# What does the argument, formula = form, tell the model to do?
# tells the tree model the response variable (Y) and covariates (X)
#20 Fill-in-the-blank: When a model is complex, it is likely to ___overfitted____________ .