#Decision Trees - Part 2 - Assignment 5B

 

#1 Name three predictor variables associated with the Gini index. Numeric, nominal and regression trees.

#2 Fill-in-the-blank: Any implementation of a decision tree algorithm provides a collection of parameters _____for tuning _________ how the tree is built.

#3 Explain tuning parameters. Model parameters which can modified to increase prediction accuracy.

#4 What is the rpart( ) function? Stands for Recursive Partitioning, and it implements a two-stage procedure to construct models that can be represented as binary trees.

#5 List four splitting functions.

#split= “information” directs rpart to use the information gain measure 
#split= “gini”,  splitting functions are Gini Index, Information Gain, Entropy and Gain.
#minsplit= minimum number of observations that must exist at a node in the tree before it is considered for splitting.
#minbucket= minimum number of observations in any terminal or leaf node.  The default value is about 1/3 of minsplit=


#6 Fill-in-the-blank:  Maxdepth, minbucket, minsplit, and maxcomplete are called _tunning arguments_________.

#7 Explain the following arguments:
# data=data[train, vars] --> Training data 
# method=“class”         --> Classification problem
# split=“information”    --> use the information gain measure
# control=control        --> Complexicity controling arguments


#8 When working with clustering, K means is sensitive to the number of clusters; the choice requires a delicate balance. Setting K to be very large will improve the homogeneity of the clusters, and at the same time, it risks overfitting the data.  Ideally, you will have a prior knowledge about the true groupings and you can apply this information to choosing the number of clusters.      TRUE or FALSE  TRUE


#9 Name four tree-building implementations. Entropy, Gain, Information Gain, Gini Index

#10 The default value of the minbucket=  argument is about one-third of the default value of minsplit=    TRUE or FALSE TRUE


#11 Fill-in-the-blank: In general, you will get a larger decision tree by __________Terminal Nodes______________.


#12 Fill-in-the-blank: The ____minbuckets=____________ is the minimum number of observations in any leaf mode.

#13 A node will be considered for splitting if it has at least minsplit= observations

#14 List four tuning parameters for the decision tree algorithm. minbucket, minsplit, maxdepth or the complexity parameter 

#15 Fill-in-the-blank: The  ___complexity parameter_____________ is used to control the size of the decision tree, and to select an optimal tree size.

#16 The larger the decision tree, the more likely it can overfit the training data. TRUE of FALSE  TRUE

#17 In order to avoid overfitting, we should do what? Increase minbucket, minsplit, maxdepth or the complexity parameter

#18 In order to make a node split of a decision tree worthwhile, what could we do? The cp= argument shows minimum “benefit” that must be obtained at the split of the decision tree in order to make a split worthwhile.

#19 In the code statement,
#Model <-- rpart(formula = form, …)

#   What does the argument, formula = form, tell the model to do?
# tells the tree model the response variable (Y) and covariates (X)

#20 Fill-in-the-blank: When a model is complex, it is likely to ___overfitted____________ .