#Decision Trees - Part 1 - Assignment 4

 

#1 Fill-in-the-blank: The ________information gain _________ measures how much more organized the input values become when we divide them up using a given feature (input data, variables, independent variables, feature selection, …)

#2 Fill-in-the-blank: One of the information theoretic metrics used for identifying the most informative features for a decision tree is _________Entropy________ .

#3 Explain information gain. Measure of how much more organized athe input values become when we divide them using a given feature

#4 Explain

#5 Fill-in-the-blank: The degree to which a subset of examples contains only a single class is known as _________Purity_________.

#6 Decision trees are usually faced with the challenge to identify which feature is greater than 10. True or False.

#7 What does the argument, “lwd = “ indicate? Line width 

#8 Fill-in-the-blank: Tree-based modeling techniques can be used to solve ____________.
#9 Tree-based models can address both _____Classification________ and ______regression_______ tasks.

#10 List four other key characteristics of tree-based models. Computational Efficiency, handle dataset of unknown values, embedded feature selection and very strong functional assumption(approximation)

#11 Will tree-based models achieve top predictive performance on classification or regression tasks? No.

#12 What is a tree-based model in terms of a hierarchy construction? Logical test on predictor variables.

#13 Where do we find the predictions of the tree-based model? At the leaf point

#14 The tree-based model is made up of branches and nodes. What is the top node called? Root.

#15 Fill-in-the-blank: As you follow a path from the top node to the leaf node, the path carries you through a conjunction of ____Logical________  ______Test_____ that leads to the prediction.

#16 Give two examples of tree-based models. Classification for Breast Cancer and Regression for Boston Housing

#(a)  The breast cancer dataset is found in which package? mlbench
#(b)  The Boston Housing dataset is found in which package? MASS

#17 Fill-in-the-blank:  Each node of the tree has a   __Logical _________  _______Test_____ on one of the predictors.  (e.g., size < 2.5)        

#18 In tree-based modeling, trees are built using an algorithm that builds these trees recursively (a recursive partitioning algorithm). This algorithm has three key issues.  List these issues.  Termination Criterion, Value of Leave selected and procedure for selecting best logical test for non-leaf nodes.


#19 TRUE or FALSE: Both the classification and the regression trees are grown using the recursive partitioning algorithm. True

#20 Classification trees typically use criteria related to the minimization of the error rate. Regression trees typically use the least squares error criterion that minimizes the mean squared error of the tree.  List three criteria that classification trees use with respect to minimizing the error rate. Gini Index, Gain Ration and entropy


#21 The Gini Index equations are used for what task? It is used to gauge the purity of the split point.

#22 When working with regression trees, what method should we use for selecting the best logical test? Least squares (LS)

#23 Explain the difference between classes and class levels in a decision tree.Classes refer to the "bucket" or label of a category, the class level refers to the values within the class.   Class-fruit Class level -Apple, Orange and pear.

#24 When determining the optimal feature to split upon, what does the R algorithm calculate? Gini Index