# Decision Trees - Part 2 - Assignment 5A
#1 What is information gain for a feature? Is the change in homogeneity that will
#result from a split on each possible feature.
#2 What is a function of information gain? Entropy
#3 Explain what is meant by “a high information gain.” The group before the split is more varied than after the split- the better the feature is at creating homogeneous groups after a split on this feature.
#4 Decision trees only use information gain for splitting on nominal features. True or False, False
#5 What are features? Give examples as well. Character "buckets", variable,input data, independent variable, feature selection, etc.
#6 Fill-in-the-blank: If there are five classes in a decision tree, the entropy will range from _______0 to log2(5)_______________
#7 What does entropy describe? What does it mean? Variance between different classes, randomness and disorder.
#8 Explain the process of the decision tree method. Decision trees are built using a heuristic called recursive partitioning. This approach is also commonly known as divide and conquer because it splits
#the data into subsets, which are then split repeatedly into even smaller subsets, and so on and so forth until the process stops when the algorithm determines the data within the subsets are sufficiently homogenous, or another stopping criterion has been met.
#9 Fill-in-the-blank: The process of _______pruning___________ involves reducing its size such that it generalizes better to unseen data.
#10 As classifiers, what do decision trees do? Utilize a tree structure to model the
#relationships among the features (input data, etc.) and the potential outcomes.
#11 What process earned its name due to the fact that it mirrors how a literal tree begins at a wide trunk, which if followed upward, splits into narrower and narrower branches. Decision Tree
#12 What is a leaf node? leaf nodes (also known as terminal nodes) that denote the action that should
#be taken as a result of the series of decisions. In the case of a predictive model, the leaf nodes provide the expected result given the series of events in the tree.
#13 Explain in your words the idea of the model being overfitted to the training? Data is trained to the data and if used against another dataset would give varied/erroneous outputs
#13 What is a root node? The entire dataset
#14 Explain the process of divide and conquer.It splits the data into subsets, which are then split repeatedly into even smaller subsets, and so on and so forth until the process stops when the algorithm determines the data within the subsets are sufficiently homogenous, or another stopping criterion has been met.
#15 List three decision tree algorithms? C4.5, C5 and ID3
#16 Explain the C5.0 algorithm.It uses entropy-randomness/disorder to identify the slip values to increase information as data flows down the tree.
#17 What metrics do we use to help us identify the best decision tree splitting candidate? Purity
#18 Which metric quantifies randomness or disorder within a set of class values? Entropy
#19 Fill-in-the-blank: Sets with ____High Entropy___________ are very diverse and provide little information about other items that may also belong in the set, as there are no apparent commonality.
#20 Explain post-pruning. Post-pruning involves, growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level.
#21 What are greedy learners? they use data on a first-come, first serve basis.
#22 What is the downside of the greedy approach? That greedy algorithms are not guaranteed to generate the optimal, most accurate, or smallest number of rules for a particular dataset.
#23 Greedy learners take the low-hanging fruit early. Explain what is wrong with this approach. may quickly find a single rule that is accurate for one subset of data; however, in doing so,
#the learner may miss the opportunity to develop a more nuanced set of rules with better overall accuracy on the entire set of data. Decision trees employ greedy learning heuristics. Once divide and conquer splits on a feature, the partitions created by the split may not be re-conquered, only further subdivided. In this way, a tree is permanently limited by its history of past decisions.
#24 For what purpose are attribute selection measures with various implementations applied? To split the node.
#25 List four common attribute selection measures. Entropy, Gain, Information Gain, Gini Index
#26 Fill-in-the-blank: The information gain is a property of ____Entropy________ and ___Attribute________.
#27 Why do we want to adjust the information gain? To Control the number of groups
#28 What is the Gini index used for? It is used to gauge the purity (a homogeneous set) of the
#split point
#29 Given that D is a sample of cases in a node, explain what a logical test, s, does for the sample of cases.logical test, s, divides the cases in D into two partitions, ᵃᵆ and ᵃ− ᵆ . #We thus have a proportion of cases going to the left or right branch of the node.