## 1. What is information gain for a feature?
## • measures how much more organized the input values become when we divide them up
## 2. What is a function of information gain?
## • is the change in homogeneity that will result from a split on each possible feature
## 3. Explain what is meant by “a high information gain.”
## • A larger information gain suggests a lower entropy group or groups of samples
## 4. Decision trees only use information gain for splitting on nominal features. True or False
## • False
## 5. What are features? Give examples as well.
## • Features are decision nodes in a decision tree. It splits the data based on metrics It involves a choice-making activity. A root node would be an example of a feature
## 6. Fill-in-the-blank: If there are five classes in a decision tree, the entropy will range from ______________________
## • A little greater than 2
## 7. What does entropy describe? What does it mean?
## • It quantifies the randomness, or disorder, within a set of class values
## 8. Explain the process of the decision tree method.
## • Decision trees are built using a heuristic called recursive partitioning. It splits the data into subsets, which are then split repeatedly into even smaller subsets, and so on and so forth until the process stops when the algorithm determines the data within the subsets are sufficiently homogenous.
## 9. Fill-in-the-blank: The process of __________________ involves reducing its size such that it generalizes better to unseen data.
## • pruning a decision tree
## 10. As classifiers, what do decision trees do?
## • utilize a tree structure to model the relationships among the features (input data, etc.) and the potential outcomes
## 11. What process earned its name due to the fact that it mirrors how a literal tree begins at a wide trunk, which if followed upward, splits into narrower and narrower branches.
## • Decision tree structure
## 12. What is a leaf node?
## • provide the expected result given the series of events in the tree.
## 13. Explain in your words the idea of the model being overfitted to the training data.
## Overfitting is an undesirable behavior that occurs when the model gives accurate predictions for training data but not for new data. An overfit model gives inaccurate predictions and cannot perform well for all types of new data.
## 14. What is a root node?
## • represents the entire dataset, since no splitting has transpired
##15. Explain the process of divide and conquer.
## • Working down each branch, the algorithm continues to divide and conquer the data, choosing the best candidate feature each time to create another decision node, until a stopping criterion is reached
##16. List three decision tree algorithms
## • C5.0 algorithm
## • Quinlan’s C5.0 algorithm
## • C4.5 algorithm
## 17. Explain the C5.0 algorithm.
## The C5.0 algorithm does well for most types of problems. Compared to other machine learning models such as the Support Vector Machine, and the Neural Networks, the decision trees built by the C5.0 algorithm generally perform nearly as well but are much easier to understand and deploy.
##18. What metrics do we use to help us identify the best decision tree splitting candidate?
## • Information Gain can be used for splitting the nodes when the target variable is categorical and Reduction in Variance is a method for splitting the node used when the target variable is continuous.
## 19. Which metric quantifies randomness or disorder within a set of class values?
## • Entropy
## 20. Fill-in-the-blank: Sets with _______________ are very diverse and provide little information about other items that may also belong in the set, as there are no apparent commonality.
## • High enthropy
## 21. Explain post-pruning.
## Post-pruning involves, growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level.
## 22. What are greedy learners?
## • they use data on a first-come, first serve basis.
## 23. What is the downside of the greedy approach?
## • are not guaranteed to generate the optimal, most accurate, or smallest number of rules for a particular dataset.
## 24. Greedy learners take the low-hanging fruit early. Explain what is wrong with this approach.
## • may quickly find a single rule that is accurate for one subset of data; however, in doing so, the learner may miss the opportunity to develop a more nuanced set of rules with better overall accuracy on the entire set of data
## 25. For what purpose are attribute selection measures with various implementations applied?
## • They split the node when various implementations are applied
## 26. List four common attribute selection measures.
## • Entropy, Gain, Information Gain, Gini Index
## 27. Fill-in-the-blank: The information gain is a property of ____________ and ___________.
## • Entropy and attribute
## 28. Why do we want to adjust the information gain?
## • to control the number of groups after splitting
## 29. What is the Gini index used for?
## • It is used to gauge the purity (a homogeneous set) of the split point.
## 30. Given that D is a sample of cases in a node, explain what a logical test, s, does for the sample of cases.
## • It compares and select the “best test” for a given node of the tree.