Decision Trees – Part 2 – Assignment 5A
Data Engineering and Mining II
1. What is information gain for a feature?
The information gain for a feature , F, is calculated as the difference between the entropy in the segment before the split (𝑆_1) and the partitions resulting from the split (𝑆_2):
Info gain (F) = Entropy(𝑆_1) – Entropy(𝑆_2)
This is a splitting criterion that is used to build decision trees
2. What is a function of information gain?
Info gain (F) = Entropy(𝑆_1) – Entropy(𝑆_2)
3. Explain what is meant by “a high information gain.”
The higher the information gain mean you found the a covariate and spliting position that is optimal for the node spliting.
4. Decision trees only use information gain for splitting on nominal features. True or False
You can use decision tree for more than binary splits. FALSE.
5. What are features? Give examples as well.
Covariate. Covariate could be any random variable that describe the response variable.
6. Fill-in-the-blank:
If there are five classes in a decision tree, the entropy will range from zeor to log2(5).
[0, 2.32]
7. What does entropy describe? What does it mean?
Entropy describes the degree of homogeneity or heterogeneity within a node. If a node is completely homogeneous, its entropy is zero. If a node contains samples from all classes with equal probabilities, it is highly heterogeneous and has high entropy.
8. Explain the process of the decision tree method.
(i).Start with complete dataset.
(ii) Select the best covariate and spliting point
(iii). Split the Data
(iv).Repeat the steps (ii) and (iii)
(v). Stop growing tree if all node is complete homogenous, or there are minimum samples in the node as predefined
(vi). Assign the node class by majority vote for classification problem or mean value of the repposne variable (Y).
9. Fill-in-the-blank:
The process of ___pruning a decision tree__ involves reducing its size such that it generalizes better to unseen data.
10. As classifiers, what do decision trees do?
Predict the class of new observation.
11. What process earned its name due to the fact that it mirrors how a literal tree begins at a wide trunk, which if followed upward, splits into narrower and narrower branches.
The decision tree structure model.
12. What is a leaf node?
Terminal node
13. Explain in your words the idea of the model being overfitted to the training data.
The model could predict more accurately as you increase the complexity of model (for example add more covariates to predict response), however it may not true for predicting new dataset. So, more complex model can be poor in its predicting power.
14. What is a root node?
The starting node with complete dataset.
15. Explain the process of divide and conquer.
The decision tree model splits the data into subsets with optimal homogenous node at each step. This step repeted until the stopping criteria.
16. List three decision tree algorithms
(i). ID3 (Iterative Dichotomiser 3): Best covariate is selected by information gain entropy.
(ii) C5.0 :Best covariate is selected by Gain Ratio.
(iii). CART (Classification and Regression Trees): Best covariate is selected by Gini Index for classification and Mean Squared Error (MSE) for regression.
17. Explain the C5.0 algorithm.
In C5.0 algorithm builds reliable tree-based models by growing an overly large tree and then use some statistical procedure usually known as post-pruning. Its earlier version are C4.5 and ID3.
18. What metrics do we use to help us identify the best decision tree splitting candidate?
Some matrics are as follows:
(i).Information Gain (Entropy-based)
(ii). Gain Index
(iii). Gini coefficient
19. Which metric quantifies randomness or disorder within a set of class values?
C5.0 uses entropy, a concept borrowed from information theory that – quantifies the randomness, or disorder, within a set of class values.
20. Fill-in-the-blank:
Sets with __high entropy__ are very diverse and provide little information about other items that may also belong in the set, as there are no apparent commonality.
21. Explain post-pruning.
To grow an overly large tree and then use some statistical procedure that tries to eliminate branches of the tree that are statistically unreliable, in a procedure usually known as post-pruning.
22. What are greedy learners?
The greedy learner chooses the best choice at each step. The tree model selects the best covariate and spliting point at each node and does not consider previous spliting.
23. What is the downside of the greedy approach?
The downside to the greedy approach is that greedy algorithms are not guaranteed to generate the optimal, most accurate, or smallest number of rules for a particular dataset.
24. Greedy learners take the low-hanging fruit early. Explain what is wrong with this approach.
By taking the low-hanging fruit early, a greedy learner may quickly find a single rule that is accurate for one subset of data; however, in doing so, the learner may miss the opportunity to develop a more nuanced set of rules with better overall accuracy on the entire set of data.
25. For what purpose are attribute selection measures with various implementations applied?
To split the node, attribute selection measures with various implementations are applied.
26. List four common attribute selection measures.
Entropy, Gain, Information Gain, Gini Index
27. Fill-in-the-blank:
The information gain is a property of __entropy and attribute (Attr) __.
28. Why do we want to adjust the information gain?
To control the number of groups after splitting.
29. What is the Gini index used for?
The Gini index used for selecting best covariate and the best split point in decision tree model. It is ued in CART (Classification and Regression Trees) package.
30. Given that D is a sample of cases in a node, explain what a logical test, s, does for the sample of cases.
D is the sample of cases in the node. It can also be a dataset.