Decision Trees – Part 1 – Assignment 4
 
1.  Fill-in-the-blank: 
The ____information gain___ measures how much more organized the input values become when we divide them up using a given feature (input data, variables, independent variables, feature selection, …)

2.  Fill-in-the-blank: 
One of the information theoretic metrics used for identifying the most informative features for a decision tree is ___entropy___ .

3.  Explain information gain.

Information gain provides improved in the measurement for accuracy or error rates in your model. 


4.  Explain

Helps to choose a model with model judgement score.

5.  Fill-in-the-blank: 
The degree to which a subset of examples contains only a single class is known as __purity__.

6.  Decision trees are usually faced with the challenge to identify which feature is greater than 10. True or False.

TRUE

7.  What does the argument, “lwd = “ indicate?
 lwd = line width of the plot in R.

8.  Fill-in-the-blank: 
Tree-based modeling techniques can be used to solve _REGRESSION OR CLASSIFICATION__.

9.  Tree-based models can address both _REGRESSION_ and __CLASSIFICATION__ tasks.

10. List four other key characteristics of tree-based models.

Other key characteristics of tree-based models are
  computational efficiency, 
  the ability to handle datasets with unknown values (missing values), 
  the feature selection and 
   non-parametric model. 


11. Will tree-based models achieve top predictive performance on classification or regression tasks?
Not guarantee. 

12. What is a tree-based model in terms of a hierarchy construction?

A tree-based model is a hierarchy of logical tests on some of the predictor variables. This
inverted tree ends at the so-called leaf nodes where we have the predictions of the model.

13. Where do we find the predictions of the tree-based model?
At the treminal node, called leaf.

14. The tree-based model is made up of branches and nodes. What is the top node called?
Terminal node

15. Fill-in-the-blank: 
As you follow a path from the top node to the leaf node, the path carries you through a conjunction of __logical tests___ that leads to the prediction.

16. Give two examples of tree-based models.

First data set: Y = disease/healthy, X = Demographic characteristics, Blood test result
Second data set: Y = Hemoglobin count, X = Demographic characteristics, Blood test result

17. (a)  The breast cancer dataset is found in which package?
 (b)  The Boston Housing dataset is found in which package?
 breast cancer dataset ->  mlbench
 Boston Housing dataset -> MASS

18. Fill-in-the-blank:  
Each node of the tree has a   ___two branches. These are related to the outcome of a test____ on one of the predictors.  (e.g., size < 2.5)        
 
19. In tree-based modeling, trees are built using an algorithm that builds these trees recursively (a recursive partitioning algorithm). This algorithm has three key issues.  List these issues. 
This algorithm has three key issues: 
(i) the termination criterion that decides when we stop growing the tree creating a leaf node; 
(ii) the value that is selected for these leaves (the representative of the cases in each leaf); and 
(iii) the procedure used for selecting the best logical test for each non-leaf node.

20. TRUE or FALSE: 
Both the classification and the regression trees are grown using the recursive partitioning algorithm.
TRUE

21. Classification trees typically use criteria related to the minimization of the error rate. Regression trees typically use the least squares error criterion that minimizes the mean squared error of the tree.  List three criteria that classification trees use with respect to minimizing the error rate.
Gini index, 
Gain ratio, 
Entropy

22. The Gini Index equations are used for what task?
The Gini Index (Gini Impurity) is used for measuring how well a split separates the classes in a decision tree algorithm. It will help to provide the covariate and split point to choose at each node.

23. When working with regression trees, what method should we use for selecting the best logical test?
 least squares method.

24. Explain the difference between classes and class levels in a decision tree.
I am not sure what it is asking. My answer is, if your data is not completely homogenous, you should be able to classify it into categories into subset which are more homogenous, and then level those categories.

25. When determining the optimal feature to split upon, what does the R algorithm calculate?
The homogeneity in the node (or least square error) criteriy at each node provide the optimal feature selection. The entropy gain could be used.