Data Engineering and Mining II                            
Name: ____Paul Brown______
Fall 2022   - Spring 2023                                                          
      Assignment 8 - Model Performance Evaluation - Part 1
Directions:  Complete the following exercises. Fill-in-the-blank:

1. _________  ____________  is the art and science of intelligent data analysis.
   •    Data Mining

2.  What is the aim of data science?
   •    The main objective of data science is to discover patterns in data. It makes sense of the data through a variety of statistical techniques. After data extraction, wrangling, and pre-processing, a data scientist must carefully examine the data.

3.  Fill-in-the-blank:
We often describe _________  ___________  as the process of building models.
   •    Machine learning 

4.  What can we do to evaluate a model?
   •    identify a dataset on which to perform the evaluation.  

5.  What is the problem with evaluating our model on the training dataset?
   •    does not give us a very good idea of how well the model will perform in general on previously unseen data

6.  Why do we use validation and/or testing datasets? (We can obtain an error rate)
   •    We use the validation dataset during the modeling process to build the final model which is identified as the “best” model. Last step is to assess the model’s performance on the testing dataset.

7.  What is the validation dataset used for?
   •    the validation dataset is used during the modeling process to build the final model.  

8.  What is a dataset?
   •    a collection of data

9.  Fill-in-the-blank:
The ________ of a dataset refers to the number of observations (rows) and the number of variables (columns).
   •    dimension

10. What variables are also known as target, response or dependent variables?
   •    Output variables

11. How do we represent values such as the names or the qualities of objects in data mining?
   •    Character strings

12. How do we represent values such as quantitative values in data mining?
   •    numerical

13. Categorical variables are always discrete. Give three examples of categorical variables.
   •    Gender, race, color

14. Categoric variables may also be known as factors. TRUE or FALSE
   •    True

15. A numeric variable has values that are real numbers, such as a person’s age or weight.
   •    True

TRUE or FALSE
16. For building a predictive model, we often partition a dataset into three independent datasets. What are those three types of datasets?
   •    a training dataset, a validation dataset, and a testing dataset.

17. Fill-in-the-blank:
When building a model, we build our model using the training dataset.  The ___________  __________ is used to assess the model’s performance.
   •    validation

18. Fill-in-the-blank:
Once we are satisfied with the model, we assess its expected performance into the future using the ___________   ____________ .
   •    Testing data

19. Which dataset must be a holdout or out-of-sample dataset?
  Testing dataset

21. Which dataset consists of randomly selected observations from the full dataset that are not used in the building of the model?
   •    Testing data

22. When evaluating model performance, what is the name of the table that compares predictions with actual answers?
   •    Confusion matrix

23. In applying a model to a new dataset, the new dataset must contain all of the same variables and have different data types. TRUE or FALSE
   •    True

24. When working with some measures of performance of a model, we sometimes use a binary classification model.  When working with binary classification, we often identify the predictions as ___________ and ____________ classes.
   •    postitive and negative

25. List two machine learning algorithm categories.
   •    Predictive and descriptive 

26. Fill-in-the-blanks:
A ____________  _____________  is used for tasks that involve the prediction of one value using other values in the dataset.
   •    predictive

26. The machine learning algorithm attempts to discover and model the relationship between the _________ _________ and the other __________ .
   •    Target feature and feature

27. Fill-in-the-blank:
Because predictive models are given clear instruction on what they need to learn and how they are intended to learn it, the process of training a predictive model is known as _________  _________ .
   •    Supervised models 

28. Fill-in-the-blank:
The often used supervised machine learning task of predicting which category an example belongs to is known as ________________ .
   •    classification

29. Complete the following statement:
A random forest algorithm is used for tasks that involve _________...
   •    Classification and Regression problems combining the output of multiple decision trees to reach a single result. 

30. Fill-in-the-blank:
The _________  _________  is calculated as the proportion of observations for which the model incorrectly predicts the class with respect to the actual class.
   •    Error rate

31. What is a true positive?
   •    When model predicts true and actual is true

32. What is a false negative?

   •    When model predicts false and actual is true

33. What are the two types of classification errors?
   •    False negative and false positive

34. Fill-in-the-blank:
The precision of a model is a measure of how accurate the positive predictions are, or how precise the model is in ___________ .
   •    predicating

35. Fill-in-the-blank:
The recall of a model is just another name for the true positive rate.  The recall is also known as the ___________ of the model.
   •    sensitivity of the model