Machine Learning i - Assignment #10

Lecture  10 – Assignment 10 – Naïve Bayes
Paul Brown

Part 1
1.  Fill-in-the-blank:  The Naïve Bayes algorithm uses ___________ to make predictions.

•   probability

  2.  Fill-in-the-blank: Given prior knowledge, the ____________ provides a way that we can calculate the probability of a piece of data belonging to a given class.

•   Bayes Therom 

3.  Interpret the following: “P(data/class)”

•   It represents the conditional probability of the data given a specific class in the context of probabilistic models. "P" denotes the probability, and "data/class" indicates that we are interested in the probability of the data belonging to a particular class.

4.  The Naïve Bayes algorithm is used on what kinds of problems?

•   Multiclass classification

5.  Fill-in-the-blank: We calculate the ____________ by the class to which they belong.

•   probability

6.  Interpret the following statement: “class_value = vector[ -1]”

•   "class_value = vector[-1]" is used to assign the value of the last element in the list or array vector to the variable class_value.

7.  Which statistics from a given dataset are used in the calculation of probabilities in a few steps?

•   Mean and Standard Deviation


8.  State the interpretation of the following: “from math import sqrt.”


•   It imports the square root function (sqrt) from the math module in Python's standard library.

9.  Explain and interpret the following statements:
                     >   “summaries = [(mean(column), stdev(column), len(column))
                                                          for column in zip(*dataset)]
                      >   del(summaries[ -1])”
 
•   This first statement calculates the mean, standard deviation, and length of each column in the dataset and stores the results in the summaries variable.

•   The second statement uses the del keyword to delete the last element of the summaries list.

10. What is the separate_by_class( ) function used for?

•   It is used to separate a dataset into distinct groups or classes based on the class labels associated with each data instance.

11. What is the summarize_dataset( ) function used for?


•   It summarizes each column in a dataset

12.  Fill-in-the-blank: Calculating the probability or likelihood of observing a given real-                value like X1 is different.  One way that we can do this is to assume that X1 values are           _______________ .

•   Drawn from a bell curve or Gaussian distribution

13.  What is the “calculate_probability( )” function used for?

•   It calculates the probability of a given input belonging to a specific class based on the available training data.

14.  What are the arguments of the calculate_probability( ) function?

•   Mean and, standard deviation


15. True of False: When we calculate the probabilities needed for a Naïve Bayes Theorem,  we use the statistics calculated from our testing data and use them to make predictions.

•   False

16. True or False: When working with class probabilities and the Naïve Bayes Theorem, we use the statistics calculated from our training data to calculate probabilities for new data.  Also, the almost probability of data belonging to a class becomes maximized and the calculation for the class that results in the largest value is taken as the prediction.

•   True

17. What are the two arguments of the function, calculate_class_probabilities( ) ?

•   Summaries and input_vector
PreviousNext
Machine Learning i - Assignment #10

Paul Brown

6/7/2023