Question 1

Which of the following are components in building a machine learning algorithm?
Answer:
Asking the right question.

Question 2

Suppose we build a prediction algorithm on a data set and it is 100% accurate on that data set. Why might the algorithm not work well if we collect a new data set?
Answer:
Our algorithm may be overfitting the training data, predicting both the signal and the noise.

Question 3

What are typical sizes for the training and test sets?
Answer:
60% in the training set, 40% in the testing set.

Question 4

What are some common error rates for predicting binary variables (i.e. variables with two possible values like yes/no, disease/normal, clicked/didn’t click)? Check the correct answer(s).
Answer:
Predictive value of a positive \(PPV = \frac{TP}{TP + FP}\)

Question 5

Suppose that we have created a machine learning algorithm that predicts whether a link will be clicked with 99% sensitivity and 99% specificity. The rate the link is clicked is 1/1000 of visits to a website. If we predict the link will be clicked on a specific visit, what is the probability it will actually be clicked?
Answer:
the cofusion matrix will be:

+ -
+ TP = 99 FP = 999
- FN = 1 TN = 98901

By the formlula on previous question, positive predictive value (PPV) will be

ppv = 99 / (99 + 999)
ppv_percentage = ppv * 100
ppv_percentage
## [1] 9.016393

So, PPV in percents = 9%