Which of the following are components in building a machine learning algorithm?
Answer : Collecting data to answer the question
Suppose we build a prediction algorithm on a data set and it is 100% accurate on that data set. Why might the algorithm not work well if we collect a new data set?
Answer : Our algorithm may be overfitting the training data, predicting both the signal and the noise.
What are typical sizes for the training and test sets?
Answer : 80% training set, 20% test set.
What are some common error rates for predicting binary variables (i.e. variables with two possible values like yes/no, disease/normal, clicked/didn’t click)? Check the correct answer(s).
Answer : Predictive value of a positive.
Suppose that we have created a machine learning algorithm that predicts whether a link will be clicked with 99% sensitivity and 99% specificity. The rate the link is clicked is 1/1000 of visits to a website. If we predict the link will be clicked on a specific visit, what is the probability it will actually be clicked?
Answer : y definition we have : * sensivity = TP/(TP+FN) * specificity = TN/(TN+FP) * prevalence = (TP+FN)/(TP+FN+TN+FP)
and we know that :
We want to compute : p = Pr(click +|test click +) = TP/(TP+FP)
So p = (10−3.0.99)/(10−3.0.99+0.01∗0.999) ~ 9%