Coursera - Practical Machine Learning

Question 1

Which of the following are components in building a machine learning algorithm?

Training and test sets
Statistical inference
Collecting data to answer the question
Artificial intelligence
Machine learning

Answer : Collecting data to answer the question

Question 2

Suppose we build a prediction algorithm on a data set and it is 100% accurate on that data set. Why might the algorithm not work well if we collect a new data set?

We may be using a bad algorithm that doesn’t predict well on this kind of data.
We have too few predictors to get good out of sample accuracy.
We have used neural networks which has notoriously bad performance.
Our algorithm may be overfitting the training data, predicting both the signal and the noise.

Answer : Our algorithm may be overfitting the training data, predicting both the signal and the noise.

Question 3

What are typical sizes for the training and test sets?

0% training set, 100% test set.
80% training set, 20% test set
90% training set, 10% test set
50% in the training set, 50% in the testing set.

Answer : 80% training set, 20% test set.

Question 4

What are some common error rates for predicting binary variables (i.e. variables with two possible values like yes/no, disease/normal, clicked/didn’t click)? Check the correct answer(s).

Correlation
Root mean squared error
Median absolute deviation
Predictive value of a positive
R^2

Answer : Predictive value of a positive.

Question 5

Suppose that we have created a machine learning algorithm that predicts whether a link will be clicked with 99% sensitivity and 99% specificity. The rate the link is clicked is 1/1000 of visits to a website. If we predict the link will be clicked on a specific visit, what is the probability it will actually be clicked?

50%
89.9%
0.009%
9%

Answer : y definition we have : * sensivity = TP/(TP+FN) * specificity = TN/(TN+FP) * prevalence = (TP+FN)/(TP+FN+TN+FP)

and we know that :

TP=(TP+FN).sensitivity, FP=(TN+FP).(1−specificity)
sensitivity.prevalence=TP/(TP+FN+TN+FP)
(1−specificity).(1−prevalence)=FP/(TP+FN+TN+FP)

We want to compute : p = Pr(click +|test click +) = TP/(TP+FP)

p=(specificity.prevalence)/(sensitivity.prevalence+(1−specificity).(1−prevalence))

So p = (10^{−3.0.99)/(10}−3.0.99+0.01∗0.999) ~ 9%

Coursera - Practical Machine Learning - Quiz1

Question 1

Question 2

Question 3

Question 4

Question 5