Question 1
You want to develop a scary monster to star in a horror film. You are pretty sure that blood
and teeth make the monster more scary, whereas cuteness makes the monster less scary.
You want to put this to the test with a linear regression model.
Which of the following formulas would be appropriate?
a. Scariness = a + 1.6 * (blood) - 2.8 * (teeth) + 3.2 * (cuteness)
b. Scariness = a + 1.6 * (blood) - 2.8 * (teeth) - 3.2 * (cuteness)
c. Scariness = a - 1.6 * (blood) + (teeth) + 3.2 * (cuteness)
d. Scariness = a + 1.6 * (blood) + 2.8 * (teeth) - 3.2 * (cuteness)
Question 2
Your friend says that your scary monster model is useless, and the only thing that makes
a difference to scariness is whether the monster has a weapon.
What could you show your friend to prove that your model is useful?
a. Compare the slope values for each predictor.
b. The model's R-squared.
c. The model's intercept.
d. The correlation between scariness and each predictor.
Question 3
Calculate the Total Sum of Squares for the table below.
Predicted_Scariness Observed_Scariness
2.88 3.8
3.22 2
3.56 4
3.90 3
4.24 5
Mean = 3.56 Mean = 3.56
Question 4
Calculate the Regression Sum of Squares for the table above.
Question 5
Based on the regression sum of squares and total sum of squares you calculated in the previous
two questions. Calculate the R-squared.
Question 6
Based on the Total Sum of Squares and Regression Sum of Squares, what can we say about our
model of monster scariness?
a. Blood, teeth and cuteness cannot account for any of the variation in monster scariness.
b. Blood, teeth and cuteness can account for less than half of the variation in monster scariness.
c. Blood, teeth and cuteness can account for half of the variation in monster scariness.
d. Blood, teeth and cuteness can account for more than half of the variation in monster scariness.
e. Blood, teeth and cuteness can account for all of the variation in monster scariness.
Question 7
Sometimes it's scary to ask people out on dates, and sometimes it's easier. A dating researcher
decides to try to build a model to predict how likely a person is to ask someone on a date
based on the following predictors: level of attraction, amount of loneliness, desperation,
fear of rejection. How many parameters are in the model?
Question 8
After 20 observations, the model predicting how likely a person is to ask someone on a date based on level of attraction, amount of loneliness, desperation and fear of rejection has an error sum
of squares of 10.6 and a total sum of squares of 26.2.
What is the F-test statistic?
Question 9
Your null hypothesis was that the regression coefficients for level of attraction, amount of
loneliness, desperation and fear of rejection are all Zeros.
What is the threshold value above which the F-statistic must lie in order to reject the null
hypothesis at the 0.05 level? Use the F-table and round the value to three decimal places.
Question 10
A TV company is interested in the people watching their period drama show "Downtown Castle" that
is set in the early 1900s. They found an overall F-statistic suggesting that together, age and
hours of free time significantly predicted the number of episodes of Downtown Castle that
people watched in a sample of 30. The slope coefficient for age was 4.5, with a standard
error of 2.5. What is the t-value of the predictor age?
Question 11
Based on the Downtown Castle model above, calculate the upper boundary of the 95% confidence
interval for the age slope coefficient.
You can select the critical t-value from the table.
Question 12
Based on the Downtown Castle model above, calculate the lower boundary of the 95% confidence
interval for the age slope coefficient.
Question 13
Based on the confidence interval calculated in the previous two questions, is age significantly
related to number of episodes watched, when controlling for hours of free time?
Question 14
Suppose both age and amount of free time positively predict the number of episodes of Downtown
Castle that people watched. The TV company looked at viewers of Downtown Castle between the ages
of 20 and 50. Suppose they looked at a full range of ages in the population, and found that
people beyond this range of 20 to 50 watch much less Downtown Castle with much less variation in
their scores. Which assumptions will this new data set likely violate? Select all that apply
a. Independence
b. Normality
c. Linearity
d. Homoscedasticity
e. Sufficient observations
f. Absence of outliers
Question 15
Both age and amount of free time positively predict the number of episodes of Downtown Castle that people watched.
However, later on the TV company found out that one of their participants who were both high on
age and free time did not have access to a TV! Which standardized residual values would you most
expect this participant to show? [Hint -3 < x < 3]
Question 16
A company desperately wants people to complete a survey, and decide to offer prizes to
incentivize participation. They have data from past surveys. In some cases participants
could win cash, in other cases they could win vouchers.
The company built a model to predict survey participation with prize, age and income
as predictors. A value of 1 for the indicator prize indicates cash, a value of Zero
indicates vouchers were offered.
The company finds that the prize indicator coefficient is 2.6. What does this mean?
a. The intercept is 2.6 units higher when vouchers are offered rather than cash.
b. The intercept is 2.6 units higher when cash is offered rather than vouchers.
c. The slope is 2.6 units higher when vouchers are offered rather than cash.
d. The slope is 2.6 units higher when cash is offered rather than vouchers.
Question 17
The company decides to look at three other prizes in addition to cash and vouchers: a car,
a holiday and a computer. How many dummy variables will the company use in their model?
Question 18
A politician is interested in voting turnout. She runs a logistic regression model where 1 = turn up to vote, 0 = don't turn up to vote, and finds that Conscientiousness as a personality trait was a significant predictor of voting. The odds coefficient for Conscientiousness was 3.210.
Which of the following is true?
a. The probability of voting increases by 3.210 when Conscientiousness increases by 1.
b. The probability of voting increases by 1 when Conscientiousness increases by 3.210.
c. The odds of voting will change by a multiplicative factor of 3.210 when Conscientiousness
increases by 1.
d. The odds of voting will change by a multiplicative factor of 1 when Conscientiousness increases by 3.210.
Question 19
Based on the politician's model expressed in terms of log-odds, with an intercept of 0.400 and
a regression coefficient of 1.166, what is the probability that a person with a
conscientiousness score of 0.5 will turn up to vote?
Question 20
The table below shows the predicted and observed values for 20 voters using the politician's
logistic regression model.
| Observed |
Vote |
Did_Not_Vote |
| Vote |
8 |
1 |
| Did_Not_Vote |
2 |
9 |
Calculate the sensitivity.
Question 21
For the data shown in above table, Calculate the specificity.