Question 1

You want to develop a scary monster to star in a horror film. You are pretty sure that blood
and teeth make the monster more scary, whereas cuteness makes the monster less scary. 
You want to put this to the test with a linear regression model. 
Which of the following formulas would be appropriate?

a.  Scariness = a + 1.6 * (blood) - 2.8 * (teeth) + 3.2 * (cuteness)
b.  Scariness = a + 1.6 * (blood) - 2.8 * (teeth) - 3.2 * (cuteness)
c.  Scariness = a -  1.6 * (blood) + (teeth) + 3.2 * (cuteness)
d.  Scariness = a + 1.6 * (blood) + 2.8 * (teeth) - 3.2 * (cuteness)

Question 2

Your friend says that your scary monster model is useless, and the only thing that makes 
a difference to scariness is whether the monster has a weapon.
What could you show your friend to prove that your model is useful?

a.  Compare the slope values for each predictor.
b.  The model's R-squared.
c.  The model's intercept.
d.  The correlation between scariness and each predictor.

Question 3

Calculate the Total Sum of Squares for the table below.

Predicted_Scariness Observed_Scariness
      2.88            3.8
      3.22            2
      3.56            4
      3.90            3
      4.24            5

Mean = 3.56         Mean = 3.56

Question 4

Calculate the Regression Sum of Squares for the table above.

Question 5

Based on the regression sum of squares and total sum of squares you calculated in the previous 
two questions. Calculate the R-squared.

Question 6

Based on the Total Sum of Squares and Regression Sum of Squares, what can we say about our 
model of monster scariness?
a. Blood, teeth and cuteness cannot account for any of the variation in monster scariness.
b. Blood, teeth and cuteness can account for less than half of the variation in monster scariness.
c. Blood, teeth and cuteness can account for half of the variation in monster scariness.
d. Blood, teeth and cuteness can account for more than half of the variation in monster scariness.
e. Blood, teeth and cuteness can account for all of the variation in monster scariness.

Question 7

Sometimes it's scary to ask people out on dates, and sometimes it's easier. A dating researcher
decides to try to build a model to predict how likely a person is to ask someone on a date 
based on the following predictors: level of attraction, amount of loneliness, desperation, 
fear of rejection. How many parameters are in the model?

Question 8

After 20 observations, the model predicting how likely a person is to ask someone on a date based on level of attraction, amount of loneliness, desperation and fear of rejection has an error sum 
of squares of 10.6 and a total sum of squares of 26.2.
What is the F-test statistic?

Question 9

Your null hypothesis was that the regression coefficients for level of attraction, amount of 
loneliness, desperation and fear of rejection are all Zeros.
What is the threshold value above which the F-statistic must lie in order to reject the null 
hypothesis at the 0.05 level? Use the F-table and round the value to three decimal places.

Question 10

A TV company is interested in the people watching their period drama show "Downtown Castle" that 
is set in the early 1900s. They found an overall F-statistic suggesting that together, age and 
hours of free time significantly predicted the number of episodes of Downtown Castle that 
people watched in a sample of 30. The slope coefficient for age was 4.5, with a standard 
error of 2.5.  What is the t-value of the predictor age?

Question 11

Based on the Downtown Castle model above, calculate the upper boundary of the 95% confidence 
interval for the age slope coefficient. 
You can select the critical t-value from the table.

Question 12

Based on the Downtown Castle model above, calculate the lower boundary of the 95% confidence
interval for the age slope coefficient.

Question 13

Based on the confidence interval calculated in the previous two questions, is age significantly
related to number of episodes watched, when controlling for hours of free time?

Question 14

Suppose both age and amount of free time positively predict the number of episodes of Downtown 
Castle that people watched. The TV company looked at viewers of Downtown Castle between the ages
of 20 and 50. Suppose they looked at a full range of ages in the population, and found that 
people beyond this range of 20 to 50 watch much less Downtown Castle with much less variation in 
their scores. Which assumptions will this new data set likely violate? Select all that apply
a.  Independence
b.  Normality
c.  Linearity
d.  Homoscedasticity
e.  Sufficient observations
f.  Absence of outliers

Question 15

Both age and amount of free time positively predict the number of episodes of Downtown Castle that people watched. 
However, later on the TV company found out that one of their participants who were both high on 
age and free time did not have access to a TV! Which standardized residual values would you most 
expect this participant to show? [Hint -3 < x < 3] 

Question 16

A company desperately wants people to complete a survey, and decide to offer prizes to 
incentivize participation. They have data from past surveys. In some cases participants 
could win cash, in other cases they could win vouchers. 
The company built a model to predict survey participation with prize, age and income 
as predictors. A value of 1 for the indicator prize indicates cash, a value of Zero 
indicates vouchers were offered.
The company finds that the prize indicator coefficient is 2.6. What does this mean?
a. The intercept is 2.6 units higher when vouchers are offered rather than cash.
b. The intercept is 2.6 units higher when cash is offered rather than vouchers.
c. The slope is 2.6 units higher when vouchers are offered rather than cash.
d. The slope is 2.6 units higher when cash is offered rather than vouchers.

Question 17

The company decides to look at three other prizes in addition to cash and vouchers: a car, 
a holiday and a computer. How many dummy variables will the company use in their model?

Question 18

A politician is interested in voting turnout. She runs a logistic regression model where 1 = turn up to vote, 0 = don't turn up to vote, and finds that Conscientiousness as a personality trait was a significant predictor of voting. The odds coefficient for Conscientiousness was 3.210.
Which of the following is true?

a. The probability of voting increases by 3.210 when Conscientiousness increases by 1.

b. The probability of voting increases by 1 when Conscientiousness increases by 3.210.

c. The odds of voting will change by a multiplicative factor of 3.210 when Conscientiousness 
increases by 1.

d. The odds of voting will change by a multiplicative factor of 1 when Conscientiousness increases by 3.210.

Question 19

Based on the politician's model expressed in terms of log-odds, with an intercept of 0.400 and 
a regression coefficient of 1.166, what is the probability that a person with a 
conscientiousness score of 0.5 will turn up to vote?

Question 20

The table below shows the predicted and observed values for 20 voters using the politician's 
logistic regression model.
Predicted
Observed Vote Did_Not_Vote
Vote 8 1
Did_Not_Vote 2 9
Calculate the sensitivity.    

Question 21

For the data shown in above table, Calculate the specificity.