Quiz 1A: Simple Linear Regression

####################
# READING IN DATA
####################

dog_data <- read.csv("dog_lifespan (1).csv")

Correlation Analysis

cor.test(dog_data$height, dog_data$lifespan)

    Pearson's product-moment correlation

data:  dog_data$height and dog_data$lifespan
t = -5.0306, df = 28, p-value = 2.551e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8406159 -0.4372929
sample estimates:
       cor 
-0.6890129 

Question 1

The degrees of freedom for the test of significance of the correlation coefficient is given by n-1.

  • True

  • False

False. It is given by \(n-2\).

Question 2

The value of the test statistic for the test of significance of the correlation coefficient is:

\(-0.69\)

Question 3

The value of the upper bound of the confidence interval for the correlation coefficient is:

\(-0.44\)

Simple Linear Regression

model <- lm(lifespan ~ height, data=dog_data)
summary(model)

Call:
lm(formula = lifespan ~ height, data = dog_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6864 -0.6530  0.2211  0.9356  2.2936 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  15.0046     0.6052  24.794  < 2e-16 ***
height       -4.9375     0.9815  -5.031 2.55e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.302 on 28 degrees of freedom
Multiple R-squared:  0.4747,    Adjusted R-squared:  0.456 
F-statistic: 25.31 on 1 and 28 DF,  p-value: 2.551e-05

Question 4

The value of the test statistic for the test of overall model significance is:

\(25.31\)

Question 5

The upper bound of the 95% confidence interval for the intercept parameter is:

confint(model)
                2.5 %   97.5 %
(Intercept) 13.764924 16.24423
height      -6.948035 -2.92701

\(16.24\)

Question 6

The lower bound of the 95% confidence interval for the slope parameter is:

\(-6.95\)

Question 7

The proportion of the variation in lifespan that is explained by a dog’s height is:

\(0.47\)

Question 8

The estimated value of the slope coefficient is:

\(-4.94\)

Question 9

The value of the intercept coefficient is:

\(15.00\)

Multiple Choice Questions

Question 10

Which is the correct interpretation of the slope estimate from the regression model fitted in the simple linear regression section?

  1. On average, there is a 15 unit increase in lifespan when height decreases by 4.94m.
  2. On average, a dog’s lifespan is 15 years when its height is 0m.
  3. On average, there is a 1 unit decrease in lifespan when height decreases by 4.94m.
  4. On average, there is a 4.94 year decrease in lifespan when height increases by 1m.
  5. We cannot interpret the slope coefficient as it is not statistically significant.

D

Question 11

A significant slope coefficient in a simple linear regression model means that: (you can select more than one correct answer)

  1. The standard error of the slope is small relative to the size of the estimate.

  2. The model is appropriate for the data.

  3. A change in the dependent variable causes a change in the independent variable.

  4. The independent and dependent variables are linearly related.

  5. A change in the independent variable causes a change in the dependent variable.

(c) and (e) are true. Although correlation does not imply causation, it is one of the perks of simple linear regression that we can explain the impact that changing one variable has on the other variable. Note, this is not to say that significance means that correlation implies causation, but simply to show the difference between “causing a change” (which is true) and “causing” (which is false).

The question of whether the model is appropriate for the data can be answered by the significance of the F statistic. Since this is significant at the \(5\%\) significance level, we can conclude that the model is significant for the data. (b) is false, since slope significance does not give us this information.

Clearly, (d) is true. A significant slope tells us that there is some significant linear relationship.

For (a), we can take

\[t=\frac{\hat{\beta}_{1}}{se(\hat{\beta}_{1})}=\frac{-4.9375}{0.9815}\approx-5.03\]

If these values were about the same size, we would have that the ratio is quite close to zero. As this is, it tells us that the slope estimate is about \(5\) times larger than the standard error. It turns out that this is true for any significant slope estimate. Not that you will always get a “\(5\) times larger” situation, but the ratio will be large. So, we would take (a) to also be true.

Question 12

Which of the following statements about linear regression are true? (you can select more than one correct solution)

  1. The regression model is based on the formula for a straight line.
  2. We can use the independent variable to predict the value of the dependent variable.
  3. Hypothesis testing is done on the estimated coefficients from regression models so that we can make inference about the population coefficients.
  4. In a linear regression analysis, the dependent variable can be continuous or categorical.
  5. The coefficients in the model output are the population parameters.

Except for the last two propositions, all of these statements are true. The dependent variable has to be continuous. Also, the coefficients in the model are sample parameters.

Question 13

In which scenarios would we conduct a simple linear regression analysis instead of a correlation analysis?

Whenever we are trying to measure the impact that changing the independent variable has on the dependent variable, and whenever we are trying to predict dependent variable using the independent variable.

Question 14

What conclusion would you make for the hypothesis test on the correlation coefficient using a significance level of 1%?

We reject the null hypothesis and conclude that there is significant evidence that the population correlation coefficient is non-zero since the \(p\)-value is less than the chosen significance level. So, there is some significant linear relationship between average lifespan and dog height.