Quiz 1 Regression Models

📚 Specialization: Data Science: Statistics and Machine Learning Specialization
📖 Course: Regression Models
- 🧑‍🏫 Instructor: Brian Caffo
📆 Week 1
- 🚦 Start: Tuesday, 05 July 2022
- 🏁 Finish: Monday, 18 July 2022
📦 Github Repository: Static Document

Question 1

Consider the data set given below

x <- c(0.18, -1.54, 0.42, 0.95)

And weights given by

w <- c(2, 1, 3, 1)

Give the value of \(\mu\) that minimizes the least squares equation

\[\sum_{i = 1}^{n}{w_i \cdot (x_i - \mu)^2}\]

0.300
0.1471
1.077
0.0025

Answer

The \(\mu\) value that minimizes the given function should be the mean. For this reason, I will calculate the weighted average.

\[\mu = \frac{\sum{x_i \cdot w_i}}{\sum{w_i}}\]

# Calculating the weighted average.
sum(x * w)/sum(w)

## [1] 0.1471429

Question 2

Consider the following data set:

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

Fit the regression through the origin and get the slope treating y as the outcome and x as the regressor.

(Hint, do not center the data since we want regression through the origin, not through the means of the data.)

0.59915
-0.04462
0.8263
-1.713

Answer

Based on the lm() function, it is necessary to set the regression without Interceptor.

# Creating a data frame.
df_q2 <- data.frame(x, y)

# Fitting a linear regression model without interceptor (using -1).
fit_q2 <- lm(data = df_q2, formula = y ~ x - 1)

# Printing the coefficients
summary(fit_q2)$coeff

##    Estimate Std. Error t value Pr(>|t|)
## x 0.8262517  0.5816544 1.42052  0.18916

Question 3

Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and weight as the predictor. Give the slope coefficient.

30.2851
0.5591
-9.559
-5.344

Answer

The model is quite simple:

# Fitting a model using mpg and weight.
fit_q3 <- lm(data = mtcars, formula = mpg ~ wt)

# Printing the coefficients
summary(fit_q3)$coeff

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Question 4

Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one half that of the outcome. The correlation between the two variables is 0.5. What value would the slope coefficient for the regression model with \(Y\) as the outcome and \(X\) as the predictor?

3
1
4
0.25

Answer

From the given formula (1) to calculate the \(\beta_1\).

\[\beta_1 = \frac{Cov(Y,X)}{(sd(X))^2}\] The Correlation is given by the follow formula (2):

\[Cor(Y,X) = \frac{Cov(Y,X)}{sd(X) \cdot sd(Y)}\]

Using the formula (2) in (1):

\[\beta_1 = Cor(X,Y) \cdot \frac{sd(Y)}{sd(X)}\]

# Creating the variables
cor_x_y <- 0.5
sdy_sdx <- 2

# Calculating the beta_1
beta_1 <- cor_x_y * sdy_sdx

# Printing beta_1
beta_1

## [1] 1

Question 5

Students were given two hard tests and scores were normalized to have empirical mean 0 and variance 1. The correlation between the scores on the two tests was 0.4. What would be the expected score on Quiz 2 for a student who had a normalized score of 1.5 on Quiz 1?

0.16
0.4
0.6
1.0

Answer

In this case of normalized data, the slope will equal the correlation. Due to the mean equal to zero, the intercept should be zero.

\[y = \underbrace{\beta_0}_{\text{Should be zero}} + \underbrace{\beta_1}_{\text{Should be equal to Cor(X,Y)}} \cdot x\]

# Creating the regression as a function.
q6 <- function(x) {
    
    beta_0 <- 0
    beta_1 <- 0.4
    
    return(beta_0 + beta_1 * x)
    }

# Calculating the value to 1.5.
q6(1.5)

## [1] 0.6

Question 6

Consider the data given by the following

x <- c(8.58, 10.46, 9.01, 9.64, 8.86)

What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?

8.86
-0.9719
8.58
9.31

Answer

# Normalizing th x vector.
x_norm <- (x - mean(x))/sd(x)

# Printing the first element
x_norm[1]

## [1] -0.9718658

Question 7

Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

2.105
1.567
1.252
-1.713

Answer

# Creating a data frame.
df_q7 <- data.frame(x,y)


# Fitting a model with intercept
fit_q7 <- lm(data = df_q7, formula = y ~ x)

# Printing the coefficients
summary(fit_q7)$coeff

##              Estimate Std. Error    t value  Pr(>|t|)
## (Intercept)  1.567461   1.252107  1.2518582 0.2459827
## x           -1.712846   2.105259 -0.8136034 0.4394144

Question 8

You know that both the predictor and response have mean 0. What can be said about the intercept when you fit a linear regression?

Nothing about the intercept can be said from the information given.
It must be identically 0.
It is undefined as you have to divide by zero.
It must be exactly one.

Answer

# Using the same values of Question 7
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

# Centering the data to have mean zero.
x_c <- x - mean(x)
y_c <- y - mean(y)

# Creating a data frame
df_q8 <- data.frame(x_c, y_c)

# Fitting a model
fit_q8 <- lm(data = df_q8, formula = y_c ~ x_c)

# Printing the coefficients
summary(fit_q8)$coeff

##                  Estimate Std. Error       t value  Pr(>|t|)
## (Intercept)  1.002843e-16  0.3355297  2.988834e-16 1.0000000
## x_c         -1.712846e+00  2.1052592 -8.136034e-01 0.4394144

Question 9

Consider the data given by

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)

What value minimizes the sum of the squared distances between these points and itself?

0.36
0.8
0.573
0.44

Answer

The answer is similar to question 1, except this data has no weight. So, the value which minimizes the Squared Distances should be the mean of x.

# Calculating the average.
mean(x)

## [1] 0.573

Question 10

Let the slope having fit Y as the outcome and X as the predictor be denoted as \(\beta_1\). Let the slope from fitting X as the outcome and Y as the predictor be denoted as \(\gamma_1\). Suppose that you divide \(\beta_1\) by \(\gamma_1\); in other words consider \(\frac{\beta_1}{\gamma_1}\). What is this ratio always equal to?

\(Var(Y)/Var(X)\)
\(2 \cdot sd(X)/sd(Y)\)
1
\(Cor(Y,X)\)

Answer

Given \(\beta_1\):

\[\beta_1 = Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}\]

Given \(\gamma_1\):

\[\gamma_1 = Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}\]

Calculating the \(\frac{\beta_1}{\gamma_1}\):

\[\frac{\beta_1}{\gamma_1} = \frac{Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}}{Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}}=\Big( \frac{sd(Y)}{sd(X)} \Big)^2 = \frac{Var(Y)}{Var(X)}\]

`Quiz 1` Regression Models

👨🏻‍💻 Anderson H Uyekita

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10