Quiz 1 Regression ModelsConsider the data set given below
x <- c(0.18, -1.54, 0.42, 0.95)
And weights given by
w <- c(2, 1, 3, 1)
Give the value of \(\mu\) that minimizes the least squares equation
\[\sum_{i = 1}^{n}{w_i \cdot (x_i - \mu)^2}\]
Answer
The \(\mu\) value that minimizes the given function should be the mean. For this reason, I will calculate the weighted average.
\[\mu = \frac{\sum{x_i \cdot w_i}}{\sum{w_i}}\]
# Calculating the weighted average.
sum(x * w)/sum(w)
## [1] 0.1471429
Consider the following data set:
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
Fit the regression through the origin and get the slope treating y as the outcome and x as the regressor.
(Hint, do not center the data since we want regression through the origin, not through the means of the data.)
Answer
Based on the lm() function, it is necessary to set the
regression without Interceptor.
# Creating a data frame.
df_q2 <- data.frame(x, y)
# Fitting a linear regression model without interceptor (using -1).
fit_q2 <- lm(data = df_q2, formula = y ~ x - 1)
# Printing the coefficients
summary(fit_q2)$coeff
## Estimate Std. Error t value Pr(>|t|)
## x 0.8262517 0.5816544 1.42052 0.18916
Do data(mtcars) from the datasets package and fit the
regression model with mpg as the outcome and
weight as the predictor. Give the slope
coefficient.
Answer
The model is quite simple:
# Fitting a model using mpg and weight.
fit_q3 <- lm(data = mtcars, formula = mpg ~ wt)
# Printing the coefficients
summary(fit_q3)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285126 1.877627 19.857575 8.241799e-19
## wt -5.344472 0.559101 -9.559044 1.293959e-10
Consider data with an outcome (Y) and a predictor (X). The standard
deviation of the predictor is one half that of the outcome. The
correlation between the two variables is 0.5. What value
would the slope coefficient for the regression model with \(Y\) as the outcome and \(X\) as the predictor?
Answer
From the given formula (1) to calculate the \(\beta_1\).
\[\beta_1 = \frac{Cov(Y,X)}{(sd(X))^2}\] The Correlation is given by the follow formula (2):
\[Cor(Y,X) = \frac{Cov(Y,X)}{sd(X) \cdot sd(Y)}\]
Using the formula (2) in (1):
\[\beta_1 = Cor(X,Y) \cdot \frac{sd(Y)}{sd(X)}\]
# Creating the variables
cor_x_y <- 0.5
sdy_sdx <- 2
# Calculating the beta_1
beta_1 <- cor_x_y * sdy_sdx
# Printing beta_1
beta_1
## [1] 1
Students were given two hard tests and scores were normalized to have empirical mean 0 and variance 1. The correlation between the scores on the two tests was 0.4. What would be the expected score on Quiz 2 for a student who had a normalized score of 1.5 on Quiz 1?
Answer
In this case of normalized data, the slope will equal the correlation. Due to the mean equal to zero, the intercept should be zero.
\[y = \underbrace{\beta_0}_{\text{Should be zero}} + \underbrace{\beta_1}_{\text{Should be equal to Cor(X,Y)}} \cdot x\]
# Creating the regression as a function.
q6 <- function(x) {
beta_0 <- 0
beta_1 <- 0.4
return(beta_0 + beta_1 * x)
}
# Calculating the value to 1.5.
q6(1.5)
## [1] 0.6
Consider the data given by the following
x <- c(8.58, 10.46, 9.01, 9.64, 8.86)
What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?
Answer
# Normalizing th x vector.
x_norm <- (x - mean(x))/sd(x)
# Printing the first element
x_norm[1]
## [1] -0.9718658
Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
Answer
# Creating a data frame.
df_q7 <- data.frame(x,y)
# Fitting a model with intercept
fit_q7 <- lm(data = df_q7, formula = y ~ x)
# Printing the coefficients
summary(fit_q7)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.567461 1.252107 1.2518582 0.2459827
## x -1.712846 2.105259 -0.8136034 0.4394144
You know that both the predictor and response have mean 0. What can be said about the intercept when you fit a linear regression?
Answer
# Using the same values of Question 7
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
# Centering the data to have mean zero.
x_c <- x - mean(x)
y_c <- y - mean(y)
# Creating a data frame
df_q8 <- data.frame(x_c, y_c)
# Fitting a model
fit_q8 <- lm(data = df_q8, formula = y_c ~ x_c)
# Printing the coefficients
summary(fit_q8)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.002843e-16 0.3355297 2.988834e-16 1.0000000
## x_c -1.712846e+00 2.1052592 -8.136034e-01 0.4394144
Consider the data given by
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
What value minimizes the sum of the squared distances between these points and itself?
Answer
The answer is similar to question 1, except this data has no weight. So, the value which minimizes the Squared Distances should be the mean of x.
# Calculating the average.
mean(x)
## [1] 0.573
Let the slope having fit Y as the outcome and X as the predictor be denoted as \(\beta_1\). Let the slope from fitting X as the outcome and Y as the predictor be denoted as \(\gamma_1\). Suppose that you divide \(\beta_1\) by \(\gamma_1\); in other words consider \(\frac{\beta_1}{\gamma_1}\). What is this ratio always equal to?
Answer
Given \(\beta_1\):
\[\beta_1 = Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}\]
Given \(\gamma_1\):
\[\gamma_1 = Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}\]
Calculating the \(\frac{\beta_1}{\gamma_1}\):
\[\frac{\beta_1}{\gamma_1} = \frac{Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}}{Cor(Y,X) \cdot \frac{sd(Y)}{sd(X)}}=\Big( \frac{sd(Y)}{sd(X)} \Big)^2 = \frac{Var(Y)}{Var(X)}\]