x<-c(0.22, -2.54, 0.52, 0.75, 25)
# and weights given by
w<-c(2, 1, 3, 1, 2)
# Give the value of mu that minimizes the least squares equation
mu <- sum(w * x) / sum(w)
print(mu)
## [1] 5.578889
x1<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)
y1<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)
# Fit the regression through the origin and get the slope treating y as the outcome and x is the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.)
lm(y1~x1-1 )
##
## Call:
## lm(formula = y1 ~ x1 - 1)
##
## Coefficients:
## x1
## 1.151
data(mtcars)
three<-lm(mtcars$mpg ~ mtcars$drat)
summary(three)
##
## Call:
## lm(formula = mtcars$mpg ~ mtcars$drat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0775 -2.6803 -0.2095 2.2976 9.0225
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.525 5.477 -1.374 0.18
## mtcars$drat 7.678 1.507 5.096 1.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.485 on 30 degrees of freedom
## Multiple R-squared: 0.464, Adjusted R-squared: 0.4461
## F-statistic: 25.97 on 1 and 30 DF, p-value: 1.776e-05
cor(mtcars$drat,mtcars$mpg)
## [1] 0.6811719
As we can see in the result, it tells us that the linear relationship between the rear axle ration and miles per gallon has a positive linear relationship.
Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one third that of the outcome. The correlation between the two variables is 0.7. What value would the slope coefficient for the regression model with Y as the outcome and X as the predictor?
\[\hat \beta = Cor(Y, X) \frac{Sd(Y)}{Sd(X)}\]
where \(\hat \beta_1\) is the slope related to the correlation, \(Cor(Y,X)\) is the correlation between (Y) and (X).
Given that \(Cor(Y,X)\) = 0.7 , and the standard deviation is \(SD(X)=\frac{1}{3} SD(Y)\) , so\[\hat \beta = 0.7 \frac{Sd(Y)}{\frac{1}{3} SD(Y)} \]
\[\hat \beta = 0.7 (3) \]
\[\hat \beta = 2.1 \]
Therefore, the value of the slope coefficient is 2.1
You ask a collection of husbands and wives to guess how many jellybeans are in a jar. The correlation is 0.6. The standard deviation for the husbands is 14 beans while the standard deviation for wives is 10 beans. Assume that the data were centered so that 0 is the mean for each. The centered guess for a husband was 40 beans (above the mean). What would be your best estimate of the wife’s guess?
The \(Cor(Y,X)\) = 0.6 , the standard deviation for the husband \(Sd(HB)\) = 14, while \(Sd(WF)\) = 10, first we need to get the slope. \[\hat \beta = 0.6 (\frac{10}{14}) \]
\[\hat \beta = 0.428 \]
Hence, the regression equation would be,
\[Y = 0.428(Sd(HB)) \]
The centered guess for a husband was 40 beans, hence the wife’s guess would be,
\[Y = 0.428(40) = 17.44 \]
Consider the data given by the following
x2 <- c(10.45, 9.45, 12.41, 14.46, 15.26)
# What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?
xnorm <- scale(x2, center = TRUE, scale = TRUE)
norm_meas <-xnorm[1]
print(norm_meas)
## [1] -0.7835272
The value if x were normalized would be -0.7835
x3<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)
y3<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)
lm(y3~x3)
##
## Call:
## lm(formula = y3 ~ x3)
##
## Coefficients:
## (Intercept) x3
## 2.2472 -0.2627
print(coef(lm(y3~x3))[1])
## (Intercept)
## 2.247175
x4 <- c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)
What value minimizes the sum of the squared distances between these points and itself?
print(mean(x4))
## [1] 1.573
Fit a linear regression model to the mtcars dataset with the variable drat as the predictor and the variable mpg as the outcome. Plot the drat (horizontal axis) versus the residuals (vertical axis).
looking back at #3, I have already set a data with drat as the predictor, and mpg as outcome.
residuals(three)
## 1 2 3 4 5 6
## -1.42048871 -1.42048871 0.76342292 5.27566203 2.03818575 4.43269646
## 7 8 9 10 11 12
## -2.82250821 3.59194014 0.22594664 -3.37405336 -4.77405336 0.35244435
## 13 14 15 16 17 18
## 1.25244435 -0.84755565 -4.57260308 -5.11007936 -2.57607286 8.59742943
## 19 20 21 22 23 24
## 0.07093171 9.02247686 0.61515782 1.83269646 -1.46181425 -7.81518916
## 25 26 27 28 29 30
## 3.07566203 3.49742943 -0.48995198 8.97768153 -9.07752314 -0.57058358
## 31 32
## -4.65632497 -2.63291755
plot(mtcars$drat, residuals(three), main = "Residuals vs drat", xlab = "drat", ylab = "Residuals")
abline(h = 0, col = "red", lty = 2)
n <- length(mtcars$mpg)
p <- length(coef(three))
residual_variance_direct <- sum(residuals(three)^2) / (n - p)
cat("Residual Variance (direct estimate):", residual_variance_direct, "\n")
## Residual Variance (direct estimate): 20.11889
cat("Residual Variance (lm output):", summary(three)$sigma^2, "\n")
## Residual Variance (lm output): 20.11889
summary(three)$r.squared
## [1] 0.4639952
plot(mtcars$hp, residuals(three), main = "Residuals vs Horsepower", xlab = "Horsepwer", ylab = "Residuals")
abline(h = 0, col = "red", lty = 2)
n1 <- length(mtcars$hp)
p1 <- length(coef(three))
residual_variance_direct <- sum(residuals(three)^2) / (n1 - p1)
cat("Residual Variance (direct estimate):", residual_variance_direct, "\n")
## Residual Variance (direct estimate): 20.11889
cat("Residual Variance (lm output):", summary(three)$sigma^2, "\n")
## Residual Variance (lm output): 20.11889
summary(three)$r.squared
## [1] 0.4639952