\(y_i=\beta_0+\beta_1x_i+\epsilon_i\)
\(\hat{\beta}_1=\frac{\sum x_iy_i-\frac{1}{n}\sum x_i\sum y_i}{\sum x_i^2-\frac{1}{n}(\sum x_i)^2}\)
\(\hat{\beta_0}=\bar{y}-\hat{\beta}_1\bar{x}\)
The R version of the regression model is: y ~ x where y is your outcome and x is your predictor.
The summary command gets all the additional information (p-values, t-statistics, r-square) that you usually want from a regression.
attach(mtcars)
f1 <- lm(hp~ disp)
summary(f1)
##
## Call:
## lm(formula = hp ~ disp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.623 -28.378 -6.558 13.588 157.562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.7345 16.1289 2.836 0.00811 **
## disp 0.4375 0.0618 7.080 7.14e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.65 on 30 degrees of freedom
## Multiple R-squared: 0.6256, Adjusted R-squared: 0.6131
## F-statistic: 50.13 on 1 and 30 DF, p-value: 7.143e-08
names(f1)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
Intercept: When disp is zero, the predicted horsepower is approximately 45.73 units. While a displacement of zero isn’t practical in real-world terms, the intercept is necessary for the mathematical equation and can be interpreted within the context of the data.
Slope: For every additional unit increase in engine displacement, the horsepower (hp) is expected to increase by approximately 0.4375 units. This shows a positive relationship between two quantatives.
P-value:Both coefficients have p-values less than 0.01, indicating they are statistically significant at the 1% significance level.
Goodness of Fit:
R-squared: Approximately 62.56% of the variability in horsepower (hp) is explained by the engine displacement (disp). R-squared values range from 0 to 1. A higher value indicates a better fit.
Residual standard error: Indicates that the residuals (errors) are small, meaning the model’s predictions are closer to the actual values. This is a sign of a better fit.
The F-statistic tests whether at least one predictor variable has a non-zero coefficient. The F-test assesses the overall significance of the model. A low p-value suggests that the model provides a better fit than one with no independent variables.
Final Interpretation: As engine displacement increases, horsepower tends to increase.
f2 <- lm(hp ~ mpg+ disp + drat)
summary (f2)
##
## Call:
## lm(formula = hp ~ mpg + disp + drat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.74 -19.46 -11.69 17.51 139.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.3580 94.9085 0.362 0.72006
## mpg -5.2383 2.2384 -2.340 0.02663 *
## disp 0.3401 0.1132 3.005 0.00555 **
## drat 38.6732 19.0215 2.033 0.05162 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.96 on 28 degrees of freedom
## Multiple R-squared: 0.7084, Adjusted R-squared: 0.6771
## F-statistic: 22.67 on 3 and 28 DF, p-value: 1.189e-07
anova(f1,f2)
## Analysis of Variance Table
##
## Model 1: hp ~ disp
## Model 2: hp ~ mpg + disp + drat
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 54560
## 2 28 42495 2 12065 3.9748 0.03023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Scatter plot with regression line
plot(mtcars$disp, mtcars$hp,
main = "Horsepower vs. Displacement",
xlab = "Displacement (cu.in.)",
ylab = "Gross Horsepower")
abline(f1, col = "red", lwd = 2)
x = c(-2,-1,0,1,2)
y= c(0,0,1,1,3)