Linear Modeling: used for explaining or modelinh the
relationship between variable Y and one or more variables: X1, X2, ….,
Xp
* Y: dependent variable, response, outcome, output variables
* X1, X2…Xp: independent, predictor, input, explantory variables
Response Variable Y must be continuous
Explantory variables: X1, X2,…Xp can be continuous, discrete or
catagorical
For a response y and a single predictor x, we can write equation:
\(\frac{y - \bar y}{SDy} = r \frac{x - \bar
x}{SDx}\)
r: the sample correlation between x and y
In the simple term: $y = + x $
library(MASS)
help(cats)
summary(cats)
## Sex Bwt Hwt
## F:47 Min. :2.000 Min. : 6.30
## M:97 1st Qu.:2.300 1st Qu.: 8.95
## Median :2.700 Median :10.10
## Mean :2.724 Mean :10.63
## 3rd Qu.:3.025 3rd Qu.:12.12
## Max. :3.900 Max. :20.50
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(cats,aes(x=Bwt,y=Hwt, color = Sex))+
geom_point()
Least Square Estimation: find \((\beta_0, \beta_1)\) that minimize the
residual sum of squares(RSS)
Rearrange the equations
From (1), we have \(\bar y =
\hat\beta_0+\beta_1\bar x\)
Plug (1) back to (2)
LS estimates of \((\beta_0, \beta_1)\) can be expressed as
final equation of y as a function of x is given by:
out = lm(Hwt ~ Bwt, data = cats)
summary(out)
##
## Call:
## lm(formula = Hwt ~ Bwt, data = cats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5694 -0.9634 -0.0921 1.0426 5.1238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3567 0.6923 -0.515 0.607
## Bwt 4.0341 0.2503 16.119 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 142 degrees of freedom
## Multiple R-squared: 0.6466, Adjusted R-squared: 0.6441
## F-statistic: 259.8 on 1 and 142 DF, p-value: < 2.2e-16
names(out)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
out$coef
## (Intercept) Bwt
## -0.3566624 4.0340627
attach(cats)
cor(Hwt,Bwt)^2
## [1] 0.6466209