Su dung ham linear model trong R su dung du lieu “Galton data”
galton = read.csv("D:\\phan mem\\Phan mem R software\\Dataset thuc hanh\\Galton data.csv")
head(galton)
## id parent child
## 1 1 70.5 61.7
## 2 2 68.5 61.7
## 3 3 65.5 61.7
## 4 4 64.5 61.7
## 5 5 64.0 61.7
## 6 6 67.5 62.2
m=lm(child~parent,data = galton)
summary(m)
##
## Call:
## lm(formula = child ~ parent, data = galton)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8050 -1.3661 0.0487 1.6339 5.9264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.94153 2.81088 8.517 <2e-16 ***
## parent 0.64629 0.04114 15.711 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.239 on 926 degrees of freedom
## Multiple R-squared: 0.2105, Adjusted R-squared: 0.2096
## F-statistic: 246.8 on 1 and 926 DF, p-value: < 2.2e-16
Kiem tra gia dinh cua mo hinh
Gia dinh 1: X khong co sai so ngau nhien (gia dinh nay trong thuc te thuong bo qua vi trong do luong thuc te viec xay ra sai so la chuyen binh thuong)
Gia dinh 3: SAi so ngau nhien phan bo chuan
Su dung bo du lieu“obesity” de tao ra bien phan phoi chuan hoa
ob=read.csv("D:\\phan mem\\Phan mem R software\\Dataset thuc hanh\\Obesity data.csv")
head(ob)
## id gender height weight bmi age bmc bmd fat lean pcfat
## 1 1 F 150 49 21.8 53 1312 0.88 17802 28600 37.3
## 2 2 M 165 52 19.1 65 1309 0.84 8381 40229 16.8
## 3 3 F 157 57 23.1 64 1230 0.84 19221 36057 34.0
## 4 4 F 156 53 21.8 56 1171 0.80 17472 33094 33.8
## 5 5 M 160 51 19.9 54 1681 0.98 7336 40621 14.8
## 6 6 F 153 47 20.1 52 1358 0.91 14904 30068 32.2
Chuan hoa bien bmi voi bien moi “zbmi” bang cach lay tung gia tri bmi - mean(bmi). Co 2 cach, cach 1 la:
ob$zbmi=(ob$bmi-mean(ob$bmi))/sd(ob$bmi)
mean(ob$zbmi)
## [1] -5.220391e-16
sd(ob$zbmi)
## [1] 1
cach 2 la
ob$zbmi=scale(ob$bmi)
mean(ob$zbmi)
## [1] -5.220391e-16
sd(ob$zbmi)
## [1] 1
Linear regression rmarkdown
library(visreg)
library(ggfortify)
## Loading required package: ggplot2
ob=read.csv("D:\\phan mem\\Phan mem R software\\Dataset thuc hanh\\Obesity data.csv")
women = subset(ob, gender=="F")
head(women)
## id gender height weight bmi age bmc bmd fat lean pcfat
## 1 1 F 150 49 21.8 53 1312 0.88 17802 28600 37.3
## 3 3 F 157 57 23.1 64 1230 0.84 19221 36057 34.0
## 4 4 F 156 53 21.8 56 1171 0.80 17472 33094 33.8
## 6 6 F 153 47 20.1 52 1358 0.91 14904 30068 32.2
## 7 7 F 155 58 24.1 66 1546 0.96 20233 35599 35.3
## 10 10 F 158 60 24.0 58 1404 0.86 21365 35534 36.6
fit mo hinh hoi quy tuyen tinh
m1=lm(pcfat~bmi, data=women)
summary(m1)
##
## Call:
## lm(formula = pcfat ~ bmi, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.4308 -2.3335 0.1359 2.5871 15.1984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.61490 0.94288 9.137 <2e-16 ***
## bmi 1.17079 0.04197 27.895 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.761 on 860 degrees of freedom
## Multiple R-squared: 0.475, Adjusted R-squared: 0.4744
## F-statistic: 778.1 on 1 and 860 DF, p-value: < 2.2e-16
visreg(m1)

autoplot(m1)

Fit mo hinh parbol va cubic
ggplot(data=women, aes(x=bmi, y=pcfat))+geom_point()+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

m2=lm(pcfat~bmi+I(bmi^2), data=women)
m3=lm(pcfat~bmi+I(bmi^2)+I(bmi^3), data=women)
summary(m2)
##
## Call:
## lm(formula = pcfat ~ bmi + I(bmi^2), data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.4126 -2.3894 0.0644 2.5644 14.9304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.821101 4.297335 -4.380 1.33e-05 ***
## bmi 3.574746 0.370065 9.660 < 2e-16 ***
## I(bmi^2) -0.051653 0.007903 -6.536 1.08e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.673 on 859 degrees of freedom
## Multiple R-squared: 0.4999, Adjusted R-squared: 0.4987
## F-statistic: 429.3 on 2 and 859 DF, p-value: < 2.2e-16
summary(m3)
##
## Call:
## lm(formula = pcfat ~ bmi + I(bmi^2) + I(bmi^3), data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.5100 -2.4021 0.0373 2.6260 14.8127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -42.817565 18.013403 -2.377 0.01767 *
## bmi 6.614148 2.246437 2.944 0.00332 **
## I(bmi^2) -0.177044 0.091753 -1.930 0.05399 .
## I(bmi^3) 0.001683 0.001227 1.372 0.17051
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.671 on 858 degrees of freedom
## Multiple R-squared: 0.501, Adjusted R-squared: 0.4992
## F-statistic: 287.1 on 3 and 858 DF, p-value: < 2.2e-16
anova(m1,m2,m3)
## Analysis of Variance Table
##
## Model 1: pcfat ~ bmi
## Model 2: pcfat ~ bmi + I(bmi^2)
## Model 3: pcfat ~ bmi + I(bmi^2) + I(bmi^3)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 860 12163
## 2 859 11587 1 576.28 42.7662 1.058e-10 ***
## 3 858 11562 1 25.35 1.8816 0.1705
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1