基于简单线性回归模型研究父亲身高与成年儿子身高之间的关系,采用UsingR程序包中的father.son数据集,其中包括1078对父子身高,数据如下:
library(UsingR)
## Warning: package 'UsingR' was built under R version 4.0.3
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 4.0.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 4.0.3
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
##
## cancer
data(father.son)
head(father.son)
## fheight sheight
## 1 65.04851 59.77827
## 2 63.25094 63.21404
## 3 64.95532 63.34242
## 4 65.75250 62.79238
## 5 61.13723 64.28113
## 6 63.02254 64.24221
利用lm函数拟合线性回归模型,父亲身高是自变量,儿子身高是因变量。
lm.sol<-lm(sheight~fheight,data=father.son)
lm.sol
##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
##
## Coefficients:
## (Intercept) fheight
## 33.8866 0.5141
summary(lm.sol)
##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8772 -1.5144 -0.0079 1.6285 8.9685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.88660 1.83235 18.49 <2e-16 ***
## fheight 0.51409 0.02705 19.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared: 0.2513, Adjusted R-squared: 0.2506
## F-statistic: 361.2 on 1 and 1076 DF, p-value: < 2.2e-16
利用ggplot2程序包作图,把1078对身高绘制散点图,穿过点的蓝色线是回归线,周围灰色区域表示模型拟合中的不确定性。
reg.p<-ggplot(father.son,aes(x=fheight,y=sheight))+
geom_point()+geom_smooth(mehtod="lm")+xlim(58,76)+ylim(58,76)+
labs(x="父亲身高 Father's height (inch)",y="儿子身高 Son's height (inch)")+
annotate("text", x=60.5, y=75, parse=TRUE,color="blue",
label=expression(paste(italic(y),"=0.5141",italic(x),"+33.8866")))+
annotate("text", x=60, y=74, parse=TRUE,color="blue",
label=expression(paste(italic(R^2),"=0.2506")))+
annotate("text", x=60, y=73, parse=TRUE,color="blue",
label=expression(paste(italic(P),"<0.001")))
## Warning: Ignoring unknown parameters: mehtod
reg.p
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量
## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量
## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量
##ggsave("reg.png", reg.p, width=6, height=6, path="D://R total/biostatistics/")