基于简单线性回归模型研究父亲身高与成年儿子身高之间的关系,采用UsingR程序包中的father.son数据集,其中包括1078对父子身高,数据如下:

library(UsingR)
## Warning: package 'UsingR' was built under R version 4.0.3
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 4.0.3
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 4.0.3
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## 
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
## 
##     cancer
data(father.son)
head(father.son)
##    fheight  sheight
## 1 65.04851 59.77827
## 2 63.25094 63.21404
## 3 64.95532 63.34242
## 4 65.75250 62.79238
## 5 61.13723 64.28113
## 6 63.02254 64.24221

利用lm函数拟合线性回归模型,父亲身高是自变量,儿子身高是因变量。

lm.sol<-lm(sheight~fheight,data=father.son)
lm.sol
## 
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
## 
## Coefficients:
## (Intercept)      fheight  
##     33.8866       0.5141
summary(lm.sol)
## 
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8772 -1.5144 -0.0079  1.6285  8.9685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.88660    1.83235   18.49   <2e-16 ***
## fheight      0.51409    0.02705   19.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared:  0.2513, Adjusted R-squared:  0.2506 
## F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16

利用ggplot2程序包作图,把1078对身高绘制散点图,穿过点的蓝色线是回归线,周围灰色区域表示模型拟合中的不确定性。

reg.p<-ggplot(father.son,aes(x=fheight,y=sheight))+
  geom_point()+geom_smooth(mehtod="lm")+xlim(58,76)+ylim(58,76)+
  labs(x="父亲身高 Father's height (inch)",y="儿子身高 Son's height (inch)")+
  annotate("text", x=60.5, y=75, parse=TRUE,color="blue",
           label=expression(paste(italic(y),"=0.5141",italic(x),"+33.8866")))+
  annotate("text", x=60, y=74, parse=TRUE,color="blue",
           label=expression(paste(italic(R^2),"=0.2506")))+
  annotate("text", x=60, y=73, parse=TRUE,color="blue",
           label=expression(paste(italic(P),"<0.001")))
## Warning: Ignoring unknown parameters: mehtod
reg.p
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量

## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量

## Warning in is.na(x): is.na()不适用于类别为'expression'的非串列或非矢量

##ggsave("reg.png", reg.p, width=6, height=6, path="D://R total/biostatistics/")