Đọc dữ liệu vào R

ob = read.csv("~/Dropbox/_Conferences and Workshops/SiS Lectures 1-2025/Data/obesity data.csv")

head(ob)
##   id gender height weight  bmi age  bmc  bmd   fat  lean pcfat
## 1  1      F    150     49 21.8  53 1312 0.88 17802 28600  37.3
## 2  2      M    165     52 19.1  65 1309 0.84  8381 40229  16.8
## 3  3      F    157     57 23.1  64 1230 0.84 19221 36057  34.0
## 4  4      F    156     53 21.8  56 1171 0.80 17472 33094  33.8
## 5  5      M    160     51 19.9  54 1681 0.98  7336 40621  14.8
## 6  6      F    153     47 20.1  52 1358 0.91 14904 30068  32.2

Phân tích mô hình hồi qui tuyến tính

summary(lm(pcfat ~ bmi, data=ob))
## 
## Call:
## lm(formula = pcfat ~ bmi, data = ob)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.612  -4.181   1.392   4.690  18.241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.39889    1.36777   6.141 1.11e-09 ***
## bmi          1.03619    0.06051  17.123  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.45 on 1215 degrees of freedom
## Multiple R-squared:  0.1944, Adjusted R-squared:  0.1937 
## F-statistic: 293.2 on 1 and 1215 DF,  p-value: < 2.2e-16

Summary

The mean of percent body fat is 31.6047859.

Vẽ mối tương quan giữa BMI và pcfat

library(ggplot2)
p = ggplot(data=ob, aes(x=bmi, y=pcfat, col=gender))
p + geom_point() + geom_smooth(method="lm")
## `geom_smooth()` using formula = 'y ~ x'